130 64 19MB
English Pages 408 [401] Year 1999
Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Edited by J. G. Carbonell and J. Siekmann
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
1730
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo
Michael Gelfond Nicola Leone Gerald Pfeifer (Eds.)
Logic Programming and Nonmonotonic Reasoning 5th International Conference, LPNMR ’99 El Paso, Texas, USA, December 2-4, 1999 Proceedings
13
Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA J¨org Siekmann, University of Saarland, Saarbr¨ucken, Germany
Volume Editors Michael Gelfond University of Texas at El Paso Department of Computer Science El Paso, TX 79916, USA E-mail: [email protected] Nicola Leone Gerald Pfeifer Technische Universit¨at Wien Institut f¨ur Informationssysteme 184/2 Favoritenstraße 9-11, A-1040 Vienna, Austria E-mail: {leone,pfeifer}@dbai.tuwien.ac.at
Cataloging-in-Publication data applied for
Die Deutsche Bibliothek - CIP-Einheitsaufnahme Logic programming and nonmonotonic reasoning : 5th international conference ; proceedings / LPNMR ’99, ElPaso, Texas, USA, December 2 - 4, 1999. Michael Gelfond . . . (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 1999 (Lecture notes in computer science ; Vol. 1730 : Lecture notes in artificial intelligence) ISBN 3-540-66749-0
CR Subject Classification (1998): I.2.3-4, F.4.1, D.1.6 ISBN 3-540-66749-0 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1999 Printed in Germany Typesetting: Camera-ready by author SPIN: 10704004 06/3142 – 5 4 3 2 1 0
Printed on acid-free paper
Preface This volume consists of the refereed papers presented at the Fifth International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR’99) held at El Paso, Texas, in December 1999. LPNMR’99 is the fifth in a series of international meetings on logic programming and nonmonotonic reasoning. Four previous meetings were held in Washington, U.S.A., in 1991, in Lisbon, Portugal, in 1993, in Lexington, U.S.A., in 1995, and in Dagstuhl, Germany, in 1997. The aim of the LPNMR conferences is to facilitate interactions between researchers interested in logic based programming languages and database systems and researchers who work in the areas of knowledge representation and nonmonotonic reasoning. In addition to presentations of accepted papers the conference will feature talks by four invited speakers — Marco Cadoli, Vladimir Lifschitz, David Mc Allester, and Leora Morgenstern. Many people contributed to the success of the LPNMR’99 conference. Special thanks are due to the program committee and the additional reviewers for careful evaluation of the submitted papers. We would also like to thank Gopal Gupta and Danny De Schreye for their efforts in coordinating the schedules of ICLP99 and LPNMR99, and Georg Gottlob, chair of the LPNMR steering committee, who provided continuous advise and support to the program chairs. The conference was financially supported by the University of Texas at El Paso and Compulog Net provided support for a European invited speaker.
December 1999
Michael Gelfond Nicola Leone Gerald Pfeifer
Conference Organization
Program Co-Chairs Michael Gelfond (University of Texas at El Paso, USA) Nicola Leone (Vienna University of Technology, Austria)
Program Committee Jose Julio Alferes (Universidade de Evora, Portugal) Chitta Baral (University of Texas at El Paso, USA) Nicole Bidoit (Universit´e de Bordeaux 1, France) J¨ urgen Dix (University of Koblenz, Germany) Thomas Eiter (Vienna University of Technology, Austria) Fangzhen Lin (The Hong Kong University of Science and Technology, China) Jack Minker (University of Maryland, USA) Anil Nerode (Cornell University, USA) Ilkka Niemela (Helsinki University of Technology, Finland) Dino Pedreschi (University of Pisa, Italy) Pasquale Rullo (University of Calabria, Rende, Italy) Chiaki Sakama (Wakayama University, Japan) V.S. Subrahmanian (University of Maryland, USA) Francesca Toni (Imperial College, London, U.K.) Miroslaw Truszczynski (University of Kentucky at Lexington, USA) Hudson Turner (University of Minnesota at Duluth, USA) Moshe Y. Vardi (Rice University, USA) Jia-Huai You (University of Alberta, Canada)
Publicity Chair Gerald Pfeifer (Vienna University of Technology, Austria)
Additional Reviewers Roberto Barbuti Francesco Buccafurri Phan Minh Dung Sergio Greco
Stefan Brass Carlos Damasio Uwe Egly Jeff Horty
Krysia Broda Alexander Dekhtyar Wolfgang Faber Katsumi Inoue
Conference Organization
Tomi Janhunen Hirofumi Katsuno Thomas Lukasiewicz Victor Marek Iara Mora Luis Moniz Pereira Salvatore Ruggieri Dietmar Seipel Terry Swift
Chris Johnson Vladimir Lifschitz Sofian Maabout Cristinel Mateis Mirco Nanni Gerald Pfeifer Fariba Sadri Hirohisa Seki Hans Tompits Kewen Wang
VII
Antonis Kakas Jorge Lobo Giuseppe Manco Yuji Matsumoto Luigi Palopoli Inna Pivkina Francesco Scarcello Patrik Simons Ulrich Zukowski
Table of Contents Contributed Papers Fixed Parameter Complexity in AI and Nonmonotonic Reasoning . . . . . . . . . . . 1 G. Gottlob, F. Scarcello, M. Sideri Classifying Semi-Normal Default Logic on the Basis of its Expressive Power 19 T. Janhunen Locally Determined Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 D. Cenzer, J. B. Remmel, A. Vanderbilt Annotated Revision Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 V. Marek, I. Pivkina, M. Truszczy´ nski Belief, Knowledge, Revisions, and a Semantics of Non-Monotonic Reasoning 63 J. Sefranek An Argumentation Framework for Reasoning about Actions and Changes . . 78 A. Kakas, R. Miller, F. Toni Representing Transition Systems by Logic Programs . . . . . . . . . . . . . . . . . . . . . . . 92 V. Lifschitz, H. Turner Transformations of Logic Programs Related to Causality and Planning . . . . 107 E. Erdem, V. Lifschitz From Causal Theories to Logic Programs (Sometimes) . . . . . . . . . . . . . . . . . . . . 117 F. Lin, K. Wang Monotone Expansion of Updates in Logical Databases . . . . . . . . . . . . . . . . . . . . 132 M. Dekhtyar, A. Dikovsky, S. Dudakov, N. Spyratos Updating Extended Logic Programs through Abduction . . . . . . . . . . . . . . . . . . 147 C. Sakama, K. Inoue LUPS – A Language for Updating Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . 162 J. J. Alferes, L. M. Pereira, H. Przymusinska, T. Przymusinski
X
Table of Contents
Pushing Goal Derivation in DLP Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 W. Faber, N. Leone, G. Pfeifer Linear Tabulated Resolution for Well Founded Semantics . . . . . . . . . . . . . . . . . 192 Y. Shen, L. Yuan, J. You, N. Zhou A Case Study in Using Preference Logic Grammars for Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 B. Cui, T. Swift, D. S. Warren Minimal Founded Semantics for Disjunctive Logic Programming . . . . . . . . . . 221 S. Greco On the Role of Negation in Choice Logic Programs . . . . . . . . . . . . . . . . . . . . . . . 236 M. De Vos, D. Vermeir Approximating Reiter‘s Default Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 T. Linke, T. Schaub Coherent Well-founded Annotated Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . 262 C. V. Dam´asio, L. M. Pereira, T. Swift Many-Valued Disjunctive Logic Programs with Probabilistic Semantics . . . .277 T. Lukasiewicz Extending Disjunctive Logic Programming by T-norms . . . . . . . . . . . . . . . . . . . 290 C. Mateis Extending the Stable Model Semantics with More Expressive Rules . . . . . . . 305 P. Simons Stable Model Semantics for Weight Constraint Rules . . . . . . . . . . . . . . . . . . . . . . 317 I. Niemel¨a, P. Simons, T. Soininen Towards First-Order Nonmonotonic Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 R. Rosati Comparison of Sceptical NAF-Free Logic Programming Approaches . . . . . . . 347 G. Antoniou, M.J. Maher, Billington, G. Governatori
Table of Contents
XI
Characterizations of Classes of Programs by Three-Valued Operators . . . . . 357 P. Hitzler, A. K. Seda Invited Talks Using LPNMR for Problem Specification and Code Generation (Abstract) 372 M. Cadoli Answer Set Planning (Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 V. Lifschitz World-Modeling vs. World-Axiomatizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 D. McAllester Practical Nonmonotonic Reasoning: Extended Inheritance Techniques to Solve Real-World Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 L. Morgenstern Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning Georg Gottlob1 , Francesco Scarcello1 , and Martha Sideri2 1
Institut f¨ur Informationssysteme, Technische Universit¨at Wien A-1040 Wien, Paniglgasse 16, Austria {gottlob,scarcell}@dbai.tuwien.ac.at 2 Department of Computer Science, Athens University of Economics and Business, Athens, Greece [email protected]
Abstract We study the fixed-parameter complexity of various problems in AI and nonmonotonic reasoning. We show that a number of relevant parameterized problems in these areas are fixed-parameter tractable. Among these problems are constraint satisfaction problems with bounded treewidth and fixed domain, restricted satisfiability problems, propositional logic programming under the stable model semantics where the parameter is the dimension of a feedback vertex set of the program’s dependency graph, and circumscriptive inference from a positive k-CNF restricted to models of bounded size. We also show that circumscriptive inference from a general propositional theory, when the attention is restricted to models of bounded size, is fixed-parameter intractable and is actually complete for a novel fixed-parameter complexity class. Keywords: Complexity, Fixed-parameter Tractability, Nonmonotonic Reasoning, Constraint Satisfaction, Prime Implicants, Logic Programming, Stable Models, Circumscription.
1 Introduction Many hard decision or computation problems are known to become tractable if a problem parameter is fixed or bounded by a fixed value. For example, the well-known NP-hard problems of checking whether a graph has a vertex cover of size at most k, and of computing such a vertex cover if so, become tractable if the integer k is a fixed constant, rather than being part of the problem instance. Similarly, the NP complete problem of finding a clique of size k in a graph becomes tractable for every fixed k. Note, however, that there is an important difference between these problems: – The vertex cover problem is solvable in linear time for every fixed constant k. Thus the problem is not only polynomially solvable for each fixed k, but, moreover, can be solved in time bounded by a polynomial pk whose degree does not depend on k. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 1–18, 1999. c Springer-Verlag Berlin Heidelberg 1999
2
G. Gottlob, F. Scarcello, and M. Sideri
– The best known algorithms for finding a clique of size k in a graph are all exponential in k (typically, they require runtime nΩ(k/2) ). Thus, for fixed k, the problem is solvable in time bounded by a polynomial pk , whose degree depends crucially on k. Problems of the first type are called fixed-parameter (short: fp) tractable, while problems of the second type can be classified as fixed-parameter intractable [9]. It is clear that fixed-parameter tractability is a highly desirable feature. The theory of parameterized complexity, mainly developed by Downey and Fellows [9,7,6], deals with general techniques for proving that certain problems are fptractable, and with the classification of fp-intractable problems into a hierarchy of fixedparameter complexity classes. In this paper we study the fixed-parameter complexity of a number of relevant AI and NMR problems. In particular, we show that the following problems are all fixedparameter tractable (the parameters to be fixed are added in square brackets after the problem description): – Constraint Satisfiability and computation of the solution to a constraint satisfaction problem (CSP) [fixed parameters: (cardinality of) domain and treewidth of constraint scopes]. – Satisfiability of CNF [fixed parameter: treewidth of variable connection graph]. – Prime Implicants of a q-CNF [fixed parameters: maximal number q of literals per clause and size of the prime implicants to be computed]. – Propositional logic programming [fixed parameter: size of a minimal feedback vertex set of the atom dependency graph]. – Circumscriptive inference from a positive q-CNF [fixed parameters: maximal number q of literals per clause and size of the models to be considered]. We believe that these results are useful both for a better understanding of the computational nature of the above problems and for the development of smart parameterized algorithms for the solution of these and related problems. We also study the complexity of circumscriptive inference from a general propositional theory when the attention is restricted to models of size k. This problem, referred-to as small model circumscription (SMC), is easily seen to be fixed-parameter intractable, but it does not seem to be complete for any of the fp-complexity classes defined by Downey and Fellows. We introduce the new class Σ2 W [SAT ] as a miniaturized version of the class Σ2P of the polynomial hierarchy, and prove that SMC is complete for Σ2 W [SAT ]. This seems to be natural, given that the nonparameterized problem corresponding to SMC is Σ2P -complete [10]. Note, however, that completeness results for parameterized classes are more difficult to obtain. In fact, for obtaining our completeness result we had to resort to the general version of circumscription (called P;Zcircumscription) where the propositional letters of the theory to be circumscribed are partitioned into two subsets P and Z, and only the atoms in P are minimized, while those in Z can float. The restricted problem, where P consists of all atoms and Z is empty does not seem to be complete for Σ2 W [SAT ], even though its non-parameterized version is still Σ2P complete [10].
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
3
The paper is organized as follows. In Section 2 we state the relevant formal definitions related to fixed parameter complexity. In Section 3 we deal with constraint satisfaction problems. In Section 4 we study fp-tractable satisfiability problems. In Section 5 we deal with logic programming. Finally, in Section 6 we study the problem of circumscriptive inference with small models.
2
Parameterized Complexity
Parameterized complexity [9] deals with parameterized problems, i.e., problems with an associated parameter. Any instance S of a parameterized problem P can be regarded as consisting of two parts: the “regular” instance IS , which is usually the input instance of the classical – non parameterized – version of P ; and the associated parameter kS , usually of integer type. Definition 1. A parameterized problem P is fixed-parameter tractable if there is an algorithm that correctly decides, for input S, whether S is a yes instance of P in time f (kS )O(nc ), where n is the size of IS (|IS | = n), kS is the parameter, c is a constant, and f is an arbitrary function. A notion of problem reduction proper to the theory of parameterized complexity has been defined. Definition 2. A parameterized problem P fp-reduces to a parameterized problem P 0 by an fp-reduction if there exist two functions f, f 0 and a constant c such that we can associate to any instance S of P an instance S 0 of P 0 satisfying the following conditions: (i) the parameter kS 0 of S 0 is f (kS ); (ii) the regular instance IS 0 is computable from S in time f 0 (kS )|IS |c ; (iii) S is a yes instance of P if and only if S 0 is a yes instance of P 0. A parameterized class of problems C is a (possibly infinite) set of parameterized problems. A problem P is C-complete if P ∈ C and every problem P 0 ∈ C is fpreducible to P . A hierarchy of fp-intractable classes, called W -hierarchy, has been defined to properly characterize the degree of fp-intractability associated to different parameterized problems. The relationship among the classes of problems belonging to the W -hierarchy is given by the following chain of inclusions: W [1] ⊆ W [2] ⊆ . . . ⊆ W [SAT ] ⊆ W [P ] where, for each natural number t > 0, the definition of the class W [t] is based on the degree t of the complexity of a suitable family of Boolean circuits. The most prominent W [1]-complete problem is the parameterized version of clique, where the parameter is the clique size. W [1] can be characterized as the class of parameterized problems that fp-reduce to parameterized CLIQUE. Similarly, W [2] can be characterized as the class of parameterized problems that fp-reduce to parameterized Hitting Set, where the parameter is the size of the hitting set.
4
G. Gottlob, F. Scarcello, and M. Sideri
A k-truth value assignment for a formula E is a truth value assignment which assigns true to exactly k propositional variables of E. Consider the following problem Parameterized SAT: Instance: A Boolean formula E. Parameter: k. Question: Does there exist a k-truth value assignment satisfying E? W [SAT ] is the class of parameterized problems that fp-reduce to parameterized SAT. W [SAT ] is contained in W [P ], where Boolean circuits are used instead of formulae. It is not known whether any of the above inclusionships is proper or not, however the difference of all classes is conjectured. The AW -hierarchy has been defined in order to deal with some problems that do not fit the W -classes [9]. The AW -hierarchy represents in a sense the parameterized counterpart of PSPACE in the classical complexity setting. In this paper we are mainly interested in the class AW [SAT ]. Consider the following problem Parameterized QBFSAT: Instance: A quantified boolean formula Φ = Qk11 x1 Qk22 x2 · · · Qknn xn E. Parameter: k =< k1 , k2 , . . . , kn >. Question: Is Φ valid? (Here, ∃ki x denotes the choice of some ki -truth value assignment for the variables x, and ∀kj x denotes all choices of kj -truth value assignments for the variables x.) AW [SAT ] is the class of parameterized problems that fp-reduce to parameterized QBFSAT.
3
Constraint Satisfaction Problems, Bounded Treewidth, and FP-Tractability
In this section we prove that Constraint Satisfaction Problems of bounded treewidth over a fixed domain are FP tractable. In order to get this results we need a number of definitions. In Section 3.1 we give a very general definition of CSPs; in Section 3.2 we define the treewidth of CSP problems and quote some recent results; in Section 3.3 we show the main tractability result. 3.1 Definition of CSPs An instance of a constraint satisfaction problem (CSP) (also constraint network) is a triple I = (V ar, U, C), where V ar is a finite set of variables, U is a finite domain of values, and C = {C1 , C2 , . . . , Cq } is a finite set of constraints. Each constraint Ci is a pair (Si , ri ), where Si is a list of variables of length mi called the constraint scope, and ri is an mi -ary relation over U , called the constraint relation. (The tuples of ri indicate the allowed combinations of simultaneous values for the variables Si ). A solution to a CSP instance is a substitution ϑ : V ar −→ U , such that for each 1 ≤ i ≤ q, Si ϑ ∈ ri . The problem of deciding whether a CSP instance has any solution is called constraint satisfiability (CS). (This definition is taken almost verbatim from [17].)
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
5
To any CSP instance I = (V ar, U, C), we associate a hypergraph H(I) = (V, H), where V = V ar, and H = {var(S) | C = (S, r) ∈ C}, where var(S) denotes the set of variables in the scope S of the constraint C. Let H(I) = (V, H) be the constraint hypergraph of a CSP instance I. The primal graph of I is a graph G(I) = (V, E), having the same set of variables (vertices) as H(I) and an edge connecting any pair of variables X, Y ∈ V such that {X, Y } ⊆ h for some h ∈ H. 3.2 Treewidth of CSPs The treewidth of a graph is a measure of the degree of cyclicity of a graph. Definition 3 ([20]). A tree decomposition of a graph G = (V, F ) is a pair hT, λi, where T = (N, E) is a tree, and λ is a labeling function associating to each vertex p ∈ N a set of vertices λ(p) ⊆ V , such that the following conditions are satisfied: 1. for each vertex b of G, there is a p ∈ N such that b ∈ λ(p); 2. for each edge {b, d} ∈ F , there is a p ∈ N such that {b, d} ⊆ λ(p); 3. for each vertex b of G, the set {p ∈ N | b ∈ λ(p)} induces a (connected) subtree of T. The width of the tree decomposition is maxp∈N k λ(p) k −1. The treewidth of G is the minimum width over all its tree decompositions. Bodlaender [3] has shown that, for each fixed k, there is a linear time algorithm for checking whether a graph G has treewidth bounded by k and, if so, computing a tree decomposition of G having width k at most. Thus, the problem of computing a tree decomposition of a graph of width k is fp-tractable in the parameter k. The treewidth of a CSP instance I is the treewidth of its primal graph G(I). Accordingly, a tree decomposition of I is a tree decomposition of G(I). 3.3
FP-Tractable CSPs
Constraint Satisfaction is easily seen to be NP-complete. Moreover, the parameterized version, where the parameter is the total size of all constraint scopes, is W [1]-complete, and thus not fp-tractable. This follows from well-known results on conjunctive query evaluation [8,19], which is equivalent to constraint satisfaction (cf. [2,18,15]). Therefore, also bounded treewidth CSP is fp-intractable and W [1]-hard. Indeed, the CSPs having total size of the constraint scopes ≤ k form a subclass of the CSPs having treewidth ≤ k. Note that, for each fixed k, CSPs of width ≤ k can be evaluated in time O(nk log n) [16]. In this section we show that, however, if as an additional parameter we fix the size of the domain U , then bounded treewidth CSP is fixed parameter tractable. It is worthwhile noting that the general CSP problem remains NP-complete even for constant domain U . (See, e.g., the 3-SAT problem discussed below.)
6
G. Gottlob, F. Scarcello, and M. Sideri
Theorem 1. Constraint Satisfaction with parameters treewidth k and universe size u = |U | is fp-tractable. So is the problem of computing a solution of a CSP problem with parameters k and u. Proof. (Sketch.) Let I = (V ar, U, C) be a CSP instance having treewidth k and |U | = u. We exhibit an fp-transformation from I to an equivalent instance I 0 = (V ar, U, C 0 ). We assume w.l.o.g. that no constraint scope S in I contains multiple occurrences of variables. (In fact, such occurrences can be easily removed by a simple preprocessing of the input instance.) Note that, from the bound k on the treewidth, it follows that each constraint scope contains at most k variables, and thus the constraint relations have arity at most k. Let hT = (V, E), λi be a k-width tree decomposition of G(I) such that |V | ≤ c|G(I)|, for a fixed predetermined constant c. (This is always possible because Bodlaender’s algorithm runs in linear time.) For each vertex p ∈ V , I 0 has a constraint Cp = (S, r) ∈ C 0 , where the scope S is a list containing the variables belonging to λ(p), and r is the associated relation, computed as described below. The relations associated to the constraints of I 0 are computed through the following two steps: 0
1. For each constraint C 0 = (S 0 , r0 ) ∈ C 0 , initialize r0 as U |var(S )| , i.e., the |var(S 0 )|fold cartesian product of the domain U with itself. 2. For each constraint C = (S, r) ∈ C, let C 0 = (S 0 , r0 ) ∈ C 0 be any constraint of I 0 such that var(S) ⊆ var(S 0 ). Such a constraint must exist by definition of tree decomposition of the primal graph G(I). Modify r0 as follows. r0 :={t0 ∈ r0 | ∃ a substitution ϑ s.t. S 0 ϑ = t0 and Sϑ ∈ r}. (In database terms, r0 is semijoinreduced by r.) It is not hard to see that the instance I 0 is equivalent to I, in that they have exactly the same set of solutions. Note that the size of I 0 is ≤ |U |k (c|G(I)|), and even computing I 0 from I is feasible in linear time. Thus the reduction is actually an fp-reduction. The resulting instance I 0 is an acyclic constraint satisfaction problem which is equivalent to an acyclic conjunctive query over a fixed database [15]. Checking whether such a query has a nonempty result and, in the positive case, computing a single tuple of the result, is feasible in linear time by Yannakakis’ well-known algorithm [24]. t u Note that, since CSP is equivalent to conjunctive query evaluation, the above result immediately gives us a corollary on the program complexity of conjunctive queries, i.e. the complexity of evaluating conjunctive queries over a fixed database [23]. The following result complements some recent results on fixed-parameter tractability of database problems by Papadimitriou and Yannakakis [19]. Corollary 1. The evaluation of Boolean conjunctive queries is fp-tractable if the parameters are the treewidth of the query and the size of the database universe. Moreover, evaluating a nonboolean conjunctive query is fp-tractable in the input and output size w.r.t. the treewidth of the query and the size of the database universe.
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
4 4.1
7
FP-Tractable Satisfiability Problems Bounded-width CNF Formulae
As an application of our general result on FP tractable CSPs we show that a relevant satisfiability problem is also FP tractable. The graph G(F ) of a CNF formula F has as vertices the set of propositional variables occurring in F and has an edge {x, y} iff the propositional variables x and y occur together in a clause of F . The treewidth of F is defined to be the treewidth of the associated graph G(F ). Theorem 2. CNF Satisfiability with parameter treewidth k is fp-tractable. So is the problem of computing a model of a CNF formula with parameter k. Proof. (Sketch.) We fp-transform a CNF formula F into a constraint satisfaction instance I(F ) = (V ar, U, C) defined as follows. V ar contains a variable Xp for each propositional variable p occurring in F ; U = {0, 1}; and for each clause D of F , I(F ) contains a constraint (S, r) where the constraint scope S is the list containing all variables Xp such that p is a propositional variable occurring in D, and the constraint relation r ⊆ U |D| consists of all tuples corresponding to truth value assignments satisfying D. It is obvious that every model of F correspond to a solution of I(F ) and vice versa. Thus, in particular, F is satisfiable if and only if I(F ) is a positive CSP instance. Since G(F ) is isomorphic to G(I(F )), both F and I(F ) have the same treewidth. Moreover, any CNF formula F of treewidth k has clauses of cardinality at most k. Therefore, our reduction is feasible in time O(2k |F |) and is thus an fp-reduction w.r.t. parameter k. By this fp-reduction, fp-tractability of CNF-SAT with the treewidth parameter follows from the fp-tractability of CSPs w.r.t. treewidth, as stated in Theorem 1. t u 4.2
CNF with Short Prime Implicants
The problem of finding the prime implicants of a CNF formula is relevant to a large number of different areas, e.g., in diagnosis, knowledge compilation, and many other AI applications. Clearly, the set of the prime implicants of a CNF formula F can be viewed as a compact representation of the satisfying truth assignments for F . It is worthwhile noting that the restriction of Parameterized SAT to CNF formulae is fp-intractable. More precisely, deciding whether a q-CNF formula F has a k-truth value assignment is W [2]-complete [9]. (We recall that a k-truth value assignment assigns true to exactly k propositional variables.) Nevertheless, we identified a very natural parameterized version of satisfiability which is fp-tractable. We simply take as the parameter the length of the prime implicants of the Boolean formula. Given a q-CNF formula F , the Short Prime Implicants problem (SPI) is the problem of computing the (consistent) prime implicants of F having length ≤ k, with parameters k and q.
8
G. Gottlob, F. Scarcello, and M. Sideri
Theorem 3. SPI is fixed-parameter tractable. Proof. (Sketch.) Let F be a q-CNF formula. W.l.o.g., assume that F does not contain tautological clauses. We generate a set IMk (F ) of implicants of F from which it is possible to compute the set of all prime implicants of F having length ≤ k. (this is very similar to the well-known procedure of generating vertex covers of bounded size, cf. [5,9]). Pick an arbitrary clause C of F . Clearly, each implicant I of F must contain at least one literal of C. We construct an edge-labeled tree t whose vertices are clauses in F as follows. The root of t is C. Each nonleaf vertex D has an edge labeled ` to a descendant, for each literal ` ∈ D. As child attach to this edge any clause E of F which does not intersect the set of all edge-labels from the root to the current position. A branch is closed if such a set does not exist or the length of the path is k. For each root-leaf branch β of the tree, let I(β) be the set containing the ≤ k literals labeling the edges of β. Check whether I(β) is a consistent implicant of F and add I(β) to the set IMk (F ) if so. It is easy to see that the size of the tree t is bounded by q k and that for every prime implicant S of F having length ≤ k, S ⊆ I holds, for some implicant I ∈ IMk (F ). Moreover, note that there are at most q k implicants in IMk (F ). For each implicant I ∈ IMk (F ), the set of all consistent prime implicants of F included in I can be easily obtained in time O(2k |F |) from I. It follows that SPI is fp-tractable w.r.t. parameters q and k. t u
5
Logic Programs with Negation
Logic programming with negation under the stable model semantics [14] is a well-studied form of nonmonotonic reasoning. A literal L is either an atom A (called positive) or a negated atom ¬A (called negative). Literals A and ¬A are complementary; for any literal L, we denote by ¬.L its complementary literal, and for any set Lit of literals, ¬.Lit = {¬.L | L ∈ Lit}. A normal clause is a rule of the form A ← L1 , . . . , Lm
(m ≥ 0)
(1)
where A is an atom and each Li is a literal. A normal logic program is a finite set of normal clauses. A normal logic program P is stratified [1], if there is an assignment str(·) of integers 0,1,. . . to the predicates p in P , such that for each clause r in P the following holds: If p is the predicate in the head of r and q the predicate in an Li from the body, then str(p) ≥ str(q) if Li is positive, and str(p) > str(q) if Li is negative. The reduct of a normal logic program P by a Herbrand interpretation I [14], denoted P I , is obtained from P as follows: first remove every clause r with a negative literal L in the body such that ¬.L ∈ I, and then remove all negative literals from the remaining rules. An interpretation I of a normal logic program P is a stable model of P [14], if I is the least Herbrand model of P I .
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
9
In general, a normal logic program P may have zero, one, or multiple (even exponentially many) stable models. Denote by stabmods(P ) the set of stable models of P. It is well-known that every stratified logic program has a unique stable model which can be computed in linear time. The following problems are the main decision and search problems in the context of logic programming. Main logic programming problems. Let P be a logic program. 1. Consistency: Determine whether P admits a stable model. 2. Brave Reasoning: Check whether a given literal is true in a stable model of P . 3. Cautious Reasoning: Check whether a literal is true in every stable model of P . 4. SM Computation: Compute an arbitrary stable model of P . 5. SM Enumeration: Compute the set of all stable models of P . For a normal logic program P , the dependency graph G(P ) is a labeled directed graph (V, A) where V is the set of atoms occurring in P and A is a set of edges such that (p, q) ∈ A is there exists a rule r ∈ P having p in its head and q in its body. Moreover, if q appears negatively in the body, then the edge (p, q) is labeled with the symbol ¬. The undirected dependency graph G∗ (P ) of P is the undirected version of G(P ). A feedback vertex set S of an undirected (directed) graph G is a subset X of the vertices of G such that any cycle (directed cycle) contains at least one vertex in S. Clearly, if a feedback vertex set is removed from G, then the resulting graph is acyclic. The feedback width of G is the minimum size over its feedback vertex sets. It was shown by Downey and Fellows [9,5] that determining whether an undirected graph has feedback width k and, in the positive case, finding a feedback vertex set of size k, is fp-tractable w.r.t. the parameter k. Let P be a logic program defined over a set U of propositional atoms. A partial truth value assignment (p.t.a.) for P is a truth value assignment to a subset U 0 of U . If τ is a p.t.a. for P , denote by P [τ ] the program obtained from P as follows: – eliminate all rules whose body contains a literal contradicting τ ; – eliminate from every rule body all literals whose literals are made true by τ . The following lemma is easy to verify. Lemma 1. Let M be a stable model of some logic program P , and let τ be a p.t.a. consistent with M . Then M is a stable model of P [τ ]. Theorem 4. The logic programming problems (1–5) listed above are all fp-tractable w.r.t. the feedback width of the dependency graph of the logic program. Proof. (Sketch.) Given a logic program P whose graph G∗ (P ) has feedback width k, compute in linear time (see [9]) a feedback vertex set S for G∗ (P ) s.t. |S| = k. Consider the set T of all the 2k partial truth value assignments to the atoms in S.
10
G. Gottlob, F. Scarcello, and M. Sideri
For each p.t.a. τ ∈ T , P [τ ] is a stratified program whose unique stable model Mτ can be computed in linear time. For each τ ∈ T compute Mτ and check whether Mτ ∈ stabmods(P ), where stabmods(P ) denotes the set of all stable models of P (this latter can be done in linear time, too, if suitable data structures are used). Let Σ = {Mτ | Mτ ∈ stabmods(P )}. By definition of Σ, it suffices to note that every stable model M for P belongs to Σ. Indeed, let τ be the p.t.a. on S determined by M . By Lemma 1, it follows that M is a stable model of P [τ ] and hence M ∈ Σ. Thus, P has at most 2k stable models whose computation is fp-tractable and actually feasible in linear time. Therefore, the problem 5 above (Stable Model Enumeration) is fp-tractable. The fp-tractability of all other problems follows. t u It appears that an overwhelmingly large number of “natural” logic programs have very low feedback width, thus the technique presented here seems to be very useful in practice. Note, however, that the technique does not apply to some important and rather obvious cases. In fact, the method does not take care of the direction and the labeling of the arcs in the dependency graph G(P ). Hence, positive programs width large feedback width are not recognized to be tractable, although they are trivially tractable. The same applies, for instance, for stratified programs having large feedback width, or to programs whose high feedback-with is exclusively due to positive cycles. Unfortunately, it is not known whether computing feedback vertex sets of size k is fixed-parameter tractable for directed graphs [9]. Another observation leading to a possible improvement is the following. Call an atom p of a logic program P malignant if it lies on at least one simple cycle of G(P ) containing a marked (=negated) edge. Call an atom benign if it is not malignant. It is easy to see that only malignant atoms can be responsible for a large number of stable models. In particular, every stratified program contains only benign atoms and has exactly one stable model. This suggest the following improved procedure: – – – –
Identify the set of benign atoms occurring in P ; Drop these benign vertices from G∗ (P ), yielding H(P ); Compute a feedback vertex set S of size ≤ k of H(P ); For each p.t.a. τ over S compute the unique stable model Mτ of P [τ ] and check whether this is actually a stable model of P , and if so, output Mτ .
It is easy to see that the above procedure correctly computes the stable models of P . Unfortunately, as shown by the next theorem, it is unlikely that this procedure can run in polynomial time. Theorem 5. Determining whether an atom of a propositional logic program is benign is NP-complete. Proof. (Sketch.) This follows by a rather simple reduction from the NP-complete problem of deciding whether for two pairs of vertices hx1 , y1 i and hx2 , y2 i of a directed graph G, there are two vertex-disjoint paths linking x1 to x2 and y1 to y2 [12]. A detailed explanation will be given in the full paper. t u
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
11
We thus propose a related improvement, which is somewhat weaker, but tractable. A atom p of a logic program P is called weakly malignant if it lies on at least one simple cycle of G∗ (P ) containing a marked (=negated) edge. An atom is called strongly benign if it is not weakly-malignant. Lemma 2. Determining whether an atom of a propositional logic program is strongly benign or weakly malignant can be done in polynomial time. Proof. (Sketch.) It is sufficient to show that determining whether a vertex p of an undirected graph G with Boolean edge labels lies on a simple cycle containing a marked edge. This can be solved by checking for each marked edge hy1 , y2 i of G and for each pair of neighbours x1 , x2 of x whether the graph G − {x} contains two vertex-disjoint paths linking x1 to y1 and x2 to y2 , respectively. The latter is in polynomial time by a result of Robertson and Seymour [21]. t u We next present an improved algorithm for enumerating the stable models of a logic program P based on the feedback width of a suitable undirected graph associated to P . Modular Stable Model Enumeration procedure (MSME). 1. Compute the set C of the strongly connected components (s.c.c.) of G(P ); 2. For each s.c.c. C ∈ C, let PC be the set of rules of P that “define” atoms belonging to C, i.e., PC contains any rule r ∈ P whose head belongs to C; 3. Determine the set UC ⊆ C of the strongly connected components (s.c.c.) of G(P ) whose corresponding program PC is not stratified; 4. For each s.c.c. C ∈ UC compute the set of strongly benign atoms SB(C) occurring in PC ; S 5. Let P 0 = C∈U C PC ; 6. Let H(P 0 ) be the the subgraph of G∗ (P 0 ) obtained by dropping every vertex p occurring in some set of strongly benign atoms SB(C) for some C ∈ UC; 7. Compute a feedback vertex set S of size ≤ k of H(P 0 ); 8. For each p.t.a. τ over S compute the unique stable model Mτ of P [τ ] and check whether this is actually a stable model of P , and if so, output Mτ . The feedback width of the graph H(P 0 ) is called the weak feedback-width of the dependency graph of P . The following theorem follows from the fp-tractability of computing feedback vertex sets of size k for undirected graph and from well-known modular computation methods for stable model semantics [11]. Theorem 6. The logic programming problems (1–5) listed above are all fp-tractable w.r.t. the weak feedback-width of the dependency graph of the logic program. Note that the methods used in this section can be adapted to show fixed-parameter tractability results for extended versions of logic programming, such as disjunctive logic programming, and for other types of nonmonotonic reasoning. In the case of disjunctive logic programming, it is sufficient to extend the dependency graph to contain a labeled directed edge between every pair of atoms occurring together in a rule head.
12
G. Gottlob, F. Scarcello, and M. Sideri
A different perspective to the computation of stable models has been recently considered in [22], where the size of stable models is taken as the fixed parameter. It turns out that computing small stable models is fixed-parameter intractable, whereas computing large stable models is fixed-parameter tractable if the parameter is the number of rules in the program.
6 The Small Model Circumscription Problem In this section we study the fixed-parameter complexity of a tractable parametric variant of circumscription, where the attention is restricted to models of small cardinality.
6.1
Definition of Small Model Circumscription
The Small Model Circumscription Problem (SMC) is defined as follows. Given a propositional theory T , over a set of atoms A = P ∪ Z, and given a propositional formula ϕ over vocabulary A, decide whether ϕ is satisfied in a model M of T such that: – M is of small size, i.e., at most k propositional atoms are true in M (written |M | ≤ k); and – M is P ; Z-minimal w.r.t. all other small models1 , i.e., for each model M 0 of T such that |M 0 | ≤ k, M ∩ P ⊆ M 0 ∩ P holds. This problem appears to be a miniaturization of the classical problem of (brave) reasoning with minimal models. We believe that SMC is useful, since in many contexts, one has large theories, but is mainly interested in small models (e.g. in abductive diagnosis). Clearly, for each fixed k, SMC is tractable. In fact it sufficed to enumerate |A|k candidate interpretations in an outer loop and for each such interpretation M check whether M |= T , M |= ϕ, and M is P ; Z-minimal. The latter can be done by an inner loop enumerating all small interpretations and performing some easy checking tasks. It is also not hard to see that SMC is fp-intractable. In fact the Hitting Set problem, which was shown to be W [2]-complete [9], can be fp-reduced to SMC and can be actually regarded as the restricted version of SMC where P = A, Z = ∅, and T consists of a CNF having only positive literals. In Section 6.2 we present the fp-tractable subclass of this version of SMC, where the maximum clause length in the theory is taken as an additional parameter. However, in Section 6.3 we show that, as soon as the set Z of floating variables is not empty, this problem becomes fp-intractable. Since brave reasoning under minimal models was shown to be Σ2P complete in [10], and is thus one level above the complexity of classical reasoning, it would be interesting to determine the precise fixed-parameter complexity of the general version of SMC w.r.t. parameter k. This problem too is tackled in Section 6.3. 1
In this paper, whenever we speak about P ; Z-minimality, we mean minimality as defined here.
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
13
6.2 A Tractable Restriction of SMC We restrict SMC by requiring that the theory T be a q-CNF with no negative literal occuring in it, and by minimizing over all atoms occurring in the theory. The problem Restricted Small Model Circumscription (RSMC) is thus defined as SMC except that T is required to be a purely positive q-CNF formula, the “floating” set Z is empty, and the parameters are the maximum size k of the models to be considered, and the maximum size q of the number of literals in the largest conjunct (=clause) of T . Theorem 7. RSMC is fixed-parameter tractable. Proof. (Sketch.) Since T is positive and Z = ∅, the set of minimal models of T to be considered are exactly the prime implicants of T having size ≤ k. By Theorem 3, computing these prime implicants for a q-CNF theory is fp-tractable w.r.t. parameters k and q. Thus, the theorem easily follows. t u 6.3 The Fixed-Parameter Complexity of SMC We first show that the slight modification of the fp-tractable problem RSMC where Z 6= ∅ is fp-intractable and in fact W [SAT ] hard. The problem Positive Small Model Circumscription (PSMC) is defined as SMC except that T is required to be a purely positive q-CNF formula, and the parameters are the maximum size k of the models to be considered, and the maximum clause length q. Let us define the Boolean formula countk (x), where x = (x1 , . . . , xn ) is a list of variables: ^ _ ^ ^ ( (qtj−1 ∧ ¬xs ) ∧ xi ) ≡ qij A= 1≤i≤n 1≤j≤min{i,k+1}
B=
^
j−1≤t≤i−1
t+1≤s≤i−1
¬qrk+1
C=
k+1≤r≤n
_
qrk
k≤r≤n
countk (x) = A ∧ B ∧ C Intuitively, in any satisfying truth value assignment for countk (x), the propositional variable qij gets the value true iff xi is the j th true variable among x1 , . . . , xi . Note that the size of countk (x) is O(kn2 ). The variables x1 , . . . , xn in the formula above are called the external variables of the formula, while all the other variables occurring in the formula are called private variables. Whenever a theory T contains a count subformula, we assume w.l.o.g. that the private variables of this subformula do not occur in T outside the subformula. In particular, if T contains two count subformulas, then their set of private variables are disjoint.
14
G. Gottlob, F. Scarcello, and M. Sideri
Lemma 3. Let F be a formula and x a list of variables occurring in F . Then – F ∧ countk (x) is satisfiable if and only if there exists a truth value assignment σ for F assigning true to exactly k variables from x. – Every k-truth value assignment σ satisfying F can be extended in a unique way to an assignment σ 0 satisfying F ∧ countk (x). – Every satisfying truth value assignment for F ∧ countk (x) assigns true to exactly k private variables of countk (x) and true to exactly k variables from x. Theorem 8. PSMC is W [SAT ]-hard. The problem remains hard even for 2-CNF theories. Proof. (Sketch.) Let Φ be a Boolean formula over propositional variables {x1 , . . . , xn }. We fp-reduce the W [SAT ]-complete problem of deciding whether there exists a k-truth value assignment satisfying Φ to an instance of PSMC where the maximum model size is 2k + 1, and the maximum clause length is 2. Let Φ0 = Φ ∧ countk (x1 , . . . , xn ), and let y1 , . . . , ym be the private variables of the countk subformula. Moreover, let T be the following 2-CNF positive theory: (p ∨ x1 ) ∧ · · · ∧ (p ∨ xn ) ∧ (p ∨ y1 ) ∧ · · · ∧ (p ∨ ym ). We take P = {p} and Z = {x1 , . . . , xn , y1 , . . . , ym }. Note that a set M is a P ; Z minimal model of T having size ≤ 2k + 1 if and only if M = {p} ∪ S, where S is any subset of Z such that |M | ≤ 2k. From Lemma 3, every satisfying truth value assignment for Φ0 must make true exactly k variables from {x1 , . . . , xn }, and k variables from the set of private variables of countk . It follows that there exists a P ; Z minimal model M of T such that |M | ≤ 2k + 1 and t u M satisfies Φ0 if and only if there exists a k-truth value assignment satisfying Φ. Let us now focus on the general SMC problem, where both arbitrary theories are considered and floating variables are permitted. It does not appear that SMC is contained in W [SAT ]. On the other hand, it can be seen that SMC is contained in AW [SAT ], but it does not seem to be hard (and thus complete) for this class. In fact, AW [SAT ] is the miniaturization of PSPACE and not of Σ2P . No class corresponding to the levels of the polynomial hierarchy have been defined so far in the theory of the fixed-parameter intractability. Nonmonotonic reasoning problems, such as SMC, seem to require the definitions of such classes. We next define the exact correspondent of Σ2P at the fixedparameter level. Definition of the class Σ2 W [SAT ]. Σ2 W [SAT ] is defined similarly to AW [SAT ], but the quantifier prefix is restricted to Σ2 . Parameterized QBF2 SAT. Instance: A quantified boolean formula ∃k1 x∀k2 yE. Parameter: k =< k1 , k2 >.
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
15
Question: Is ∃k1 x∀k2 yE valid? (Here, ∃k1 x denotes the choice of some k1 -truth value assignment for the variables x, and ∀k2 y denotes all choices of k2 -truth value assignments for the variables y.) Definition 4. Σ2 W [SAT ] is the set of all problems that fp-reduce to Parameterized QBF2 SAT. Membership of SMC in Σ2 W [SAT ]. Let the problem Parameterized QBF2 SAT≤ be the variant of the problem Parameterized QBF2 SAT where the quantifiers ∃k1 x and ∀k2 y are replaced by quantifiers ∃≤k1 x and ∀≤k2 y with the following meaning. ∃≤k1 x α means that there exists a truth value assignment making at most k1 propositional variables from x true such that α is valid. Simmetrically, ∀≤k2 y α means that α is valid for every truth value assignment making at most k2 propositional variables from y true. Lemma 4. Parameterized QBF2 SAT≤ is in Σ2 W [SAT ]. Proof. (Sketch.) It suffices to show that Parameterized QBF2 SAT≤ is fp-reducible to Parameterized QBF2 SAT. Let Φ = ∃≤k1 x1 x2 . . . xn ∀≤k2 y1 y2 . . . ym E(x1 , . . . , xn , y1 , . . . , ym ) be an instance of Parameterized QBF2 SAT≤ . It is easy to see that the following instance Φ0 of Parameterized QBF2 SAT is equivalent to Φ. 0 ∃2k1 x1 x2 . . . xn x01 x02 . . . x0n ∀2k2 y1 y2 . . . ym y10 y20 . . . ym 0 E(x1 ∧ x01 , . . . , xn ∧ x0n , y1 ∧ y10 , . . . , ym ∧ ym ),
0 are new variables and E(x1 ∧x01 , . . . , xn ∧x0n , y1 ∧ where x01 , x02 , . . . , x0n , y10 , y20 , . . . , ym 0 0 y1 , . . . , ym ∧ ym ) is obtained from E by substituting xi ∧ x0i for xi (1 ≤ i ≤ n) and t u yj ∧ yj0 for yj (1 ≤ j ≤ m).
Theorem 9. SMC is in Σ2 W [SAT ]. Proof. (Sketch.) By Lemma 4 it is sufficient to show that every SMC instance S can be fp-reduced to an equivalent instance Φ(S) of Parameterized QBF2 SAT≤ . Let S = (A = P ∪ Z, T (P, Z), ϕ, k) be an SMC instance, where P = {p1 , . . . , pn } and Z = 0 } be two sets of fresh {z1 , . . . , zm }. Let P 0 = {p01 , . . . , p0n } and Z 0 = {z10 , . . . , zm variables. Φ(S) is defined as follows: 0 ∃≤k p1 . . . pn z1 , . . . zm ∀≤k p01 . . . p0n z10 , . . . zm T (P, Z) ∧ ϕ ∧V W T (P 0 , Z 0 ) ⇒ ( 1≤i≤n pi ≡ p0i )( 1≤i≤n p0i ∧ ¬pi ),
where T (P 0 , Z 0 ) is obtained from T (P, Z) by substituting p0i for pi (1 ≤ i ≤ n) and zj0 for zj (1 ≤ j ≤ m).
16
G. Gottlob, F. Scarcello, and M. Sideri
The first part of Φ(S) guesses a model M of T with at most k atoms among P ∪ Z which satisfies ϕ. The second part makes sure that the M is P ; Z minimal by checking that each model M 0 of T is either equivalent to M over the P variables, or has at least one P variable true whereas the same variable is false in M . Hence T bravely entails ϕ under small models P ; Z circumscription if and only if Φ(S) is valid. t u Σ2 W [SAT ]-hardness of SMC. Theorem 10. SMC is Σ2 W [SAT ]-hard, and thus Σ2 W [SAT ]-complete. Proof. (Sketch.) We show that Parameterized QBF2 SAT is fp-reducible to SMC. Let Φ be the following instance of Parameterized QBF2 SAT. ∃k1 x1 x2 . . . xn ∀k2 y1 y2 . . . ym E(x1 , . . . , xn , y1 , . . . , ym ). We define a corresponding instance of SMC S(Φ) = (A = P ∪ Z, T, ϕ = w, k = 2k1 + 2k2 + 1), where w is a fresh variable, T = (E(x, y) ⇒ w) ∧ countk1 (x) ∧ countk2 (y), P = x ∪ {w}, and Z consists of all the other variables occurring in T , namely, the variables in y and the private variables of the two count subformulae. We prove that Φ is valid if and only if S(Φ) is a yes instance of SMC. (Only if part.) Assume Φ is valid. Then, there exists a k1 -truth value assignment σ to the variables x such that for every k2 -truth value assignment to the variables y, the formula E is satisfied. Let M be an interpretation for T constructed as follows. M contains the k1 variables from x which are made true by σ and the first k2 variables of y; in addition, M contains w and k1 + k2 private variables which make true the two count subformulae. This is possible by Lemma 3. It is easy to see that M is a model for T . We now show that M is a P ; Z minimal model of T . Assume that M 0 is a P ; Z smaller model. Due to the countk1 (x) subformula, M 0 must contain exactly k1 atoms from x and therefore M and M 0 coincide w.r.t. the x atoms. It follows that w 6∈ M 0 . However, by validity of Φ and the construction of M , M 0 |= E holds, and therefore M 0 |= w, as well. Contradiction. (If part.) Assume there exists a P ; Z minimal model M of T such that M entails w and |M | ≤ k. Note that, by Lemma 3, it must hold that M contains exactly k1 true variables from x and exactly k2 true variables from y. Towards a contradiction, assume that Φ is not valid. Then it must hold that for every k1 -truth value assignment σ to the variables x, there exists a k2 -truth value assignment σ 0 to the variables y, such that σ ∪ σ 0 falsifies E. In particular, for the k1 variables from x which are true according to M , it is possible to make true exactly k2 variables from y such that the formula E is not satisfied. Consider now the interpretation M 0 containing these k1 + k2 true variables plus the k1 + k2 made true by the two count subformulae. M 0 is a model of T whose P variables coincide with those of M except for w which belongs to M , but not to M 0 . Therefore, M is not P ; Z minimal, a contradiction. Finally, note that the transformation from Φ to S(Φ) is an fp-reduction. Indeed it is feasible in polynomial time and is just linear in k. t u
Fixed-Parameter Complexity in AI and Nonmonotonic Reasoning
17
Corollary 2. Parameterized QBF2 SAT≤ is Σ2 W [SAT ]-complete. Proof. (Sketch.) Completeness follows from the fact that, as shown in Lemma 4, this problem belongs to Σ2 W [SAT ], and by Theorem 10, which shows that the Σ2 W [SAT ]t u hard problem SMC is fp-reducible to Parameterized QBF2 SAT≤ . Downey and Fellows [9] pointed out that completeness proofs for fixed parameter intractability classes are generally more involved than classical intractability proofs. Note that this is also the case for the above proof, where we had to deal with subtle counting issues. A straightforward downscaling of the standard Σ2P completeness proof for propositional circumscription appears not to be possible. In particular, observe that we have obtained our completeness result for a very general version of propositional minimal model reasoning, where there are variables to be minimized (P ) and floating variables (Z). It is well-known that minimal model reasoning remains Σ2P complete even if all variables of a formula are minimized (i.e., if Z is empty). This result does not seem to carry over to the setting of fixed parameter intractability. Clearly, this problem, being a restricted version of SMC, is in Σ2 W [SAT ]. Moreover it is easy to see that the problem is hard for W [2] and thus fixed parameter intractable. However, we were not able to show that the problem is complete for any class in the range from W [2] to Σ2 W [SAT ], and leave this issue as an open problem. Open Problem. Determine the fixed-parameter complexity of SMC when all variables of the theory T are to be minimized.
References 1. K. Apt, H. Blair, and A. Walker. Towards a Theory of Declarative Knowledge. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pp. 89–148. Morgan Kaufman, Washington DC, 1988. 2. W. Bibel. Constraint Satisfaction from a Deductive Viewpoint. Artificial Intelligence, 35,401– 413, 1988. 3. H. L. Bodlaender. A linear-time algorithm for finding tree-decompositions of small treewidth. SIAM Journal on Computing, 25(6):1305-1317, 1996. 4. A.K. Chandra and P.M. Merlin. Optimal Implementation of Conjunctive Queries in relational Databases. In ACM Symp. on Theory of Computing (STOC’77), pp.77–90, 1977. 5. R.G. Downey and M.R. Fellows. Fixed Parameter Tractability and Completeness. Congressus Numerantium, 87:161–187, 1992. 6. R.G. Downey and M.R. Fellows. Fixed Parameter Intractability (Extended Abstract). In Proc. of Structure in Complexity Theory, IEEE, pp.36–50, 1992. 7. R.G. Downey and M.R. Fellows. Fixed Parameter Tractability and Completeness I: Basic Results. SIAM J. Comput., 24:873–921, 1995. 8. R.G. Downey and M.R. Fellows. On the Parametric Complexity of Relational Database Queries and a Sharper Characterization of W [1]. In Combinatorics Complexity and Logics, Proceedings of DMTCS’96, pp.164–213, Springer, 1996. 9. R.G. Downey and M.R. Fellows. Parameterized Complexity. Springer, New York, 1999. 10. T. Eiter and G. Gottlob. Propositional Circumscription and Extended Closed World Reasoning are ΠP 2 -complete. Theoretical Computer Science, 114(2):231–245, 1993. Addendum 118:315.
18
G. Gottlob, F. Scarcello, and M. Sideri
11. T. Eiter, G. Gottlob, and H. Mannila. Disjunctive Datalog. ACM Trans. on Database Syst., 22(3):364–418, September 1997. 12. S. Fortune, J.E. Hopcroft, and J. Wyllie. The Directed Subgraph Homeomorphism Problem. Theoretical Computer Science, 10(2): 111-121, 1980. 13. M.R. Garey and D.S. Johnson. Computers and Intractability. A Guide to the Theory of NPcompleteness. Freeman and Comp., NY, USA, 1979. 14. M. Gelfond and V. Lifschitz. The Stable Model Semantics for Logic Programming. In Logic Programming: Proc. Fifth Intl Conference and Symposium, pp. 1070–1080, Cambridge, Mass., 1988. MIT Press. 15. G. Gottlob, N. Leone, and F. Scarcello. The Complexity of Acyclic Conjunctive Queries. Technical Report DBAI-TR-98/17, available on the web as: http://www.dbai.tuwien.ac.at/staff/gottlob/acyclic.ps, or by email from the authors. An extended abstract concerning part of this work has been published in Proc. of the IEEE Symposium on Foundations of Computer Science (FOCS’98), pp.706–715, Palo Alto, CA, 1998. 16. G. Gottlob, N. Leone, and F. Scarcello.A Comparison of Structural CSP Decomposition Methods. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 394–399, Stockholm, 1999. 17. P. Jeavons, D. Cohen, and M. Gyssens. Closure Properties of Constraints. JACM, 44(4), 1997. 18. Ph. G. Kolaitis and M. Y. Vardi. Conjunctive-Query Containment and Constraint Satisfaction. Proc. of Symp. on Principles of Database Systems (PODS’98), 1998. 19. C.H. Papadimitriou and M. Yannakakis. On the Complexity of Database Queries. In Proc. of Symp. on Principles of Database Systems (PODS’97), pp.12–19, Tucson, Arizona, 1997. 20. N. Robertson and P.D. Seymour. Graph Minors II. Algorithmic Aspects of Tree-Width. J. Algorithms, 7:309-322, 1986. 21. N. Robertson and P.D. Seymour. Graph Minors XX. Wagner’s Conjecture. To appear. 22. M. Truszczy´nski. Computing Large and Small Stable Models. In Proc. of the 16th International Conference on Logic Programming (ICLP’99), Las Cruces, New Mexico. To appear. 23. M. Vardi. Complexity of Relational Query Languages. In Proceedings 14th STOC, pages 137–146, San Francisco, 1982. 24. M. Yannakakis. Algorithms for Acyclic Database Schemes. In Proc. of Int. Conf. on Very Large Data Bases (VLDB’81), pp. 82–94, C. Zaniolo and C. Delobel Eds., Cannes, France, 1981.
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power Tomi Janhunen? Helsinki University of Technology Laboratory for Theoretical Computer Science P.O.Box 5400, FIN-02015 HUT, Finland [email protected]
Abstract. This paper reports on systematic research which aims to classify non-monotonic logics by their expressive power. The classification is based on translation functions that satisfy three important criteria: polynomiality, faithfulness and modularity (PFM for short). The basic method for classification is to prove that PFM translation functions exist (or do not exist) between certain logics. As a result, non-monotonic logics can be arranged to form a hierarchy. This paper gives an overview of the current expressive power hierarchy (EPH) and investigates semi-normal default logic as well as prerequisite-free and semi-normal default logic in order to locate their exact positions in the hierarchy.
1
Introduction
Non-monotonic reasoning (NMR) has a rich variety of formalizations such as McCarthy’s circumscription [18], Moore’s autoepistemic logic [19] and Reiter’s default logic [21]. These non-monotonic logics were proposed about twenty years ago and since then their interrelations have been extensively studied [3,5,6,7,9,11] [12,22]. This line of research has concentrated on finding translation functions that transform a theory of one non-monotonic logic into a theory of other such that the sets of conclusions associated with the former theory are preserved (to a reasonable degree) in this transformation. A number of variants of nonmonotonic logics has also been proposed. Let us just mention some of these approaches: parallel circumscription by Lifschitz [13], strong autoepistemic logic by Marek and Truszczy´ nski [16] and syntactically restricted forms of default logic such as normal default logic and prerequisite-free default logic (see e.g. [4]). Naturally, the interconnections of these variants to their predecessors have also been analyzed (see e.g. [1,5,6,8,15,16,20,23]). The translation functions proposed in the literature provide means to measure the expressive power of non-monotonic logics involved: a non-monotonic logic can capture expressions of another non-monotonic logic via a translation function. The tightness of this relationship depends on the requirements imposed on translation functions. Our recent experiences indicate that these requirements ?
The support from Academy of Finland (project 43963) is gratefully acknowledged.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 19–33, 1999. c Springer-Verlag Berlin Heidelberg 1999
20
T. Janhunen
affect the results on expressiveness in very delicate ways [9]. We have adopted three requirements from earlier approaches – namely polynomiality, faithfulness and modularity (PFM) – as the basis of our framework. In particular, the modularity requirement has turned out to be useful when one wants to differentiate non-monotonic logics by their expressive power [5,6,9]. The author has used PFM translation functions systematically for classifying non-monotonic logics on the basis of their expressive power. So far, our analysis [10] has covered eight non-monotonic logics – giving rise to a hierarchy (EPH) on page 21. In this paper, we analyze the expressive powers of semi-normal default logic (SNDL) as well as prerequisite-free and semi-normal default logic (PSNDL) in order to locate the exact positions of these logics in EPH. We proceed as follows. In Section 2, we describe the requirements for PFM translation functions and give an overview of the current EPH. Five syntactic variants of default logic are introduced in Section 3. The expressive powers of semi-normal default logic and semi-normal and prerequisite-free default logic are then analyzed in Sections 4 and 5, respectively. The paper ends with conclusions in Section 6.
2
PFM Translations and Expressive Power Hierarchy
In this section, we introduce the three basic requirements for translation functions. Let us introduce some notation and terminology needed in order to formulate these requirements. We write L(A) to introduce a propositional language L based on a set of propositional atoms A. We let hX, T i stand for a nonmonotonic theory where T ⊆ L is its propositional subtheory and X stands for any set(s) of syntactic elements which are specific to the non-monotonic logic L in question (such as a set of defaults D in Reiter’s default logic). The sets of conclusions associated with a non-monotonic theory hX, T i are called extensions (or expansions) of hX, T i and they determine the semantics of hX, T i. We consider only finite non-monotonic theories, and let ||hX, T i|| denote the length of hX, T i, i.e. the number of symbol occurrences needed to represent hX, T i. The three requirements for translation functions are formulated as follows. Definition 1. A translation function Tr : L1 → L2 is (i) polynomial iff for all hX, T i ∈ L1 , the time required to compute Tr(hX, T i) is polynomial in ||hX, T i||, (ii) faithful iff for all hX, T i ∈ L1 , the propositionally consistent extensions of hX, T i ∈ L1 and Tr(hX, T i) ∈ L2 are in one-to-one correspondence and coincide up to the propositional language L of T (iii) modular iff for all hX, T i ∈ L1 , the translation Tr(hX, T i) = hX 0 , T 0 ∪ T i where hX 0 , T 0 i = Tr(hX, ∅i). Polynomiality is a reasonable requirement from the computational point of view: translations should be computable in polynomial time and space. A faithful translation preserves the semantics of a non-monotonic theory hX, T i which is determined by its extensions. We require a one-to-one correspondence of propositionally consistent extensions, which supports both brave and cautious reasoning with extensions in a straightforward way. Moreover, the languages associated with hX, T i and Tr(hX, T i) may extend the propositional language L(A) of T ,
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
21
but a faithful translation function is supposed to preserve extensions of hX, T i up to L. The modularity requirement demands that a translation function provides a fixed translation for X which is independent of T . Consequently, there is no need to recompute Tr(hX, ∅i) whenever T is updated. A more detailed discussion and a comparison (e.g. with [5,17]) can be found in [9,10]. For the sake of brevity, we say that a translation function is PFM if it satisfies the three requirements of Definition 1. In particular, we note that any composition of PFM translation functions is also PFM [9]. PFM translation functions provide us a framework for analyzing the relative expressive power of non-monotonic logics: if there is a PFM translation function from one nonmonotonic logic L1 to another non-monotonic logic L2 , then is considered L2 to be at least as expressive as L1 (denoted by L1 −→ L2 ). If – in addition – there are 6 no PFM translation functions in the opposite direction (denoted by L2 −→L 1 ), then we say that L1 is less expressive than L2 (denoted by L1 =⇒ L2 ). If there are PFM translation functions in both directions, then L1 and L2 are equally 6 6 expressive (denoted by L1 ←→ L2 ). If L1 −→L 2 and L2 −→L 1 hold for two logics L1 and L2 simultaneously, then L1 and L2 are incomparable with −→ (denoted 6 by L1 ←→L 2 ). Note that −→ is a preorder on (non-monotonic) logics and =⇒ is a strict partial order among the equivalence classes induced by −→ .
.... ... ... ... ...
.... ... ... ... ...
.... ... ... ... ...
..... ... ... .. ..
.... ... ... ... ...
INCREASING EXPRESSIVE POWER
The author has analyzed the interreTT lations of eight non-monotonic logics in DL - SAEL- PL terms of PFM translation functions [10]. Using the relations −→ and =⇒ it is possible to form a hierarchy of nonNDL AEL - PDL monotonic logics in Figure 1. Classical propositional logic (CL) is included in the hierarchy in order to complete our view. PNDL- CIRC The most expressive class contains default logic (DL) [21], strong autoepistemic logic (SAEL) as well as priority logic (PL) [24]. Below this class, there are two less exCL pressive but mutually incomparable classes. The one on the left contains normal Fig. 1: Expressive Power Hierarchy of default logic (NDL) while the one on the Non-monotonic Logics (EPH) right contains autoepistemic logic (AEL) [19] and prerequisite-free default logic (PDL) which are of equal expressive power. Below these two classes, there is a less expressive class containing parallel circumscription (CIRC) [13] as well as prerequisite-free and normal default logic (PNDL) which are equally expressive. The classes of EPH indicate some astonishing relationships in light of earlier expressiveness results (c.f. [1,5]): AEL and PDL are of equal expressive power and less expressive than DL and SAEL. NDL, PDL and PNDL are all syntactically restricted forms of default logic. It is clear by the classes of EPH that syntactic restrictions tend to decrease the expressive powers of the corresponding variants of DL. This motivates the goal of this paper which is to investigate the expressiveness of DL under the semi-normality restriction.
22
3
T. Janhunen
Syntactic Variants of Default Logic
In order to define syntactic variants of default logic, we begin with a short introduction to Reiter’s default logic [21]. A default theory is a pair hD, T i where n such that T ⊆ L and D is a set of default rules (or defaults) of the form α:β1 ,...,β γ n ≥ 0 and the prerequisite α, the justifications β1 , . . . , βn and the consequent γ of the rule are sentences of L. We let Jf(D) (Cq(D)) stand for the set of justifications (consequents) that appear in a set of defaults D. Marek and Truszczy´ nski [17] reduce a set of defaults D with respect to a propositional theory E ⊆ L to a set of inference rules DE which contains an inference rule αγ whenever there is
n ∈ D such that E ∪ {βi } is consistent for all 0 < i ≤ n. a default rule α:β1 ,...,β γ The closure of a theory T ⊆ L under a set of inference rules R is denoted by CnR (T ). This theory is the least theory E ⊆ L which (C1) contains T , (C2) is closed under propositional consequence and (C3) is closed under the rules of R, i.e. whenever αγ ∈ R and α ∈ E, then also γ ∈ E [17, Theorem 3.7]. It is possible to capture the closure CnR (T ) by introducing a notion of a proof from T as follows. A sequence αγ11 , . . . , αγnn of rules of R is an R-proof of φ ∈ L from T ⊆ L iff T ∪ {γ1 , . . . , γn } |= φ and T ∪ {γ1 , . . . , γi−1 } |= αi holds for each 0 < i ≤ n (c.f. [17] for a a slightly different system where the rules of R are incorporated into a propositional proof system). The following definition of extensions for a default theory hD, T i is equivalent to Reiter’s original definition [21].
Definition 2 (Marek and Truszczy´ nski [17]). A theory E ⊆ L is an extension of a default theory hD, T i in L if and only if E = CnDE (T ). Normality and semi-normality are examples of syntactic restrictions proposed for defaults (see e.g. [4,14]). Normal defaults have the form α:γ γ while semi-normal
. A default theory hD, T i is called normal if D defaults are of the form α:γ∧β γ contains only normal defaults. The fragment of DL corresponding to normal default theories under Reiter’s extensions is called normal DL (NDL). Seminormal default theories and semi-normal DL (SNDL) are defined analogously. A default of the form >:β1 γ,...,βn is called prerequisite-free and a shorthand :β1 ,...,βn γ
is often used for such a default. A default theory hD, T i is called prerequisite-free, if every default of D is prerequisite-free. Prerequisite-free DL (PDL) is the fragment of DL corresponding to prerequisite-free default theories under Reiter’s extensions. It is also possible to combine prerequisite-freedom with the normality and semi-normality conditions. This gives rise to prerequisite-free and normal default logic (PNDL) and prerequisite-free and semi-normal default logic :γ∧β (PSNDL). In these logics, defaults are of the forms :β β γ , respectively. The expressive powers of PDL, NDL and PNDL have already been analyzed [10]. This paper extends the analysis to cover SNDL and PSNDL. Lemma 1 (Marek and Truszczy´ nski [17]). If E ⊆ L is an extension of a default theory hD, T i in L, then E = Cn(T ∪ Γ ) where Γ ⊆ {γ | αγ ∈ DE }.
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
4
23
Classifying SNDL in EPH
The goal of this section is to locate the exact position of SNDL in EPH. We start by explaining how the current hierarchy provides lower and upper bounds for the expressiveness of SNDL. Firstly, a semi-normal default is a special case of an ordinary default and this implies that DL is at least as expressive as SNDL. becomes equivalent to a normal one when Secondly, a semi-normal default α:γ∧β γ β equals to > (it is customary to omit β in this case). Thus semi-normal defaults are able to express anything that normal defaults do – indicating that SNDL is at least as expressive as NDL. This is how we end up with a preliminary setting NDL −→ SNDL −→ DL, but the strictness of these relationships remains open. There is a significant difference between NDL and SNDL: a normal default theory has always at least one extension [17, p. 107], but this is not guaranteed for a semi-normal default theory. This is demonstrated by our next example. A variant of C without prerequisites appears in the literature (see e.g. [2,17]). f:r∧¬p , f:q∧¬r } Example 1. Consider a semi-normal set of defaults C = { f:p∧¬q p q , r and theories ∅ and T = {f}. The default theory hC, ∅i has exactly one extension: E = Cn(∅) in which none of the given defaults is applicable, since the common prerequisite f of the rules cannot be derived. However, this critical prerequisite is given in T directly, but the default theory hC, T i has no extensions. This is because the consequents and justifications of the three rules are circularly interdependent such that an extension cannot be established.
The set of defaults C provides us means to rule out extensions of any default theory hD, T i provided that f, p, q and r are new atoms with respect to hD, T i. For instance, if we want to exclude the extensions of hD, T i that contain a but not b, we extend D to a the set of defaults D0 = D∪C ∪{ a:¬b f }. Given an extension E of hD, T i that contains a but not b, it can be shown that E is not an extension of hD0 , T i. In fact, the possibility for the nonexistence of expansions is a distinctive feature of SNDL. It is sufficient to make a difference between SNDL and NDL what comes to expressiveness. Theorem 1. SNDL−→NDL 6 Proof. Let us assume that there is a PFM translation function Tr that transforms semi-normal default theories into normal ones. Let us recall the set of defaults C given in Example 1 and define a set of semi-normal defaults D = C ∪{ :f∧¬a f }. Let hD0 , T 0 i be the translation Tr(hD, ∅i). Note that hD, ∅i does not have extensions, but the normality of D0 guarantees the existence of an extension E 0 for hD0 , T 0 i [17, p. 106]. As Tr is faithful, E 0 must be inconsistent. As shown by Marek and Truszczy´ nski [17, p. 106], a normal default theory hD0 , T 0 i has an inconsistent extension E 0 if and only if T 0 is inconsistent. Thus T 0 must be inconsistent. Then consider a theory T = {a}. The default theory hD, T i has a unique extension E = Cn({a}) which is also propositionally consistent. The translation Tr(hD, T i) is hD0 , T 0 ∪ T i by the modularity of Tr. However, the theory T 0 ∪ T is also inconsistent and thus hD0 , T 0 ∪ T i has only an inconsistent extension E 0 = L0 . But this contradicts the faithfulness of the translation function Tr. 2
24
T. Janhunen
Theorem 1 indicates that NDL =⇒ SNDL, i.e. NDL is less expressive than SNDL. This suggests that we should next analyze whether the relationship SNDL −→ DL is strict or not. Thus we have to consider the possibilities of obtaining a PFM translation from standard DL into SNDL. The very problem n is represented in terms of semi-normal defaults. is how a default rule α:β1 ,...,β γ The main questions that arise in this respect are (i) how the consistency of justifications β1 , . . . , βn is tested and (ii) on what conditions the consequent γ is supposed to be inferable. To give answers to these questions we propose a translation function as follows. The translation function TrSN introduces a new atom cβ (meaning that β is consistent) for each justification β ∈ Jf(D) and a new atom bd (meaning that d is blocked) for each default d ∈ D. Definition 3. Let D be any set of defaults and C the set of defaults of Examn ∈ D, the translation TrSN (d) = ple 1. For an individual default d = α:β1 ,...,β γ :bd ∧¬cβ1 bd
:bd ∧¬cβn bd
d α∧¬γ:f∧¬bd } ∪ { α:γ∧¬b , }. For the default theory hD, T i, γ f S :cβ ∧β the translation TrSN (hD, T i) = hC ∪ { cβ | β ∈ Jf(D)} ∪ d∈D TrSN (d), T i.
{
,...,
The defaults introduced by TrSN have the following purposes in regard to n ∈ D. (i) Semi-normal defaults justifications β ∈ Jf(D) and defaults d = α:β1 ,...,β γ :c ∧β
of the form βcβ test the consistency of the justifications that appear in D. For each consistent justification β, the atom cβ is concluded by the rule. (ii) For each :b ∧¬c :b ∧¬c default d, the semi-normal defaults d bd β1 , . . . , d bd βn test whether each of the justifications β1 , . . . , βn is consistent. If not, then bd is derived by one of the rules to indicate that d is blocked (as one of its justifications is not consistent). d is a rewrite of the original default d. The (iii) The semi-normal default α:γ∧¬b γ consistency of the justifications β1 , . . . , βn is verified by checking that ¬bd is consistent, i.e. bd cannot be derived. Note that this amounts to testing that d is not blocked. (iv) The consequent γ of d appears as an additional justification of the preceding default in order to establish semi-normality. This leads to a complication that has to be relaxed by introducing the rules of C as well as a d . Such a default detects cases where semi-normal default of the form α∧¬γ:f∧¬b f d is applicable (α can be derived and each of β1 , . . . , βn is consistent with E), d tests the but γ is inconsistent with E (also ¬γ can be derived). Because α:γ∧¬b γ consistency of γ, it is unable to derive γ as well as a propositional contradiction1 d in this case. This is why α∧¬γ:f∧¬b and C are needed to ensure that no extension f can result in this case. This is how we utilize nonexistence of extensions as a substitute for propositional inconsistency of extensions. These are equivalent under the notion of faithfulness introduced in Section 2. Let us demonstrate the consistency checking mechanism in practice as follows. Example 2. Consider a set of defaults D = { >: a } and theories T1 = ∅ and T2 = {¬a} based on the language L({a}). The default theory hD, T1 i has a unique extension E1 = Cn({a}) while the default theory hD, T2 i has only one extension E2 = L which is propositionally inconsistent. 1
This is possible with justification-free defaults as to be demonstrated in Example 2.
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
25
The translation TrSN (hD, Ti i) = hC ∪ D0 , Ti i where D0 is the set of defaults :a∧¬b ¬a:f∧¬b TrSN ( >: }. The default theory hC ∪ D0 , T1 i has a unique exa ) = { a , f 0 tension E1 = Cn({a}) so that E1 = E10 ∩ L. On the other hand, the default theory hD0 , T2 i has an anomalous extension E20 = Cn({¬a, f}), but the defaults in C ensure that hC ∪ D0 , T2 i has no extensions (recall Example 1 and discussion after that). Thus we have a one-to-one correspondence of propositionally consistent extensions of hD, Ti i and the translation TrSN (hD, Ti i) = hC ∪ D0 , Ti i such that extensions coincide up to the language L of Ti . Example 2 suggests that TrSN is faithful. Our next objective is to show that this in deed the case. We need a subsidiary result on the effects of adding a set of new literals L to a propositional theory T (note that M ⊆ A is a model of T and M0 ⊆ A0 is a model of L ⇔ M ∪ M0 ⊆ A ∪ A0 is a model of T ∪ L). Lemma 2. Let T ⊆ L(A) and let L be set of literals based on a set of atoms A0 such that A ∩ A0 = ∅. Thus T ∪ L is a propositional theory in L0 (A ∪ A0 ). Let φ ∈ L be any sentence and l any literal based on A0 . – If L is consistent, then (i) T ∪ L |= φ ⇔ T |= φ, (ii) φ is consistent with T ∪L ⇔ φ is consistent with T and (iii) T is consistent ⇔ T ∪L is consistent. – If T is consistent, then (i) T ∪ L |= l ⇔ L |= l, (ii) l is consistent with T ∪L ⇔ l is consistent with L and (iii) T ∪L is consistent ⇔ L is consistent. – If L is consistent, then L |= l ⇔ l ∈ L. – T ∪ L is consistent ⇔ T is consistent and L is consistent. To make precise the relationship of propositionally consistent extensions of hD, T i and the translation TrSN (hD, T i), we introduce mappings ExtSN and Ext (for propositionally consistent and closed theories) that are later shown to establish a one-to-one correspondence between these classes of extensions. Definition 4. Let hD, T i be a default theory in L(A) and A0 = {f, p, q, r} ∪ {cβ | β ∈ Jf(D)}∪{bd | d ∈ D}. Let hD0 , T i be the translation TrSN (hD, T i) which has the language L0 (A ∪ A0 ). For every propositionally closed theory E ⊆ L, let ExtSN (E) = Cn(E ∪ A) ⊆ L0 where A ⊆ A0 is a set of atoms atoms containing (i) the atom cβ for each justification β ∈ Jf(D) that is consistent with E and (ii) n ∈ D having a justification βi which the atom bd for each default d = α:β1 ,...,β γ is not consistent with E. For every propositionally closed theory E 0 ⊆ L0 , let Ext(E 0 ) = E 0 ∩ L. 0 Let us then precompute the reduction DE 0 as far as possible, given propositionally consistent and closed theories E ⊆ L and E 0 = ExtSN (E).
Lemma 3. Assume definitions given in Definition 4. If E ⊆ L(A) is a propositionally consistent and closed theory and A ⊆ A0 is a set of new atoms, then D0 and E 0 = Cn(E ∪ A) ⊆ L0 (A ∪ A0 ) satisfy for all justifications β ∈ Jf(D) and for 0 n ∈ D the following: (R1) c>β ∈ DE all defaults d = α:β1 ,...,β 0 ⇔ β is consistent γ α 0 0 with E, (R2) b>d ∈ DE 0 ⇔ cβi 6∈ A for some justification βi , (R3) γ ∈ DE 0 α∧¬γ 0 ⇔ (γ is consistent with E and bd 6∈ A), (R4) f ∈ DE 0 ⇔ bd 6∈ A, (R5) f f f 0 0 0 p ∈ DE 0 ⇔ q 6∈ A, q ∈ DE 0 ⇔ r 6∈ A, and r ∈ DE 0 ⇔ p 6∈ A.
26
T. Janhunen
0 Proof sketch. Use the definition of DE 0 and Lemma 2 repeatedly.
2
Lemma 4. Let hD0 , T i be the translation TrSN (hD, T i) as given in Definition 3. If E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i, then (i) E = E 0 ∩L = Cn(T ∪Γ ) is propositionally closed and consistent, (ii) E 0 = Cn(T ∪ 0 0 > 0 Γ ∪ A) = Cn(E ∪ A) where Γ ⊆ {γ ∈ L | αγ ∈ DE 0 } and A = {a ∈ A | a ∈ DE 0 }. Moreover, A ∩ {f, p, q, r} = ∅. 0
Proof sketch. Let E 0 = CnDE0 (T ) be a propositionally consistent extension of hD0 , T i implying that E = E 0 ∩ L is propositionally closed and consistent. Then 0 Lemma 1 implies that E 0 = Cn(T ∪ Γ 0 ) where Γ 0 ⊆ {γ | αγ ∈ DE 0 }. The possible 0 members of DE 0 are listed in Lemma 3. Thus we can partition Γ 0 into two disjoint 0 sets of consequents Γ = Γ 0 ∩L and A = Γ 0 ∩A0 . Note that Γ ⊆ {γ ∈ L | αγ ∈ DE 0} 0 α 0 0 0 and A ⊆ {a ∈ A | a ∈ DE 0 }. Then E = Cn(T ∪ Γ ∪ A) and E = E ∩ L implies E = Cn(T ∪ Γ ) by Lemma 2. Thus also E 0 = Cn(E ∪ A) holds by closure properties. Moreover, if we assume that f ∈ A we can establish that p ∈ A ⇔ f 0 p ∈ DE 0 (and similarly for q and r by symmetry). Together with R5, we obtain f f 0 0 0 p ∈ A ⇔ pf ∈ DE 0 ⇔ q 6∈ A ⇔ q 6∈ DE 0 ⇔ r ∈ A ⇔ r ∈ DE 0 ⇔ p 6∈ A, a 0 contradiction. Hence f 6∈ A is the case and f 6∈ E follows by Lemma 2. f 0 By Lemma 3, the only rule of DE 0 having p as a consequent is p (if present 0
by R5). Since E 0 = CnDE0 (T ) is consistent, and f 6∈ E 0 , we know that p 6∈ E 0 and p 6∈ A by Lemma 2. Thus q 6∈ A and r 6∈ A for symmetry reasons and 0 A ∩ {f, p, q, r} = ∅. Lemma 3 implies that A = {a ∈ A0 | >a ∈ DE 0 }, as the rules 0 2 of DE 0 that have f, p, q or r as a consequent are not applicable.
Propositions 1 and 2 establish that ExtSN and Ext are mappings between the propositionally consistent extensions of a default theory hD, T i and the propositionally consistent extensions of the translation TrSN (hD, T i). Proposition 1. If a theory E ⊆ L is a propositionally consistent extension of a default theory hD, T i in L(A), then ExtSN (E) ⊆ L0 (A ∪ A0 ) given in Definition 4 is a propositionally consistent extension of TrSN (hD, T i). Proof sketch. Assume definitions given in Definition 4. Let E = CnDE (T ) be a propositionally consistent extension of hD, T i. It follows by Lemma 2 that also E 0 = ExtSN (E) = Cn(E ∪ A) is propositionally consistent. The conditions 0 R1–R5 in Lemma 3 are also satisfied. The proof of E 0 = CnDE0 (T ) follows. 0 (⊆) Let us establish A ⊆ CnDE0 (T ) at first. (i) If cβ ∈ A for a justification β ∈ 0 Jf(D), it follows that β is consistent with E by the definition of A. Thus c>β ∈ DE 0 0
n by R1 and cβ ∈ CnDE0 (T ) follows. (ii) If bd ∈ A for a default d = α:β1 ,...,β ∈ D, γ then some of the justifications βi of d is not consistent with E and cβi 6∈ A by D0 0 0 the definition of A. Thus b>d ∈ DE 0 by R2 and bd ∈ Cn E (T ) is the case. It 0
0
follows by (i) and (ii) that A ⊆ CnDE0 (T ). Still E = CnDE (T ) ⊆ CnDE0 (T ) should be established. It can be proved by induction on the lengths of DE 0 proofs that if φ ∈ L is DE -provable from T in k steps, then φ ∈ CnDE0 (T ). Thus 0 0 E ∪ A ⊆ CnDE0 (T ) implying that also E 0 = Cn(E ∪ A) ⊆ CnDE0 (T ).
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
27
(⊇) It suffices to show that E 0 satisfies the closure properties C1–C3 of (T ). It is clear that T ⊆ E 0 , as T ⊆ E ⊆ E 0 , and that E 0 is proposiCn tionally closed. Moreover, it can be shown that E 0 = Cn(E ∪ A) is closed under α0 0 the rules of DE 0 using R1-R5. This requires one to check all the rules γ 0 of the 0 DE 0
0
0 forms c>β , b>d , αγ , α∧¬γ , pf , qf and fr . If we assume that αγ 0 ∈ DE 0 (the conditions f 0 0 0 2 are given by R1–R5) and α ∈ E , we can establish that γ ∈ E 0 as well.
Proposition 2. Let hD, T i be a default theory in L(A) and let hD0 , T i be the translation TrSN (hD, T i) as given in Definition 3. If a theory E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i, then the theory E = Ext(E 0 ) = E 0 ∩ L is a propositionally consistent extension of hD, T i. Proof sketch. Let E 0 ⊆ L0 (A ∪ A0 ) be a propositionally consistent extension of hD0 , T i and E = E 0 ∩L. It follows by Lemma 4 that E is propositionally closed as 0 well as consistent and E 0 = Cn(E ∪ A) for a set of atoms A = {a ∈ A | >a ∈ DE 0} DE so that A ∩ {f, p, q, r} = ∅. The proof of E = Cn (T ) follows. (⊆) Using the conditions R1–R5, it can shown for all φ ∈ L that φ ∈ 0 0 CnDE0 (T ) implies φ ∈ CnDT (T ) by induction on the lengths of DE 0 -proofs. 0 (⊇) Let us establish that E = E ∩ L has the closure properties C1–C3 of CnDE (T ). (C1) Since T ⊆ L and T ⊆ E 0 , it holds that T ⊆ E = E 0 ∩ L. (C2) As n ∈D noted already, E is propositionally closed. (C3) Consider any d = α:β1 ,...,β γ α such that γ ∈ DE and α ∈ E. The former implies that each justification βi is 0 consistent with E. Thus c> ∈ DE 0 by R1 and cβi ∈ A for each βi . It follows β i
α∧¬γ 0 0 ∈ DE by R2 that b>d 6∈ DE 0 so that bd 6∈ A by the definition of A. Thus 0 f by R4. Now assuming that γ is not consistent with E implies that ¬γ ∈ E, α ∧ ¬γ ∈ E ⊆ E 0 , and f ∈ E 0 , a contradiction. Hence γ is consistent with E and 0 DE α 0 0 0 0 (T ). γ ∈ DE 0 by R3. Then α ∈ E implies α ∈ E as well as γ ∈ E = Cn Since γ ∈ L, it follows that γ ∈ E. Thus E is closed under the rules of DE . 2
Using mappings ExtSN and Ext, we can establish the desired one-to-one correspondence for propositionally consistent extensions. Proposition 3. The propositionally consistent extensions of a default-theory hD, T i and the translation TrSN (hD, T i) are in one-to-one correspondence. Proof sketch. Let E1 and E2 be two propositionally consistent extensions of hD, T i such that ExtSN (E1 ) = ExtSN (E2 ). It follows by Definition 4 and Lemma 2 that Cn(E1 ∪A1 ) = Cn(E2 ∪A2 ) where A1 ⊆ A0 , A2 ⊆ A0 , E1 = Cn(E1 ∪A1 )∩L and E2 = Cn(E2 ∪ A2 ) ∩ L. Thus E1 = E2 and ExtSN is injective. Let E10 and E20 be two propositionally consistent extensions of TrSN (hD, T i) such that Ext(E10 ) = Ext(E20 ), i.e. E10 ∩ L = E20 ∩ L = E. Let i ∈ {1, 2}. It 0 holds by Lemma 4 that Ei0 = Cn(E ∪ Ai ) where Ai = {a ∈ A0 | >a ∈ DE 0 }. Thus > cβ
i
> 0 0 ∈ DE 0 ⇔ cβ ∈ DE20 and cβ ∈ A1 ⇔ cβ ∈ A2 by R1 and the definitions of 1 A1 and A2 . Consequently, also bd ∈ A1 ⇔ bd ∈ A2 by R2 and the preceding equivalence. Recall also that A1 ∩ {f, p, q, r} = A2 ∩ {f, p, q, r} = ∅ by Lemma 4. Thus A1 = A2 as well as E10 = E20 .
28
T. Janhunen
The mappings ExtSN and Ext are inverses of each other as Ext(ExtSN (E)) = 2 ExtSN (E) ∩ L = Cn(E ∪ A) ∩ L = E by Lemma 2. Let us now state the main result of this paper: DL and SNDL are of equal expressive power according to the measure provided by PFM translations. Theorem 2. DL ←→ SNDL. Proof. (−→) TrSN is obviously polynomial and modular. The one-to-one correspondence of propositionally consistent extensions is established in Propositions 1–3. Proposition 2 implies that these extensions coincide up to L. Therefore TrSN is also faithful. (←−) The translation function TrI (hD, T i) = hD, T i is PFM. 2 Marek and Truszczy´ nski [17, Theorem 5.19] propose a translation function TrMT that transforms a default theory hD, T i to a weak semi-normal default theory hD0 , T i where D0 contains semi-normal defaults or justification-free defaults (of the form α: γ ) that correspond to monotonic inference rules. The function TrMT translates a default d = α:cd α:cd ∧β 1 ∧β1 , . . . , ncd n cd n 1
d cd 1 ∧...∧cn : . γ
α:β1 ,...,βn γ
∈ D into following defaults:
and The default for checking the consistency of a justification βi is semi-normal and almost like ours except α is used as a prerequisite. The last rule (that controls the derivation of the consequent γ of the original rule) is not semi-normal. Marek and Truszczy´ nski establish that TrMT is PFM so that DL ←→ WSNDL. However, this result does not yet establish that DL and SNDL are of expressive power. This is because weak semi-normal theories have a richer syntax and are therefore at least as expressive as semi-normal default theories (i.e. SNDL −→ WSNDL). In order to establish SNDL ←→ WSNDL, the key problem is to express justification-free defaults in terms of semi-normal ones. The translational technique behind TrSN provides a solution: a justification-free α:γ α∧¬γ:f default α: together γ can be expressed using semi-normal defaults γ and f with the set of defaults of C in Example 1. From the historical perspective, it is also worth mentioning earlier work [14] by Lukaszewicz who considers the possibilities of translating a default α:β γ into a
as well as into a normal default α:γ∧β semi-normal default α:γ∧β γ γ∧β . This is because he argues that normal defaults are the only defaults that one needs in practice. d introduced by The first step yields a default that resembles the default α:γ∧¬b γ TrSN , but Lukaszewicz does not provide a consistency checking mechanism (as TrSN does). Thus the set of defaults addressed in Example 2 cannot be faithfully captured in his approach. On the other hand, Theorem 1 and EPH indicate that the second translation considered by Lukaszewicz cannot be faithful.
5
Classifying PSNDL in EPH
As already explained in Section 3 there is an a further way to constrain defaults by denying prerequisites. This is how we end up with defaults of the form :γ∧β that are special cases of both semi-normal and prerequisite-free defaults. γ
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
29
This suggests that PSNDL −→ SNDL and PSNDL −→ PDL. However, the author [10] has already established that PDL =⇒ DL which implies PDL =⇒ SNDL by Theorem 2. Moreover, assuming that SNDL −→ PSNDL would imply that SNDL −→ PDL by the relationship PSNDL −→ PDL and compositionality of PFM translation functions. But this contradicts PDL =⇒ SNDL. Thus SNDL−→PSNDL 6 is the case and PDL gives an upper bound for the expressiveness of PSNDL. To examine whether PDL provides a tight bound, we have to consider the possibilities of translating systematically prerequisite-free defaults n into prerequisite-free and semi-normal ones. of the form :β1 ,...,β γ The function TrSN provides a natural starting point for our considerations, as it produces semi-normal defaults. The set of defaults C involved in TrSN has prerequisites, and we have to make a rearrangements in order to translate C into a set of prerequisite-free and semi-normal defaults (denoted by C 0 below). Example 3. Consider a prerequisite-free and semi-normal set of defaults C 0 = :q∧¬r :r∧¬p :p∧¬f :q∧¬f :r∧¬f { :p∧¬q p , q , r , p , q , r } and theories ∅ and T = {f}. The default theory hC 0 , ∅i has exactly one extension: E = Cn({p, q, r}) in which the last three of the given defaults are applicable and the justifications of the first three rules are inconsistent with E. However, the atom f is present in T directly, preventing the applicability of the last three defaults so that p, q and r cannot be derived by them. Then the first three defaults are again circularly interdependent (in analogy to the defaults of C if f can be derived) such that no extension results. The translation function TrSN produces also defaults of the forms α∧¬γ:f∧¬bd f
α:γ∧¬bd γ
and that are not prerequisite-free. Whenever α = > these defaults d d and ¬γ:f∧¬b . The last category has still a reduce to defaults of the form :γ∧¬b γ f prerequisite and we have to express the consistency check of γ somehow else. For these reasons, the translation function TrPSN introduces a new atom cγ (meaning that γ is consistent) for consequents γ ∈ Cq(D) as well. Definition 5. Let D be any set of prerequisite-free defaults and C 0 the set of n ∈ D, the transladefaults of Example 3. For an individual default d = :β1 ,...,β γ :bd ∧¬cβ1 :b ∧¬c d :f∧¬bd ∧¬cγ , . . . , d bd βn } ∪ { :γ∧¬b , } and for hD, T i, bd γ S f :c ∧φ φ 0 hC ∪ { cφ | φ ∈ Jf(D) ∪ Cq(D)} ∪ d∈D TrPSN (d), T i.
tion TrPSN (d) = { TrPSN (hD, T i) =
d is expressed using a In this translation, the semi-normal default ¬γ:f∧¬b f :f∧¬bd ∧¬cγ which is prerequisite-free and semi-normal. If the refined judefault f stification ¬bd ∧¬cγ is consistent, then the justifications of the translated default n are all consistent, but γ is not consistent. This is exactly the case d = :β1 ,...,β γ
d cannot be applied due to semi-normality of the default (although when :γ∧¬b γ it should be applied to derive a propositional contradiction). Thus it is natural to derive f by the former rule in this case in order to prevent an extension where d is not properly applied. Let us then define a mapping (a revision of ExtSN ).
30
T. Janhunen
Definition 6. Let hD, T i be a prerequisite-free default theory in L(A) and A0 = {f, p, q, r}∪{cβ | β ∈ Jf(D) ∪ Cq(D)}∪{bd | d ∈ D}. Let hD0 , T i be the translation TrPSN (hD, T i) in L0 (A ∪ A0 ). For every propositionally closed theory E ⊆ L, let ExtPSN (E) = Cn(E ∪ A) ⊆ L0 where A ⊆ A0 is a set of atoms atoms containing (i) the atom cφ for each justification or consequent φ ∈ Jf(D) ∪ Cq(D) that is n ∈ D having a consistent with E, (ii) the atom bd for each default d = :β1 ,...,β γ justification βi which is not consistent with E and (iii) the atoms p, q and r. The following three lemmas correspond to Lemmas 3, 1 and 4, respectively. Lemma 5. If E ⊆ L(A) is propositionally consistent and closed and A ⊆ A0 then E 0 = Cn(E ∪ A) ⊆ L0 (A ∪ A0 ) satisfies for all φ ∈ Jf(D) ∪ Cq(D) and for 0 n ∈ D the following: (PR1) c>φ ∈ DE all d = :β1 ,...,β 0 ⇔ φ is consistent with E, γ
> 0 0 (PR2) b>d ∈ DE 0 ⇔ cβi 6∈ A for some justification βi , (PR3) γ ∈ DE 0 ⇔ (γ > 0 is consistent with E and bd 6∈ A), (PR4) f ∈ DE 0 ⇔ (bd 6∈ A and cγ 6∈ A), > > > 0 0 0 (PR5) p ∈ DE 0 ⇔ f 6∈ A, q ∈ DE 0 ⇔ f 6∈ A, r ∈ DE 0 ⇔ f 6∈ A, (PR6) > > > 0 0 0 ∈ D ⇔ q ∈ 6 A, ∈ D ⇔ r ∈ 6 A, and ∈ D ⇔ p ∈ 6 A. E0 E0 E0 p q r
Lemma 6. If E ⊆ L is an extension of a prerequisite-free default theory hD, T i in L, then E = Cn(T ∪ Γ ) where Γ = {γ | > γ ∈ DE }. Lemma 7. Let hD0 , T i be the translation TrPSN (hD, T i) as given in Definition 3. If E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i, then (i) E = E 0 ∩ L = Cn(T ∪ Γ ) is propositionally closed and consistent and (ii) 0 E 0 = Cn(T ∪ Γ ∪ A) = Cn(E ∪ A) where Γ = {γ ∈ L | > γ ∈ DE 0 } and A = > 0 {a ∈ A0 | a ∈ DE 0 }. Moreover, f 6∈ A, but {p, q, r} ⊆ A. 0
Proof sketch. Let E 0 = CnDE0 (T ) be a propositionally consistent extension of hD0 , T i so that E = E 0 ∩ L is propositionally closed and consistent. Lemma 6 0 0 implies that E 0 = Cn(T ∪ Γ 0 ) where Γ 0 = {γ | > γ ∈ DE 0 }, as D is prerequisite0 0 0 0 free. Let us partition Γ into Γ = Γ ∩ L and A = Γ ∩ A by the structure of D0 . Then E 0 = Cn(T ∪ Γ ∪ A) and E = E 0 ∩ L imply E = Cn(T ∪ Γ ) and E 0 = Cn(E ∪ A) by Lemma 2. If f ∈ A, PR5 and PR6 imply that p ∈ A ⇔ > > > 0 0 0 p ∈ DE 0 ⇔ q 6∈ A ⇔ q 6∈ DE 0 ⇔ r ∈ A ⇔ r ∈ DE 0 ⇔ p 6∈ A, a contradiction. > > 0 2 Hence f 6∈ A and { > p , q , r } ⊆ DE 0 (by PR5) implying that {p, q, r} ⊆ A. We are ready to establish that ExtPSN and ExtP are mappings between the propositionally consistent extensions of a default theory hD, T i and the propositionally consistent extensions of the translation TrPSN (hD, T i). Proposition 4. If a theory E ⊆ L is a propositionally consistent extension of a prerequisite-free default theory hD, T i in L(A), then ExtPSN (E) ⊆ L0 (A ∪ A0 ) given in Definition 6 is a propositionally consistent extension of TrPSN (hD, T i). Proof. Assume definitions given in Definition 5. Let E ⊆ L be a propositionally consistent extension of hD, T i. Since E is propositionally consistent, it is clear by
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
31
Lemma 2 that also E 0 = ExtPSN (E) = Cn(E ∪ A) is propositionally consistent. Moreover, the conditions PR1–PR6 given in Lemma 5 are met. On the basis 0 of these conditions and the definition of A, the reduction DE 0 contains for each :β1 ,...,βn > ∈ D (i) the rule if φ is consistent with E, φ ∈ Jf(D) ∪ Cq(D) and d = γ cφ
(ii) the rule b>d if some of justifications βi is not consistent with E, (iii) the rule > γ if γ is consistent with E and each of the justifications β1 , . . . , βn is consistent > > with E and (iv) the rules > p , q and r . It follows by the items above and the 0
definition of A that CnDE0 (T ) = Cn(T ∪ Γ ∪ A) where Γ contains a consequent n if and only if γ is consistent with E and > γ of a default :β1 ,...,β γ γ ∈ DE . Let us 0
then establish Cn(E ∪ A) = CnDE0 (T ). The key observation is that E satisfies E = CnDE (T ) = Cn(T ∪ {γ | > γ ∈ DE }) by Lemma 6. Since E is propositionally consistent, it follows that each consequent γ for which > γ ∈ DE is necessarily consistent with E. Thus E = Cn(T ∪ Γ ) and E 0 = Cn(E ∪ A) = Cn(T ∪ Γ ∪ A) = 0 2 CnDE0 (T ) is a propositionally consistent extension of hD0 , T i. Proposition 5. Let hD, T i be a prerequisite-free default theory in L(A) and let hD0 , T i be the translation TrPSN (hD, T i) as given in Definition 6. If a theory E 0 ⊆ L0 (A ∪ A0 ) is a propositionally consistent extension of hD0 , T i then the theory E = Ext(E 0 ) = E 0 ∩ L is a propositionally consistent extension of hD, T i. Proof. Let E 0 be a propositionally consistent extension of hD0 , T i. It follows by 0 Lemma 7 that E 0 = CnDE0 (T ) = Cn(E ∪ A) where A ⊆ A0 and E = E 0 ∩ L is propositionally closed and consistent. The proof of E = CnDE (T ) ⊆ E follows. (⊆) Consider any φ ∈ E. If follows that φ ∈ L and T ∪ Γ |= φ by Lemma 7. 0 n ∈ D such that > Note that γ ∈ Γ whenever d = :β1 ,...,β γ γ ∈ DE 0 . By PR3, this 0 is the case ⇔ γ is consistent with E and bd 6∈ A. This implies that b>d 6∈ DE 0 by > > 0 Lemma 7. Thus {cβ1 , . . . , cβn } ⊆ A by PR2 and { cβ , . . . , cβ } ⊆ DE 0 by Lemma 1
n
7. Then PR1 implies that each of β1 , . . . , βn is consistent with E so that > γ ∈ DE DE DE DE and γ ∈ Cn (T ). Thus Γ ⊆ Cn (T ). Since also T ⊆ Cn (T ) and T ∪ Γ |= φ holds, it follows that φ ∈ CnDE (T ), because CnDE (T ) is propositionally closed. (⊇) (C1) As T ⊆ L and T ⊆ E 0 , it holds that T ⊆ E = E 0 ∩ L. (C2) As noted n be any default from D already, E is propositionally closed. (C3) Let d = :β1 ,...,β γ and assume that > ∈ D . It follows that each of the justifications β1 , . . . , βn is E γ > > 0 consistent with E. Then the rules cβ , . . . , cβ belong to DE 0 (by PR1) implying 1
n
0 that {cβ1 , . . . , cβn } ⊆ A. But then PR2 implies b>d 6∈ DE 0 as well as bd 6∈ A. > Moreover, the rule γ ∈ DE implying that γ ∈ E and γ is consistent with E, 0 0 as E is consistent. It follows by PR3 that the rule > γ ∈ DE 0 and γ ∈ E . Since 2 γ ∈ L, we know that γ ∈ E holds as well. Thus E is closed under DE .
The one-to-one correspondence of extensions is established in analogy to Proposition 3. By this relationship of extensions, the function TrPSN is PFM and PSNDL and PDL reside in the same class of EPH (this is a straightforward analog to Theorem 2 and we omit the proof of Theorem 3).
32
T. Janhunen
Proposition 6. The propositionally consistent extensions of a prerequisite-free default-theory hD, T i and the propositionally consistent extensions of the translation TrPSN (hD, T i) are in one-to-one correspondence. Theorem 3. PDL ←→ PSNDL.
6
Conclusions
This paper continues earlier research by the author on classifying non-monotonic logics on the basis of their expressive power. The framework [10] is based on the notion of a polynomial, faithful and modular (PFM) translation function that maps systematically theories of one non-monotonic logic to theories of other such that the semantics of theories is preserved. Then it is possible to use the existence/nonexistence of such translations between certain non-monotonic logics as criteria to rank non-monotonic logics on the basis of their expressive power. This gives rise to the expressive power hierarchy (EPH) of non-monotonic logics. This paper analyzes semi-normal default logic in order locate its position in EPH. A PFM translation function is presented in order to establish the main result of this paper (Theorem 2): semi-normal default logic (SNDL) and Reiter’s default logic (DL) are of equal expressive power, i.e. SNDL ←→ DL. Thus semi-normality is an example of a syntactic restriction that does not affect the expressiveness of DL. In contrast to this, normal and prerequisite-free defaults are already less expressive, as NDL and PDL reside lower in EPH. The result of Theorem 2 has some interesting consequences. The first one relates to Reiter’s original definition of defaults: that are assumed to have at least one justification (n > 0) as noted in [17, p. 71]. Recall that we allow justificationfree defaults (n ≥ 0). However, Theorem 2 indicates that justification-free defaults do not increase the expressiveness of DL. Moreover, Theorem 2 implies that it is possible to translate arbitrary defaults into defaults that have only a single justification (n = 1). This tightens Marek and Truszczy´ nski’s result that unitary defaults (0 ≤ n ≤ 1) are sufficient [17, Corollary 5.20]. The equality of SNDL and DL implies also that SNDL−→NDL, 6 i.e. there is no PFM translation function from SNDL to normal default logic (NDL), since it is already known that DL−→NDL 6 [10]. Nevertheless, a direct counter-example is given in this paper for illustrative purposes (see Theorem 1). The structure of EPH implies that SNDL has greater expressive power than Moore’s autoepistemic logic (AEL) so that SNDL−→AEL 6 holds in harmony with [5]. The effects of the prerequisite-freedom are also evaluated in this paper in conjunction with the semi-normality. The resulting logic PSNDL turns out to be less expressive than SNDL. In fact, PSNDL resides in the same class as PDL as indicated by Theorem 3. This is intuitive: since DL ←→ SNDL it is natural to expect that PDL ←→ PSNDL as the syntaxes of PDL and PSNDL are obtained by constraining those of DL and SNDL in the same way. One of the implications of Theorem 3 is also that NDL←→PSNDL. 6 This indicates that NDL has features (rules that are close to monotonic inference rules) that cannot be captured in PSNDL, and vice versa (existence of extensions is not guaranteed in PSNDL).
Classifying Semi-Normal Default Logic on the Basis of its Expressive Power
33
References 1. P.A. Bonatti and T. Eiter. Querying disjunctive database through nonmonotonic logics. Theoretical Computer Science, 160:321–363, 1996. 2. D.W. Etherington. Formalizing nonmonotonic reasoning systems. Artificial Intelligence, 31:41–85, 1987. 3. D.W. Etherington. Relating default logic and circumscription. In Proceedings of IJCAI’87, pages 489–494, Milan, Italy, August 1987. Morgan Kaufmann. 4. D.W. Etherington. Reasoning with Incomplete Information. Pitman, London, 1988. 5. G. Gottlob. Translating default logic into standard autoepistemic logic. Journal of the Association for Computing Machinery, 42(2):711–740, 1995. 6. T. Imielinski. Results on translating defaults to circumscription. Artificial Intelligence, 32:131–146, 1987. 7. T. Janhunen. Representing autoepistemic introspection in terms of default rules. In Proceedings of ECAI’96, pages 70–74, Budapest, Hungary, 1996. John Wiley. 8. T. Janhunen. Separating disbeliefs from beliefs in autoepistemic reasoning. In J. Dix, U. Furbach, and A. Nerode, editors, Proceedings of LPNMR’97, pages 132– 151, Dagstuhl, Germany, July 1997. Springer-Verlag. LNAI 1265. 9. T. Janhunen. On the intertranslatability of autoepistemic, default and priority logics, and parallel circumscription. In Proceedings of JELIA’98, pages 216–232, Dagstuhl, Germany, October 1998. Springer-Verlag. LNAI 1489. 10. T. Janhunen. On the intertranslatability of non-monotonic logics. Annals of Mathematics and Artificial Intelligence (issue on JELIA’98). Accepted for Publication. 11. K. Konolige. On the relation between default and autoepistemic logic. Artificial Intelligence, 35:343–382, 1988. 12. K. Konolige. On the relation between autoepistemic logic and circumscription. In Proceedings of IJCAI’89, pages 1213–1218, Detroit, Michigan, USA, August 1989. 13. V. Lifschitz. Computing circumscription. In Proceedings of IJCAI’85, pages 121– 127, Los Angeles, California, USA, August 1985. Morgan Kaufmann. 14. W. Lukaszewicz. Two results on default logic. In Proceedings of IJCAI’85, pages 459–461, Los Angeles, California, August 1985. 15. W. Marek, G.F. Schwarz, and M. Truszczy´ nski. Modal nonmonotonic logics: Ranges, characterization, computation. Journal of the ACM, 40(4):963–990, 1993. 16. W. Marek and M. Truszczy´ nski. Modal logic for default reasoning. Annals of Mathematics and Artificial Intelligence, 1:275–302, 1990. 17. W. Marek and M. Truszczy´ nski. Nonmonotonic Logic: Context-Dependent Reasoning. Springer-Verlag, Berlin, 1993. 18. J. McCarthy. Circumscription—a form of non-monotonic reasoning. Artificial Intelligence, 13:27–39, 1980. 19. R.C. Moore. Semantical considerations on nonmonotonic logic. In Proceedings of IJCAI’83, pages 272–279, Karlsruhe, FRG, August 1983. Morgan Kaufmann. 20. I. Niemel¨ a. A unifying framework for nonmonotonic reasoning. In Proceedings of ECAI’92, pages 334–338, Vienna, Austria, August 1992. John Wiley. 21. R. Reiter. A logic for default reasoning. Artificial Intelligence, 13:81–132, 1980. 22. G. Schwarz. On embedding default logic into Moore’s autoepistemic logic. Artificial Intelligence, 80:349–359, 1996. 23. M. Truszczy´ nski. Modal interpretations of default logic. In Proceedings of IJCAI’91, pages 393–398, Sydney, Australia, August 1991. Morgan Kaufmann. 24. X. Wang, J.-H. You, and L.Y. Yuan. Nonmonotonic reasoning by monotonic inferences with priority constraints. In Proceedings of the 2nd International Workshop on Non-Monotonic Extensions of LP, pages 91–109. Springer, 1996. LNAI 1216.
Locally Determined Logic Programs Douglas Cenzer1 , Jeffrey B. Remmel2 , and Amy Vanderbilt1 1
2
Department of Mathematics, University of Florida, P.O. Box 118105, Gainesville, Florida 32611 [email protected] fax: 352-392-8357 Department of Mathematics, University of California at San Diego La Jolla, CA 92093 [email protected]
Abstract. In general, the set of stable models of a recursive propositional logic program can be quite complex. For example, it follows from results of Marek, Nerode, and Remmel [8] that there exists finite predicate logic programs and recursive propositional logic programs which have stable models but no hyperarithmetic stable models. In this paper, we shall define several conditions which ensure that recursive logic program has a stable model which is recursive.
1
Introduction
The stable model semantics of logic programs has been extensively studied. Unfortunately, the set of stable models of a recursive propositional logic program with negation or even of a finite predicate logic program with negation can be quite be quite complex. For example, in [7], it is shown that for any recursive propositional logic program P , there is an infinite branching recursive tree TP such that there is an effective 1:1 degree preserving correspondence between the set of stable models of P and the set of infinite paths through TP . In [8] it is shown that given any infinite branching recursive tree T , there exists a recursive propositional logic program PT such that there is an effective 1:1 correspondence between the set of infinite paths through T and the set of stable models of P . Moreover, in [8], it is shown that the same result holds if we replace recursive logic programs by finite predicate logic programs with negation. These results imply that the set of stable models of a recursive propositional logic program or a finite predicate logic program can be extremely complex. For example, it follows from these results that there a finite predicate logic program which has a stable model but has no stable model which is hyperarithmetic. The main motivation for this paper was to develop conditions on recursive logic programs P which would guarantee the existence of well behaved stable model for P , i.e. a stable model of P which is recursive or possibly even polynomial time. In this paper, we shall give several conditions which guarantee that a recursive propositional logic program has a recursive stable model. We should note that are several number of conditions in the literature which guarantee that a recursive propositional logic program has a stable model of relatively low complexity with respect to the arithmetic hierarchy. Clearly, the first such condition M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 34–48, 1999. c Springer-Verlag Berlin Heidelberg 1999
Locally Determined Logic Programs
35
is to consider recursive Horn logic programs. In that case, it is implicit in [11] and explicitly proved in [1] that the least model of recursive Horn program is a recursively enumerable (r.e.) set and that every r.e. set can appear as the least model of recursive Horn Program. Another important class of logic programs is stratified logic programs where one can single out a particular model called the perfect model. This model is a unique stable model of such program. Apt and Blair [3] showed that recursive logic program with n strata must have a perfect model which is Σn0 and that there is a recursive logic program P with n strata such that the perfect model of P is Σn0 complete. In [8], Marek, Nerode, and Remmel considered the two conditions of being (i) locally finite, that is, (i) each atom of the Herbrand Base of P has at most finitely many minimal derivations from P , and (ii) rsp, which is that there is an effective procedure to find these possible derivations. They showed that these conditions ensure that there is highly recursive tree TP and an effective 1:1 degree preserving correspondence between the set of stable models of P and the set of infinite paths through TP . Here a tree T is highly recursive, if T is recursive, finitely branching, and there is a effective procedure which given any node η ∈ T , produces the set of nodes in T which immediately extend η. One consequence of this fact is that a recursive rsp logic program which has a stable model always has a stable model M whose jump is recursive in 00 . In addition, Marek, Nerode, and Remmel [9] generalized Reiter’s concept of normal default theories to logic programs, FC normal logic programs in language of [9], and showed that FC normal logic programs always have a stable model which is r.e. in 00 . The outline of this paper is as follows. In section 1, we shall define the concept of proof schemes and of FC normal logic programs which will be crucial for later developments. In section 2, we shall introduce the new notion of a locally determined logic program. Given a recursive logic program P and an effective listing of the atoms of the Herbrand base, HP , of P , HP = {a0 , a1 , . . .}, we say that n is a level of P if, roughly, whenever there is a proof scheme p for a sentence ai with i ≤ n, then there exists a proof scheme q only involving elements from {a0 , . . . , an } and their negations such that the restraint set of q is contained in the restraint set of p. We then say that P is locally determined if for every k ≥ 0, there is an nk ≥ k such that nk is a level of P . We say that P is effectively locally determined if one can effectively find such an nk from k. We shall show in section 2, that if P is an effectively locally determined recursive logic program, then there is a highly recursive tree TP such that there is an effective 1:1 degree preserving correspondence between the stable models of P and the set of infinite paths through TP . Thus being effectively locally determined is another condition much like the rsp property which reduces the complexity of the set of stable models of P . In section 3, we shall introduce several strengthenings of local determinedness which will ensure that recursive logic program always has a recursive stable model.
36
2
D. Cenzer, J.B. Remmel, and A. Vanderbilt
Propositional Logic Programs, Proof Schemes, and Normality
In this section, we shall introduce several key notions which will be used in later sections. In particular, we shall carefully define the notion of recursive logic programs. Then we shall define the notion of proof schemes which will lead to the definitions of locally finite programs and rsp programs. Finally we shall describe the extension of the Reiter’s concept of normal default theories to recursive logic programs following [9]. A program clause is an expression of the form C = p ← q1 , . . . , qn , ¬r1 , . . . , ¬rm
(1)
where p, q1 , . . . , qn , r1 , . . . , rm are atomic formulas in some propositional language L. A program is a set of clauses of the form (1). A clause C is called a Horn clause if m = 0. We let Horn(P ) denote the set of all Horn clauses of P . HP is the Herbrand base of P , that is, the set of all atomic formulas of the language of P . If P is a program and M ⊆ HP is a subset of the Herbrand base, define operator TP,M : P(HP ) → P(HP ) where TP,M (I) is the set of all p such that there exists a clause C = p ← q1 , . . . , qn , ¬r1 , . . . , ¬rm in P such that q1 ∈ I, . . . , qn ∈ I and {r1 , . . . , rm } ∩ M = ∅. The operator TP,M is a monotonic finitizable operator, see [2], and hence possesses a least fixpoint FP,M . Given program P and M ⊆ HP , the GelfondLifschitz reduct of P is defined as follows. For every clause C of P , execute the following operation: If some atom a belongs to M and its negation ¬a appears in C, then eliminate C altogether. In the remaining clauses that have not been eliminated by the operation above, eliminate all the negated atoms. The resulting GL is a Horn propositional program (possibly infinite). The program program PM GL GL coincides with PM possesses a least Herbrand model. If that least model of PM M , then M is called a stable model for P . Gelfond and Lifschitz [6] proved the every stable model of P is a minimal model of P and that M is stable model of P iff M = FP,M . Having characterized stable models as fixpoints of (parametrized) operators, consider the form of elements of FP,M . A P, M -derivation of an atom p is a sequence hp1 , . . . , ps i such that (i) ps = p and (ii) for every i ≤ s, either ”pi ←” is a member of P or there is a clause C = ”pi ← q1 , . . . , qn , ¬r1 , . . . , ¬rm ” such / M . It is easy to show that C ∈ P , q1 , . . . , qn ∈ {p1 , . . . , pi−1 } , and r1 , . . . , rm ∈ that FP,M is the set of all atoms possessing a P, M -derivation. Thus M is a stable model of the program P if and only if M consists exactly of those atoms which possess a P, M -derivation. The property that a sequence hp1 , . . . , ps i is a P, M -derivation of an atom p does not depend on the whole set M but only on the intersection of M and a certain finite set of atoms that occur in the derivation. In order that the sequence hp1 , . . . , ps i be a P, M -derivation of an atom ps , some atoms must be left out of the set M . Each derivation depends on a finite number of such omitted atoms.
Locally Determined Logic Programs
37
In other words, if we classify the atoms according to whether they are “in” or “out” of M , the property that a sequence hp1 , . . . , ps i is a P, M -derivation depends only on whether a finite number of elements are out of M . The notion of a proof scheme formalizes this idea. A (P-)proof scheme for an atom p is a sequence S = hhpi , Ci , Ui iisi=1 of triples such that for each triple hpi , Ci , Ui i, pi ∈ HP , Ci ∈ P is a clause with the head pi and Ui is a finite subset of HP . Such sequence S is a proof scheme for p if (1) ps = p, and for every i (2) Ci = pi ← q1 , . . . , qn , ¬r1 , . . . , ¬rm , where {q1 , . . . , qn } ⊆ {p1 , . . . pi−1 } and Ui = Ui−1 ∪ {r1 , . . . , rm }. We call p the conclusion of S, written p = cln(S), and the set Us the support of S, written supp(S). We say that a subset M ⊆ HP admits a proof scheme S = hhpi , Ci , Ui iisi=1 if M ∩ Us = ∅. The following proposition due to Marek, Nerode,and Remmel in [7] characterizes stable models in terms of the existence of proof schemes. Proposition 1. Let M ⊆ HP . Then M is a stable model of P if and only if (1) for every p ∈ M , there is a proof scheme S for p such that M admits S, and (2) for every p ∈ / M , there is no proof scheme S for p such that M admits S. As stated in the introduction, restrictions on the number of proof schemes greatly reduces the possible complexity of the set of stable models of a recursive logic program P . But how many derivation schemes for an atom p can there be? If we allow P to be infinite, then it is easy to construct an example with infinitely many derivations of a single atom. Moreover given two proof schemes, one can insert one into the other (increasing appropriately the sets Ui in this process, with obvious restrictions). Thus various clauses Ci may be immaterial to the purpose of deriving p. This leads us to introduce a natural relation ≺ on proof schemes using a well-known device from proof theory. Namely, we define S1 ≺ S2 if S1 , S2 have the same conclusion and if every clause appearing in S1 also appears in S2 . Then a minimal proof scheme for p is defined to be a proof scheme S for p such that whenever S 0 is a proof scheme for p and S 0 ≺ S, then S ≺ S 0 . Note that ≺ is reflexive and transitive, but ≺ is not antisymmetric. However it is wellfounded. That is, given any proof scheme S , there is an S 0 such that S 0 ≺ S and for every S 00 , if S 00 ≺ S 0 , then S 0 ≺ S 00 . Moreover, the associated equivalence relation, S ≡ S 0 , defined by S ≺ S 0 and S 0 ≺ S, has finite equivalence classes. Example 1. Let P1 be the following program: C1 : p(0) ← ¬q(Y ). C2 : nat(0) ← . C3 : nat(s(X)) ← nat(X). Then atom p(0) possesses infinitely many minimal proof schemes. For instance, each one-element sequence Si = hhp(0), C1 Θi , {si (0)}ii where Θi is the operation of substituting si (0) for Y , is a minimal proof scheme for p(0). However if program P2 is the result of replacing clause C1 by C10 : q(s(Y )) ← ¬q(Y ), each atom possesses only finitely many minimal proof schemes.
38
D. Cenzer, J.B. Remmel, and A. Vanderbilt
We shall call a program P locally finite if for every atom p, there are only finitely many minimal proof schemes with conclusion p. If P is locally finite and p ∈ HP , we let Dp denote the union of all supports of minimal proof schemes of p. Clearly for any M ⊆ HP , the question of whether p has a P, M -derivation depends only on M ∩Dp . This implies that if P is locally finite, when we attempt to construct a subset M ⊆ HP which is a stable model for P , we can apply a straightforward (although still infinite) tree construction to produce such an M , if such an M exists at all. Next, we need to make the notion of a recursive program precise. First, assume that we have a G¨odel numbering of the elements of the Herbrand base HP . Thus, we can think of each element of the Herbrand base as a natural number. odel number of p. Let ω = {0, 1, 2, . . .}. If p ∈ HP , write c(p) for the code or G¨ Assume [, ] is a fixed recursive pairing function which maps ω × ω onto ω which has recursive projection functions π1 and π2 , defined by πi ([x1 , x2 ]) = xi for all x1 and x2 and i ∈ {1, 2}. Code a finite sequence hx1 , . . . , xn i for n ≥ 3 by the usual inductive definition [x1 , . . . , xn ] = [x1 , [x2 , . . . , xn ]]. Next, code finite subsets of ω via “canonical indices”. The canonical index of the empty set, ∅, is the number 0 and Pnthe canonical index of a nonempty set {x0 , . . . , xn }, where x0 < . . . < xn , is j=0 2xj . Let Fk denote the finite set whose canonical index is k. Once finite sets and sequences of natural numbers have been coded, we can code more complex objects such as clauses, proof schemes, etc. as follows. Let the code c(C) of a clause C = p ← q1 , . . . , qn , ¬r1 , . . . , ¬rm be [c(p), k, l], where k is the canonical index of the finite set {c(q1 ), . . . , c(qn )}, and l is the canonical index of the finite set {c(r1 ), . . . , c(rm )}. Similarly, let the code c(S) of a proof scheme S = hhpi , Ci , Ui iisi=1 be [s, [[c(p1 ), c(C1 ), c(U1 )], . . . , [c(ps ), c(Cs ), c(Us )]]], where for each i, c(Ui ) is the canonical index of the finite set of codes of the elements of Ui . The first coordinate of the code of a proof scheme is the length of the proof scheme. Once we have defined the codes of proof schemes then for locally finite programs we can define the code of the set Dp consisting of the union of the supports of all minimal proof schemes for P . Finally we code recursive sets as natural numbers. Let φ0 , φ1 , . . . be an effective list of all partial recursive functions where φe is the partial recursive function computed by the e-th Turing machine. By definition, a (recursive) index of a recursive set R is an e such that φe is the characteristic function of R. Call a program P recursive if the set of codes of the Herbrand universe HP is recursive and the set of codes of the clauses of the program P is recursive. If P is a recursive program, then by an index of P we mean the code of a pair [u, p] where u is an index of the recursive set of all codes of elements in HP and p is an index of the recursive set of the codes of all clauses in P . For the rest of this paper we shall identify an object with its code as described above. This means that we shall think of the Herbrand universe of a program, and the program itself, as subsets of ω and clauses, proof schemes, etc. as elements of ω. We also need to define various types of recursive trees and Π10 classes. Let be the set of all finite sequences from ω and let 2 F to the truth values. An interpretation I for P is a total mapping from BP to {T, M, U, F }. We denote by I T (resp. I M , I U , I F ) the set of atoms whose truth value according to I is true (resp. must-be-true, undefined, false). Intuitively, the mbt truth value is assigned to atoms that cannot be derived from any program rule at the current computation step, but must eventually be true in the answer set to be computed. These atoms are not immediately taken as true in order to guarantee the “supportedness” of the interpretation at hand. This is a main peculiarity of answer sets w.r.t. ordinary models: Any atom p belonging to an answer set of P has a rule which supports p. Formally, p is true w.r.t. a given answer set S if and only if there exists a rule r ∈ P such that p ∈ H(r), H(r) − {p} is f alse w.r.t. S, and B(r) is true w.r.t. S − {p}. Thus, enforcing supportedness is a principal difference between DLP systems and satisfiability solvers (like, e.g., the Davis-Putnam procedure). Given an interpretation I for P, the function valI is defined as follows: For any atom p ∈ BP , valI (p) = I(p), and valI (not p) = not valI (p). Accordingly, for a ground rule r ∈ P, we define valI (H(r)) (resp., valI (B(r))) as the maximum (minimum) value assigned by valI over the literals in H(r) (B(r)). If H(r) = ∅, i.e., r is a constraint, then valI (H(r)) = F . r is satisfied w.r.t. I if the truth value of its head is not less than the truth value of its body, i.e. valI (H(r)) ≥ valI (B(r)). Moreover, for every atom p, we define support(p) as the set of rules in ground(P) such that valI (H(r) − {p}) < M and valI (B(r)) > F , i.e., the rules which can be potentially used to derive the truth of p “starting” from the interpretation I. |support(p)| denotes the cardinality of support(p). The procedure det cons is shown in Figure 2. Given an interpretation I for P, it extends I by what we call the deterministic consequences of I w.r.t. P. It can assign F , M or T to any undefined atom, but can only assign T to mbt atoms. det cons also detects inconsistencies, e.g. if some mbt atom should get the value F . As long as det cons has modified the interpretation I for the ground program P, the Boolean variable modif ied is true at the end of the repeat loop of Step 2. If any inconsistency is detected, the procedure immediately aborts by means of an exit instruction. Steps 4–12 focus on rules which are not satisfied (in the 4-valued sense described above). If the head of the rule is false and its body is either true or mbt, then the procedure exit returning contradiction = true, because there is no way
Pushing Goal Derivation in DLP Computations
181
Procedure det cons(P: Program; var I: Interpretation; var contradiction: Boolean) (* Computes the deterministic consequences for P w.r.t. I *) var modified: Boolean; begin (1) contradiction := f alse; (2) repeat (3) modif ied := f alse; (* Enforce satisfaction of all rules *) (4) for each rule r ∈ ground(P) not satisfied w.r.t. I do (5) if valI (B(r)) ≥ M and valI (H(r)) = F (6) contradiction := true; exit procedure; (7) else if valI (B(r)) ≥ M and valI (H(r) − {p}) = F for a p ∈ H(r) then (8) I(p) := valI (B(r)); modif ied := true; (9) if valI (H(r)) = F and valI (B(r) − {L}) ≥ M for some undefined literal L ∈ B(r) then (10) modif ied := true; (11) if L is a positive literal p then I(p) := F ; (12) else (* L is a negative literal not p *) I(p) := M ; end for; (* Ensure supportedness *) (13) if |support(p)| = 0 and I(p) ≥ M for some atom p then (14) contradiction := true; exit procedure; (15) for each atom p s.t. I(p) = U and |support(p)| = 0 do (16) I(p) := F ; (17) for each atom p s.t. I(p) ≥ M and |support(p)| = 1 do Let r be the (unique) rule in support(p); (18) for each undefined atom q ∈ (H(r) − {p}) do (19) I(q) := F ; modif ied := true; (20) for each undefined positive literal q ∈ B(r) do (21) I(q) := M ; modif ied := true; (22) for each undefined negative literal not q ∈ B(r) do (23) I(q) := F ; modif ied := true; end for; (24) until not modified end procedure
Fig. 2. Function for computing the deterministic consequences
to satisfy r (recall that true and f alse atoms cannot be changed and mbt can evolve only into true). Steps 7–12 enforce the satisfaction of a rule r ∈ P if this can be done deterministically, i.e., by changing the value of exactly one literal occurring in r. Consider Steps 7–8: If the truth value of B(r) according to I is X, where X is at least M , i.e., mbt or true, and every atom in the head of r is f alse, except for one atom p, we can draw a deterministic consequence. We enforce the satisfaction of r by incrementing the truth value of p up to the value of B(r).
182
W. Faber, N. Leone, and G. Pfeifer
For instance, if p is either undefined or mbt w.r.t. I and valI (B(r)) = T , then I is modified by assigning the value T to p, denoted by I(p) := T in the algorithm. Note that this is the only step which can assign the value true to an atom, that is, det cons assigns the value true to an atom p only if p is “supported”. Now, consider Steps 9–12: If the head of r is f alse, but its body is at least mbt, except for one undefined literal L, then L should get the truth value f alse in order to satisfy r. Note that, if L is a negative literal not p, this is accomplished by setting I(p) := M . Indeed, declaring p true would not guarantee the “supportedness” of this atom. Steps 13–23 draw deterministic conclusions following the “supportedness” principle; they are a main novelty of our approach. In particular, if a true or mbt atom has no supporting rule according to the interpretation I, then we get a contradiction (Step 14), while an undefined atom without any supporting rule can be declared f alse (Steps 15–16). If a true or mbt atom p has only one supporting rule r, i.e., support(p) = {r}, then r must be able to derive the truth of p. Thus, we enforce that p is derivable from r assigning suitable truth values to every undefined literal occurring in r (see Steps 18–23). This is a sort of backward propagation step: From the truth of the head we derive that all body literals must be true. Example 1. Consider the program from Figure 1 applied to (the encoding of) the graph of Figure 3 starting with the “empty” interpretation I, where I T = I M = I F = ∅ and I U = BP . By rule (i), reached(a) is immediately derived (by steps 4–8). The constraint (v) essentially serves as a query that assures that all nodes are indeed reached. As node(X) is true for all nodes, reached(X) is derived as mbt by means of Steps 9–12, for each X ∈ {a, b, c, d, e}. The mbt atom reached(b) is only derivable by a single ground instance3 of rule (ii), namely reached(b) :- reached(a), inPath(a,b). At this point, the backward propagation step described above comes into play and sets inPath(a,b) to mbt (lines 17–21). Then, support(outPath(a,b)) becomes empty, since the only rule with outPath(a,b) in the head contains also the mbt inPath(a,b). Thus, outPath(a,b) is derived as false (lines 15–16). In turn, this causes Steps 4–8 to derive inPath(a,b) as true. Now we easily derive reached(b) as true from (ii). Also inPath(c,d) and outPath(c,d) are derived as true and false, respectively, in analogy to inPath(a,b) and outPath(a,b). Each node in a Hamiltonian path has exactly one outgoing arc, and indeed Steps 9–12 derive inPath(a,c) and inPath(a,e) as false, which then leads to outPath(a,c) and outPath(a,e) to be set to true. Now we derive inPath(b,c) and inPath(d,e) as true in the same way we derived inPath(a,b) above, and eventually are able to obtain reached(c), reached(d), and reached(e). That is, starting from an “empty” interpreta3
Note that the instantiation procedure of dlv generates only ground rules that are constructible from the facts in the input ([4]).
Pushing Goal Derivation in DLP Computations
183
tion, a single invocation of det cons has deterministically and efficiently found the Hamiltonian path for this graph.
b
a
c
d
e
Fig. 3. Example graph 1
Recall that a total model S for P is an interpretation satisfying the following properties: (i) every rule of P is satisfied w.r.t. S, and (ii) S M ∪ S U = ∅, i.e., every atom is either true or false w.r.t. S. The partial order on the four truth values is defined through the following relationships: U F , U M , U T , and M T ; moreover, X X for any X ∈ {F, U, M, T }. We say that an interpretation I 0 extends an interpretation I if, for each atom p, I(p) I 0 (p). Intuitively, I 0 represents more concrete knowledge than I does. The correctness of our algorithm relies on the following property of det cons. Theorem 1. Let P and I be the program and the interpretation, resp., given as inputs to det cons, and denote by I 0 and contradiction0 the value of the variables I and contradiction, resp., at the end of the procedure. Then, 1. I 0 extends I; 2. every answer set S for P which extends I, extends I 0 as well; 3. if contradiction0 holds, then no answer set S 0 for P extends I. It is worthwhile noting that det cons has been implemented very carefully in our system. By using sophisticated data structures for representing rules and interpretations, it runs in linear time, i.e., in time O(k P k + k I k), where k · k denotes the size of an object.
4
Overall Model Generation Algorithm
In this section we briefly review the model generation algorithm (Model Generator) of the dlv system, in order to show how the new notion of mbt and the new function det cons are employed. The Model Generator (MG) produces a set of interpretations that are “candidates” for answer sets, which are then submitted to the Model Checker for verification. The Model Generator essentially relies on a backtracking technique which spans the search space for computing all answer sets.
184
W. Faber, N. Leone, and G. Pfeifer
Basically, the MG works as follows: (1) Derive what is deterministically derivable from the program, (2) make an “educated” guess for one of those literals which have not been decided yet, and (3) propagate the consequences of this choice. This process is recursively applied until either a contradiction arises, or no further guess can be made. In the former case, MG backtracks and modifies the last choice; in the latter case, we have an answer set candidate and the Model Checker is called. If the candidate is not an answer set, backtracking is performed. To formalize what we have called “educated guess” before, we introduce the concept of a possibly-true (PT) literal: Definition 1. Let I be an interpretation for P. A positive PT literal of P w.r.t. I is a positive literal p such that U ≤ I(p) ≤ M and there exists a rule r ∈ ground(P) for which all of the following conditions hold: 1. p ∈ H(r); 2. valI (H(r)) < T (i.e., the head is not true w.r.t. I); 3. valI (B(r)) = T (i.e., the body is true w.r.t. I). A negative PT literal of P w.r.t. I is an undefined negative literal not q such that there exists a rule r ∈ ground(P) for which all of the following conditions hold: 1. 2. 3. 4.
not q ∈ B − (r); valI (H(r)) < T (i.e., the head is not true w.r.t. I); valI (B + (r)) = T (i.e., the body is true w.r.t. I). valI (B − (r)) ≥ U (i.e., no negative literal of the body is false w.r.t. I).
The set of all (positive and negative) PT literals of P w.r.t. I is denoted by P TP (I). Example 2. Consider the program P = {a ∨ b :- c, not d. e :- c, not f.} and let I = {c, not d} be an interpretation for P. Then, we have three PT literals of P w.r.t. I: a, b and not f . The actual algorithm for computing answer sets is shown in Figure 4. There, isAnswerSet is a function which returns true iff I T is an answer set for P. It is worth noting that the essence of the MG, based on the notion of PT, has not significantly changed w.r.t. previous versions; the reader is referred to [10,3,4] for further details on the other features.
5
Heuristics
In this section we focus on the question how to select PT literals in line (6) of ComputeAnswerSets in Figure 4, such that the likelihood of finding an answer set is maximized.
Pushing Goal Derivation in DLP Computations
185
Algorithm ComputeAnswerSets Input: A ground DLP program P. Output: The answer sets of P (if any). Procedure ComputeAnswerSets(I: Interpretation) (* The procedure outputs all answer sets of P *) var Q: SetOfLiterals; L: Literal; (1) det cons(P, I,contradiction); (2) if contradiction then exit procedure; (3) if (P TP (I) = ∅) then (* I T ∪ I M is a model of P *) (4) if (I M = ∅) and isAnswerSet(P, I T ) then (5) output I T ; (* I T is an answer set *) else (6) Take a literal L from P TP (I); (* Assume the truth of a PT literal *) (7) if L is a negative literal not p then (8) I(p) := F ; else (* L is a positive literal *) (9) I(L) := T ; (10) ComputeAnswerSets(I); (* At this point all answer sets containing I ∪{L} have been generated *) (* L must be false in following computations *) (11) if L is a negative literal not p then (12) I(p) := M ; else (13) I(L) = F ; (14) ComputeAnswerSets(I); end procedure var I: Interpretation; begin (* Main *) I T := ∅; I F := ∅; I M := ∅; I U := BP ; ComputeAnswerSets(I); end. Fig. 4. Algorithm for the Computation of Answer Sets
To this end we employ so-called “lookahead”, that is, we temporarily assume the truth of one PT literal at a time 4 and perform the deterministic derivations, i.e. we apply the det cons function. On the basis of the changes which have been derived during this lookahead, we then make the decision which PT literal should be taken. (Note that the smodels system [15,16] also employs lookahead, but they use a completely different heuristics.) Definition 2. A mbt atom p is said to be of level n (w.r.t. an interpretation I), if |support(p)| = n (w.r.t. I). 4
For a negative literal this means assigning false to its atom.
186
W. Faber, N. Leone, and G. Pfeifer
During the lookahead we record the following counters for each PT literal p: mbt− (p) The overall number of eliminated mbt atoms (mbt which became true). mbt+ (p) The overall number of inserted mbt atoms (undefined which became mbt). mbt− 2 (p) The number of eliminated mbt atoms of level 2. mbt+ 2 (p) The number of inserted mbt atoms of level 2. mbt− 3 (p) The number of eliminated mbt atoms of level 3. mbt+ 3 (p) The number of inserted mbt atoms of level 3. The respective level is w.r.t. the interpretation at the moment the mbt atom is assigned true. In addition, we define some difference functions: ∆mbt (p) = mbt− (p) − mbt+ (p) + ∆mbt2 (p) = mbt− 2 (p) − mbt2 (p) − ∆mbt3 (p) = mbt3 (p) − mbt+ 3 (p) Concerning heuristics itself, we have defined a heuristic relation over the set of PT literals as follows: Definition 3. Given two PT literals a and b, we define an ordering relation > as follows: If (mbt− (a) = 0 ∧ mbt− (b) > 0) ∨ (mbt− (a) > 0 ∧ mbt− (b) = 0) then a > b ⇔ mbt− (a) > mbt− (b) otherwise a > b holds if one of the following conditions applies: 1. ∆mbt (a) > ∆mbt (b) 2. ∆mbt2 (a) > ∆mbt2 (b) ∧ ∆mbt (a) = ∆mbt (b) 3. ∆mbt3 (a) > ∆mbt3 (b) ∧ ∆mbt (a) = ∆mbt (b) ∧ ∆mbt2 (a) = ∆mbt2 (b) Further, let a = b be true if a 6> b ∧ b 6> a. In other words, if exactly one of mbt− (a) and mbt− (b) is zero, we prefer the PT literal for which mbt− is non-zero. Otherwise (i.e., both mbt− (a) and mbt− (b) are zero or both are non-zero), we prefer the one for which the overall number of mbt atoms becomes smaller. If this number is equal, we prefer the one for which the overall number of mbt atoms of level 2 becomes smaller. If also this number is equal, we use the number of mbt atoms of level 3. Otherwise, we consider them to be equal. The reasoning behind this relation is that the total number of mbt atoms can be viewed as constraints which are not yet satisfied but eventually have to be for any answer set. So the fewer mbt atoms there are, the smaller is the distance to an answer set. Additionally, mbt atoms of level 2 and 3 are the ones which are the “hardest” to become satisfied (observe that mbt atoms of level 1 are always derived by det cons). The purpose of the test whether exactly one of mbt− (a) or mbt− (b) is zero is that in this case we want to avoid preferring a PT literal, which only introduces
Pushing Goal Derivation in DLP Computations
187
new mbt atoms but does not eliminate any, over one which eliminates some but introduces more (the former is like a “null action”). The guessing step in the Model Generator (line (6) in ComputeAnswerSets in Figure 4) takes a PT literal which is a maximum w.r.t. ≥. Example 3. Consider again the program for computing Hamiltonian paths shown in Figure 1, now together with the encoding of the graph depicted in Figure 5 plus start(a). b
a
c
d
e
Fig. 5. Example graph 2
By the first call to det cons, only reached(a) is set to true, while reached(b), reached(c), reached(d), and reached(e) are assigned mbt because of the single literal constraints obtained by (v) (see Appendix A). The choice rule (iii) is instantiated with the arcs (see Appendix A); these rules supply the PT literals (all of which are positive). Note that the rules which define the predicate reached are instantiated in a way such that reached(n) occurs in the head of exactly two rules for each node n (apart from a). This is because each of these nodes has exactly two incoming arcs. Each of the reached(n) (n = {b, . . . , e}) needs support, but it is not yet known which of the two rules will supply it eventually. To evaluate the heuristic relation, we perform a lookahead: for each PT L, we assume L true, compute its deterministic consequences (by a call to det cons), and store the values of the respective mbt counters. Let us first consider the PT literal inPath(a,b): Upon assuming it true, we immediately derive reached(b) as true, and thus eliminate a mbt atom of level 2 (since it occurs in the head of two unsatisfied rules). By statements (9) – (11) in det cons we derive falsity for inPath(a,c), inPath(a,d), inPath(a,e), and inPath(d,b), reflecting the fact that no two arcs in the Hamiltonian path may begin in the same node or end in the same node (constraints (iv) in Figure 1). After that, for each of reached(c), reached(d), reached(e) only one supporting rule is left, so we can infer that the yet undefined positive body literals of these rules (inPath(b,c), inPath(c,d), inPath(d,e)) are mbt. Moreover, since each of them occurs in the head of exactly one rule and the body of this rule is true, we infer them as true immediately afterwards and eventually we also infer reached(c), reached(d), and reached(e) as true. These steps are visualized in Figure 6, where bold arcs are in the Hamiltonian path, while dashed arcs are not.
188
W. Faber, N. Leone, and G. Pfeifer b
a
b c
d
e
a
b c
d
e
a
c
d
e
Fig. 6. Steps during lookahead for inPath(a,b)
In total, the deterministic derivation has generated 3 new mbt atoms (all of which have subsequently been derived as true) and eliminated 7 mbt atoms, one of which was of level 2. All PT literals and their corresponding heuristic-relevant function values, ordered by ≥, are shown in Table 1. Those which are not listed (inPath(a,c), inPath(a,d), outPath(b,c), outPath(c,d)) generate an inconsistency during propagation. + − + PT literal mbt− mbt+ mbt− 2 mbt2 mbt3 mbt3 inPath(a,b) 7 3 1 0 0 0 outPath(a,e) 8 4 0 0 0 0 outPath(d,b) 8 4 0 0 0 0 inPath(d,e) 7 3 0 0 0 0 inPath(a,e) 4 3 1 0 0 0 outPath(a,b) 5 4 0 0 0 0 inPath(d,b) 4 3 0 0 0 0 outPath(a,b) 5 4 0 0 0 0 outPath(a,c) 1 1 0 0 0 0 outPath(a,d) 1 1 0 0 0 0 inPath(b,c) 0 0 0 0 0 0 inPath(c,d) 0 0 0 0 0 0
Table 1. PT literals and their values, ordered by ≥
Thus, following the heuristics, the PT inPath(a,b) is chosen by our computation. Then, the propagation of it, done by det cons, immediately leads to the computation of the Hamiltonian path. Thanks to the heuristics only one choice was sufficient! Note that performing lookahead has an additional merit: If an inconsistency is detected during the propagation of the PT literal, we can then set it to false and apply det cons, thus pruning the search tree quite a bit.
6
Some Experimental Results
We have conducted a number of experiments, in order to show the usefulness of the various techniques introduced in this paper. To this end we have compared several versions of dlv: The first one is the release of February 10th , 1999. It contains only a small part of det cons, notably without the notion of mbt atoms and also without the part ensuring supportedness. Also heuristics are not included.
Pushing Goal Derivation in DLP Computations
189
The second one is the release of April 6th , 1999. This one contains the fully implemented first part of det cons, i.e. statements (4) – (12). The part ensuring supportedness is missing (as in the previous version), and also heuristics are not yet included. The third version is the release of May 28th , 1999. It contains the full implementation of det cons, but heuristics are not included. Finally, the fourth version is the previous one, enriched by heuristics. The public release of this version is dated June 8th , 1999. We have benchmarked a set of blocksworld instances, most of which are taken from [6] (except for P5). We use an encoding of the problem domain which is different from the one in [6], but which is also derived from an encoding in an action language. You can find the domain encoding and the instances in Appendix B. Blocksworld Examples P1 to P5
Runtime in seconds
800
600
400
without enhancements with must-be-true (without supportedness) with full det_cons with full det_cons and heuristics
200
0 1
2
3 Instance #
4
5
Since we chose the Hamiltonian path problem as a running example, we have also picked a random graph with 25 nodes and 60 arcs5 and run the program of Figure 1 and an arbitrarily picked starting node (node 0) with it. Version 1 could not find a Hamiltonian path within 1000 seconds, while version 2 found one in 716 seconds, and version 3 took 750 seconds. With heuristics enabled, dlv was able to find a path in 12.7 seconds!
References 1. W. Chen and D. S. Warren. Computation of Stable Models and Its Integration with Logical Query Processing. IEEE Transactions on Knowledge and Data Engineering, 8(5):742–757, 1996. 2. T. Eiter, W. Faber, N. Leone, and G. Pfeifer. The Diagnosis Frontend of the dlv System. AI Communications – The European Journal on Artificial Intelligence, 12(1–2):99–111, 1999. 3. T. Eiter, N. Leone, C. Mateis, G. Pfeifer, and F. Scarcello. A Deductive System for Nonmonotonic Reasoning. In Proc. LPNMR ’97, pages 363–374. 4. T. Eiter, N. Leone, C. Mateis, G. Pfeifer, and F. Scarcello. Progress Report on the Disjunctive Deductive Database System dlv. In Proc. FQAS ’98, pages 145–160. 5. T. Eiter, N. Leone, C. Mateis, G. Pfeifer, and F. Scarcello. The KR System dlv: Progress Report, Comparisons and Benchmarks. In Proc. KR’98, pages 406–417. 5
Generated by the Stanford Graphbase [9] using
random graph(25,60,0,0,0,0,0,1,1,60).
190
W. Faber, N. Leone, and G. Pfeifer
6. E. Erdem. Applications of Logic Programming to Planning: Computational Experiments. Unpublished draft, 1999. 7. M. Fitting. A Kripke-Kleene semantics for logic programs. Journal of Logic Programming, 2(4):295–312, 1985. 8. M. Gelfond and V. Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365–385, 1991. 9. D. E. Knuth. The Stanford GraphBase : a platform for combinatorial computing. ACM Press, New York, 1994. 10. N. Leone, P. Rullo, and F. Scarcello. Disjunctive stable models: Unfounded sets, fixpoint semantics and computation. Information and Computation, 135(2):69– 112, June 1997. 11. V. Lifschitz. Action Languages, Answer Sets and Planning. In K. Apt, V. W. Marek, M. Truszczy´ nski, and D. S. Warren, editors, The Logic Programming Paradigm - A 25-Year Perspective, pages 357–373. Springer Verlag, 1999. 12. V. W. Marek and M. Truszczy´ nski. Stable Models and an Alternative Logic Programming Paradigm. In K. Apt, V. W. Marek, M. Truszczy´ nski, and D. S. Warren, editors, The Logic Programming Paradigm - A 25-Year Perspective, pages 375–398. Springer Verlag, 1999. 13. J. Minker. On Indefinite Data Bases and the Closed World Assumption. In Proc. CADE ’82, pages 292–308. 14. I. Niemel¨ a. Logic Programs with Stable Model Semantics as a Constraint Programming Paradigm. In Proceedings of the Workshop on Computational Aspects of Nonmonotonic Reasoning, May 1998. 15. I. Niemel¨ a and P. Simons. Smodels - an implementation of the stable model and well-founded semantics for normal logic programs. In Proc. LPNMR ’97, pages 420–429. 16. P. Simons. Towards constraint satisfaction through logic programs and the stable model semantics. Research Report A47, Digital Systems Laboratory, Department of Computer Science, Helsinki University of Technology, Finland.
A
Instantiation of the Hamiltonian Path Program
Here are the rules of the Hamiltonian path program in Figure 1, instantiated with example graph 2 of Figure 5: :::::::::::-
inPath(a,b), inPath(a,b), inPath(a,b), inPath(a,c), inPath(a,c), inPath(a,c), inPath(a,d), inPath(a,d), inPath(a,d), inPath(d,b), inPath(d,e),
inPath(a,c). inPath(a,d). inPath(a,e). inPath(a,b). inPath(a,d). inPath(a,e). inPath(a,b). inPath(a,c). inPath(a,e). inPath(d,e). inPath(d,b).
::::::::-
inPath(a,b), inPath(d,b), inPath(a,c), inPath(b,c), inPath(c,d), inPath(a,d), inPath(d,e), inPath(a,e),
::::-
not not not not
inPath(d,b). inPath(a,b). inPath(b,c). inPath(a,c). inPath(a,d). inPath(c,d). inPath(a,e). inPath(d,e).
reached(b). reached(c). reached(d). reached(e).
Pushing Goal Derivation in DLP Computations inPath(a,b) inPath(a,c) inPath(a,d) inPath(a,e) inPath(b,c) inPath(c,d) inPath(d,b) inPath(d,e)
B
∨ ∨ ∨ ∨ ∨ ∨ ∨ ∨
reached(b) reached(b) reached(c) reached(c) reached(d) reached(d) reached(e) reached(e)
outPath(a,b). outPath(a,c). outPath(a,d). outPath(a,e). outPath(b,c). outPath(c,d). outPath(d,b). outPath(d,e).
::::::::-
reached(a), reached(d), reached(a), reached(b), reached(c), reached(a), reached(d), reached(a),
191
inPath(a,b). inPath(d,b). inPath(a,c). inPath(b,c). inPath(c,d). inPath(a,d). inPath(d,e). inPath(a,e).
The Blocksworld Domain and Instances
% specification of the move action move(B,L,T) v -move(B,L,T) :- block(B), location(L), actiontime(T), B L. % the effects of moving a block on(B,L,T1) :- move(B,L,T), #succ(T,T1). -on(B,L,T1) :- move(B, ,T), on(B,L,T), #succ(T,T1). % move preconditions % a block can be moved only when it’s clear :- move(B,L,T), on(B1,B,T). % if a block is moved onto another block, the latter must be clear :- move(B,B1,T), on(B2,B1,T), block(B1). % concurrent actions are not allowed :- move(B, ,T), move(B1, ,T), B B1. :- move( ,L,T), move( ,L1,T), L L1. % inertia on(B,L,T1) :- on(B,L,T), not -on(B,L,T1), #succ(T,T1). % time at which actions can be initiated actiontime(T) :- T < #maxint, #int(T). % location definition (blocks are defined in the problem instances) true. location(t) :- true. location(B) :- block(B). P1
1
2
4 3
P4 2 1 0
10 9 4 3
8 7 6 5
0 4 9
7 8 3
5 4 3 2 1
P2
4 3 2 1
1 4 3
1 2 10 6 5
2 5
P5 2 1 0
10 9 4 3
8 7 6 5
0 4 9
7 8 3
P3 2 3 0
1 2 10 5 6
4 1
7 6 5
7 3 4
2 6
Problem blocks steps P1 4 4 P2 5 6 P3 8 8 P4 11 9 P5 11 11
5 0 1
Linear Tabulated Resolution for the Well-Founded Semantics Yi-Dong Shen?1 , Li-Yan Yuan2 , Jia-Huai You2 , and Neng-Fa Zhou3 1
3
Department of Computer Science, Chongqing University, Chongqing 400044, P.R.China, [email protected] 2 Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada T6G 2H1, {yuan, you}@cs.ualberta.ca Department of Computer and Information Science, Brooklyn College, The City University of New York, New York, NY 11210-2889, USA, [email protected]
Abstract. Global SLS-resolution and SLG-resolution are two representative mechanisms for top-down evaluation of the well-founded semantics of general logic programs. Global SLS-resolution is linear but suffers from infinite loops and redundant computations. In contrast, SLG-resolution resolves infinite loops and redundant computations by means of tabling, but it is not linear. The distinctive advantage of a linear approach is that it can be implemented using a simple, efficient stack-based memory structure like that in Prolog. In this paper we present a linear tabulated resolution for the well-founded semantics, which resolves the problems of infinite loops and redundant computations while preserving the linearity. For non-floundering queries, the proposed method is sound and complete for general logic programs with the bounded-term-size property.
1
Introduction
Two representative methods have been presented in literature for top-down evaluation of the well-founded semantics of general logic programs: Global SLSresolution [5,6] and SLG-resolution [2,3]. Global SLS-resolution is a direct extension to SLDNF-resolution [4], which treats infinite derivations as failed and infinite recursions through negation as undefined. Like SLDNF-resolution, it is linear in the sense that for any derivation G0 ⇒C1 ,θ1 G1 ⇒ ... ⇒Ci ,θi Gi with Gi the latest generated goal, it makes the next derivation step either by expanding Gi by resolving a subgoal in Gi with a program clause, i.e. Gi ⇒Ci+1 ,θi+1 Gi+1 , or by expanding Gi−1 via backtracking. The distinctive advantage of a linear approach is that it can be implemented using a simple, efficient stack-based memory structure (like that in Prolog). However, Global SLS-resolution inherits ?
Currently on leave at Department of Computing Science, University of Alberta, Canada. Email: [email protected]
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 192–205, 1999. c Springer-Verlag Berlin Heidelberg 1999
Linear Tabulated Resolution for the Well-Founded Semantics
193
from SLDNF-resolution the two serious problems: infinite loops and redundant computations. SLG-resolution (similarly, Tabulated SLS-resolution [1]) is a tabling mechanism for top-down evaluation of the well-founded semantics. The main idea of tabling is to store intermediate results of relevant subgoals and then use them to solve variants of the subgoals whenever needed. Since no variant subgoals will be recomputed by applying the same set of program clauses, infinite loops can be avoided and redundant computations be substantially reduced. Like other existing tabling mechanisms, SLG-resolution adopts the solution-lookup mode. That is, all nodes in a search tree/forest are partitioned into two subsets, solution nodes and lookup nodes. Solution nodes produce child nodes using program clauses, whereas lookup nodes produce child nodes using answers in the tables. As an illustration, consider the derivation p(X) ⇒Cp1 ,θ1 q(X) ⇒Cq1 ,θ2 p(Y ). Assume that no answers of p(X) have been derived. Since p(Y ) is a variant of p(X) and thus a lookup node, the next derivation step is to expand p(X) against a program clause, instead of expanding the latest generated goal p(Y ). Apparently, such a derivation is not linear. Because of such non-linearity, SLGresolution can neither be implemented using an efficient stack-based memory structure nor utilize those useful strictly sequential operators such as cuts in Prolog. This has been evidenced by the fact that a well-known tabling system, XSB, which is an implementation of SLG-resolution [7,8,9], disallows clauses like p(.) ← ..., t(.), !, ... where t(.) is a tabled subgoal, because the tabled predicate t occurs in the scope of a cut [9]. The objective of our research is to develop a linear tabling method for topdown evaluation of the well-founded semantics of general logic programs, which resolves infinite loops and redundant computations, without sacrificing the linearity of SLDNF-resolution. In an earlier paper [11], we presented a linear tabling mechanism called TP-resolution for positive logic programs (“TP” for “Tabulated Prolog”). In TP-resolution, each node in a search tree can act both as a solution node and as a lookup node, regardless of when and where it is generated. This represents an essential difference from existing tabling approaches. The main idea is as follows: For any selected subgoal A at a node Ni labeled with a goal Gi , we first try to use an answer I in the table of A to generate a child node Ni+1 , which is labeled by the resolvant of Gi and I. If such answers are not available in the table, we then resolve A against program clauses in a top-down order, except for the case where the derivation has stepped into a loop at Ni . In such a case, the subgoal A will skip the clause that is being used by its ancestor subgoal that is a variant of A. For example, for the derivation p(X) ⇒Cp1 ,θ1 q(X) ⇒Cq1 ,θ2 p(Y ), we will expand p(Y ) by resolving it against the program clause next to Cp1 . Thanks to its linearity, TP-resolution can be implemented by an extension to any existing Prolog abstract machines such as WAM [14] or ATOAM [15]. In this paper, we extend TP-resolution to TPWF-resolution, which computes the well-founded semantics of general logic programs. The extension is nontrivial because of possible infinite recursions through negation. In addition to
194
Y.-D. Shen et al.
the strategy for clause selection adopted by TP-resolution, TPWF-resolution uses two critical mechanisms to deal with infinite recursions through negation. One is making assumptions for negative loop subgoals whose truth values are currently undecided, and the other is doing answer iteration to derive complete answers of loop subgoals. For non-floundered queries, TPWF-resolution is sound and complete for general logic programs with the bounded-term-size property. Section 2 will give an illustrative example to outline these main ideas, and Section 3 defines TPWF-trees based on these strategies. Section 4 presents the definition of TPWF-resolution and discusses its properties. 1.1
Notation and Terminology
Variables begin with a capital letter, and predicates, functions and constants with a lower case letter. By E we denote a list/tuple (E1 , ..., Em ) of elements. Let X = (X1 , ..., Xm ) be a tuple of variables and I = (I1 , ..., Im ) a tuple of terms. By X/I we denote an mgu {X1 /I1 , ..., Xm /Im }. By p(.) we refer to any atom with the predicate p and p(X) to an atom p(.) that contains the list X of distinct variables. For instance, if p(X) = p(W, a, f (Y, W ), Z), then X = (W, Y, Z). By a variant of an atom (resp. subgoal or term) A we mean an atom (resp. subgoal or term) A0 that is the same as A up to variable renaming.1 A set of atoms (resp. subgoals or terms) that are variants of each other are called variant atoms (resp. variant subgoals or variant terms). Moreover, for any element E by E being in a set S we understand a variant of E is in S. For convenience of describing our method, we use the four truth values: t (true), f (false), u (undefined), and u∗ (temporarily undefined), with ¬t = f , ¬f = t, ¬u = u, and ¬u∗ = u∗ . As its name suggests, u∗ will be used as a temporary truth value when the truth value (t, f or u) of a subgoal is currently undecided (due to the occurrence of loops). In addition to f ∧V = f and t∧V = V for any V ∈ {t, f, u, u∗ }, we have u ∧ u∗ = u∗ . Let A be an atom. By A∗ we refer to an answer A with truth value u∗ . Finally, clauses in a program with the same head predicate p are numbered sequentially, with Cpi referring to its i-th clause (i > 0).
2
Main Ideas
In this section, we outline the main ideas of TPWF-resolution through an illustrative example. Example 1. Consider the following program: P1 : p(X) ← q(X). p(a). q(X) ← ¬r. q(X) ← w. 1
By this definition, A is a variant of itself.
Cp1 Cp2 Cq1 Cq2
Linear Tabulated Resolution for the Well-Founded Semantics
q(X) ← p(X). r ← ¬s. s ← ¬r. w ← ¬w, v.
195
Cq3 Cr1 Cs1 Cw1
Let G0 =← p(X) be the query (top goal). Reasoning in the same way as Prolog,2 we successively generate the nodes N0 − N7 as shown in Fig. 1. Obviously Prolog will repeat the loop between N3 and N7 infinitely. However, we break the loop by disallowing N7 to select the clause Cr1 , which is being used by N3 . This makes N7 have no clause to unify with, which leads to backtracking. Since the loop is negative in the sense that it goes through negation, N7 should not be failed by falsifying r at this moment. Instead, r is assumed to be temporarily undefined (i.e. r = u∗ ). By definition r = u∗ (at N7 ) means ¬r = u∗ (at N6 ), so that s = u∗ (at N5 ) is derived. For the same reason, r = u∗ (at N3 ) is derived. N0 : p(X)
Cp1 Cq1
N3 : r
Cr1 ?
N4 .: ¬s ..
. ..
. ..
?
U
N1 : q(X)
N13 : q(X)
H Cq3 Cq2 ? H j H
N2 .: ¬r ..
HHCp1 j H
N8 : w
N12 : p(X)
C w1 ?
N9 : . ¬w, v ..
. ..
N10 : w
Cq3 ?
N14 : p(X)
Cp2 ?
T
?
N11 : v
N5 : s
Cs1 ?
N6 : . ¬r ..
. ..
N7 : r
Fig. 1. TPWF-derivations.
We use two data structures, U A and U D, to keep atoms that are assumed and derived to be temporarily undefined, respectively. Therefore, after these steps, U A = {r} and U D = {s, r}. We are then back to N3 . Since N3 is the top node of the loop, before failing it via backtracking we need to be sure that r has got its complete set of answers (r = t or r = u or r = f ). This is achieved by performing answer iteration via the loop. That is, we regenerate the loop to see if any new answers can be derived until we reach 2
That is, we use the following control strategy: Depth-first (for goal selection) + Left-most (for subgoal selection) + Top-down (for clause selection) + Last-first (for backtracking).
196
Y.-D. Shen et al.
a fixpoint. We use a flag variable N EW , with N EW = 0 initially. Whenever a new answer with truth value t or u for any subgoal is derived, N EW is set to 1. Before starting an iterate, we set N EW = 0 and U A = U D = {}. The answer iteration stops by the end of some iterate, where N EW = 0 and (U A ⊆ U D or U D = {}). The fact that N EW = 0 and U A ⊆ U D indicates that the truth values of all atoms in U A totally depend on how they are assumed in the negative loop, which, under the well-founded semantics [12], amounts to saying that these truth values are undefined. Since up to now no answer with truth value t or u has been derived (i.e. N EW = 0) and U A = {r} ⊂ U D = {s, r}, the termination condition of answer iteration is satisfied. Therefore, we change the truth values of all atoms in U D from temporarily undefined to undefined (i.e. r = s = u) and memorize the new answers in respective tables. After the completion of answer iteration, we set U A = U D = {}. By definition, r = u (at N3 ) means ¬r = u (at N2 ), which leads to an answer node U for the top goal (see Fig. 1). That is, we have q(X) = u and p(X) = u, which are memorized in their tables. Now we backtrack q(X) at N1 . Applying Cq2 and Cw1 leads to N8 − N10 , which forms another negative loop. In the same way as above, we assume w = u∗ and put w into U A. So ¬w = u∗ , which leads to the node N11 . Since v is false, we backtrack to N9 and then to N8 , with N EW = 0, U A = {w} and U D = {}. Again, before leaving N8 via backtracking, we need to complete the answers of w by means of answer iteration via the loop. Obviously, the termination condition of answer iteration is satisfied. Here N EW = 0, w ∈ U A and w 6∈ U D suggests that w can not be inferred from the program whatever truth values we assign to the temporarily undecided subgoals in U A. This, under the well-founded semantics, implies that w is false. So we set w = f and come back to N1 again. Applying Cq3 leads to N12 . We see that there is a loop between N0 and N12 . Instead of selecting Cp1 which is being used by N0 , we use Cp2 to unify against p(X), which leads to an answer node T with mgu X/a. That is, p(a) = t and q(a) = t, which are added to the tables of p(X) and q(X), respectively (N EW is then set to 1). Since the loop N0 → N1 → N12 is positive, we backtrack to N1 and then to N0 , making no assumption. This time, we have N EW = 1, U A = {} and U D = {}. Since N0 is the top loop node and N EW = 1, we do answer iteration by regenerating the loop, which leads to N0 → N13 → N14 . Since Cp1 is being used by N0 and Cp2 has already been used before (by N12 , with the answer stored in the table of p(X)), p(X) at N14 has no clause to unify with. So we backtrack to N13 and then to N0 . Now, N EW = 0 and U A = U D = {}, so we end the iteration. Since N0 is the root, the evaluation of G0 terminates. The derived answers are: p(a) = q(a) = t, p(b) = q(b) = u for any b 6= a, r = s = u, and w = v = f . t u We see that these answers constitute the well-founded model for P1 .
Linear Tabulated Resolution for the Well-Founded Semantics
197
The tabulated resolution shown in Example 1 is obviously linear. Meanwhile, we see that it resolves infinite loops and redundant computations without losing any answers. Main points are summarized as follows: 1. Tabling. Tables are used to store intermediate results, which is the basis of all tabulated resolutions. 2. Clause selection. Without loops, clauses are selected in the same way as in Prolog except that clauses that have been used before will not be reapplied because the complete set of answers derived via those clauses has already been memorized in related tables. For example, N14 skips Cp2 because the clause has already been used by N12 . This avoids redundant computations. When a loop occurs, however, clauses that are being used by ancestor loop subgoals will be skipped. For example, Cr1 , Cw1 and Cp1 are skipped by N7 , N10 and N12 , respectively. This breaks infinite loops. 3. Assumption. For a positive loop subgoal, backtracking proceeds in the same way as in Prolog (see N12 and N14 ). A negative loop subgoal whose truth value is currently undecided, however, will be assumed temporarily undefined before being failed (see N7 and N10 ). Temporarily undefined values will be removed (from tables) when their t or u counterparts are derived. This guarantees the correctness of answers. 4. Answer iteration. Before leaving a loop by failing its top loop node (e.g. N3 , N8 and N0 ), iteration will be carried out to derive complete answers of loop subgoals. Without iteration, we would miss answers because some clauses have been skipped to break infinite loops. The process of answer iteration is briefly described as follows. Let Nt be the top loop node. We first check if the termination condition is satisfied (i.e. N EW = 0 and (U A ⊆ U D or U D = {})). If not, we start an iterate by setting N EW = 0 and U A = U D = {}. The iterate will regenerate the loop (e.g. N0 → N13 → N14 in Fig. 1). During the iterate, N EW , U A and U D will be updated accordingly. By the end of the iterate, i.e. when we come back to the top loop node Nt again and try to fail it via backtracking, we distinguish among the following cases: – N EW = 1, which means at least one new answer, with truth value t or u, has been derived (and added to the related table) during the iterate. So we start a new iterate to seek more answers. – N EW = 0 and (U A ⊆ U D or U D = {}). Stop the iteration with all temporarily undefined answers replaced by undefined ones. After this, the answers of all subgoals involved in the loop are completed (i.e. the tables of these subgoals contain all of their answers). We attach a flag COM P to each table with COM P = 1 standing for being completed. For any subgoal A whose table flag COM P is 1, its instance A0 is true if A0 = t is in the table of A, undefined if A0 = u is in the table but A0 = t is not, and false if neither is in the table. For instance, in the above example, p(a) = t and p(X) = u being in the table of p(X) shows that p(a) is true and p(b) is undefined for any b 6= a. – Otherwise. Let U C = U A−U D. Since N EW = 0, for all subgoals in U C we can not infer any new answers for them from the program whatever
198
Y.-D. Shen et al.
truth valus we assign to the temporarily undecided subgoals in U A. This implies that the answers of these subgoals have been completed, so we set the flag COM P of their tables to 1. Since the subgoals in U D − U C are still temporarilly undecided, we start next iterate. The iteration will terminate provided that the program has the bounded-term-size property [13].
3
TPWF-Trees
In this section we define TPWF-trees, which is the basis of TPWF-resolution. We begin by defining tables. 3.1
Tables
Let P be a logic program and p(X) an atom. Let P contain exactly Np clauses with a head p(.). A table for p(X), denoted T B(p(X)), is a four-tuple (p(X), T, C, COM P ), where 1. T = {T1 , T2 }, with T1 and T2 storing answers of p(X) with truth values t and u, respectively. 2. C is a vector of Np elements, keeping the status of Cpi s w.r.t. p(X). C[i] = 0 (resp. = 1) represents that the clause Cpi is no longer available (resp. still available) to p(X). 3. COM P ∈ {0, 1}, with COM P = 1 indicating that the answers of p(X) have been completed. For convenience, we use T B(p(X)) → t answer[i] and T B(p(X)) → u answer[i] to refer to the i-th answer in T1 and T2 , respectively, T B(p(X)) → clause status[i] to refer to the status of Cpi w.r.t. p(X), and T B(p(X)) → COM P to refer to the flag COM P . When a table T B(p(X)) is created, T1 = T2 = {}, the status of all clauses is initialized to 1, and COM P = 0. Answers in a table will be read sequentially from T1 followed by T2 . When T1 = T2 = {} and COM P = 1, p(X) = f . Example 2. Consider again the program P1 in Example 1. After node N14 is generated (see Fig. 1), we have the following tables: T B(p(X)) : T B(q(X)) : T B(r) : T B(s) : T B(w) :
(p(X), {{p(a)}, {p(X)}}, {1, 0}, 0), (q(X), {{q(a)}, {q(X)}}, {0, 0, 1}, 0), (r, {{}, {r}}, {0}, 1), (s, {{}, {s}}, {0}, 1), (w, {{}, {}}, {0}, 1).
t u
From Fig. 1 we observe that each node in the tree has a unique name (index) Ni that is labeled by a goal Gi , so that the left-most subgoal A1 = A (or A1 = ¬A) of Gi is uniquely determined by Ni . In order to keep track of A1 that resolves against both program clauses and tabled answers, we attach to Ni
Linear Tabulated Resolution for the Well-Founded Semantics
199
three pointers. Ni → t answer ptr and Ni → u answer ptr point to an answer in T B(A) → t answer and T B(A) → u answer, respectively. Ni → clause ptr points to a clause whose head is unifiable with A. This leads to the following. Definition 1. Let Gi be a goal ← A1 , ..., Am (m ≥ 1). By “register a node Ni with Gi ” we do the following: (1) label Ni with Gi ; (2) create the above three pointers for Ni , which unless otherwise specified are initialized to null. We assume two table functions: memo(.) and lookup(.). Let Ni be a node with the left-most subgoal A. Let I be an answer of A with truth type S ∈ {t, u, u∗ }. When T B(A) contains no answer with truth value t that is a variant of or more general than I, memo(Ni , I, S) adds I to T B(A) in the following way. When S = t, add I to the end of T B(A) → t answer, set T B(A) → COM P = 1 if I is a variant of A, and remove from T B(A) → u answer all J/J ∗ with J an instance/variant of I. Otherwise, if S = u (resp. S = u∗ ), add I (resp. I ∗ ) to the end of T B(A) → u answer provided that it contains no answer that is a variant of or more general than I (resp. I ∗ ). Let Ni and A be as above, and I and S be variables that are used for caching an answer and its truth type. lookup(Ni , I, S) fetches from T B(A) an answer with its truth type into I and S, respectively. If no answer is available in T B(A), I = null. 3.2
Resolvants
We now discuss how to resolve subgoals against program clauses as well as tabled answers. Let Ni be a node labeled by a goal Gi =← A1 , ..., Am (m ≥ 1) with A1 = p(X). Consider evaluating A1 using a program clause C = A ← B1 , ..., Bn (n ≥ 0), where A1 θ = Aθ.3 In Prolog, we will generate a new node labeled with the goal Gi+1 = (B1 , ..., Bn , A2 , ..., Am )θ, where we see that the mgu θ is consumed by all Aj s (j > 1), although the proof of A1 θ has not yet been completed (produced). In our tabulated resolution, however, we apply the PMF (for Prove-Memorize-Fetch) mode to resolve subgoals against clauses and tabled answers [11]. That is, we first prove (B1 , ..., Bn )θ. If it is true with some mgu θ1 , which means A1 θθ1 is true, we memorize the answer in the table T B(A1 ) if it is new. We then fetch an answer p(I) with truth type S from T B(A1 ) and apply it to the remaining subgoals of Gi . The process can be depicted more clearly in Fig. 2. Obviously the PMF mode preserves the original set of answers of A1 . Moreover, since only new answers of A1 are added to the table, all repeated answers of A1 will be precluded to apply to the remaining subgoals of Gi . The PMF mode can readily be realized by using the two table procedures memo(.) and lookup(.). That is, after resolving the subgoal A1 with the clause C, Ni gives a child node Ni+1 labeled with the goal Gi+1 =← (B1 , ..., Bn )θ, memo(Ni , p(X)θ, t), lookup(Ni , Ii , Si ), A2 , ..., Am . Note that the propagation of 3
Here and throughout, we assume that C has been standardized apart to share no variables with Gi .
200
Y.-D. Shen et al.
Resolve A1 against C
A1 θ = Aθ
-
Prove (B1 , ..., Bn )θ
A1 θθ1 is true
-
Memorize A1 θθ1
⇓ Apply (X/I, S) to A2 , ..., Am
(I, S)
Fetch an answer p(I)
⇐
T B(A1 )
Fig. 2. The PMF mode for resolving subgoals.
θ is blocked by the subgoal lookup(Ni , Ii , Si ) because the consumption (fetch) must be after the production (prove and memorize). Observe that after the proof of A1 is reduced to the proof of (B1 , ..., Bn )θ, memo(Ni , p(X)θ, t), lookup(Ni , Ii , Si ) by applying a program clause C, the truth value of an answer of A1 to be memorized must be the logical AN D of the truth values of the answers of all Bj θs. Such an AN D computation is carried out incrementally. Initially we have memo(Ni , p(X)θ, S 0 ) with S 0 = t. Then from j = 1 to j = n if Bj θ gets an answer Bj θθ0 with truth type S, the memo(.) subgoal is updated to memo(Ni , p(X)θθ0 , S 0 ∧ S). This leads to the following definition. Definition 2. Let G1 =← A1 , ..., Am be a goal, θ an mgu, and S ∈ {t, u, u∗ }. The resultant of applying (θ, S) to G1 is the goal G2 =← (A1 , ..., Ak−1 )θ, A0k θ, Ak+1 , ..., Am , where Ak is the left-most subgoal of the form memo(.) (if G1 contains no memo(.), k = m) and A0k is Ak with its answer type S 0 ∈ {t, u, u∗ } changed to S 0 ∧ S. The concept of resolvants of TPWF-resolution is then defined based on the PMF mode. Definition 3. Let Ni be a node labeled by a goal Gi =← A1 , ..., Am (m ≥ 1). 1. If A1 = p(X), let C be a clause A ← B1 , ..., Bn with Aθ = A1 θ, then a) The resolvant of Gi and C is the goal Gi+1 =← (B1 , ..., Bn )θ, memo(Ni , p(X)θ, t), lookup(Ni , Ii , Si ), A2 , ..., Am . b) Let p(I) be an answer of A1 with truth type S, then the resolvant of Gi and p(I) with S is the resultant of applying (X/I, S) to ← A2 , ..., Am . 2. If A1 = ¬B with B a ground atom, let B be the answer with truth type S ∈ {f, u, u∗ }, then the resolvant of Gi and B with S is the resultant of applying ({}, ¬S) to ← A2 , ..., Am . 3. If A1 is memo(Nh , q(I), S) and A2 is lookup(Nh , Ih , Sh ), let q(X) be the leftmost subgoal at node Nh , then (after executing the two functions) the resolvant of Gi and Ih with truth type Sh is the resultant of applying (X/Ih , Sh ) to ← A3 , ..., Am .
Linear Tabulated Resolution for the Well-Founded Semantics
3.3
201
Ancestor Lists and Loops
Loop checking is a principal feature of TPWF-resolution (see Example 1). Positive and negative loops are determined based on ancestor lists that are associated with subgoals. Definition 4. ([10] with slight modification) An ancestor list ALA is associated with each subgoal A in a tree (see the TPWF-tree below), which is defined recursively as follows. 1. If A is at the root, then ALA = {}. 2. Let A be at node Ni+1 . If A inherits a subgoal A0 (by copying or instantiation) from its parent node Ni , then ALA = ALA0 ; else if A is in the resolvant of a subgoal B at node Ni and a clause B 0 ← A1 , ..., An with Bθ = B 0 θ (i.e. A = Ai θ for some 1 ≤ i ≤ n), ALA = {(Ni , B)} ∪ ALB . 3. Let Gi =← ¬A, ... be the goal at Ni , which has a child node Ni+1 labeled by the goal Gi+1 =← A (the edge from Ni to Ni+1 is dotted; see Fig. 1). Then ALA = {¬} ∪ AL¬A . Let Gi at node Ni and Gk at node Nk be two goals in a derivation and A and A0 be the left-most subgoals of Gi and Gk , respectively. If A is in the ancestor list of A0 , i.e. (Ni , A) ∈ ALA0 , the proof of A needs the proof of A0 . In such a case, we call A (resp. Ni ) an ancestor subgoal of A0 (resp. ancestor node of Nk ). Particularly, if A is both an ancestor subgoal and a variant, i.e. an ancestor variant subgoal, of A0 , we say the derivation goes into a loop. The loop is negative if there is a ¬ ahead of (Ni , A) in ALA0 ; otherwise, it is positive. For example, the ancestor list of the subgoal r at N7 in Fig. 1 is ALr = {¬, (N5 , s), ¬, (N3 , r), ¬, (N1 , q(X)), (N0 , p(X))} and the ancestor list of the subgoal p(X) at N12 is ALp(X) = {(N1 , q(X)), (N0 , p(X))}. There is a negative loop between N3 and N7 , and a positive loop between N0 and N12 . 3.4
Control Strategy
Although in principle the tabulated approach presented in this paper is effective for any fixed control strategy, we choose to use the so called TP-strategy, which is the Prolog control strategy enhanced with mechanisms for selecting tabled answers. Definition 5 ([11]). By TP-strategy we mean: Depth-first (for goal selection) + Left-most (for subgoal selection) + Table-first (for program and table selection) + Top-down (for the selection of tabled answers and program clauses) + Lastfirst (for backtracking).
202
3.5
Y.-D. Shen et al.
Algorithm for Building TPWF-Trees
In order to simplify the presentation, we assume every subgoal has a table and that the flag variables COM P (in tables) and N EW are updated automatically. Moreover, we assume that whenever an atom A is assumed undefined (i.e. A = u∗ is assumed), A is added to U A, and that whenever A = u∗ is derived (memorized), A is added to U D (automatically). We assume that when selecting clauses to resolve with subgoals, all clauses whose status is “no longer available” are automatically skipped. Finally we assume a function return(A, S), which returns an answer A with truth type S. The truth type of return(A, S) is updated in the same way as memo( , , S). TPWF-trees are constructed based on TP-strategy using the following algorithm. Definition 6 (TPWF-Algorithm). Let P be a logic program, A an atom, and G0 =← A, return(A, t). Let ALA = {} be the ancestor list of A. The TPWFtree T FG0 of P ∪ {G0 } is constructed by applying the following algorithm until the answer N O or FLOUND is returned. tpwf (G0 , ALA ) : 1. Root Node: Register the root N0 with G0 and goto 2. 2. Node Expansion: Let Ni be the latest registered node labeled by Gi =← A1 , ..., Am (m > 0). Register Ni+1 as a child of Ni with Gi+1 if Gi+1 can be obtained as follows. Case 2.1: A1 is return(A0 , S). Return (A0 , S) if S 6= u∗ . When S = t or S = u, set Gi+1 = T and Gi+1 = U, respectively. Goto 3 with N = Ni . Case 2.2: A1 is memo(Nh , I, S) and A2 is lookup(Nh , Ih , Sh ). Execute the two table functions. If Ih = null, then goto 3 with N = Ni ; else set Gi+1 to the resolvant of Gi and Ih with truth type Sh and goto 2. Case 2.3: A1 = ¬B. If B is non-ground, set Gi+1 = FD and return FLOUND. Get an answer from T B(B). Let I be the answer with truth type S. Case 2.3.1: I 6= null. If S = t, then goto 3 with N = Ni ; else set Gi+1 to the resolvant of Gi and I with S and goto 2. Case 2.3.2: I = null. When T B(B) → COM P = 1, if T B(B) → u answer 6= {}, then goto 3 with N = Ni ; else set Gi+1 =← A2 , ..., Am and goto 2. Otherwise, let G00 =← B, return(B, t) and ALB = {¬} ∪ ALA1 . Call tpwf (G00 , ALB ) until N O or F LOU N D is returned. If F LOU N D is returned, then set Gi+1 = FD and return F LOU N D; else apply the answers in T B(B) to A1 (repeat Case 2.3) and then goto 3 with N = Ni . Case 2.4: A1 = p(X). Get an answer I with truth type S from T B(A1 ). If I 6= null, then set Gi+1 to the resolvant of Gi and I with S and goto 2; else Case 2.4.1: T B(A1 ) → COM P = 1. Goto 3 with N = Ni . Case 2.4.2: Ni is a top loop node. Do answer iteration and then goto 3 with N = Ni .
Linear Tabulated Resolution for the Well-Founded Semantics
203
Case 2.4.3: A1 ∈ U A. if A∗1 is in T B(A1 ) → u answer, then goto 3 with N = Ni ; else set Gi+1 to the resolvant of Gi and A1 with truth type u∗ , and goto 2.4 Case 2.4.4: Otherwise. If no loop occurs (i.e. A1 has no ancestor variant subgoal), then resolve A1 with the first clause available; else resolve A1 with the first clause below the one that is being used by its closest ancestor variant subgoal. If such a clause Cpj exists, then set Gi+1 to the resolvant of Gi and Cpj and goto 2; else goto 3 with N = Ni while assuming A1 = u∗ if the loop is negative. 3. Backtracking: If N is the root, return N O. Let Nf be the parent node of N with the left-most subgoal Af . If Af is a function, goto 3 with N = Nf . Otherwise, if N was generated from Nf by resolving Af with a clause Cj , then if Af is not involved in any loop, set T B(Af ) → clause status[j] = 0. Goto 2 with Nf as the latest registered node. The input of TPWF-Algorithm includes a top goal G0 =← A, return(A, t) and an ancestor list ALA . Its output is either F LOU N D, indicating that G0 is floundered, or N O, showing that there is no more answer for A, or (A0 , S), meaning that A0 is an answer of A with truth type S ∈ {t, u}. Observe that like SLDNF-resolution [4], when A1 = ¬B we may build a new tree for B (Case 2.3.2). In SLDNF-resolution, the two SLDNF-trees are totally independent. This leads to possible infinite negative loops. TPWF-resolution, however, connects the two TPWF-trees via the ancestor list {¬} ∪ ALA1 , so that negative loops can be detected effectively (see Fig. 1).
4
TPWF-Resolution
Definition 7. Let T FG0 be a TPWF-tree of P ∪{G0 }. All leaves of T FG0 labeled by T , U or FD are success, undefined and flounder leaves, respectively, and all other leaves are failure leaves. A TPWF-derivation is a partial branch in T FG0 starting at the root, which is successful, floundered, undefined or failed if it ends respectively with a success leaf, a flounder leaf, an undefined leaf and a failure leaf. The process of constructing TPWF-derivations is called TPWF-resolution. A goal G0 is floundered if it has a floundered TPWF-derivation. Let G0 be a non-floundered goal and I 0 be a variant of or more general than I. Then G0 is true with an answer I if there is a successful TPWF-derivation with (I 0 , t) returned; undefined with I if it is not true with any instance of I but there is an undefined TPWF-derivation with (I 0 , u) returned; false with I if it is neither true nor undefined with any instance of I. The following theorem follows from the basic fact: For any logic program with the bounded-term-size property, (1) the set of answers in any table is finite, (2) every TPWF-derivation is finite, and (3) answer iteration must reach a fixpoint. 4
For this case, no further backtracking will be allowed at this node.
204
Y.-D. Shen et al.
Theorem 1 (Termination of TPWF-resolution). Let P be a logic program with the bounded-term-size property and G0 =← A, return(A, t) a top goal. TPWF-Algorithm terminates with a finite TPWF-tree. TPWF-resolution cuts infinite loops and infinite recursions through negation by means of assumption and answer iteration. Positive loops are cut simply by backtracking, whereas negative loop subgoals whose truth values are currently undecided will be assumed temporarily undefined before being failed via backtracking. Temporarily undefined values will be removed (from tables) after their t or u counterparts are derived. This guarantees the correctness of loop cutting. Meanwhile, before leaving a loop by failing its top loop node, iteration will be carried out to derive complete answers of loop subgoals. For logic programs with the bounded-term-size property, the iteration must terminate with a fixpoint of answers. This leads to the following. Theorem 2 (Soundness and Completeness of TPWF-resolution). Let P be a logic program with the bounded-term-size property and G0 =← A, return(A, t) a non-floundered goal. Let W F (P ) be the well-founded model of P . Then 1. 2. 3. 4.
W F (P ) |= ∃(A) iff G0 is true with an instance of A; W F (P ) |= ¬∃(A) iff G0 is false with A; W F (P ) |= ∀(Aθ) iff G0 is true with Aθ; W F (P ) 6|= ∃(A) and W F (P ) 6|= ¬∃(A) iff G0 is undefined with A.
Acknowledgments We thank the anonymous referees for their helpful comments. The first author is supported in part by Chinese National Natural Science Foundation and Trans-Century Training Programme Foundation for the Talents by the Chinese Ministry of Education.
References 1. Bol, R. N., Degerstedt, L.: Tabulated Resolution for the Well-Founded Semantics. Journal of Logic Programming 34:2 (1998) 67-109 2. Chen, W. D., Swift, T., Warren, D. S.: Efficient Top-Down Computation of Queries under the Well-Founded Semantics. Journal of Logic Programming 24:3 (1995) 161199 3. Chen, W. D., Warren, D. S.: Tabled Evaluation with Delaying for General Logic Programs. J. ACM 43:1 (1996) 20-74 4. Lloyd, J. W.: Foundations of Logic Programming. 2nd edn. Springer-Verlag, Berlin (1987) 5. Przymusunski, T.: Every Logic Program Has a Natural Stratification and an Iterated Fixed Point Model. In: Proc. of the 8th ACM Symposium on Principles of Database Systems (1989) 11-21 6. Ross, K.: A Procedural Semantics for Well-Founded Negation in Logic Programs. Journal of Logic Programming 13:1 (1992) 1-22 7. Sagonas, K., Swift, T., Warren, D. S.: XSB as an Efficient Deductive Database Engine. In: Proc. of the ACM SIGMOD Conference on Management of Data. Minneapolis (1994) 442-453
Linear Tabulated Resolution for the Well-Founded Semantics
205
8. Sagonas, K., Swift, T., Warren, D. S.: An Abstract Machine for Tabled Execution of Fixed-Order Stratified Logic Programs.ACM Transactions on Programming Languages and Systems 20:3 (1998) 9. Sagonas, K., Swift, T., Warren, D. S., Freire, J., Rao, P.: The XSB Programmer’s Manual (Version 1.8) (1998) 10. Shen, Y. D.: An Extended Variant of Atoms Loop Check for Positive Logic Programs.New Generation Computing 15:2 (1997) 317-341 11. Shen, Y. D., Yuan, L. Y., You, J. H., Zhou, N. F.: Linear Tabulated Resolution Based on Prolog Control Strategy. Submitted for publication (1999) 12. Van Gelder, A., Ross, K., Schlipf, J.: The Well-Founded Semantics for General Logic Programs. J. ACM 38:3 (1991) 620-650 13. Van Gelder, A.: Negation as Failure Using Tight Derivations for General Logic Programs. Journal of Logic Programming 6:1&2 (1989) 109-133 14. Warren, D. H. D.: An Abstract Prolog Instruction Set. Technical Report 309, SRI International (1983) 15. Zhou, N. F.: Parameter Passing and Control Stack Management in Prolog Implementation Revisited. ACM Transactions on Programming Languages and Systems 18:6 (1996) 752-779
A Case Study in Using Preference Logic Grammars for Knowledge Representation Baoqiu Cui, Terrance Swift, and David S. Warren Department of Computer Science SUNY at Stony Brook Stony Brook, NY 11794-4400, U.S.A. {cbaoqiu,tswift,warren}@cs.sunysb.edu
Abstract. Data standardization is the commercially important process of extracting useful information from poorly structured textual data. This process includes correcting misspellings and truncations, extraction of data via parsing, and correcting inconsistencies in extracted data. Prolog programming offers natural advantages for standardizing: definite clause grammars can be used to parse data; Prolog rules can be used to correct inconsistencies; and Prolog’s simple syntax allows rules to be generated to correct misspellings and truncations of keywords. These advantages can be seen as rudimentary mechanisms for knowledge representation and at least one commercial standardizer has exploited these advantages. However advances in implementation and in knowledge representation — in particular the addition of preferences to logical formalisms — allow even more powerful and declarative standardizers to be constructed. In this paper a simple preference logic, that of [7] is considered. A fixed point semantics is defined for this logic and its tabled implementation within XSB is described. Development of a commercial standardizer using the preference logic of [7] is then documented. Finally, detailed comparisons are made between the preference logic standardizer and the previous Prolog standardizer illustrating how an advance in knowledge representation can lead to improved commercial software.
1
Introduction
Horn clauses have proven remarkably useful for parsing when their syntactic variant, definite clause grammars (DCGs), is employed. DCGs are commonly used to construct LL parses in Prolog — but DCGs can also implement the more powerful class of LR parses in Prologs that include tabling, such as XSB or YAP. Even LR parses, however, can prove cumbersome for implementing grammars that contain potential ambiguities such as the “dangling else” problem, which can arise with nested if-then-else statements in imperative programming languages. While LR grammars can be written to deterministically parse such potential ambiguities, the determinism comes at a cost of the conciseness of the grammar. This problem is especially important for natural language applications where ambiguities often occur and which may require a high degree of maintenance when a grammar written for one corpus of text is re-applied to a new corpus. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 206–220, 1999. c Springer-Verlag Berlin Heidelberg 1999
A Case Study in Using Preference Logic Grammars
207
As proposed in [7] a natural way to resolve the dangling else ambiguity is to declare preferences for one parse over another by adding preference clauses to Horn clauses. The resulting framework is called Preference Logic Programs (PLPs) or, in their grammar form Preference Logic Grammars (PLGs) [3]. PLPs contain a syntactic restriction that ensures that the preferences can be precomputed “statically”. The semantics of PLPs is also oriented to the subclass of programs that contain the optimal subproblem property which intuitively means that a preferred goal depends only on preferred subgoals. The PLPs of [7] are in this sense weaker than other formalisms such as that of Brewka [1]. On the other hand, PLPs are relatively easy to efficiently implement and are arguably easier for programmers to understand than more general frameworks. Despite the above restrictions, the usefulness of PLGs over DCGs for practical natural language analysis can be striking, particularly for the important commercial application of data standardization [12]. The problem of data standardization is to extract meaningful, standardized information from formatted textual strings. For instance, data standardization might seek to extract street address or telephone information from address strings contained in a relational database or XML page. To take a simple but concrete example, a relational database may contain the following (misspelled) textual string1 : TO THE ORDR OF ZZZ AUTOPARTS INC 129 WASHING TON STREET EL SEGUNDO CA A name and address standardizer might extract the company name, address, city and postal zip code all in a standard format, as the following record indicates: Organization: Street: PO BOX: City: State: Zip:
ZZZ AUTOPARTS 129 WASHINGTON ST EL SEGUNDO CA 90245
Data standardiation thus relies on parsing to extract the company name, street number and so on from the string; on techniques to infer missing information to provide the proper zip code for the string; on facilities to correct badly entered information to correct the street name; and on a detailed knowledge of a narrow domain to understand that the phrase TO THE ORDR OF is a preamble, and not part of a company name. At the same time, data standardization does not require techniques for understanding deep linguistic structures, and is performed over a relatively narrow semantic domain. Since nearly any large organization must maintain data about names and addresses (of suppliers, customers, etc), name and address standardization is of great commercial significance, and various tools exist that extract and standardize textual names and addresses. For instance a commercial Prolog standardizer has been developed by the second author [12] and used to standardize 1
All examples are from commercial data, with minor changes to protect the privacy of the entities involved.
208
B. Cui, T. Swift, and D.S. Warren
data for several commercial and government organizations. This standardizer required a significant implementation effort, consists of about 100,000 lines of Prolog code. The most widely used address standardizer, however, is written by the U.S. Postal Service and does not use logic programming techniques. Comparisons between the Postal standardizer and the standardizer of [12] indicate complimentary strengths: the Prolog standardizer is much better at extracting addresses from free text, at parsing the various address components, and at handling foreign addresses. Because it works off of a better knowledge base, the Postal standardizer is better at correcting address data once the address components have been identified (e.g. the street, post-office box and city identified). Indeed, these two standardizers have worked together to commercial advantage. Name and address standardization thus provides an example of an important commercial problem for which logic programming techniques offer significant advantages over other existing methods. It should also be noted that the domain of names and addresses is only one domain for which data standardization has commercial importance, and standardizers have been written for other domains, including aircraft part information, transportation records, and cargo descriptions. There are no other known standardizers for these domains. This paper studies in detail how the addition of simple preferences to grammars has been used to improve the Prolog standardizer mentioned above. The structure of the paper is as follows. We first briefly describe the PLPs of [7] with a view to developing a fixed point semantics for a large class of PLPs. Based on this semantics we outline how PLPs can be evaluated using tabling, and sketch their implementation in XSB. The rest of the paper is specific to the application of name and address standardization. The domain of names and addresses is described in detail, followed by a brief description of the Prolog name and address standardizer. We then examine how PLPs (and tabling) can be used to improve the Prolog standardizer, and show that when rewritten using PLPs the resulting standardizer is much more concise, particularly for portions of the code that parse and resolve ambiguities, with only a moderate loss in performance time.
2
Preference Logic Programs
For our purposes, a preference logic program (PLP) P can be see as a set H(P ) of Horn clauses formed over a language LP containing a finite number of predicate, and function symbols, augmented with a set of preference clauses pref er(p(t1 , . . . , tn ), p(u1 , . . . , un )) ← A1 , ..., An . where the predicate symbol pref er/2 does not occur in LP , but all other atoms are formed over LP . Predicates that appear as arguments of the heads of preference rules are called optimization predicates. The set of derived predicates is the smallest set of predicates in H(P ) for which at least one clause of each predicate contains either a derived predicate or an optimization predicate in its body. All other predicates are termed base predicates that are defined by base clauses. Atoms in the body of preference clauses are restricted to base predicates. This
A Case Study in Using Preference Logic Grammars
209
syntactic restriction in the bodies of preference clauses guarantees that the computation of preferences will not depend on the application of other preferences, greatly simplifying computation. Example 1. Let P1 be the preference logic program prefer(p(a),p(a)). p(a). d(X):- p(X).
prefer(p(c),p(d)):- b(1). b(1). p(c):- p(d). p(d).
p/1 is an optimization predicate, b/1 a base predicate, and d/1 a derived predicate. A Fixed-Point Semantics for PLPs In order to define a fixed point semantics for PLPs, we consider a program to be represented by its ground instantiation, and term ground atoms of the form pref er(p(t1 , . . . , tn ), p(u1 , . . . , un )) preference atoms. Similarly, we refer to optimization atoms, derived atoms, and base atoms of a (ground) PLP P . Neither preferred atoms nor base atoms depend on an application of preferences and can be constructed via the usual least fixed point construction for Horn clauses. We thus refer to the canonical pre-interpretation of a PLP P as the least model of the preference and base clauses of P considered as Horn clauses. Thus, for any PLP P , its canonical pre-interpretation is uniquely determined and is denoted as CP . The preference atoms in CP , when augmented by a transitivity axiom, induces a relation, which we denote by
street(Str). addr_element(pobox(PO)) --> pobox(PO). addr_element(csz(Csz)) --> city_state_zip(CSZ). addr_element(country(Ctry)) --> country(Ctry).
Fig. 2. Parsing an Address Using Tabling
3
In version 2.0 XSB, the call tnot/1 in Figure 2 must be replaced by a call to ’t not’/1 in order to execute non-ground tabled negation.
218
B. Cui, T. Swift, and D.S. Warren
As will be substantiated in Section 4 the address parsing of Figure 2 is much simpler than the LL-parser of Section 3.1. Rather than duplicating different productions within the default production for company for street name and so on, it simply constructs all possible parses, and filters the preferred ones through the predicate preferred address/1. As seen in Figure 2, preferred address/1 calls the predicate filterPO/2 which performs preferred answer filtering. One clause for the preference rule prefer address is: prefer_address(Address1,Address2):Address1 \== Address2, get_csz(Address1,CSZ1), get_csz(Address2,CSZ2), weigh_csz(CSZ1,W1), weigh_csz(CSZ2,W2), W1 >= W2.
This rule calls a routine that assigns a weight to a triple of city, state, zip elements, in which the weight depends on whether city is a valid city name, whether city is actually located in state, and whether zipcode is correct for the city. Other rules of the PLP standardizer, which are not shown, weigh an address depending on the validity of a street address, on whether a valid room number is present or not, and so on. Pruning Using Preference Logic Compared to the approach of Section 3.1, the standardizer just described is simple but inefficient. In particular, all addresses are generated before preferences are applied so that no advantage is taken of pruning. However, in the case of addresses with valid city, state, zip triples, pruning can be programmed in a simple manner. To begin with, the fifth clause of addr element/3 in Figure 2 can be modified to use a derived predicate addr_element(csz(Csz)) --> preferred_city_state_zip(Csz). so that only preferred city, state, sip triples will be propagated into addresses. New predicates needed to implement preferred city state zip(CSZ) are analogous to those needed for preferred address/1. Note that the use of pruning relies on the optimal sub-problem property.
4
Comparison of the Two Standardizers
Table 1 provides insight into the amount of code in each standardizer. Clearly, most code comprises domain information, mostly tables of cities, states, zip codes, countries, and so on; along with rules for the bottom-up parse, which as mentioned in Section 3.1 is largely automatically generated based on declarations of keywords. The most elaborate code is in the top-down parse and in the postprocessing: each of these sections of code is reduced. Indeed, the post-processing step almost eliminated: some of it is moved into preference rules and reclassified
A Case Study in Using Preference Logic Grammars
219
under the parsing phase, but much of it is avoided altogether. Thus, while using the new standardizer architecture does not lead to a large reduction in overall standardizer code, it greatly reduces the amount of code needed by later phases of standardization — the code that requires the most programmer maintenance. Function Clauses Lines Tokenization 94 412 Bottom-up Parse 26205 26205 Domain Information 59150 59150 Control and Utilities 727 1345 (Prolog) Top-Down 724 2082 (Prolog) Post-processing 604 2838 (PLP) Top-Down 198 686 (PLP) Post-processing 7 106 Table 1. Code sizes for Prolog and XSB Standardizers
Testing on Defense Department data indicates that the PLP standardizer works correctly on about 96-97% of the time, a rate that is virtually identical to the Prolog standardizer 4 . Table 2 indicates the performance of the various standardizers in terms of records per second standardized on a PC. We note that the two standardizers differ slightly in their functionality so that the numbers in each table, should be taken as approximate comparisons. Even with this disclaimer, it can be seen that the PLP standardizer drastically reduces code in the top-down parsing and post processing stanges. This is due to both to the simpler grammatical forms that tabling allows and to the declarative use of preference rules that are combined with the grammar rather than applied after the entire string has been parsed. While the PLP standardizer is 3 times slower than the Prolog standardizer, the tradeoff of speed for declarativity is beneficial for this application since the costs of maintenance outweigh the performance costs as long as the performance costs remain reasonable. In any case, low-level optimizations to filterPO/3 have been identified and are scheduled to be implemented in XSB so that the performance loss of the PLP standardizer may be reduced. In addition, the standardizer recoding was done manually coding the application of preferences on DCGs. A library that implements the full PLG transformation is planned for XSB, consisting of the DCG transformation together with the PLP transformation sketched in Section 2.
5
Discussion
Commercial entities are often reluctant to use Prolog for program development, let alone extensions of Prolog that include preferences or other uncommon techniques for knowledge representation. We believe that it is only by developing 4
Verification is performed by human analysis of a random sample of data.
220
B. Cui, T. Swift, and D.S. Warren
Prolog Stdzr PLP Stdzr (no pruning) PLP Stdzr (pruning) Records per second 54 14 19 Table 2. Performance of Various Standardizers
efficient implementations of these techniques that their research and commercial applications can be discovered and tested — and that it is through such applications that the significance of the knowledge representation techniques will be judged. We have shown here how a simple logic for preferences can be implemented and applied to a commercial problem. Efficient implementation and large-scale application of more powerful logics for preferences, such as that of [1] which includes dynamic preferences, remains open. Acknowledgements The authors would like to thank Bharat Jayaraman and Kannan Govindarajan for their comments on a prelimiary version of this paper. This work was partially supported by NSF grants CCR-9702581, EIA-97-5998, and INT-96-00598.
References 1. G. Brewka. Well-founded semantics for extended logic programs with dynamic preferences. Journal of Artificial Intelligence Research, 4:19–36, 1996. 2. W. Chen and D. S. Warren. Tabled Evaluation with Delaying for General Logic Programs. JACM, 43(1):20–74, January 1996. 3. C. Crowner, K. Govindarajan, B. Jayaraman, and S. Mantha. Preference logic grammars. Computer Languages, 1999. To apear. 4. B. Cui, T. Swift, and D. S. Warren. Using tabled logic programs and preference logic for data standardization. Available at http://www.cs.sunysb.edu/˜tswift, 1998. 5. J. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, 1970. 6. J. Freire, T. Swift, and D. S. Warren. Beyond depth-first: Improving tabled logic programs through alternative scheduling strategies. Journal of Functional and Logic Programming, 1998(3), 1998. 7. K. Govindarajan, B. Jayaraman, and S. Mantha. Preference logic programming. In ICLP, pages 731–746. MIT Press, 1995. 8. M. Kifer and V. S. Subrahmanian. Theory of generalized annotated logic programming and its applications. J. Logic Programming, 12(4):335–368, 1992. 9. J. W. Lloyd. Foundations of Logic Programming. Springer-Verlag, Berlin Germany, 1984. 10. I. V. Ramakrishnan, A. Roychoudhury, and T. Swift. A standardization tool for data warehousing. In Practical Applications of Prolog, 1997. 11. T. Swift. Tabling for non-monotonic programming. Annals of Mathematics and Artificial Intelligence, 1999. To appear. 12. T. Swift, C. Henderson, R. Holberger, J. Murphey, and E. Neham. CCTIS: an expert transaction processing system. In Sixth Conference on Industrial Applications of Artificial Intelligence, pages 131–140, 1994.
Minimal Founded Semantics for Disjunctive Logic Programming? Sergio Greco DEIS, Univ. della Calabria, 87030 Rende, Italy [email protected]
Abstract. In this paper, we propose a new semantics for disjunctive logic programming and deductive databases. The semantics, called minimal founded, generalizes stable model semantics for normal (i.e. non disjunctive) programs but differs from disjunctive stable model semantics (the extension of stable model semantics for disjunctive programs). Compared with disjunctive stable model semantics, the minimal founded semantics seems to be, in some case, more intuitive, it gives meaning to programs which are meaningless under stable model semantics and it is not harder to compute. We study the expressive power of the semantics and show that for general disjunctive datalog programs it has the same power of disjunctive stable model semantics. We also present a variation of the minimal founded semantics, called strongly founded which on stratified programs coincide with the perfect model semantics.
1
Introduction
Several different semantics have been proposed for normal and disjunctive logic programs. Stable model semantics, first proposed for normal (i.e. disjunction free) programs, has been subsequently extended for disjunctive programs. For normal programs, stable model semantics has been widely accepted since it captures the intuitive meaning of programs and, for stratified programs it coincides with the perfect model semantics which is the standard semantics for this class of programs. For positive programs, stable model semantics coincide with the minimal model semantics which is the standard semantics for positive disjunctive programs. However, the introduction of disjunction in the head of rules does not guarantees uniqueness of the minimal model also in the case of negation free programs. For general disjunctive programs several semantics have been proposed. We mention here the (extended) generalized closed world assumption [17], the perfect model semantics [18], particularly suited for stratified programs, the disjunctive stable model semantics [12,19], Partial stable model semantics [19,7]. Disjunctive stable model semantics is widely accepted since i) it gives a good intuition of the meaning of programs, ii) for normal programs it coincide with ?
This work has been partially supported by ISI-CNR, by an EC grant under the project “Contact” and by MURST grants under the projects “Interdata” and “Telcal”.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 221–235, 1999. c Springer-Verlag Berlin Heidelberg 1999
222
S. Greco
stable model semantics and for positive programs it coincide with the minimal model semantics. However, disjunctive stable model semantics has some drawbacks. It is defined for a restricted class of programs and there are several reasonable programs which are meaningless, i.e. they do not have stable models. Motivating Examples The following examples present some programs whose intuitive meaning is not captured by disjunctive stable model semantics. Example 1. Consider the following simple disjunctive program P1 a∨b∨c← ← ¬a. ← ¬b. where the second and third rules are constraints, i.e. rules which are satisfied only if the body is false. These rules can be rewritten into equivalent standard rules.1 P1 has a unique a minimal model M1 = {a, b} but M1 is not stable. 2 Thus, under stable model semantics the above program is meaningless. However, the intuitive meaning is captured by the unique minimal model since the constraints forces to infer more than one atom from the disjunctive rule. The next example presents another program which has no stable models. Example 2. Consider the following disjunctive program P2 a∨b∨c← a ← ¬b. b ← ¬c. c ← ¬a. P2 has three minimal models M21 = {a, b}, M22 = {a, c} and M23 = {b, c} but all minimal models are not stable. 2 The intuitive meaning of the above program is captured by the three alternative minimal models. Indeed, the non disjunctive rules state that from the first rule we must infer at least two atoms, among a, b and c. Intuitively, the problem with stable model semantics, is that in some case the inclusive disjunction is interpreted as exclusive disjunction. Thus, in order to overcome some drawbacks of stable model semantics and to give semantics to a larger class of programs, we propose a different extension of stable model semantics for normal program, called minimal founded semantics. 1
A constraint rule of the form ← b1 , ..., bk can be rewritten as p(X) ← b1 , ..., bk , ¬p(X) where p is a new predicate symbol and X is the list of all distinct variables appearing in the source rule.
Minimal Founded Semantics for Disjunctive Logic Programming
223
Contributions The main contributions of the paper are the following: – We introduce a new semantics for disjunctive programs. The proposed semantics seems to be more intuitive than stable model semantics and it gives meaning to programs which are meaningless under disjunctive stable model semantics. – We show that the new semantics coincide with stable model semantics for normal (i.e. disjunction free) and positive programs. Therefore, the proposed semantics differs from stable model semantics only for programs containing both disjunctive rules and negation. – We formally define the expressive power and complexity of the new semantics for datalog programs and we show that the proposed semantics has the same expressive power and complexity of disjunctive stable model semantics. Organization of the Paper The sequel of the paper is organized as follows. Section 2 presents preliminaries on disjunctive datalog, minimal and stable model semantics. Section 3 introduces the minimal founded semantics. The relationship with minimal founded semantics and stable model semantics is investigated. Section 3 presents results on the expressive power of minimal founded semantics whereas Section 5 presents the data complexity results. Finally, Section 6 presents our conclusions.
2
Preliminaries
A (disjunctive datalog) rule r is a clause of the form a1 ∨ · · · ∨ an ← b1 , · · · , bk , ¬bk+1 , · · · , ¬bm ,
n ≥ 1, m ≥ 0.
a1 , · · · , an , b1 , · · · , bm are atoms of the form p(t1 , ..., tn ), where p is a predicate of arity n and the terms t1 , ..., tn are constants or variables. The disjunction a1 ∨ · · · ∨ an is the head of r, while the conjunction b1 , ..., bk , ¬bk+1 , ..., ¬bm is the body of r. Moreover, if n = 1 we say that the rule is normal, i.e. not disjunctive. We denote by H(r) the set {a1 , ..., an } of the head atoms, and by B(r) the set {b1 , ..., bk , ¬bk+1 , ..., ¬bm } of the body literals. We often use upper-case letters, say L, to denote literals. As usual, a literal is an atom A or a negated atom ¬A; in the former case, it is positive, and in the latter negative. Two literals are complementary , if they are of the form A and ¬A, for some atom A. For a literal L, ¬L denotes its complementary literal, and for a set S of literals, ¬S = {¬L | L ∈ A}. Moreover, B + (r) and B − (r) denote the set of positive and negative literals occurring in B(r), respectively. A (disjunctive) logic program is a finite set of rules. A ¬-free (resp. ∨-free) program is called positive (resp. normal). A term, (resp. an atom, a literal, a rule or a program) is ground if no variables occur in it. In the following we also
224
S. Greco
assume the existence of rules with empty head which defines constraints 2 , i.e. rules which are satisfied only if the body is false. Moreover, a rule defining a constraint of the form ← B(X), where B(X) denote the body conjunction and X denotes the list of variables appearing in the body of the rule, can be rewritten as a normal rule of the form p(X) ← B(X), ¬p(X) where p is a new predicate symbol. The Herbrand Universe UP of a program P is the set of all constants appearing in P , and its Herbrand Base BP is the set of all ground atoms constructed from the predicates appearing in P and the constants from UP . A rule r0 is a ground instance of a rule r, if r0 is obtained from r by replacing every variable in r with some constant in UP . We denote by ground(P ) the set of all ground instances of the rules in P . An interpretation of P is any subset of BP . The value of a ground atom L w.r.t. an interpretation I, valueI (L), is true if L ∈ I and f alse otherwise. The value of a ground negated literal ¬L is ¬valueI (L). The truth value of a conjunction of ground literals C = L1 , . . . , Ln is the minimum over the values of the Li , i.e., valueI (C) = min({valueI (Li ) | 1 ≤ i ≤ n}), while the value valueI (D) of a disjunction D = L1 ∨ ... ∨ Ln is their maximum, i.e., valueI (D) = max({valueI (Li ) | 1 ≤ i ≤ n}); if n = 0, then valueI (C) = T and valueI (D) = F . Finally, a ground rule r is satisfied by I if valueI (H(r)) ≥ valueI (B(r)). Thus, a rule r with empty body is satisfied by I if valueI (H(r)) = T whereas a rule r0 with empty head is satisfied by I if valueI (B(r)) = F . An interpretation M for P is a model of P if M satisfies each rule in ground(P ). Minker proposed in [17] a model-theoretic semantics for positive P , which assigns to P the set of its minimal models MM(()P ), where a model M for P is minimal, if no proper subset of M is a model for P . Accordingly, the program P = {a ∨ b ←} has the two minimal models {a} and {b}, i.e. MM(()P ) = { {a}, {b} }. The more general disjunctive stable model semantics also applies to programs with (unstratified) negation [12,19]. Disjunctive stable model semantics generalizes stable model semantics [11], previously defined for normal programs. Definition 1. For any interpretation I, denote with gram derived from ground(P)
P I
the ground positive pro-
1. by removing all rules that contain a negative literal ¬a in the body and a ∈ I, and 2. by removing all negative literals from the remaining rules. An interpretation M is a (disjunctive) stable model of P if and only if M ∈ P ). 2 MM( M For general P , the stable model semantics assigns to P the set SM(P ) of its stable models. It is well known that stable models are minimal models (i.e. SM(P ) ⊆ MM(P )) and that for negation free programs minimal and stable model semantics coincide (i.e. SM(P ) = MM(P )). 2
Under total semantics
Minimal Founded Semantics for Disjunctive Logic Programming
225
An extension of the perfect model semantics for stratified datalog programs to disjunctive programs has been proposed in [19]. A disjunctive datalog program P is said to be locally stratified if there exists a decomposition S1 , ..., Sω of the Herbrand base such that for every (ground instance of a) clause A1 , ...Ak ← B1 , ..., Bm , ¬ C1 , ..., ¬ Cn in P , there exists an l, called level of the clause, so that: 1. ∀i ≤ k stratum(Ai ) = l, 2. ∀i ≤ m stratum(Bi ) ≤ l, and 3. ∀i ≤ n stratum(Ci ) < l. where stratum(A) = i iff A ∈ Si . The set of clause in ground(P ) having level i (resp. ≤ i) is denoted by Pi (resp. Pi∗ ). Any decomposition of the ground instantiation of a program P is called local stratification of P . The preference order on the models of P is defined as follows: M N iff M 6= N and for each a ∈ M − N there exists a b ∈ N − M such that stratum(a) > stratum(b). Intuitively, stratum(a) > stratum(b) means that a has higher priority that b. Definition 2. Let P be a (locally) stratified disjunctive datalog program. A model M for P is perfect if there is no model N such that N M . The collection of all perfect models of P is denoted by PM(P ). 2 Consider for instance the program consisting of the clause a ∨ b ← ¬ c. The minimal models are M1 = {a}, M2 = {b} and M3 = {c}. Since stratum(a) > stratum(c) and stratum(b) > stratum(c), we have that M1 M3 and M2 M3 . Therefore, only M1 and M2 are perfect models. Notice that M ⊂ N implies M N ; thus, for stratified P , PM(P ) ⊆ MM(P ). Moreover, for positive P , MM(P ) = PM(P ) and for stratified P , PM(P ) = SM(P ) ⊆ MM(P ). The computation of the perfect model semantics of a program P can be done by considering a decomposition (P1 , ..., Pω ) of P and computing the minimal models of all subprograms, one at time, following the linear order [10].
226
3
S. Greco
Minimal Founded Semantics
In this section we introduce a new semantics for disjunctive programs. Definition 3. Let P be a positive disjunctive program and let M be an interpretation. Then, SP (M ) = {a ∈ BP |∃r ∈ ground(P ) ∧ a ∈ H(r) ∧ B(r) ⊆ M } SPω (∅) denotes the least fixpoint of the operator SP .
2
The operator SP extends the classical immediate consequence operator TP to disjunctive programs. It is obvious that the operator SP , for positive P , is monotonic and continuos and, therefore, it admits a least fixpoint. Definition 4. Let P be a disjunctive program and let M be an interpretation. Then, P (M ) denotes the positive program derived from ground(P ) as follows: for each rule r : a1 , ..., ak ← b1 , ..., bm , ¬c1 , ..., cn 1. delete r if there is some ci ∈ M ; 2. delete all remaining negated literals ¬ci s.t. ci 6∈ M ; 3. delete all head atoms ai 6∈ M if there is some aj ∈ M .
2
P is in Item 3. Thus, in the generation The difference between P (M ) and M of P (M ) we delete also the atoms appearing in the head of rules which are false in the interpretation M if the head of rule contains some other atom true in M . P . Clearly, for normal programs P (M ) = M
Example 3. Consider for instance the program P1 of Example 1 and the interpretation M1 = {a, b}. P1 (M1 ) consists of the unique rule a∨b← Consider now the program P2 of Example 2 and the interpretation M21 = {a, b}. The program P2 (M21 ) consists of the rules a∨b← b←
2
Definition 5. Let P be a disjunctive program and let M be a model for P . Then, M is a founded model if it is contained in SPω (M ) (∅). Moreover, M is said to be minimal founded if it is a minimal model of P and it is also founded. 2 Example 4. The program P1 of Example 1 has a unique minimal model M1 = {a, b} which is also founded since it is the fixpoint of P1 (M1 ). The program P2 of Example 2 has three minimal models M21 = {a, b}, M22 = {a, c} and M23 = {b, c} which are all minimal founded since M21 , M22 , M23 are fixpoints of P2 (M21 ), P2 (M22 ) and P2 (M23 ), respectively. 2
Minimal Founded Semantics for Disjunctive Logic Programming
227
In the following we shall denote the set of minimal founded models by MF(P ). The following results states that for disjunction-free programs, stable models semantics and minimal founded semantics coincide. Proposition 1. Let P be a normal program. Then, SM(P ) = MF(P ). Proof. Clearly, for any normal program P and any interpretation M of P , P M = P (M ). If M is a minimal model and the unique minimal model of P (M ) is M , M is also a stable model. Moreover, if M is a stable model for P , M is a P , it is also minimal founded. 2 minimal model and since P (M ) = M The following example presents a disjunctive program where stable and minimal founded semantics coincide. Example 5. Consider the following simple disjunctive program P5 a∨b∨c← a ← ¬b, ¬c. b ← ¬a. c ← ¬a. This program has a two stable models M51 = {a} and M52 = {b, c} which are also minimal founded 2 Moreover, for general programs containing both disjunction and negation, stable and minimal founded semantics do not coincide. The relation between the two semantics is given by the following result. Theorem 1. Let P be a disjunctive program. Then, SM(P ) ⊆ MF(P ). Proof (Sketch). It is well known that stable models are also minimal models. It is sufficient to show that every stable model is founded, i.e. M ⊆ SPω (M ) (∅). P is contained in SPω (M ) (∅) and, therefore, since Clearly, every minimal model of M M ∈ MM(P ), M ⊆ SPω (M ) (∅), i.e. M is founded. Therefore, SM(P ) ⊆ MF(P ). 2 As shown by the previous examples there are programs where the containment is strict and there are programs having minimal founded models which are not stable. Corollary 1. Let P be a positive program. Then, MM(P ) = MF(P ). Proof. From Theorem 1 SM(P ) ⊆ MF(P ). Moreover, by definition MF(P ) ⊆ MM(P ). Since for positive programs SM(P ) = MM(P ), we conclude that MF(P ) = MM(P ). 2 Therefore, for positive programs, minimal model semantics, stable model semantics and minimal founded semantics coincide. Proposition 2. Let P be a stratified program. Then, MF(P ) 6= ∅.
2
228
S. Greco
The above results states that, under minimal founded semantics, stratified programs have a well defined meaning. However, also for stratified programs the set of stable and minimal founded models may be different. Example 6. Consider the program P6 a∨b← a ← c ← ¬b This program has two minimal founded models M61 = {a, c} and M62 = {a, b} 2 but only M61 is stable. The previous results states that all programs having stable model semantics have also minimal founded semantics although, as showed by our examples, there are programs which have well defined meaning under minimal founded semantics but are meaningless under stable model semantics. It is worth noting that both stable and minimal founded semantics consider minimal models whose atoms can be ‘derived’ from the program. Stable model semantics is more restrictive since for a given program P it considers only minimal models M which belong P ), whereas the minimal founded semantics considers minimal models to MM( M whose atoms can be derived from the program, i.e. all minimal models M contained in SP∞(M ) (∅). It could be interesting to compare the two semantics on the base of abstract properties [2].
4
Expressive Power and Complexity
In this section we present some results on the expressive power and the data complexity of minimal founded semantics for disjunctive datalog programs. We first introduce some preliminary definitions and notation and next present our results. Predicates symbols are partitioned into the two sets of base (EDB) and derived (IDB) predicates. Base predicate symbols correspond to database relations on a countable domain U and do not occur in the rule heads. Derived predicate symbols appear in the head of rules. Possible constants in a program are taken from the domain U . A program P has associated a relational database scheme DBP = {r| r is an EDB predicate symbol of P}, thus EDB predicate symbols are seen as relation symbols. A database D on DB P is a set of finite relations, one for each r in DBP , denoted by D(r). The set of all databases on DB P is denoted by DP . Given a database D ∈ DP , PD denotes the following logic program: PD = P ∪ {r(t) ← | r ∈ DBP ∧ t ∈ D(r)}. The Herbrand universe UPD is a finite subset of U and consists of all constants occurring in P or in D (active domain). If D is empty and no constant occur in P, then UPD is assumed to be equal to {a}, where a is any constant in U .
Minimal Founded Semantics for Disjunctive Logic Programming
229
Definition 6. A (bound) query Q is a pair hP, Gi, where P is a disjunctive program and G is a ground literal (the query goal). 2 The result of a query Q = hP, Gi on an input database D is defined in terms of the minimal founded models of PD , by taking either the union of all models (possible inference, ∃MF ) or the intersection (certain inference, ∀MF ). Definition 7. Given a program P and a database D, a ground atom G is true, under possible semantics, if there exists a minimal founded model M for PD such that G ∈ M . Analogously, G is true, under certain semantics, if G is true in every minimal founded model. The set of all queries is denoted by Q. 2 Definition 8. Let Q = hP, Gi be a query. Then the database collection of Q w.r.t. the set of minimal founded models MF is: (a) under the possible version of semantics, the set of all such that G is true in PD under the possible version semantics; this set is denoted by EXP ∃MF (Q); (b) under the certain version of semantics, the set of all such that G is true in PD under the certain version semantics; this set is denoted by EXP ∀MF (Q).
databases D in DP of minimal founded databases D in DP of minimal founded
The expressive power of a given version (either possible or certain) of minimal founded semantics is given by the family of the database collections of all possible queries, i.e., EXP ∃MF [Q] = {EXP ∃MF (Q)|Q ∈ Q} and EXP ∀MF [Q] = {EXP ∀MF (Q)|Q ∈ Q}. 2 It is well known that the database collection of every query, is indeed a generic set of databases [1]. Recall that a set D of databases on a database scheme DB with domain U is (K-)generic [4,1] if there exists a finite subset K of U such that for any D in D and for any isomorphism θ on relations extending a permutation on U − K, θ(D) is in D as well — informally, all constants not in K are not interpreted and relationships among them are only those explicitly provided by the databases. Note that for a query Q = hP, Gi, K consists of all constants occurring in P and in G. From now on, any generic set of databases will be called a database collection. After the data complexity approach of [4,22] for which the query is assumed to be a constant while the database is the input variable, the expressive power coincides with the complexity class of the problems of recognizing each query database collection. The expressive power of each semantics will be compared with database complexity classes, defined as follows. Given a Turing machine complexity class C (for instance P or N P), a relational database scheme DB, and a database collection D on DB, D is C-recognizable if the problem of deciding whether D is in D is in C. The database complexity class DB-C is the family of all C-recognizable database collections (for instance, DB-P is the family of all database collections that are recognizable in polynomial time). If the expressive
230
S. Greco
power of a given semantics coincides with some complexity class DB-C, we say that the given semantics captures (or expresses all queries in) DB-C. Recall that the classes ΣkP , ΠkP of the polynomial hierarchy [21] are defined by P P = N P Σi , and ΠiP = co-ΣiP , for all i ≥ 0. In particular, Π0P = P, Σ0P = P, Σi+1 Σ1P = N P, and Π1P = co-N P. By Fagin’s Theorem [9] and its generalization in [21], complexity and second-order definability are linked as follows. Fact 1 ([9,21]) A database collection D over a scheme R is in DB-ΣkP , k ≥ 1, ˚k )φ on R, where ˚1 )(∀A ˚2 ) · · · (Qk A iff it is definable by a second-order formula (∃A ˚i are lists of predicate variables preceded by alternating quantifiers and φ is the A first-order. 2 4.1
Expressive Power
It is well known that disjunctive datalog under total stable model semantics captures the complexity classes Σ2P and Π2P , respectively, under possible and certain semantics. The following example presents a program which defines a Σ2P -complete problem [3]. The definition of the problem by means of a disjunctive program has been taken from [8]. Example 7. A holding owns companies, each of which produces some goods. Moreover, several companies may have jointly control over another company. Now, some companies should be sold, under the constraint that all goods can be still produced, and that no company is sold which would still be controlled by the holding after the transaction. A company is strategic, if it belongs to a strategic set, which is a minimal set of companies satisfying these constraints. The query consists to check if a given company “a00 is strategic. This query can expressed as hSC, st(a)i where SC is defined as follows: st(C1 ) ∨ st(C2 ) ← pb(P, C1 , C2 ). st(C) ← cb(C, C1 , C2 , C3 ), st(C1 ), st(C2 ), st(C3 ). Here st(C) means that C is strategic, pb(P, C1 , C2 ) that product P is produced by companies C1 and C2 , and cb(C, C1 , C2 , C3 ) that C is jointly controlled by C1 , C2 and C3 ; we have adopted here from [3] that each product is produced by at most two companies and each company is jointly controlled by at most three other companies. The problem consists in checking if the company a is strategic, i.e. if there is a stable model containing st(a). 2 Thus, the strategic company problem can be defined by means of the disjunctive program reported above under the possible version of disjunctive stable model semantics (see [8]). Theorem 2. EXP ∃MF [Q] = DB-Σ2P .
Minimal Founded Semantics for Disjunctive Logic Programming
231
Proof. We first prove that for any query Q = hP, Gi in Q, recognizing whether a database D is in EXP ∃MF (Q) is in Σ2P . D is in EXP ∃MF (Q) iff there exists a minimal founded model M of PD such that G ∈ M . To check this, we may guess an interpretation M of PD and verify that: 1) M is a minimal model of PD , 2) M is founded and 3) G ∈ M . To solve Step 1 we can verify in polynomial time that M is a model of PD and use an N P oracle to ask whether M is not minimal (the oracle guess an interpretation N ⊆ M and checks that N is a model for PD ). If the answer of the oracle is “no” (i.e. M is a minimal model) we check in polynomial time Steps 2 and 3. Therefore, recognizing whether a database D is in EXP ∃MF (Q) is in Σ2P . To prove completeness it is sufficient to show that there is some Σ2P problem which can be expressed by disjunctive datalog under the possible version of minimal founded semantics. The strategic companies problem of Example 7 is a Σ2P -complete and is expressed by means of positive disjunctive datalog program under the possible version of stable model semantics [8]. Since for positive disjunctive program the sets of stable and minimal founded models coincide, we conclude that this program defines the strategic companies problem also under the possible version of minimal founded semantics. 2 Theorem 3. EXP ∀MF [Q] = DB-Π2P . Proof (Sketch). We first prove that for any query Q = hP, Gi in Q, recognizing whether a database D is in EXP ∀MF (Q) is in Π2P . To this end, let consider the complementary problem: is it true that D is not in EXP ∀MF (Q) ? Now, D is not in EXP ∀MF (Q) iff there exists a minimal founded model M of PD such that G 6∈ M . Following the line of the proof of Theorem 2, we can easily see that the latter problem is in Σ2P . Hence, recognizing whether a database D is in EXP ∀MF (Q) is in Π2P . Let us now prove that every Π2p recognizable database collection D on a database scheme DB is in EXP ∀MF [Q]. By Fact 1, D is defined by a second order formula of the form ∀R1 ∃R2 Φ(R1 , R2 ). Using the usual transformation technique, the above formula is equivalent to a second order Skolem form formula (∀S1 )(∃S2 )Γ (S1 , S2 ), where Γ (S1 , S2 ) = (∀X)(∃Y)(Θ1 (S1 , S2 , X, Y) ∨ . . . ∨ Θk (S1 , S2 , X, Y)), S1 and S2 are two lists of respectively m1 , m2 predicate symbols, containing all symbols in R1 and R2 , respectively. Consider the following program P: r1 r2 r3 r4 r5 r6 r7
: : : : : : :
s1j (Wj1 ) ∨ sˆ1j (Wj1 ) ← s2j (Wj2 ) ∨ sˆ2j (Wj2 ) ← q(X) ← Θi (S1 , S2 , X, Y) g ← ¬q(X). g ← s2j (Wj2 ), sˆ2j (Wj2 ) 2 2 sˆj (Wj ) ← g. s2j (Wj2 ) ←g
(1 ≤ j ≤ m1 ) (1 ≤ j ≤ m2 ) (1 ≤ i ≤ k) (1 ≤ j ≤ m2 ) (1 ≤ j ≤ m2 ) (1 ≤ j ≤ m2 )
232
S. Greco
where, intuitively, sˆ1j (Wj1 ) corresponds to ¬s1j (Wj1 ) and sˆ2j (Wj2 ) corresponds to s2j (Wj2 ). Now it is easy to show that the formula (∀S1 )(∃S2 )Γ (S1 , S2 ) is valid if g is false in all minimal founded models of P. 2 Therefore, the expressive power of disjunctive datalog under minimal founded and stable model semantics is the same. 4.2
Data Complexity
Data complexity is usually closely tied to the expressive power and, in particular, it provides an upper bound for the expressive power. Theorem 4. Given a disjunctive program P, a database D on DB P , and an interpretation M for PD , deciding whether M is a minimal founded model for PD is coN P-complete. Proof. (sketch) Let M be an interpretation and consider the complementary problem Π: is it true that M is not a minimal founded model? Π is in N P since we can guess an interpretation N and verify in polynomial time that (i) N is a model for PD and (ii) either M is not a model for PD or N is a proper subset of M . Hence the problem Π is in coN P. Deciding whether M is a stable model for PD is also coN P-complete. Hardness can be proved in a similar way (cf. [5]). 2 The results on the data complexity of queries under minimal founded semantics are immediate consequences from the expressiveness results. Theorem 5. Let Q = hP, Gi be a query and D a database. Deciding whether 2 PD has a minimal founded model, is Σ2P -complete. Theorem 6. (Possibility inference) Let Q be a query and D a database. Deciding whether Q is true under the possible version of the minimal founded semantics is Σ2P -complete whereas under the certain version is Π2P -complete. 2
5
Strongly Founded Semantics
As shown by the program of Example 6, the minimal founded semantics admits models which are not intuitive. Indeed, the intuitive meaning of stratified programs is captured by the perfect model semantics. Thus, in this section we introduce a refinement of the minimal founded semantics, called strongly founded, which on stratified programs coincide with the perfect model semantics. Let P be a disjunctive datalog program and let S1 , ..., Sω be a decomposition of the Herbrand base such that for every (ground instance of a) clause A1 , ...Ak ← B1 , ..., Bm , ¬ C1 , ..., ¬ Cn in P , there exists an l, called level of the clause, so that:
Minimal Founded Semantics for Disjunctive Logic Programming
233
1. ∀i ≤ k stratum(Ai ) = l; 2. ∀i ≤ m stratum(Bi ) ≤ l; = l, if there is some Aj → Ci (1 ≤ j ≤ k) 3. ∀i ≤ n stratum(Ci ) < l, otherwise where stratum(A) = i iff A ∈ Si . The set of clause in ground(P ) having level i (resp. ≤ i) is denoted by Pi (resp. Pi∗ ). Any decomposition of the ground instantiation of a program P is called ordered decomposition of P . Observe that the level of ground clauses as above defined slightly differs from the one used in the definition of locally stratification (the two definitions differ in Item 3 since we also consider unstratified programs—for stratified programs the two definitions coincide). The preference order on the models of P is defined as follows: M N (M is preferable to N ) iff M 6= N and for each a ∈ M −N there exists a b ∈ N −M such that stratum(a) > stratum(b). Intuitively, stratum(a) > stratum(b) means that a has higher priority that b. A model M is said to be preferred if there is no model N M . Definition 9. Let P be a disjunctive datalog program. A model M for P is said to be strongly founded if it is founded and there is no model N such that N M . The collection of all strongly founded models of P is denoted by SF (P ). 2 Theorem 7. Let P be a disjunctive datalog program. Then, SM(P ) ⊆ SF (P ) ⊆ M F (P ). Proof. SF(P ) ⊆ MF(P ) is obvious since strongly founded models are restricted minimal founded models. Let us now prove that SM (P ) ⊆ SF (P ), i.e., that for each M ∈ SM (P ), M ∈ SF (P ). Assume that this is not true, i.e. that there is a model N ∈ M F (P ) Si (resp. Mi = M ∩ Si ) and Ni∗ = N ∩ Si∗ such that N M . Let Ni = N ∩S ∗ ∗ ∗ (resp. Mi = M ∩ Si ) where Si = j≤i Si . Let k be the first ordinal such that ∗ ∗ ⊂ Mk+1 and Nk∗ = Mk∗ ). Since Nk+1 ⊂ Mk+1 and for h ≤ k Nh = Mh (i.e., Nk+1 P∗
k+1 ∗ ∗ M is a stable model we have that M ∈ MM( M ∗ ) and that Mk+1 ∈ MM(Pk+1 ). k ∗ ∗ ∗ ∗ But Mk+1 ∈ MM(Pk+1 ) implies that Nk+1 6⊆ Mk+1 . Therefore there is no k ∗ ∗ such that Nk+1 6⊆ Mk+1 and, consequently, there is no minimal founded model N M. 2
Example 8. The program of Example 6 has two minimal founded models M61 = {a, c} and M62 = {a, b} but only M62 is strongly founded since M62 M61 . As observed in Example 6, M62 is also stable. The program of Example 1 has a unique minimal founded model which is strongly founded. The program of Example 2 has three minimal founded models which are also strongly founded. 2
234
S. Greco
Theorem 8. Let P be a locally stratified program. Then, SF(P ) = PM(P ). Proof. For (locally) stratified programs the definition of local stratification and ordered decomposition coincide. The definition of perfect model and preferred model coincide too and, therefore, SF(P ) ⊇ PM(P ). We show that SF(P ) ⊆ SM (P ) = PM(P ), i.e. that for every M ∈ SF (P ), P ). Assume the existence of a model M ∈ SM (P ) or equivalently M ∈ M M ( M P N ∩ Si (resp. Mi = M ∩ Si ) and M ∈ M M ( M ) and a model N ⊂ M . Let Ni = S Ni∗ = N ∩Si∗ (resp. Mi∗ = M ∩Si∗ ) where Si∗ = j≤i Si . Let k be the first ordinal ∗ ∈ MM( such that Nk+1 ⊂ Mk+1 (for h ≤ k Nh = Mh ). We have that Mk+1 P∗
∗ Pk+1 Mk∗ )
∗ ∗ ∗ ∗ ∗ ∈ MM( Nk+1 and Nk+1 ∗ ). Since Mk = Nk , Mk+1 = Nk+1 ⊆ N and, therefore, k ∗ there is no k such that Mk+1 ⊇ N . Thus, M ⊆ N , i.e. M is a minimal model of P 2 M.
The above theorem states that for stratified programs, stable model semantics and strongly founded semantics coincide. Corollary 2. Let P be a positive disjunctive datalog program. Then MM(P ) = SF(P ). Proof. SF(P ) ⊆ MF(P ) subseteqMM(P ) by Theorem 7 since MF(P ) ⊆ MM(P ) by definition. For positive disjunctive programs, SM(P ) = MM(P ) and, therefore, SF(P ) = MF(P ) = MM(P ) 2 Corollary 3. Let P be a standard datalog program. Then SM (P ) = SF (P ). Proof. SM (P ) ⊆ SF(P ) ⊆ MF(P ) by Theorem 7. For standard datalog programs, SM(P ) = MM(P ) (by Proposition 1). Therefore, SM (P ) = SF (P ). 2 We conclude this section by mentioning that the strongly founded and minimal founded semantics have the same expressive power and the same data complexity. The formal results on the expressive power and data complexity of the strongly founded semantics can be found in the extended version of the paper [14].
6
Conclusion
The semantics proposed in this paper are essentially an extension of stable model semantics for normal programs and of the perfect model semantics for disjunctive programs. The aim of our proposal is the solution of some drawbacks of disjunctive stable model semantics which, in some case, interprets inclusive disjunction as exclusive disjunction. Several problems which need further research have been left open. For instance, further research could be devoted to i) the characterization head cycle free programs, ii) the identification of fragments of disjunctive datalog for which one minimal founded model can be computed in polynomial time; iii) the investigation of abstract properties for disjunctive datalog under minimal founded semantics [2].
Minimal Founded Semantics for Disjunctive Logic Programming
235
References 1. Abiteboul, S., Hull, R., Vianu, V. (1994), Foundations of Databases. AddisonWesley. 2. S. Brass and J. Dix. Classifying semantics of disjunctive logic programs. Proc. JICSLP-92, pp. 798–812, 1993. 3. Cadoli M, T. Eiter and G. Gottlob, Default Logic as a Query Language, IEEE Transaction on Knowledge and Data Engineering, 9(3), 1997, 448-463. 4. Chandra, A., D. Harel. Structure and Complexity of Relational Queries. Journal of Computer and System Sciences, 25, pp. 99–128, 1982. 5. T. Eiter and G. Gottlob. Complexity aspects of various semantics of disjunctive databases, Proc. Int. Conf. on Principles of Database Systems, 158–166, 1993. 6. T. Eiter, G. Gottlob and H. Mannila (1997), Disjunctive Datalog, ACM Transactions on Database Systems, 22(3):364–418, 1997 7. T. Eiter and N. Leone and D. Sacc´ a. Expressive Power and Complexity of Partial Models for Disjunctive Deductive databases, Theoretical Computer Science, 1997. 8. Eiter T., N. Leone, C. Mateis, G. Pfeifer and F. Scarcello. The KR System dlv: Progress Report, Comparisons and Benchmarks. Proc. of 6th Int. Conf. on Princ. of Knowledge Representation, 1998, pp. 406-417. 9. Fagin R. Generalized First-Order Spectra and Polynomial-Time Recognizable Sets, in Complexity of Computation, SIAM-AMS Proc., Vol. 7, pp. 43-73, 1974. 10. Fernandez, J. A., and Minker, J. Computing perfect models of disjunctive stratified databases. In Proc. ILPS’91 Workshop on Disjunctive Logic Programming, pp. 110117, 1991. 11. Gelfond, M., Lifschitz, V. The Stable Model Semantics for Logic Programming, in Proc. of Fifth Conf. on Logic Programming, pp. 1070–1080, 1988. 12. Gelfond, M. and Lifschitz, V. (1991), Classical Negation in Logic Programs and Disjunctive Databases, New Generation Computing, 9, 365–385. 13. Greco, S., Binding Propagation in Disjunctive Databases, Proc. Int. Conf. on Very Large Data Bases, 1997. 14. Greco, S., Strongly founded semantics for disjunctive logic programming, Technical Report, 1999. 15. Leone, N., P. Rullo, P. and F. Scarcello. Disjunctive Stable Models: Unfounded Sets, Fixpoint Semantics and Computation, Information and Computation, Academic Press, Vol. 135, No. 2, June 15, pp. 69-112, 1997. 16. Marek, W., Truszczy´ nski, M., Autoepistemic Logic, Journal of the ACM, 38, 3, pp. 518-619, 1991. 17. Minker, J. On Indefinite Data Bases and the Closed World Assumption, in “Proc. of the 6th Conference on Automated Deduction (CADE-82),” pp. 292–308, 1982. 18. Przymusinski, T. On the Declarative Semantics of Deductive Databases and Logic Programming, in “Foundations of deductive databases and logic programming,” Minker, J. ed., ch. 5, pp.193–216, 1988. 19. Przymusinski, T. Stable Semantics for Disjunctive Programs, New Generation Computing, 9, 401–424, 1991. 20. D. Sacc` a. The Expressive Powers of Stable Models for Bound and Unbound DATALOG Queries. Journal of Computer and System Sciences, Vol. 54, No. 3, June 1997, pp. 441–464. 21. Stockmeyer, L.J. The Polynomial-Time Hierarchy. Theoretical Computer Science, 3, pp. 1–22, 1977. 22. Vardi M.Y., ”The Complexity of Relational Query Languages”, Proc. ACM Symp. on Theory of Computing, pp. 137-146, 1982.
On the Role of Negation in Choice Logic Programs Marina De Vos? and Dirk Vermeir Dept. of Computer Science Free University of Brussels, VUB Pleinlaan 2, Brussels 1050, Belgium Tel: +32 2 6293308 Fax: +32 2 6293525 {marinadv,dvermeir}@tinf.vub.ac.be http://tinf2.vub.ac.be
Abstract. We introduce choice logic programs as negation-free datalog programs that allow rules to have exclusive-only (possibly empty) disjunctions in the head. Such programs naturally model decision problems where, depending on a context, agents must make a decision, i.e. an exclusive choice out of several alternatives. It is shown that such a choice mechanism is in a sense equivalent with negation as supported in semi-negative (“normal”) datalog programs. We also discuss an application where strategic games can be naturally formulated as choice programs: it turns out that the stable models of such programs capture exactly the set of Nash equilibria. We then consider the effect of choice on “negative information” that may be implicitly derived from a program. Based on an intuitive notion of unfounded set for choice programs, we show that several results from (seminegative) disjunctive programs can be strengthened; characterizing the position of choice programs as an intermediate between simple positive programs and programs that allow for the explicit use of negation in the body of a rule. Keywords: Logic programming, choice, unfounded sets, game-theory
1
Choice Logic Programs for Modeling Decision Making
When modeling agents using logic programs, one often has to describe a situation where an agent needs to make a decision, based on some context. A decision can be thought of as a single choice between several competing alternatives, thus naturally leading to a notion of nondeterminism. Using seminegative (also called “normal”) programs, such a choice can be modeled indirectly by using stable model semantics, as has been argued convincingly before [10,8]. E.g. a program such as p ← ¬q q ← ¬p ?
Wishes to thank the FWO for their support.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 236–246, 1999. c Springer-Verlag Berlin Heidelberg 1999
On the Role of Negation in Choice Logic Programs
237
has no (unique) total well-founded model but it has two total stable models, namely {p, ¬q} and {¬p, q}, representing a choice between p and q (note that his choice is, however, not exclusive, as e.g. p may very well lead to q in a larger program). In this paper, we simplify matters by providing for explicit choice sets in the head of a rule. Using p ⊕ q to denote an exclusive choice between p and q, the example above can be rewritten as p⊕q ← Intuitively, ⊕ is interpreted as “exclusive or”, i.e. either p or q, but not both, should be accepted in the above program. Definition 1. A choice logic program is a finite set of rules 1 of the form A ← B where A, the head, and B, the body, are finite sets of atoms. Intuitively, atoms in A are assumed to be xor’ed together while B is read as a conjunction. In examples, we often use ⊕ to denote exclusive or, while “,” is used to denote conjunction. If we want to single out an atom in the head of a rule we sometimes write A ⊕ a to denote A ∪ {a}. The semantics of choice logic programs can be defined very simply. Definition 2. Let P be a choice logic program. The Herbrand base of P , denoted BP , is the set of all atoms occurring in the rules of P . A set of atoms I ⊆ BP is model of P if for every rule A ← B, B ⊆ I implies that I ∩ A is a singleton, i.e. |A ∩ I| = 12 . A model of P is called stable iff it is minimal (according to set inclusion). Note that the above definitions allow for constraints to be expressed as rules where the head is empty. Example 1 (Graph 3-colorability). Given a graph assign each node one of three colors such that no two adjacent nodes have the same color. This problem is know as graph 3-colorability and can be easily transformed in the following choice program: col(X, r) ⊕ col(X, g) ⊕ col(X, b) ← node(X) ← edge(X, Y ), col(X, C), col(Y, C) The first rule states that every node should take one and only one of the three available colors (r, g or b). The second demands that two adjacent nodes have different colors. To this program we only need to add the facts (rules with empty body) that encode the graph to make sure that the stable models for this program reflect the possible solutions for this graph’s 3-colorability. The facts look either as node(a) ← or edge(a, b) ←. 1
2
In this paper, we identify a program with its grounded version, i.e. the set of all ground instances of its clauses. This keeps the program finite as we do not allow function symbols (i.e. we stick to datalog). We use |X| to denote the cardinality of a set X.
238
M. De Vos and D. Vermeir
Does not confess Confess
Does not confess 3, 3 4, 0
Confess 0, 4 1, 1
Fig. 1. The prisoner’s dilemma (Ex. 2)
The following example shows how choice logic programs can be used to represent strategic games[6]. Example 2 (The Prisoner’s Dilemma). Two suspects of a crime (they jointly committed) are arrested and interrogated separately. The maximum sentence for their crime is four years of prison. But if one betrays the other while the latter keeps quiet, the former is released while the silent one receives the maximum penalty. If they both confess they are both convicted to three years of prison. In case they both remain silent, they are convicted for a minor felony and sent to prison for only a year. In game theory this problem can be represented as a strategic game with a graphical notation as in Fig. 1. One player’s actions are identified with the rows and the other player’s with the columns. The two numbers in the box formed by row r and column c are the players’ payoff (e.g., the years gained with respect to the maximum sentence). When the row player chooses r and the column player chooses c, the first component represents the payoff of the row player. It is easy to see that the best action for both suspects is to confess because otherwise there is a possibility that they obtain the full four years. This is called a Nash equilibrium. This game can be easily transformed to the following choice logic program where di stands for “suspect i does not confess” and ci means “suspect i confesses”: d1 ⊕ c1 d2 ⊕ c2 c1 c1 c2 c2
← ← ← d2 ← c2 ← d1 ← c1
The first two rules express that both suspects have to decide upon a single action. The last four indicate which action is the most appropriate given the other suspect’s actions. This program has a single stable model corresponding to the Nash equilibrium of the game, namely {c1 , c2 }. In [3], it was shown that every finite strategic game can be converted to a choice logic program whose stable models correspond with the game’s Nash equilibria. Definition 3 ([6]). A strategic game is a tuple hN, (Ai )i∈N , (≥i )i∈N i where 1. N is a finite set of players;
On the Role of Negation in Choice Logic Programs
Head Tail
239
Head Tail 1, 0 0, 1 0, 1 1, 0
Fig. 2. Matching Pennies (Ex. 3)
2. for each player i ∈ N , Ai is a nonempty set of actions that are available to her (we assume that Ai ∩ Aj = ∅ whenever i 6= j) and 3. for each player i ∈ N , ≥i is a preference relation on A = ×j∈N Aj An element a ∈ A is called a profile. For a profile a we use ai to denote the component of a in Ai . For any player i ∈ N , we define A−i = ×j∈N \{i} Aj . Similarly, an element of A−i will often be denoted as a−i . For a−i ∈ A−i and ai ∈ Ai we will abbreviate as (a−i , ai ) the profile a0 ∈ A which is such that a0 i = ai and a0 j = aj for all j 6= i. A Nash equilibrium of a strategic game hN, (Ai )i∈N , (≥i )i∈N i is a profile a∗ satisfying ∀ai ∈ Ai · (a∗−i , a∗i ) ≥i (a∗−i , ai ) Intuitively, a profile a∗ is a Nash equilibrium if no player can unilaterally improve upon his choice. Put in another way, given the other players’ actions a∗−i , a∗i is the best player i can do3 . Not every strategic game has a Nash equilibrium as demonstrated by the next example. Example 3 (Matching Pennies). Two persons are tossing a coin. Each of them has to choose between Head or Tail. If the choices differ, person 1 pays person 2 a Euro; if they are the same, person 2 pays person 1 a Euro. Each person cares only about the amount of money that she receives. The game modeling this situation is depicted in Fig. 2. This game does not have a Nash equilibrium. The corresponding choice logic program would look like: h1 ⊕ t1 h 2 ⊕ t2 h1 t1 h2 t2
← ← ← h2 ← t2 ← t1 ← h2
This program has no stable model as the game has no Nash equilibrium. Notice that this would not have been the case if we would have used inclusive disjunctions instead of exclusive ones. Theorem 1. For every strategic game G = hN, (Ai )i∈N , (≥i )i∈N i there exists a choice logic program PG such that the set of stable models of PG coincides with the set of Nash equilibria of G. 3
Note that the actions of the other players are not actually known to i.
240
M. De Vos and D. Vermeir
The choice logic program PG obtained for a game, as one can see form the examples, consists of rules expressing that each player has to make a single choice out of her action set and rules expressing the best action for a player given the different actions of the other players.
2
Negation in Choice Logic Programs
While negation is not explicitly present in choice logic programs, it does appear implicitly. E.g. deciding on a in a rule a ⊕ b ← implicitly excludes b from any model; which can be read as “¬b is true”. A similar effect can be observed for constraints: if e.g. a is true, then the presence of a rule ← a, b implies that b must be false. Still, there is a difference with seminegative programs because, although implicitly implied negative information may prevent the further application of certain rules, such information can never be used to enable the inference of further atoms. The latter is possible e.g. in seminegative logic programs or disjunctive logic programs where the body of a rule may contain negated atoms. Hence choice logic programs can be regarded as an interesting intermediate system in between purely positive logic programs, where a model can be computed without taking into account any negative information4 and systems that allow for explicit negation in (the body of) a rule. In the remainder of this paper we will compare the role of negation in choice logic programs with both seminegative logic programs and seminegative disjunctive logic programs. 2.1
Simulating Seminegative Logic Programs
It turns out that choice logic programs can simulate semi-negative datalog programs, using the following transformation, which resembles the one used in [9] or [7] for the transformation of general disjunctive programs into negation-free disjunctive programs. Definition 4. Let P be a semi-negative logic program. The corresponding choice logic program P⊕ can be obtained from P by replacing each rule r : a ← B, ¬C from P with B ∪ C ⊆ BP and C 6= ∅, by ar ⊕ KC ← B (r10 ) a ← ar (r20 ) ∀c ∈ C · KC ← c (r30 ) where ar and KC are new atoms that are uniquely associated with the rule r. A model M for P⊕ is called rational iff: ∀KC ∈ M · M ∩ C 6= ∅ 4
Of course, as a last step, the complement of the positive interpretation can be declared false as a consequence of the closed world assumption.
On the Role of Negation in Choice Logic Programs
241
Intuitively, KC is an “epistemic” atom which stands for “the (non-exclusive) disjunction of atoms from C is believed”. If the positive part of a rule in the original program P is true, P⊕ will choose (rules r10 ) between accepting the conclusion and KC where C is the negative part of the body; the latter preventing rule application. Each conclusion is tagged with the corresponding rule (r20 ), so that rules for the same conclusion can be processed independently. Finally, the truth of any member of C implies the truth of KC (rules r30 ). Intuitively, a rational model contains a justification for every accepted KC . Proposition 1. Let P be a semi-negative datalog program. M is a rational stable model of P⊕ iff M ∩ BP is a (total) stable model of P . The rationality restriction is necessary to prevent KC from being accepted without any of the elements of C being true. For positive-acyclic programs, we can get rid of this restriction. Definition 5. A semi-negative logic program P is called positive-acyclic 5 iff there is an assignment of positive integers to each element of BP such that the number of the head of any rule is greater than any of the numbers assigned to any non-negated atom appearing in the body. Proposition 2. Let P be a semi-negative positive-acyclic logic program. There exists a choice logic program Pc such that M is a stable model of Pc iff M ∩ BP is a stable model of P . The reverse transformation is far less complicated. Proposition 3. Let P⊕ be a choice program. There exists a semi-negative datalog program P (containing constraints) such that M is a stable model of P⊕ iff M is a stable model of P . 2.2
Unfounded Sets and Seminegative Disjunctive Programs
In this section, we formalize implicit negative information by defining an appropriate notion of “unfounded set” for choice logic programs and we investigate its properties and usefulness for the computation of stable models. It turns out that many of the results of [5] remain valid or can even be strengthened: 1. For choice logic programs, the greatest unfounded set is defined on any interpretation, which is not the case for disjunctive programs. 2. Contrary to disjunctive programs, the results for choice programs remain valid in the presence of constraints. 3. For choice logic programs, the RP,I (see Definition 9) operator, when repeatedly applied to BP , always yields the greatest unfounded set w.r.t. I. 5
In [5] a similar notion is called “head-cycle free”.
242
M. De Vos and D. Vermeir
4. Because of (1) above, the WP (see Definition 8) operator can be used in the computation of a stable model. For disjunctive programs, this is not possible because there is no guarantee that an intermediate interpretation has a greatest unfounded set. Definition 6. Let P be a choice logic program . An interpretation is any consistent6 subset of (BP ∪ ¬BP ). We use IP to denote the set of all interpretations of P . An interpretation I is total iff7 I + ∪ I − = BP . A total interpretation M is called a (stable) model iff M + is a (stable) model of P . A set X ⊆ BP is an unfounded set for P w.r.t. an interpretation I iff for each p ∈ X one of the following three conditions holds: 1. ∃r : A ⊕ p ← B ∈ P such that A ∩ I 6= ∅ and B ⊆ I, or 2. ∃r : ← B, p ∈ P such that B ⊆ I, or 3. ∀r : A ⊕ p ← B ∈ P at least one of the following conditions is satisfied: a) B ∩ ¬I 6= ∅, or b) B ∩ X 6= ∅, or c) A ∩ B 6= ∅ The set of all unfounded sets for P wrt I is denoted UP (I). TheSgreatest unfounded set wrt I, denoted GUS P (I), is defined by GUS P (I) = X∈UP (I) X. I is called unfounded-free iff I ∩ GUS P (I) = ∅. Condition (1) above expresses the fact that choice is exclusive and thus, alternatives to the actual choice are to be considered false. Condition (2) implies that any atom that would cause a constraint to be violated may be considered false. Condition (3) resembles the traditional definition of unfounded set by expressing when a rule cannot be used to infer a new atom: in case (a), the rule is “blocked” by the current interpretation; in case (b), the rule’s application depends on an unfounded literal while case (c) indicates that the rule is useless[2] since the body contains one of the choices in the head. The next proposition shows that the name “greatest unfounded set” is wellchosen for the union of all unfounded sets, GUS P (I). Proposition 4. Let I be an interpretation for the choice logic program P . Then, GUS P (I) ∈ UP (I). Moreover, GUS P is a monotonic operator; i.e. if I1 ⊆ I2 , then GUS P (I1 ) ⊆ GUS P (I2 ). Note that the above proposition is false for disjunctive logic programs [5]. In fact, for such programs, GUS P (I) ∈ UP (I) is only guaranteed if I is unfounded-free or d-unfounded-free[2]. Proposition 5. Let M be a model for the choice logic program P . Then M − ∈ UP (I). 6 7
For X a set of literals, we use ¬X to denote {¬p|p ∈ X} where ¬¬a = a for any atom a. X is consistent iff X ∩ ¬X = ∅. For a subset X ⊆ (BP ∪ ¬BP ), we define X + = X ∩ BP and X − = ¬(X ∩ ¬BP ).
On the Role of Negation in Choice Logic Programs
243
Unfortunately, the converse does not hold, as can be seen from the interpretation {a, b} of the single-rule program a ⊕ b ← which is not a model, although its complement (the empty set) is trivially unfounded. For seminegative disjunctive logic programs, the converse does hold[5]. Proposition 6. Let P be a choice logic program . A total interpretation is a stable model iff it is unfounded-free. Combining Propositions 5 and 6 yields a characterization of stable models in terms of unfounded sets which also holds for disjunctive programs. Corollary 1. Let P be a choice logic program. An interpretation M is a stable model for P iff GUS P (M ) = M − . Definition 7. Let P be a choice logic program. The immediate consequence operator, TP : 2(BP ∪ ¬BP ) → 2BP , is defined by TP (I) = {a ∈ BP | ∃A ⊕ a ← B ∈ P · A ⊆ ¬I ∧ B ⊆ I} This operator adds those atoms that are definitely needed in any model extension of I. It is clearly monotonic. The WP operator, which uses the same intuition as the one defined in [4], uses TP to extend I + and GUS P to extend I − . Definition 8. Let P be a choice logic program. The operator WP : IP → 2(BP ∪ ¬BP ) is defined by WP (I) = TP (I) ∪ ¬GUS P (I) Note that WP is monotonic and skeptical as it only adds literals that must be in any model extension of I. The following result also holds for disjunctive programs (without constraints). Proposition 7. Let P be a choice logic program and let M be a total interpretation for it. M is a stable model iff M is a fixpoint of WP . The least fixpoint WPω (∅) of WP can, if it exists8 , be regarded as the “kernel” of any stable model. Proposition 8. Let P be a choice logic program . If WPω (∅) exists then WPω (∅) ⊆ M for each stable model M . If WPω (∅) does not exist then P has no stable models. Because WP is deterministic, and contrary to the case of e.g. seminegative (disjunctive-free) programs, WPω (∅) may not be a model, even if it is consistent. Corollary 2. Let P be a choice logic program . If WPω (∅) is a total interpretation, then it is the unique stable model of P . 8
The fixpoint may not exist because WPn (I) may not be consistent, i.e. outside of the domain of WP , for some n > 0.
244
M. De Vos and D. Vermeir
The following monotonically decreasing operator can be used to check the unfounded-free property of total interpretations. Definition 9. Let P be a choice logic program and let I be an interpretation for it. The operator RP,I : 2BP → 2BP is defined by ∃r : A ⊕ a ← B ∈ P · A ∩ I 6= ∅ ∧ B ⊆ I or or RP,I (X) = a ∈ X | ∃← B, a ∈ P · B ⊆ I B ∩ (¬I ∪ X) 6= ∅, or ∀r : A ⊕ a ← B · (A ∪ {a}) ∩ B 6= ∅ Intuitively, RP,I (J) gathers all atoms that are contained in both J and some unfounded set of I. Proposition 9. Let I be a total interpretation for a choice logic program P . + Then, Rω P,I (I ) = ∅ iff I is unfounded-free. Moreover RP,I can be used to compute the greatest unfounded sets GUS P (I). Proposition 10. Let P be a choice logic program and let I be an interpretation for it. Then, Rω P,I (BP ) = GUS P (I). The above result does not hold for disjunctive logic programs.
3
Computing Stable Models
With the help of the above results, an intuitive and relatively efficient “backtracking fixpoint” algorithm can be designed to compute the stable models of a choice logic program. Essentially, the algorithm of Fig. 3 keeps a “current interpretation” (which is initialized to the empty set) and a stack of choice points (initially empty). It consists of a loop which itself consists of two stages: 1. In the first stage, WP is applied on the current interpretation until a fixpoint interpretation is reached or an inconsistency is detected. In the latter case, the algorithm backtracks to the previous choice point (if any) and tries a different choice. 2. In the second stage, a choice is made from the applicable rules (that have a true body in the current interpretation) that are not yet applied. If there are no such rules, the current interpretation is a stable model. For the selected rule, a choice is made for a literal from the head to be added to the current interpretation, thus making the rule applied (the choice must be such that the new interpretation remains consistent). The other literals are immediately assumed false. Such a combination of literals is called is a ”possibly-true conjunction”[5]. We use P TP (I) to denote the set of such choices that are available, given the interpretation I. Given the results of the previous section, it is clear that this algorithm will find all stable models of a given choice logic program. It generalizes on a corresponding algorithm in [5] because it also handles constraints. In addition, it can afford to be more skeptical than the algorithm in [5] (checking consistency at each step in stage 1) because of Proposition 4.
On the Role of Negation in Choice Logic Programs
245
Input: A choice logic program P . Output: The stable models of P . Procedure Compute-Stable(In :SetOfLiterals); 0 var X, In0 , In+1 : SetOfLiterals; begin if P TP (In ) = ∅ (* no choices available *) then output ”In is a stable model of P ”; else for each X ∈ P TP (In ) do 0 In+1 := In ∪ X; (* Assume the truth of a possibly-true conjunction *) repeat 0 In0 := In+1 ; 0 0 In+1 := TP (In0 ) ∪ ¬Rω 0 (BP ); (* = WP (In ) *) P,In 0 0 0 0 until In+1 = In or In+1 ∩ ¬In+1 6= ∅; 0 0 0 ∩ ¬In+1 = ∅ (* In+1 is consistent *) if In+1 0 then Compute-Stable(In+1 ) end-if end-for end-if end-procedure var I,J : SetOfLiterals; G : SetOfAtoms; begin (*Main *) I := ∅; repeat (* Computation of WPω (∅) if it exists *) J := I; G := GUS P (J); (* by means of Rω P,J (BP ) *) if G ∩ J 6= ∅ (* J not unfounded-free *) exit end-if ; I := TP (J) ∪ ¬G; (* = WP (J) *) until I = J; if P TP (I) = ∅ then output ”I is the unique stable model of P ”; else Compute-Stable(I) end-if end. Fig. 3. Algorithm for the Computation of Stable Models for choice logic programs.
4
Conclusions and Directions for Further Research
We introduced choice logic programs as a convenient and simple formalism for modeling decision making. Such programs can e.g. be used to model strategic games. We investigated the implicit support for negation that is present in such programs, due to the exclusive nature of the choices and the support for constraints. It turns out that choice programs can reasonably simulate seminegative
246
M. De Vos and D. Vermeir
logic programs. On the other hand, many results that are known for (seminegative) disjunctive programs (without constraints) can be carried over (or even strengthened) to choice programs (with constraints), resulting in a simple algorithm to compute the stable models of a choice program. It is worth noting that, although [1] introduces constraints for disjunctive logic programs, these are checked only after the usual algorithm (for programs without constraints) finishes, while our algorithm uses constraints directly, which should result in a more eager pruning of candidate interpretations. Future research will attempt to extend the notion of choice programs to allow for the expression of epistemic restrictions. At present, all the knowledge of decision making agents is stored in a single program which is visible to each agent (this fact lies at the basis of Theorem 1); an assumption which is often not realistic.
References 1. Francesco Buccafurri, Nicola Leone and Pasquale Rullo. Strong and Weak Constraints in Disjunctive Datalog. In Jurgen Dix and Ulrich Furbach and Anil Nerode, editors, 4th International Conference on Logic Programming and Non-Monotonic Reasoning (LPNMR’97), volume 1265 of Lecture Notes in Computer Science, pages 2–17. Springer. 2. Marina De Vos and Dirk Vermeir. Forcing in Disjunctive Logic Programs. In Kamal Karlapalem, Amin Y. Noaman and Ken Barker, editors, Proceedings of the Ninth International Conference on Information and Computation, pages 167–174, Winnipeg, Manitoba, Canada, June 1998. 3. Marina De Vos and Dirk Vermeir. Choice logic programs and Nash equilibria in strategic games. Accepted at Annual Conference of the European Association for Computer Science Logic (CSL99), September 20-25, 1999, Madrid, Spain. Published in Lecture Notes in Computer Science, Springer. 4. Allen Van Gelder, Kenneth A. Ross and John S. Schlipf. The Well-Founded Semantics for General Logic Programs. Journal of the Association for Computing Machinery, 38(3) (1991) 620–650. 5. Nicola Leone, Pasquale Rullo and Francesco Scarello. Disjunctive Stable Models: Unfounded Sets, Fixpoint Semantics, and Computation. Journal of Information and Computation 135(2) (1997) 69–112. 6. M. J. Osborne and A. Rubinstein. A Course in Game Theory, MIT Press,1994. 7. Carolina. Ruiz and Jack. Minker. Computing Stable and Partial Stable Models of Extended Disjunctive Logic Programs. Lecture Notes in Computer Science, 927(1995). Spinger 8. D. Sacca. Deterministic and Non-Deterministic Stable Models. Logic and Computation, 5 (1997) 555–579. 9. Chiaki Sakama and Katsumi Inoue An Alternative Approach to the Semantics of Disjunctive Logic Programs and Decductive Databases. Journal of Automated Reasoning, 13 (1994) 145–172. 10. D. Sacca and C. Zaniolo. Stable Models and Non-Determinism for Logic Programs with Negation. In Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 205-218. Association for Computing Machinery, 1990.
Default Reasoning via Blocking Sets Thomas Linke and Torsten Schaub Institut f¨ ur Informatik, Universit¨ at Potsdam, Postfach 60 15 53, D-14415 Potsdam
Abstract. We present a new approach to reasoning with default logic that aims at Reiter’s original approach, whenever there is no source for incoherence. We accomplish this by shifting the emphasis from the application of individual default rules to that of the joint application of a default rule together with rules supporting this application. This allows for reasoning in an incremental yet compositional fashion, without giving up the expressiveness needed for knowledge representation. Technically, our approach differs from others in that it guarantees the existence of extensions without requiring semi-monotonicity.
1
Introduction
Default logic [20] is one of the best known and most widely studied formalizations of default reasoning due to its very expressive and lucid language. In default logic, knowledge is represented as a default theory, which consists of a set of formulas and a set of default rules for representing default information. Possible sets of conclusions from a default theory are given in terms of extensions of that theory. A default theory can possess no, one, or multiple extensions because different ways of resolving conflicts among default rules lead to different alternative extensions. Such extensions are formed in a context-sensitive (yet self-referential) way by requiring that all drawn inferences are already consistent with the final extension. Interestingly, Reiter already anticipated in [20, p. 83] that “providing an appropriate formal definition of this consistency requirement is perhaps the thorniest issue in defining a logic for default reasoning”. At this stage, this was insofar foreseeable since the original approach relied on complex fixed-point constructions that denied any incremental constructibility and that sometimes had no solutions (i.e. extensions) at all. As a consequence, several variants of default logic were proposed, addressing either purportedly counterintuitive or technical problems of the original approach, beginning with Lukaszewicz’ variant [14] up to the proposals in over those of Brewka [2], Delgrande et al. [5], Mikitiuk and Truszczy´ nski [17], Przymusinska and Przymusinski [19] and Giordano and Martinelli [9] up to the proposal by Brewka and Gottlob [3]. Many of these variants put forward the formal property of semi-monotonicity because it guarantees the existence of extensions and it allows for incremental constructibility that is advantageous from a computational point of view. On the other hand, Brewka has shown in [2] that semi-monotonicity diminishes the expressive power of default logic. Intuitively, this is because semi-monotonicity limits the contextual scope of M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 247–261, 1999. c Springer-Verlag Berlin Heidelberg 1999
248
T. Linke and T. Schaub
inferences (as made precise in Section 2). Consequently, we were up to now faced with the dilemma of choosing between full expressive power and full incremental constructibility. We address this shortcoming by proposing a compromising approach that allows for (compositional) incremental constructions and that guarantees the existence of extensions without requiring semi-monotonicity. This gives us a useful trade-off between the feasibility of inference and the expressiveness of representation. (The term feasibility should not be conflated with that of complexity; it rather refers to the degree of incrementality.) As a result, we obtain an approach to default logic that aims at deviating from the original approach on incoherent theories only. This is made more precise in Section 4. The intuitive idea is to substitute the usual fixed-point constructions by rather conflict-driven constructions that are delineated by pre-compiled interaction patterns between default rules. For this purpose, we draw on the notions of blocking sets (and block graphs) introduced in [12]. There, these concepts were used for characterizing default theories guaranteeing the existence of extensions and for supporting queryanswering. (An explicit reference is made for each contribution due to [12].) Our interest lies here, however, on rather different topics, namely the development of a new conception for extensions of default theories and the elaboration of new structural relationships among related approaches.
2
Background
A default rule is an expression of the form αγ: β 1 where α, β and γ are propositional formulas. We sometimes denote the prerequisite α of a default rule δ by p(δ), its justification β by j(δ) and its consequent γ by c(δ).2 A rule is called normal if β is equivalent to γ; it is called semi-normal if β implies γ. A set of default rules D and a set of formulas W form a default theory3 ∆ = (D, W ), that may induce one, multiple or no extensions: Definition 1. [20] Let ∆ = (D, W ) be a default theory. For any set of formulas S, let Γ∆(S) be the smallest set of formulas S 0 such that DL1 W ⊆ S 0 , DL2 Th(S 0 ) = S 0 , DL3 for any αγ: β ∈ D, if α ∈ S 0 and ¬β 6∈ S then γ ∈ S 0 . A set of formulas E is an R-extension of ∆ iff Γ∆(E) = E. Observe that E is a fixed point of Γ∆. Any such extension represents a possible set of beliefs about the world. 1
2 3
Reiter [20] considers default rules having finite sets of justifications. [16] show that any such default rule can be transformed into a set of default rules having a single justification. This generalizes to sets of default rules in the obvious way. If clear from the context, we sometimes refer to (D, W ) as ∆ and vice versa.
Default Reasoning via Blocking Sets
249
For simplicity, we assume for the rest of the paper that default theories (D, W ) comprise finite sets only. Additionally, we assume that for each default rule δ in D, we have that W ∪ {j(δ)} is consistent. This can be done without loss of generality, because we can clearly eliminate all rules δ from D for which W ∪ {j(δ)} is inconsistent, without altering the set of extensions. Consider the standard example where birds fly, birds have wings, penguins are birds, and penguins don’t fly along with a formalization through default theory (D1 , W1 ) where o n b : w p : b p : ¬abp b , , , (1) D1 = b : ¬ab f w b ¬f and W1 = {¬f → abb , f → abp , p}. We let δf , δw , δb , δ¬f abbreviate the previous default rules by appeal to their consequents. Our example yields two extensions, viz. E1 = Th(W1 ∪ {b, w , ¬f }) and E2 = Th(W1 ∪ {b, w , f }), while :x }, W1 ) has no extension. theory (D1 ∪ { ¬x We call a default theory coherent if it has some extension. A default theory (D, W ) is semi-monotonic, if for any D0 ⊆ D, we have that if E 0 is an extension of (D0 , W ), then there is an extension E of (D, W ) where E 0 ⊆ E. Note that semi-monotonicity implies coherence but not vice versa. A default logic is said to enjoy coherence or semi-monotonicity, if its interpretation of default theories guarantees the respective property for all default theories. It is well-known that semi-monotonicity does not hold for Reiter’s default logic. Now we can make precise the dilemma between incremental constructibility and full expressiveness depending on whether semi-monotonicity holds or not. On the one hand, it should be clear that semi-monotonicity allows for incremental constructions because we can gradually extend a set of default rules without running in danger of invalidating former conclusions. Although this does not affect worst-case complexity (cf. [10]), it makes inferencing more feasible since it allows to validate the application of a default rule with respect to previously applied rules only (while ignoring all other rules). On the other hand, semi: abb to (1) in monotonicity reduces expressiveness. To explain this, let us add p ab b order to eliminate extension E2 . While this works in Reiter’s default logic, it fails for semi-monotonic default logics. To see this, simply take E 0 and (D0 , W ) above as E2 and (D1 , W1 ). Now semi-monotonicity ensures that either E2 or : abb }, W1 ). In fact, E2 is one of its supersets is an extension of theory (D1 ∪ { p ab b an extension of this theory in the semi-monotonic variant of Lukaszewicz (see below). This shows that semi-monotonicity disables the possibility of blocking b through default conclusions such as abb . Note the difference to rules like b : ¬ab f the addition of p → abb to W1 that eliminates extension E2 no matter whether we deal with a semi-monotonic system or not. Further, define for a set of formulas S and a set of defaults D, the set of generating default rules as GDR(D, S) = {δ ∈ D | S ` p(δ) and S 6` ¬j(δ)}. We call a set of default rules D grounded in a set of formulas S iff there exists an enumeration hδi ii∈I of D such that we have for all i ∈ I that S ∪ c({δ0 , . . . , δi−1 }) ` p(δi ). As proposed by [11,8], we call a set of default rules D weakly regular wrt a set of formulas S iff we have for each δ ∈ D that S ∪ c(D) 6` ¬j({δ}). A set of rules D
250
T. Linke and T. Schaub
is called strongly regular wrt S iff S ∪ c(D) ∪ j(D) 6` ⊥. A default logic is said to enjoy one of these properties according to its treatment of default rules that generate extensions. While all variants mentioned in the introductory section lead to grounded sets of generating default rules, Reiter’s and Lukaszewicz’ variants enjoy weak regularity, while those in [2,5,17] are strongly regular. Lukaszewicz’ gives in [14] the following alternative definition of extensions: Definition 2. Let (D, W ) be a default theory. For any pair of sets of formulas (S, T ) let Ψ (S, T ) be the pair of smallest sets of formulas (S 0 , T 0 ) such that LDL1 W ⊆ S 0 , LDL2 S 0 = Th(S 0 ), LDL3 for any αγ: β ∈ D, if α ∈ S 0 and ¬η 6∈ Th(S ∪ {γ}) for all η ∈ T ∪ {β} then γ ∈ S 0 and β ∈ T 0 . A set of formulas E is an L-extension of (D, W ) wrt a set of formulas J iff Ψ (E, J) = (E, J). Interestingly, given a theory (D, W ), maximal sets D0 ⊆ D of grounded and weakly regular default rules induce L-extensions, as shown in [21]. That is, for each such D0 , Th(W ∪ c(D0 )) forms an L-extension wrt J = j(D0 )4 . We refer to the set of default rules (here D0 ) generating an L-extension E wrt J (here j(D0 )) as GDL(D, E, J). For capturing the interaction between default rules under weak regularity, [12] introduced the concept of blocking sets: Definition 3. [12] Let ∆ = (D, W ) be a default theory. For δ ∈ D and B ⊆ D, we define 1. B as a potential blocking set of δ, written B 7→∆ δ, iff a) W ∪ c(B) ` ¬j(δ) and b) B is grounded in W . • 2. B is an essential blocking set of δ, written B 7→∆ δ, iff a) B 7→∆ δ and b) (B \ {δ 0 }) 7→∆ δ 00 for no δ 0 ∈ B and no δ 00 ∈ B ∪ {δ}. Observe that for constructing blocking sets the justifications of the default rules are ignored. Hence defaults are treaded as monotonic inference rules.5 • Let B∆ (δ) = {B | B 7→∆ δ} be the set of all essential blocking sets of δ. These blocking sets provide candidate sets for denying the application of δ. The second condition on essential blocking sets, namely (2b), assures that B∆ (δ) contains only ultimately necessary blocking sets: First, members of B∆ (δ) are (set inclusion) minimal among the blocking sets of δ. Second, no blocking set in B∆ (δ) contains any blocking sets for its constituent rules. We give the sets of blocking sets obtained in our example at the end of this section. 4 5
J is used to distinguish identical L-extensions, generated by different sets of defaults D0 . Monotonic inference rules are also considered in [15]
Default Reasoning via Blocking Sets
251
In what follows, we let the term blocking set refer to essential blocking sets. This is justified by our first result, showing that essential blocking sets are indeed sufficient for characterizing the notion of consistency used in Reiter’s default logic: Theorem 1. Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be grounded in W . We have that D0 is weakly regular wrt W iff we have for each δ 0 ∈ D0 and each B ⊆ D0 that B 6∈ B∆ (δ 0 ). The problem with blocking sets is that there may be exponentially many in the worst case. This is why [12] put forward the notion of a block graph, as a compact abstraction of actual blocking sets: Definition 4. [12] Let ∆ = (D, W ) be a default theory. The block graph G(∆) = (V∆ , A∆ ) of ∆ is a directed graph with vertices V∆ = D and arcs A∆ = {(δ 0 , δ) | δ 0 ∈ B for some B ∈ B∆ (δ)} . (Recall that a directed graph G is a pair G = (V, A) such that V is a finite, non-empty set of vertices and A ⊆ V × V is a set of arcs.) We observe that the space complexity of block graphs is quadratic in the number of default rules; its construction6 faces the same time complexity as the extension-membershipproblem. Note that the efforts put into constructing a block graph are, however, meant to amortize over subsequent tasks; notably its construction (and reduction, see below) are both incremental. A default theory is said to be non-conflicting, well-ordered or even, depending on whether its block graph has no arcs, no cycles or only even cycles, respectively. [12] show that these three classes guarantee the existence of R-extensions. A default theory is said to be odd if its block graph has some odd cycle. For a default theory ∆ = (D, W ) and sets B, B 0 ⊆ D, we abuse our notation • and write B 0 7→∆ B, if there is some δ ∈ B such that B 0 ∈ B∆ (δ). With this, we define the concept of supporting sets: Definition 5. [12] Let ∆ = (D, W ) be a default theory. We define the set of all supporting sets for δ ∈ D as •
S∆ (δ) = {B10 ∪ . . . ∪ Bn0 | Bi0 ⊆ D s.t. Bi0 7→∆ Bi and B∆ (δ) = {B1 , . . . , Bn } } provided B∆ (δ) 6= ∅. Otherwise, we define S∆ (δ) = {∅}. Supporting sets are meant to cover the safe application of default rules in focus. We draw on them in the next section as a means for ruling out blocking sets as subsets of the generating default rules, because once a supporting set for some rule has been applied, the rule itself can be applied safely. Default theory (1) yields the following blocking and supporting sets: B∆ (δf ) = {{δ¬f }} B∆ (δw ) = ∅ B∆ (δb ) = ∅ B∆ (δ¬f ) = {{δb , δf }} 6
That is, a corresponding decision problem.
S∆ (δf ) = {{δb , δf }} S∆ (δw ) = {∅} S∆ (δb ) = {∅} S∆ (δ¬f ) = {{δ¬f }}
252
T. Linke and T. Schaub
We get a block graph with vertex set D1 (indicated by white nodes) and (solid) arcs (δ¬f , δf ), (δf , δ¬f ) and (δb , δ¬f ) : δf
δw
δabb
δ¬x δb
δ¬f
: abb The addition of δabb = p ab to (1) augments B∆ (δf ) as well as S∆ (δ¬f ) by b {δabb }, whereas it reduces S∆ (δf ) to ∅, indicating that δf has no supporting sets anymore. We get additionally B∆ (δabb ) = ∅ and S∆ (δabb ) = {∅}, reflecting the fact that δabb is unblockable, that is, applicable without consistency check. Note the crucial difference between an empty supporting set and one containing the empty set. The addition to the block graph is indicated by (light-gray) node :x to (1) leaves δabb and (dashed) arc (δabb , δf ). The further addition of δ¬x = ¬x the above blocking sets unaffected and yields additionally B∆ (δ¬x ) = {{δ¬x }} and S∆ (δ¬x ) = {{δ¬x }} reflecting self blockage. This leads to an additional (light-gray) node δ¬x and a (dotted) odd loop (δ¬x , δ¬x ) in the augmented block graph.
3
Supported default logic
Our new conception of extensions is defined by appeal to blocking and supporting sets: Definition 6. Let ∆ = (D, W ) be a default theory and E a set of formulas. We define E as an S-extension of ∆ iff E = Th(W ∪ c(D0 )) for some maximal set D0 ⊆ D s.t. SDL1 D0 is grounded in W , SDL2 B ⊆ D0 for no B ∈ B∆ (δ) and every δ ∈ D0 , SDL3 S ⊆ D0 for some S ∈ S∆ (δ) and every δ ∈ D0 . Observe that SDL2 and SDL3 are actually parameterized by ∆. We refer to the set of default rules (here D0 ) generating an S-extension E of some theory (D, W ) as GDS(D, E). First of all, we observe that S-extensions do not rely on fixed-point definitions. In contrast to R-extensions, where global consistency is guaranteed at once by appeal to all applying default rules (comprised in the extension, which is the fixed-point), S-extensions ensure consistency by avoiding conflicts (separately) among the generating default rules. While SDL2 implements weak regularity (see Theorem 1) by eliminating all blocking sets of generating default rules, SDL3
Default Reasoning via Blocking Sets
253
provides reasons for doing so. That is, by requiring the presence of some supporting set for each generating default rule δ, it keeps out all blocking sets of δ. This is actually the salient difference between our approach and the standard way of constructing extensions: While all existing variants focus on the applicability of individual rules, we shift the emphasis to the joint application of a rule together with one of its supporting sets. Hence we call the resulting system supported default logic. Consider our initial example in (1). In fact, both R-extensions E1 and E2 are also S-extensions of (D1 , W1 ). To see this, let us verify that the underlying sets of generating defaults GDR(D, E1 ) = {δw , δb , δ¬f } and GDR(D, E2 ) = {δf , δw , δb }, respectively, do also fulfill the conditions stipulated for D0 in Definition 6. Clearly, both of them satisfy SDL1 (groundedness) and SDL2 (weak regularity) by virtue of being generating default rules for R-extensions. To see that both also fulfill SDL3, it is sufficient to verify that each of their constituent rules comes with one of its supporting sets. For instance, we have δf ∈ GDR(D, E2 ) and {δb , δf } ⊆ GDR(D, E2 ) for {δb , δf } ∈ S∆ (δf ). Now consider (D1 ∪ {δabb }, W1 ). We have seen at the end of the previous section that δf has no supporting set in (D1 ∪{δabb }, W1 ), which would protect it against δabb . This disqualifies GDR(D, E2 ) as a generator of an S-extension, since it contains a default rule without a supporting set. Hence E2 is no S-extension of the augmented theory, as opposed to E1 , which is still an S-extension. This is because the supporting sets of all members of GDR(D, E1 ) remain intact when adding δabb (and no new blocking sets for them appear). In both cases, we have obtained in supported default logic the same extensions as in Reiter’s default logic. Notably in both default logics E2 is ruled out by the addition of δabb . This is due to the following fact. Property 1. Supported default logic is not semi-monotonic. Here’s another property shared with Reiter’s approach: Theorem 2. Supported default logic is weakly regular. For further illustration, consider the theory used in [7] to show that semi-normal theories may lack extensions: : a∧¬b : b∧¬c : c∧¬a , b , c ,∅ . (2) (D2 , W2 ) = a While this theory has no R-extension, it has S-extension Th(∅). In fact, the block graph of this theory comprises an odd cycle. This makes it impossible to jointly apply a rule together with its supporting set (given by the singleton containing the pre-predecessor in the block graph). Consequently, none of the rules can contribute to an S-extension, which results in S-extension Th(∅). This comportment becomes more apparent when examining theories like :x }, W1 ) or (D1 ∪D2 , W1 ∪W2 ). In both cases, we obtain no R-extensions (D1 ∪{ ¬x although there is arguably a part of the theory, viz. (D1 , W1 ), that would give rise to reasonable conclusions. However, in each example the respective odd cycle destroys all conclusions, although its rules are unrelated to the rest of the
254
T. Linke and T. Schaub
theory. This is different from supported default logic that yields in both cases the two extensions E1 and E2 already obtained from (D1 , W1 ). So, supported default logic lets the reasonable conclusions go through, whereas rules belonging to (harmful) odd cycles are discarded during extension formation. Notably, the elimination of odd cycles applies to harmful ones only. For instance, theory (D2 , W2 ∪ {c → b}) has despite its odd cycle in the block graph the identical R- and S-extension Th({c}). The capacity of discarding harmful odd cycles leads to the following result. Theorem 3. Every default theory has an S-extension. We complete this section by showing that the extension construction process coincides with that of conventional default logics on normal theories: Theorem 4. Let ∆ be a normal default theory and E a set of formulas. Then, E is an R-extension of ∆ iff E is an S-extension of ∆. Clearly, this result extends to all variants enjoying the same correspondence with Reiter’s default logic. Moreover, it provides us with complexity results: For instance, by using normal default theories, Gottlob shows in [10] that the extension-membership-problem (for R-extensions) is Σ2P -complete. Hence, considering normal default theories, Theorem 4 makes this result applicable to supported default logic.
4
Elaboration in context
This section continues with the elaboration of supported default logic and its underlying concepts in the context of Reiter’s and Lukaszewicz’ default logic. We need the following definition. For default theory ∆ = (D, W ) and D0 ⊆ D, define ∆|D0 = (D \ (D0 ∪ D0 ), W ∪ c(D0 )) where7 D0 = {δ ∈ D | W ∪ c(D0 ) ` ¬j(δ)}. The next result shows that operator | allows for filtering out extensions that are generated by a given rule set: Theorem 5. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. Further, let D0 ⊆ GDR(D, E) be grounded in W . We have that E is an Rextension of ∆ iff E is an R-extension of ∆|D0 . R- and S-extensions. To begin with, we show that Reiter’s conception of default logic coincides with ours whenever there are no odd cycles in the block graph: Theorem 6. Let ∆ be an even default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an S-extension of ∆. 7
D0 eliminates defaults whose justification is inconsistent with the facts of ∆|D0 .
Default Reasoning via Blocking Sets
255
This explains further why we obtain the same R- and S-extensions from (D1 , W1 ). In the general case, both approaches coincide whenever the generating defaults induce an arcless block graph: Theorem 7. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an S-extension of ∆ and G(∆|GDS(D, E)) is arcless. In fact, one can show that if E is an S-extension but not an R-extension, then there is an odd cycle in G(∆|GDS(D, E)) and hence also in G(∆). In other words, R- and S-extensions coincide whenever there is no source for incoherence. In less technical terms, we have in general the following corollary: Corollary 1. Every R-extension is an S-extension, but not vice versa. In fact, the generating default rules of R-extensions do always induce arcless block graphs: Theorem 8. Let ∆ = (D, W ) be a default theory. If E is an R-extension of ∆, then G(∆|GDR(D, E)) is arcless. This is different from S-extensions that leave back harmful odd cycles in the block graph. For instance, the generating default rules of both S-extensions E1 and E2 of (D1 ∪ {δ¬x }, W1 ) induce block graphs containing odd cycle (δ¬x , δ¬x ). L- and S-extensions. Let us now turn to the relationship between our approach and that of Lukaszewicz. First of all, we note that we obtain identical L- and S-extensions from normal default theories. In analogy to Theorem 7, we have the following result: Theorem 9. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. If D0 = GDS(D, E) = GDL(D, E, J) for some J ⊆ j(D) and G(∆|D0 ) is arcless, then E is an S-extension of ∆ iff E is an L-extension of ∆ and D0 satisfies SDL3. That is, whenever the generating default rules induce an arcless block graph, then an L-extension is an S-extension if it satisfies SDL3. More precisely, we have the following relationship. Theorem 10. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. We have that – if E is an L-extension of ∆ and GDL(D, E, J) satisfies SDL3, then E is an S-extension of ∆, and – if E is an S-extension of ∆ and GDS(D, E) is maximal in SDL1 and SDL2, then E is an L-extension of ∆. Both types of extensions are induced by grounded (SDL1) and weakly regular (SDL2) sets of default rules, so that their difference boils down to SDL3. This condition enforces that the application of each default is unseparately
256
T. Linke and T. Schaub
connected with that of one of its supporting sets. The absence of SDL3 leads to semi-monotonicity that allows defaults to support themselves when forming Lextensions. To see this, recall that E2 = Th(W1 ∪ {b, w , f }) is an L-extension of : abb }, W1 ), although it is no S-extension (and no R-extension). This is (D1 ∪ { p ab b because the contribution of δf to L-extension E2 is ensured by semi-monotonicity, while it is ruled out by SDL3 in supported default logic (and R-default logic, see below). Since both the existence of L- and S-extensions is guaranteed, the question arises how the underlying approaches handle odd cycles destroying R-extensions. In fact, we obtain S-extension Th(∅) from Theory (D2 , W2 ), whereas there are three L-extensions, viz. Th({a}), Th({b}), and Th({c}). This shows how semimonotonicity unfolds the odd cycle in Lukaszewicz’ variant, whereas our approach simply ignores the rules belonging to the harmful cycle. This is advantageous whenever there are multiple odd cycles, because they induce an exponential number of L-extensions in the worst case. R- and L-extensions. Let us finally exploit our instruments even further for making the relationship between Reiter’s and Lukaszewicz’ conception of default logic more precise. Lukaszewicz already showed in [14] that every R-extension is an L-extension, but not vice versa. Also, it is well-known that both approaches coincide on normal default theories. To begin with, we show that default theories with arcless block graphs, yield the same R- and L-extensions: Theorem 11. Let ∆ be a non-conflicting default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an L-extension of ∆. Note that weakly regular default logics differ on non-conflicting and strongly , ∅). Our result is therefore orthogonal to the general theories, like ( :ab , : ¬b c equivalence of these default logics on normal default theories. The last result already fails to hold for well-ordered theories, such as , ∅). This theory has one extension containing a under Reiter’s ( :aa , : b∧¬a b interpretation, while a second one containing b emerges in Lukaszewicz’ default logic. We have the following result for the general case, that provides (to the best of our knowledge) the first “iff” result between R- and L-extensions. Theorem 12. Let ∆ = (D, W ) be a default theory and let E be a set of formulas. We have that E is an R-extension of ∆ iff E is an L-extension of ∆, GDL(D, E, J) satisfies SDL3, and G(∆|GDL(D, E, J)) is arcless. In addition to SDL3, the difference between L- and R-extensions boils down the induction of an arcless block graph (already observed between R- and Sextensions). In fact, the last result is not only of theoretical importance, but moreover of practical relevance, since it furnishes an easy procedure for constructing R-extensions from L-extensions. For this, we first construct (incrementally) an
Default Reasoning via Blocking Sets
257
L-extension and then verify by recourse to the block graph whether the corresponding generating default rules satisfy the two additional conditions. This is detailed next. Constructing extensions. The last series of results has not only shed led on the relationships between the considered variants, but it has moreover provided a new view on the respective extension construction processes. In fact, we can directly read off Theorem 12 the following recipe for constructing R-extensions: Procedure R-extension( ∆ = (D, W ) : Default theory ) 0. Construct the block graph G(∆) of ∆. 1. Construct a maximal set D0 ⊆ D of default rules satisfying SDL1 and SDL2. 2. If D0 satisfies SDL3 and if G(∆|D0 ) is arcless then return Th(W ∪ c(D0 )). Interestingly, our above results show that one could integrate the verification of the two conditions in 2. into the maximization in 1. Then, however, step 1. would go beyond the construction of L-extensions. For constructing L-extensions it is clearly sufficient to replace Step 2. by: 2. Return Th(W ∪ c(D0 )). For constructing S-extensions, we must integrate the verification of SDL3 into Step 1., while the condition on G(∆|D0 ) is deleted: Procedure S-extension ( ∆ = (D, W ) : Default theory ) 0. Construct the block graph G(∆) of ∆. 1. Construct a maximal set D0 ⊆ D of default rules satisfying SDL1, SDL2 and SDL3. 2. Return Th(W ∪ c(D0 )). As opposed to L-extensions, we must account for SDL3 when computing Rand S-extensions. In fact, the plain condition imposed by SDL3 comprises a “don’t know”-choice for the supporting set S ∈ S∆ (δ) accompanying rule δ. Interestingly, this turns out to be a “don’t care”-choice whenever {δ}∪S satisfies SDL1, SDL2 and SDL3. We make this precise below in Theorem 15. An issue common to all three procedures is the construction of the block graph at Step 0. Apart from its explicit inspection when constructing R-extensions, the block graph plays an important pragmatic role for verifying SDL2 and SDL3. This is because it delineates the respective search space: Given a default rule, its blocking sets are necessarily found among its predecessors in the block graph, while its supporting sets are among its pre-predecessors. [13] contains case-studies showing that for instance the encoding of the Hamiltonian cycle problem given in [4] yields a rather dense graph, while the encoding of graph coloring [4] and taxonomic knowledge results in rather sparse graphs. The block graph’s role as an instrument indicating rules relevant to the application of other rules is further elaborated upon next.
258
T. Linke and T. Schaub
Restricted semi-monotonicity. Apart from the plain fact that Reiter’s default logic does not enjoy semi-monotonicity (except for restricted theories), there was yet no further elaboration of semi-monotonicity under Reiter’s interpretation. We address this shortcoming by providing a conditioned semimonotonicity property for Reiter’s default logic. For this, we need the following definitions: For a block graph G(∆) = (D, S A) iand vertex v 0∈ D, define (v) where γ∆ (v) = {v} the reachable predecessors of v as γ∆(v) = i≥0 γ∆ i−1 i (v) = {u | (u, w) ∈ A and w ∈ γ (v)} for i > 1. Finally, define and γ∆ ∆ S γ∆(D) = v∈D γ∆(v). Then, we can show the following property of restricted semi-monotonicity for Reiter’s default logic: Theorem 13. Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be a set of defaults. If (γ∆(D0 ), W ) has an R-extension E 0 and ∆|GDR(γ∆(D0 ), E 0 ) is coherent, then ∆ has an R-extension E with E 0 ⊆ E. If D0 is the set of generating defaults of E 0 , then there is an R-extension E of ∆ with E 0 ⊆ E, provided that ∆|D0 is coherent (which is verifiable by appeal to block graph G(∆|D0 )). Since odd loops in G(∆|D0 ) cannot harm S-extensions, we may drop the coherence condition in supported default logic: Theorem 14. Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be a set of defaults. If (γ∆(D0 ), W ) has an S-extension E 0 , then ∆ has an S-extension E with E 0 ⊆ E. The last two theorems exploit the structure of block graphs for capturing the nature of semi-monotonicity in Reiter’s and supported default logic. While full semi-monotonicity starts out from an arbitrary subset D0 ⊆ D, we must additionally account for the reachable predecessors of D0 in block graph G(∆) in order to guarantee the continued existence of partial (R- and) S-extension E 0 . The lack of coherence in Reiter’s approach necessitates moreover the inspection of the remaining rules in D \ γ∆(D0 ) by examining ∆|GDR(γ∆(D0 ), E 0 )). Although the coherence of this theory is often verifiable by appeal to its block graph (cf. Section 2), the mere possibility of a hidden incoherence in D \ γ∆(D0 ) causes the computational inconvenience that all default rules must in some way or another be inspected for reasoning under Reiter’s interpretation (for ensuring an encompassing extension). In contrast to this, full semi-monotonicity allows for constructing L-extensions by gradually adding one default after another as long as SDL1 and SDL2 are satisfied. In fact, a similar proceeding is possible for constructing S-extensions, yet at another level of granularity: Theorem 15 (Compositional incrementality). Let ∆ = (D, W ) be a default theory and let D0 ⊆ D be a set of defaults. If D0 satisfies SDL1, SDL2 and SDL3 (with respect to ∆), then ∆ has an S-extension E with D0 ⊆ GDS(D, E).
Default Reasoning via Blocking Sets
259
The important consequence of this result is that S-extensions are constructible by progressively adding grounded and weakly regular sets of defaults that contain a supporting set for each constituent rule. We refer to such sets, like D0 , as supported sets. A strategy would be to start with a rule δ and one of its supporting sets S ∈ S∆ (δ). While conditions SDL1 and SDL2 depend merely on the rule set in focus, one may have to supplement additional rules, say S 0 , for SDL3. If a supported set like {δ} ∪ S ∪ S 0 , satisfying all three criteria, has applied, it has the same incontestable status as an applied individual rule δ under full semi-monotonicity. Hence for constructing S-extensions of (D1 , W1 ), we may rely on supported sets {δf , δb }, {δw }, {δb }, {δ¬f }, while (D1 ∪ {δabb }, W1 ) gives {δw }, {δb }, {δ¬f }, {δ¬f , δabb }, {δabb }. All of them are freely combinable unless their union violates SDL2 or SDL3. This leads finally to the respective sets of generating defaults.
5
Conclusion
We presented an approach to default logic that aims at deviating from the original approach merely on odd default theories (cf. Theorem 6 and 7). Our approach aims at balancing the expressiveness of Reiter’s default logic with the notion of feasibility found in semi-monotonic variants. While it complies with Reiter’s approach in enabling blockage via default conclusions, it provides incremental constructions using (supported) sets of default rules rather than individual rules, as in Lukaszewicz’ variant. We thus shift the emphasis from the application of individual defaults to the joint application of a default together with one of its supporting sets. We observe that violating this may either lead to the destruction of (R-)extensions or to a tremendous increase in the number of (L-)extensions. A rather different approach to feasibility is pursued in [1,19,3] by using ideas borrowed from well-founded semantics. These approaches are different from ours in several respects. First, they are interested in conclusions belonging to all extensions rather than extensions themselves. Second, the two former approaches are rather weak approximations, as shown in [3]. Finally, the latter approach is only defined for coherent theories, which takes it out of the focus of our approach. On the other hand, these approaches are well studied as regards computational complexity. For semi-monotonic variants, one may draw on their usual equivalence to Reiter’s approach on normal default theories, since the central complexity proofs in [10] rely on prerequisite-free normal default theories. What is definitely needed here is a more fine-grained complexity analysis, addressing constructive issues and distinguishing different treatments of general default rules, as done for instance in [6]. This shortcoming applies also to our work and makes it an issue of future research. In [18] finite sets of justifications, so-called full sets are used to characterize R-extensions. Full sets contain those justifications that are consistent with the set obtained by closing the initial set of facts under classical inferences and the defaults (used as monotonic inference rules) whose justifications belong to the full set. Blocking sets also use default rules as monotonic inference rules, but
260
T. Linke and T. Schaub
here negated justification of other defaults are derived. In this sense, blocking sets and full sets can be considered as dual. However, there is another important difference between full and blocking sets: Whereas the former characterize entire R-extensions the latter are just potential parts of some R-extensions. The distinction between coherence and semi-monotonicity was so far neglected in the literature. Our approach is thus unique in that it guarantees the existence of extensions without requiring semi-monotonicity. In fact, so far the major distinguishing properties of default logics were given by cumulativity, regularity and semi-monotonicity [8]; coherence was always subsumed by semimonotonicity, as one of its consequences. This was insofar appropriate since up to now existing variants enjoyed either both semi-monotonicity and coherence or neither of them. So how does supported default logic fit into the picture? Actually, as regards formal properties, it is indistinguishable from Reiter’s approach, when odd theories are not in issue (cf. Theorem 6). That is, it enjoys weak regularity, whereas it neither satisfies semi-monotonicity nor cumulativity (as verifiable by the standard example). Our elaboration has also revealed structural dependencies that shed light on existing approaches. In particular, we have clarified the relationship between Rand L-extensions and we have given a non-fixed-point definition of R-extensions along with a recipe for constructing R-extension from L-extensions.
Acknowledgements We would like to thank the anonymous referees and Hans Tompits for commenting on a previous version of this paper.
References 1. C. Baral and V. Subrahmanian. Duality between alternative semantics of logic programs and nonmonotonic formalisms. In First International Workshop on Logic Programming and Nonmonotonic Reasoning, pages 69–86. MIT Press, 1991. 2. G. Brewka. Cumulative default logic: In defense of nonmonotonic inference rules. Artificial Intelligence, 50(2):183–205, 1991. 3. G. Brewka and G. Gottlob. Well-founded semantics for default logic,. Fundamenta Informaticae,, 31(3-4):221–236, 1997. 4. P. Cholewi´ nski, V. Marek, A. Mikitiuk, and M. Truszczy´ nski. Experimenting with nonmonotonic reasoning. In Proceedings of the International Conference on Logic Programming. MIT Press, 1995. 5. J. Delgrande, T. Schaub, and W. Jackson. Alternative approaches to default logic. Artificial Intelligence, 70(1-2):167–237, 1994. 6. Y. Dimopoulos. The computational value of joint consistency. In L. Pereira and D. Pearce, editors, European Workshop on Logics in Artificial Intelligence (JELIA’94), volume 838 of Lecture Notes in Artificial Intelligence, pages 50–65. Springer Verlag, 1994. 7. D. Etherington. Reasoning with Incomplete Information: Investigations of NonMonotonic Reasoning. PhD thesis, Department of Computer Science, University
Default Reasoning via Blocking Sets
8. 9. 10. 11.
12.
13. 14. 15. 16.
17.
18. 19. 20. 21.
261
of British Columbia, Vancouver, BC, 1986. Revised Version appeared as: Research Notes in AI, Pitman. C. Froidevaux and J. Mengin. Default logic: A unified view. Computational Intelligence, 10(3):331–369, 1994. L. Giordano and A. Martinelli. On cumulative default logics. Artificial Intelligence, 66(1):161–179, 1994. G. Gottlob. Complexity results for nonmonotonic logics. Journal of Logic and Computation, 2(3):397–425, June 1992. F. L´evy. Computing extensions of default theories. In R. Kruse and P. Siegel, editors, Proceedings of the European Conference on Symbolic and Quantitative Approaches for Uncertainty, volume 548 of Lecture Notes in Computer Science, pages 219–226. Springer Verlag, 1991. T. Linke and T. Schaub. An approach to query-answering in Reiter’s default logic and the underlying existence of extensions problem. In J. Dix, L. Fari˜ nas del Cerro, and U. Furbach, editors, Logics in Artificial Intelligence, Proceedings of the Sixth European Workshop on Logics in Artificial Intelligence, volume 1489 of Lecture Notes in Artificial Intelligence, pages 233–247. Springer Verlag, 1998. Th. Linke. New Foundations for Automation of Default Reasoning. Dissertation, University of Bielefeld, 1999. W. Lukaszewicz. Considerations on default logic — an alternative approach. Computational Intelligence, 4:1–16, 1988. W. Marek and M. Truszczy´ nski. Nonmonotonic logic: context-dependent reasoning. Artifical Intelligence. Springer Verlag, 1993. W. Marek and M. Truszczy´ nski. Normal form results for default logics. In G. Brewka, K. Jantke, and P. Schmitt, editors, Nonmonotonic and Inductive Logic, volume 659 of Lecture Notes in Artificial Intelligence, pages 153–174. Springer Verlag, 1993. A. Mikitiuk and M. Truszczy´ nski. Rational default logic and disjunctive logic programming. In A. Nerode and L. Pereira, editors, Proceedings of the Second International Workshop on logic Programming and Non-monotonic Reasoning., pages 283–299. MIT Press, 1993. I. Niemel¨ a. Towards efficient default reasoning. In C. Mellish, editor, Proceedings of the International Joint Conference on Artificial Intelligence, pages 312–318. Morgan Kaufmann Publishers, 1995. H. Przymusinska and T. Przymusinski. Stationary default extensions. Fundamenta Informaticae, 21(1-2):76–87, 1994. R. Reiter. A logic for default reasoning. Artificial Intelligence, 13(1-2):81–132, 1980. V. Risch. Analytic tableaux for default logics. Journal of Applied Non-Classical Logics, 6(1):71–88, 1996.
Coherent Well-founded Annotated Logic Programs Carlos Viegas Dam´asio1 , Lu´ıs Moniz Pereira2 , and Terrance Swift3 1 A.I. Centre, Faculdade de Ciˆencias e Tecnologia, Universidade Nova de Lisboa, 2825-114 Caparica, Portugal. ([email protected]) 2 A.I. Centre, Faculdade de Ciˆencias e Tecnologia, Universidade Nova de Lisboa, 2825-114 Caparica, Portugal. ([email protected]) 3 Department of Computer Science, University of Maryland, College Park, MD, USA. ([email protected])
Abstract. Extended logic programs and annotated logic programs are two important extensions of normal logic programs that allow for a more concise and declarative representation of knowledge. Extended logic programs add explicit negation to the default negation of normal programs in order to distinguish what can be shown to be false from what cannot be proven true. Annotated logic programs generalize the set of truth values over which a program is interpreted by explicitly annotating atoms with elements of a new domain of truth values. In this paper coherent well-founded annotated programs are defined, and shown to generalize both consistent and paraconsistent extended programs, along with several classes of annotated programs.
1
Introduction
The ability to concisely represent knowledge by a logic program, along with the ability to efficiently evaluate that program, can lead to important applications of logic programming. This has been seen to be the case in diagnosis, model checking, grammar processing, and many other applications. Indeed, a stream of research has focussed on how logic programming can be employed to better represent knowledge. For instance, extended logic programs add explicit negation to normal programs, and gain the fundamental ability to distinguish what can be shown to be false from what is false by default because it cannot be proven true. This distinction can be useful in representing knowledge that derives from separate, possibly contradictory, sources. These two negations are conveniently related through the principle of coherence, which states that an atom that is explicitly proven false must be considered default false as well. In fact, the coherence principle underlies two main semantics for extended programs: the answer set semantics [9] and the well-founded semantics with explicit negation [2]. A separate line of research, into annotated logic programs, has extended the domain of truth M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 262–276, 1999. c Springer-Verlag Berlin Heidelberg 1999
Coherent Well-founded Annotated Logic Programs
263
values over which logic programs are interpreted. Rather than mapping atoms into true, false, or undefined, they are mapped into domains that allow paraconsistent or quantitative information to be represented. This research direction is represented by formalisms such as GAPs [11] and Amalgamation Logic [13]. Each of these extensions is powerful in itself, but suffers from some deficiencies regarding knowledge representation. Extended logic programs, per se, cannot easily represent quantitative information such as probabilities or degrees of belief; annotated logic programs, per se, cannot easily relate what is explicitly known to be false to what is not known to be true. In this paper we propose a framework for coherent well-founded annotated programs that combines the expressivity of both annotated and extended logic programs. We show that several classes of annotated programs can be embedded into coherent well-founded annotated programs, as can both consistent and paraconsistent extended programs.
2
Generalized Annotated Logic Programs
Generalized annotated logic programs (GAPs) are an extension to ordinary definite logic programs. The language, semantics, and query answering procedures are covered in the joint article by Kifer and Subrahmanian [11]. In this section we recall their fundamental results required by our study. A GAP is defined with respect to an underlying upper semi-lattice of truthvalues (T , 4), representing a partial ordering among truth-values. This lattice can be used to represent fuzzy truth-values, time intervals, paraconsistent logics, qualitative degrees of truth, and the like [11,13,1]. In our work we assume this lattice is complete, and therefore the existence of the minimum and maximum elements is always guaranteed, represented by ⊥ and > respectively. Truth-values are referred to in the programs by means of annotation terms, while the basic syntactic elements of generalized annotated logic programs are annotated atoms. Given a set of atoms A, an annotated atom has the form A : µ, where A is an atom in A and µ is an annotation term. For instance, low temp : 0.82 may mean that the temperature is low with a certainty of at least 82%, or box(a) : 4 could signify that object a is a box with a confidence level of at least 4. Formally: Definition 1 (Annotation terms). Let (T , 4) be a complete lattice, F a set of function symbols of arity n > 1, and V a set of variables ranging over the truth-values T . Then, 1. Every element of T is a (simple) annotation term; 2. Every annotation variable of V is a (simple) annotation term; 3. If f is a n-ary annotation function symbol of F and t1 , . . . , tn are annotation terms, then f (t1 , . . . , tn ) is an annotation term; 4. Nothing else is an annotation term. A annotation function symbol is assumed to be computable and continuous, and hence monotonic [11].
264
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Definition 2 (Generalized Annotated Logic Program). A generalized annotated logic program is a set of annotated clauses of the form: A0 : µ0 ← A1 : µ1 & . . . &An : µn where A0 : µ0 is an annotated atom, and the Ai : µi (1 ≤ i ≤ n) are atoms annotated with simple annotation terms. In a ground annotated clause all annotations are truth-values of T . For our purposes the above syntax suffices. However, in their original paper a full blown first-order logic like syntax is introduced, with the usual connectives and quantification symbols. The details can be found in [11]. The reading of an annotated clause of the form A0 : µ0 ← A1 : µ1 & . . . &An : µn is “if A1 is at least µ1 and . . . and An is at least µn then A0 is at least µ0 .” Mark that function symbols may only appear in the heads of annotated clauses. Furthermore, one can instantiate all the annotated variables and evaluate all the function symbol annotations in the heads of the resulting ground program, i.e. where all annotated clauses are replaced by all their ground instances. For simplicity, we assume from now on that this grounding operation has been performed on every program, which may result in an infinite program. This program is dubbed a “strictly ground instance” in [11]. An interpretation is a mapping from the set of atoms to the set of truthvalues T . This corresponds to the restricted interpretations of [11]. Given an interpretation, it is straightforward to define a satisfaction relation: Definition 3 (Ground satisfaction). Let I be an interpretation on (T , 4). We define the ground satisfaction relation as follows, denoted by |=, where all annotations are ground: – I |= A : µ iff I(A) < µ; – I |= A1 : µ1 & . . . &An : µn iff I |= A1 : µ1 and . . . and I |= An : µn ; – I |= A0 : µ0 ← A1 : µ1 & . . . &An : µn iff I 6|= A1 : µ1 & . . . &An : µn or I |= A0 : µ0 An interpretation I is a model of a ground GAP iff it satisfies all the annotated clauses in the program. The ordering of the underlying lattice of truth-values is easily extended to the point-wise ordering between interpretations. As usual, we are interested in the minimal model of the program. It can be obtained by extending the TP operator of van Emden and Kowalski [6] to this more general setting: Definition 4 (Immediate consequences operator). Let P be a generalized annotated logic program on the complete lattice (T , 4). The immediate consequences operator, a function mapping interpretations into interpretations, is defined by: TP (I)(A) = lub { µ | I |= A1 : µ1 & . . . &An : µn , where A : µ ← A1 : µ1 & . . . &An : µn belongs to P }
Coherent Well-founded Annotated Logic Programs
265
Because this operator is monotonic by the Knaster-Tarski fixpoint theorem, i.e. if I 4 J then TP (I) 4 TP (J), we can conclude that TP has a least fixpoint, which corresponds to the least model of P . It can be found by iterating from the least interpretation, where all atoms are initially assigned the truth-value ⊥. However, this operator is not continuous. See [11] for more details. We now provide some examples to illustrate the above concepts. Example 1. Definite logic programs are easily captured by generalized annotated logic programs. Let L2 = {⊥, >} with ⊥ ≺ >. For instance the classical member/2 predicate, written as a GAP over L2 is: member(X, [X| ]) : >.
member(X, [ |Y ]) : > ← member(X, Y ) : >.
In general, we obtain an equivalence between GAPs over L2 and definite logic programs by adding the annotation “: >” to every predicate symbol of the latter. Example 2. [10] Consider Belnap’s logic FOUR = ({⊥, f , t, >}, {⊥ ≺ f , ⊥ ≺ t, f ≺ >, t ≺ >}). The tweety example can be encoded as: f lies(X) : t ← bird(X) : t. f lies(X) : f ← penguin(X) : t. bird(X) : t ← penguin(X) : t.
penguin(f red) : t. bird(tweety) : t.
In this example we conclude that f lies(tweety) : t and f lies(f red) : >. Mark that the corresponding first-order theory does not have any model. Example 3. By defining C to be a lattice of probability intervals, GAPs can be used to implement probabilistic reasoning. Specifically, if probabilities associated with atoms are assumed to be independent, the join operation of C can be defined as the intersection < max(Low1 , Low2 ), min(High1 , High2 ) > of two intervals [Low1 , High1 ] and [Low2 , High2 ]. Expanding on this idea, GAPs can be used to implement a significant subset of Hybrid Probabilistic Programs [5]. Under the lattice C, GAPs have been used to model probabilistic association rules in a deductive database about aircraft spare parts implemented in XSB [7]. An instance of such a rule is: process(P art,0 CADM IU M P LAT IN G0 , Source) : [94.7, 100] ← nomenclature(P art,0 BELL CRAN K 0 , Source) : [100, 100] & f ederal supply class(P art,0 A05000 , Source)) : [100, 100]. This rule allows one to infer the finishing process of a part given other definite true facts about the part that may be present in a database.
3
Coherent Well-founded Annotated Programs
Both common sense and expert knowledge may be positive (stating the veracity of facts and conclusions) or negative (expressing their falsity). It is also important to have the ability to assume truth or falsity of facts non-monotonically.
266
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Generalized Annotated Logic Programs provide ease of expression of (monotonic) negative knowledge by means of epistemic negation. Epistemic negation, represented by the symbol “¬”, is a unary operator on the truth-value lattice, subject to no other additional constraints. This negation corresponds to the notion of explicit negation in several semantics for extended logic programs. Example 4. Continuing from Ex. 3, it is known that the primary material of certain types of parts, must be either all Steel, all Aluminum, or all Magnesium. This gives us a rule, using explicit negation, for instance of the form ¬material(P art,0 ST EEL0 , Source) : P rob ← material(P art,0 ALU M IN U M 0 , Source) : P rob & nomenclature(P art,0 ST RU T 0 , Source) : [100, 100]. The definition of the negation operator is ¬[Low, High] = [1 − High, 1 − Low]. As originally presented [11], the GAP framework lacks a form of (nonmonotonic) default negation (called ontological negation in [10]), i.e. a nonmonotonic closed world assumption. This has been remedied in the more recent work [13], where a well-founded like [8] and answer-sets like semantics [9] have extended GAPs with a default negation operator. However the semantics of [13] ignores a fundamental relationship that default and explicit negation should obey. Namely, that if something is stated false then it should be assumed false: the coherence principle1 . This principle has been extensively advocated in [2,3,4]. We will adopt coherency from now on and examine its consequences within the setting of annotated programs. Example 5. Continuing from Ex. 4, it is also known that cadmium plating is only used on steel parts, so that, in the absence of more specific information about a part’s material, the part may be inferred to be of generic steel. This requires default negation to be added to GAPs. material(P art,0 ST EEL0 , Source) : P rob ← process(P art,0 CADM IU M P LAT IN G0 , Source) : P rob & not more specif ic material(P art,0 ST EEL0 , Source) : P rob. more specif ic material(P art, M at, Source) : P rob ← material(P art, M at1, Source) : P rob & subclass(M at1, M at, Source) : [100, 100]. In these rules about parts, GAPs are used to reason with probabilistic data mining rules in Ex. 3, default negation is used to allow default inferences, while explicit negation is used to allow representation of contrary information in the rule of Ex. 4. Moreover, coherence ensures that default literal of the form not material(P art,0 Steel0 , Source) : P rob are true in the deductive database by virtue of explicit negative information. 1
Even though answer-sets are coherent, their paraconsistent [12] and annotated [13] extensions are not. For details consult [4].
Coherent Well-founded Annotated Logic Programs
267
Let us start by clarifying the notion of explicit negation. Definition 5 (Explicit negation). Let (T , 4) be a complete lattice. An explicit negation operator “¬” is a total mapping from T into T such that the following two conditions are satisfied: 1. for every µ ∈ T we have ¬¬µ = µ; 2. if µ 4 ϑ then ¬µ 4 ¬ϑ, for every µ, ϑ ∈ T . An explicit negation operator enforces a symmetry transformation on the truth-values lattice. The use of negation is already covered by the original syntax of [11]. We extend the syntax with a default negation operator. Definition 6 (Annotated objective and default literals). Let A : µ be an annotated atom constructed from the complete lattice (T , 4) with an explicit negation operator “¬”. For simplicity, assume that µ ∈ T . Then – A : µ, and ¬A : µ = A : ¬µ are annotated objective literals. We use the notation L : µ to refer this type of literal; – not(A : µ), and not(¬A : µ) = not(A : ¬µ) are annotated default literals. Similarly, we use not(L : µ) to denote annotated default literals. The extension of the satisfaction relation to objective literals is straightforward. By definition ¬A : µ equals A : ¬µ. Therefore, I |= ¬A : µ iff I |= A : ¬µ iff I(A) < ¬µ. Notice that ¬µ is an element of T . We conclude that in annotated programs without default negation, the explicit negation operator is just syntactic sugar. The syntax of Generalized Annotated Programs is appropriately extended: Definition 7 (Normal Annotated Logic Programs). A normal annotated logic program is a set of annotated clauses of the form: L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn )
(m, n ≥ 0)
where the Li : µi are annotated objective literals and the not(Hj : ϑj ) are annotated default literals. For default negation, the definition of the satisfaction relation is more intricate. One cannot simply define I |= not(A : µ) via I 6|= A : µ, as can be seen from the next example. Example 6. Consider the normal annotated logic program on the lattice L2 : a : > ← not(a : >) The single model of this program is I(a) = >. However, this is contrary to the usual requirement of a logic program: every true literal should be supported, roughly meaning that it should be implied only by the set of rules with true body for it, whose conjuncts are each supported. One might conclude that under these conditions this program has no computationally relevant model, as this single rule becomes an equivalence, but the body is not supported.
268
C.V. Dam´ asio, L.M. Pereira, and T. Swift
One of the competing solutions to this problem is given by the well-founded semantics, for which the literal becomes undefined in the above situation. We adhere to this stream, and proceed by defining the meaning of the default negation operator via an alternating fixpoint construction, very similar to the original one of well-founded semantics [8] and amalgamation logic [13]. The crux of this technique is the notion of a Gelfond-Lifschitz like operator: Definition 8 (ΓPT operator). Let P be a normal annotated logic program, over the complete lattice (T , 4). Let I be an interpretation for P . The division of P by I over T is the generalized logic program P T = { L0 : µ0 ← L1 : µ1 & . . . &Lm : µm | I L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn ) ∈ P and I 6|= H1 : ϑ1 and . . . and I 6|= Hn : ϑn } Then the operator ΓPT , maps interpretations to interpretations as follows: ΓPT (I) = lfp T P T I
That is, the least fixpoint of the immediate consequences operator T applied to the division of P by I with respect to T . Proposition 1 (Anti-monotonicity of ΓPT ). [13] Let P be a normal annotated logic program over the complete lattice (T , 4). Let I and J be interpretations for P . Then I 4 J implies that ΓPT (J) 4 ΓPT (I). With this operator, the well-founded semantics can be extended to the more general setting of normal annotated logic programs. This result is provided in [13]. Basically, the true atoms in the well-founded annotated semantics are given by the least fixpoint T = ΓPT ΓPT (T ), and the default ones are obtained from F = HPT − ΓPT (T ), where HPT is the set of all annotated objective literals (the annotated Herbrand base). We refer to this fixpoint semantics as the well-founded annotated semantics. Example 7. Consider the normal annotated logic program over FOUR: a : t ← not(b : t). b : t ← not(a : t). b : f. According to the well-founded annotated semantics, these literals are entailed by the program: {a : ⊥, b : f , b : ⊥} ∪ not {a : f , a : >}
Coherent Well-founded Annotated Logic Programs
269
For lattice FOUR, there is a natural explicit negation operator, where ¬⊥ = ⊥, ¬f = t, ¬t = f , and ¬> = >. What is odd about the above result is that though we have b : f we do not have not(¬b : f ), which is the same as not(b : t), and therefore we cannot conclude a : t ! This example shows the well-founded annotated semantics does not comply with the coherence principle, which would entail not(¬b : f ) from b : f . In our opinion, this is unsatisfactory. Nevertheless, the coherence property cannot be easily enforced on the well-founded annotated semantics. A na¨ıve approach would resort to the semi-normal program, an approach used by the well-founded semantics with explicit negations (WFSX) [2]. However, in some situations the resulting semantics might not be coherent, in particular when we have undefined literals in the model. The next example illustrates this. Example 8. Consider the lattice over the set of elements {⊥, t1, t2, f1, f2, >} and with ordering relation ⊥ ≺ t1, ⊥ ≺ f1, t1 ≺ t2, f1 ≺ f2, t2 ≺ >, f2 ≺ >. The explicit negation operator ¬ is given by ¬⊥ = ⊥, ¬t1 = f1, ¬f1 = t1, ¬t2 = f2, ¬f2 = t2, ¬> = >. Let P be the program: a : t1.
a : f2 ← b : t1.
b : t1 ← not(b : t1).
Its semi-normal version Ps is: a : t1 ← not(a : f1). b : t1 ← not(b : t1) & not(b : f1). a : f2 ← b : t1 & not(a : t2). For extended programs without annotations, the well-founded semantics can be derived as the 4-least fixpoint of ΓP ΓPs , where Ps denotes the semi-normal rewrite of P . The computation of the least fixpoint of ΓP ΓPs proceeds as follows: I0 ΓPs I0 I1 = ΓP ΓPs I0 ΓPs I1 I2
= {a = ⊥, b = ⊥} = {a = >, b = t1} = {a = t1, b = ⊥} = {a = >, b = t1} = ΓP ΓPs I1
The annotated literals true in the model are {a : ⊥, a : t1, b : ⊥} ∪ not {b : f1, b : t2, b : f2, b : >}. Thus we have a : t1 but not(a : f1) is not entailed! Coherence is not satisfied. As the example shows, coherence is impeded because a : f2 is undefined. a : f2 is not being falsified via its semi-normal rule because a : t1 is not strong enough; a : t2 being required to do so. One way of guaranteeing coherence, and the one we follow in this paper, is by avoiding these situations. We achieve that by removing rules from the program which can destroy coherence. This is accomplished by an extension to the semi-normal program called the down seminormal program, whose construction requires that the down-set2 of every element in the lattice must be finite. Another solution, to be expounded elsewhere, is to introduce a tuneable coherence in complete lattices. 2
In an ordered set P the down-set of x, denoted by ↓ x is {y ∈ P |y ≤ x}.
270
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Definition 9 (Down semi-normal program). Let P be a normal annotated logic program over the finite lattice (T , 4) with explicit negation operator “¬”. The down semi-normalized program version of P , denoted by Pds , is the normal annotated logic program obtained as follows: If L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn ) ∈ P then let {τ1 , . . . , τo } = (↓ ¬µ0 ) − {⊥}. The following rule is in Pds : L0 : µ0 ←L1 : µ1 & . . . &Lm : µm & not(H1 : ϑ1 )& . . . ¬(Hn : ϑn )¬(L0 : τ1 )& . . . ¬(L0 : τo ) Note that the finiteness condition is necessary to guarantee that each body in the down semi-normal program is finite. This simplifies the presentation in the finite case. The down semi-normal can then be used to define the new operator on programs ΓPTds . However, a more general operator can be easily defined to work on arbitrary complete lattices, by including the down semi-normalization condition directly in the program division operation: Definition 10 (zTP operator). Let P be a normal annotated logic program, over the complete lattice (T , 4). Let I be an interpretation for P . The down division of P by I over T is the generalized logic program P/T I = { L0 : µ0 ← L1 : µ1 & . . . &Lm : µm | L0 : µ0 ← L1 : µ1 & . . . &Lm : µm ¬(H1 : ϑ1 )& . . . ¬(Hn : ϑn ) ∈ P and I 6|= H1 : ϑ1 and . . . and I 6|= Hn : ϑn and for all τ ∈ (↓ ¬µ0 ) − {⊥} then I 6|= L0 : τ } The operator zTP , mapping interpretations to interpretations, is defined by: zTP (I) = lfp TP/T I The proof of anti-monotonicity of zTP is straightforward. Proposition 2 (Anti-monotonicity of zTP ). Let P be a normal annotated logic program over the complete lattice (T , 4). Let I and J be interpretations for P . Then I 4 J implies that zTP (J) 4 zTP (I). Proof. Our proof relies on the fact that when I 4 J, then the program P/T J has fewer rules than P/T I. By monoticity of the immediate consequences operator on the program the result immediately follows (if we have more rules one can derive more truths). Assume that a rule of P with head L : µ is removed in program P/T I. We show that this rule is also removed in P/T J. This is due to at least one of the following cases: 1. There is a default annotated literal not(H : ϑ) in the body of the rule such that I |= H : ϑ. But since I 4 J then J |= H : ϑ. Therefore the rule also does not belong to P/T J.
Coherent Well-founded Annotated Logic Programs
271
2. There is a τ ∈ (↓ ¬µ) − {⊥} such that I |= L : τ . But then J |= L : τ . Therefore the rule does not appear in P/T J. We finally obtain the intended alternating fixpoint operator construction: Proposition 3 (Monotonicity of ΓPT zTP ). Let P be a normal annotated logic program over the complete lattice (T , 4) with explicit negation operator “¬”. Let I 4 J be two interpretations for P . Then I 4 J implies ΓPT zTP (I) 4 ΓPT zTP (J) When the program and associated truth-value lattice are clear from context we omit them from the operators. Also, it should be clear to the reader that zTP coincides with ΓPTds when T is finite. To further simplify notation we denote the combination of operators ΓPT zTP with the notation Γ Γds , whenever confusion does not arise. Since the alternating fixpoint construction Γ Γds is monotonic it has always a least fixpoint, which can be “obtained” by iterating from the least interpretation, ∆, where for every atom A in the language we have ∆(A) = ⊥. The semantics of normal annotated logic programs follows. Definition 11 (Down coherent well-founded semantics). Let P be a normal annotated logic program over the complete lattice (T , 4) with explicit negation operator “¬”. Let M be the least fixpoint of Γ Γds . Its down coherent well-founded semantics is given by {A : µ | M(A) = ϑ and µ 4 ϑ} ∪ {not(A : µ) | (Γds M)(A) = ϑ and µ 64 ϑ} The least fixpoint M of Γ Γds determines the true annotated literals while the default ones are those not belonging to Γds M. Example 9. Consider the program and lattice of Ex. 7. First, note that the down semi-normal version of P is: a : t ← not(b : t) & not(a : f ). b : t ← not(a : t) & not(b : f ). b : f ← not(b : t). The semantics of the program is iteratively obtained as follows: I0 Γds I0 I1 = Γ Γds I0 Γds I1 I2 = Γ Γds I1 Γds I2 I3 = Γ Γds I2
=∆ = {a = ⊥, b = ⊥} = lfp {a : t ←; b : t ←; b : f ←} = {a = t, b = >} = lfp {b : f ←} = {a = ⊥, b = f } = lfp {a : t ←; b : f ←} = {a = t, b = f } = lfp {a : t ←; b : f ←} = {a = t, b = f } = lfp {a : t ←; b : f ←} = {a = t, b = f } = I2
272
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Applying now Def. 11 we get the model: M = {a : ⊥, a : t, b : ⊥, b : f } ∪ not {a : f , a : >, b : t, b : >} One can easily check this is the expected model, and that coherence is verified. Example 10. Let us return to the program and lattice of Ex. 8. The down seminormal version Pds is: b : t1 ← not(b : t1) & not(b : f1). a : t1 ← not(a : f1). a : f2 ← b : t1 & not(a : t2) & not(a : t1). Note that in the last rule we have added the default literals not(a : t2) and not(a : t1) to the body of the rule. We now get the expected results: I0 Γds I0 I1 = Γ Γds I0 Γds I1 I2 = Γ Γds I1
=∆ = {a = ⊥, b = ⊥} = lfp {a : t1 ←; a : f2 ← b : t1; b : t1 ←} = {a = >, b = t1} = lfp {a : t1 ←; a : f2 ← b : t1} = {a = t1, b = ⊥} = lfp {a : t1 ←; b : t1 ←} = {a = t1, b = t1} = I1
The literals true in the model are: {a : ⊥, a : t1, b : ⊥} ∪ not {a : f1, a : t2, a : f2, a : >, b : f1, b : t2, b : f2, b : >} Clearly coherence is obeyed. However the semantics in some cases is “overly” coherent. The following illustrates this fact. Example 11. Consider the lattice over the set of elements {⊥, t1, t2, >1, f1, f2, >2} and with ordering relation ⊥ ≺ t1, ⊥ ≺ f1, t1 ≺ >1, f1 ≺ >1, >1 ≺ t2, >1, ≺ f2, t2 ≺ >2, f2 ≺ >2. The explicit negation operator ¬ is given by ¬⊥ = ⊥, ¬t1 = f1, ¬f1 = t1, ¬>1 = >1, ¬t2 = f2, ¬f2 = t2, ¬>2 = >2. The program consisting of the single fact a : t2 has model M : {a : ⊥, a : t1, a : f1, a : >1, a : t2} ∪ not {a : ⊥, a : t1, a : f1, a : >1, a : t2, a : f2, a : >2} In which both a : t2 and not(a : t2) are present. Te presence of both a : t2 and not(a : t2) in the model M requires some explanation. Consider the atom a : >1. Now, since M |= a : t2, by Definition 3 M |= a : >1. But ¬a : >1 = a : >1, so that M |= ¬a : >1. By coherency, we should have M 6|= a : >1, which by Definition 3 means that M 6|= a : t2, accounting for the paraconsistency.
Coherent Well-founded Annotated Logic Programs
273
This approach to coherency, may be termed strong in that it dictates that if a literal is false to some degree then it must be false for all higher degrees. In some cases strong coherency may be desirable, but not in others. We are currently working on a spectrum of annotated semantics where the intended degree of coherence can be tuned, e.g. by not requiring propagation of falsity to all higher degrees. In particular, one may want to allow for a undefinedness at a higher truth value not to be overriden by falsity at a weaker truth value. Thus, the introduction of coherence into annotated programs raises some non-trivial issues in the propagation of paraconsistency.
4
Embeddings
We next show how down coherent well-founded semantics extends several wellknown semantics of logic and annotated programs. We assume the reader is acquainted with the syntax and definitions of the following semantics. An embedding of the well-founded annotated semantics with a complete truth-value lattice having an explicit negation operator is given below. The rationale is to put two copies of the truth-value lattice side by side, merging their two bottom elements, and putting a new top element over both sub-lattices. The negation operator maps an element onto its corresponding element at the other lattice copy, and so provides the desired symmetry along the vertical axis. Proposition 4 (Well-founded annotated semantics). Let P be a normal annotated logic program over the complete lattice (T , 4). Construct the new lattice (2T , 42 ) as follows. Let 2T = {(⊥, ⊥)} ∪ {(f , µ) | µ ∈ T − {⊥}} ∪ {(t, µ) | µ ∈ T − {⊥}} ∪ {(>, >)} and the ordering on 2T and the explicit negation operator ¬ be defined by: – For every (µ, ϑ) ∈ 2T we have (⊥, ⊥) 42 (µ, ϑ), and ¬(⊥, ⊥) = (⊥, ⊥); – For every (f , µ), (f , ϑ) ∈ 2T we have (f , µ) 42 (f , ϑ) iff µ 4 ϑ. Furthermore, ¬(f , µ) = (t, µ); – For every (t, µ), (t, ϑ) ∈ 2T we have (t, µ) 42 (t, ϑ) iff µ 4 ϑ. Furthermore, ¬(t, µ) = (f , µ); – For every (µ, ϑ) ∈ 2T we have (µ, ϑ) 42 (>, >), and ¬(>, >) = (>, >). Construct program P 2 over the new lattice, from P , by substituting every occurrence of ⊥ by (⊥, ⊥) and every other literal µ by (t, µ). Then a literal is derived from program P over lattice T under the well-founded annotated semantics iff its corresponding literal – substituting its annotation either to (⊥, ⊥) or (t, µ) – is derived from program P 2 over lattice 2T under the down coherent well-founded semantics. The technique of Prop. 4 should now be clear. Note that literals annotated with (>, >) or (f , µ) never appear in P 2 , and therefore objective literals annotated with those truth-values are never derived from P 2 . Thus the extra default
274
C.V. Dam´ asio, L.M. Pereira, and T. Swift
annotated literals introduced in the down semi-normal program are never false. It is this fact that ensures that the fixpoint of ΓPT zTP over the lattice 2T coincides with that of ΓPT ΓPT over T , and so guarantees the validity of the embedding. The embedding requires the smallest addition of new truth values to the original lattice so that the embedding into down coherent well-founded semantics is valid. This is important since it is desirable to keep the lattice as simple and as close as possible to the original one. Obviously, the lattice L2 of Ex. 1 provides an embedding of Well-founded Semantics into the Well-founded Annotated Semantics. By resorting to the above result and letting T = L2 we obtain an embedding of well-founded semantics into down coherent well-founded semantics. Notice that 2L2 = {(⊥, ⊥), (f , >), (t, >), (>, >)} Accordingly, the ordering relation is: (⊥, ⊥) 42 (f , >)
(⊥, ⊥) 42 (t, >) (f , >) 42 (>, >)
(t, >) 42 (>, >)
and the explicit negation operator behaves as follows: ¬(⊥, ⊥) = (⊥, ⊥) ¬(f , >) = (t, >) ¬(t, >) = (f , >) ¬(>, >) = (>, >) The reader may verify that (2L2 , 42 ) is isomorphic to Belnap’s logic FOUR. This justifies the following corollary: Corollary 1 (Well-founded semantics). Let P be a normal logic program, and P W F S the following normal annotated logic program over the lattice FOUR with the usual explicit negation operator: P W F S = {A0 : t ← A1 : t& . . . , An : t¬ (B1 : t)& . . . ¬ (Am : t) such that A0 ← A1 , . . . , An , not B1 , . . . , not Bm belongs to P } Then A, respectively not A, belongs to the well-founded model of P iff A : t, respectively not (A : t), belongs to the down coherent well-founded model of P WFS. The transformation guarantees that all semi-normalization literals introduced in the semi-normal program transformation are of the form not(A : f ). All rules in P W F S have a head annotated with t, therefore neither > nor f are derivable, and therefore not(A : f ) is always true. Thus, the Γ F OU R ΓsF OU R alternating fixpoint construction coincides with the original Γ Γ construction of [8]. More importantly, the same construction can be used to extend an arbitrary lattice with an explicit negation operator, where it is possible to categorically state the truth or falsity of literals and have coherence enforced. A similar effect can be obtained with the lattice operator |, where T |T is the lattice 2T without elements (f , >) and (t, >). This corresponds to merging the two top elements of the two lattice instances, as explained before. The lattice T |T is normally used when there is no need to distinguish between the veracity and falsity of the top
Coherent Well-founded Annotated Logic Programs
275
element of T ; in most situations this corresponds to interpreting > in T already as “contradiction”. An application of the previous techniques and results gives a natural embedding of the Paraconsistent Extended Well-founded Semantics [3,4] into Down Coherent Well-Founded Annotated programs. The Paraconsistent Extended Wellfounded Semantics (denoted by W F Mp ) is obtained from Well-founded Annotated Semantics by using the complete set of truth values of lattice FOUR. Proposition 5 (Paraconsistent extended well-founded semantics). [3] Consider the extended logic program P . The normal annotated logic program P ¬ over FOUR obtained from P by substituting every occurrence of ¬A by A : f , and of A by A : t. Let M be its coherent well-founded model. Then – – – –
A belongs to W F Mp (P ) iff A : t belongs to M; ¬A belongs to W F Mp (P ) iff A : f belongs to M; not A belongs to W F Mp (P ) iff not (A : t) belongs to M; not ¬A belongs to W F Mp (P ) iff not (A : f ) belongs to M.
If programs contain only literals annotated with ⊥ or t we obtain the wellfounded semantics (as expected from Prop. 4). Moreover, if programs contain no default negated literals then the objective literals labeled with t in the down coherent well-founded model coincide with the ones true in the minimal Herbrand Model of the corresponding definite program. We thus have shown how to move from generalized annotated logic programs or from extended logic programs to their natural paraconsistent and coherent well-founded based annotated semantics.
5
Conclusions
The theorems in the previous section show that coherent well-founded annotated programs incorporate both extended and annotated logic programs. The practical importance of such a combination has been indicated by several of the examples. Example 3 uses annotations to formulate probabilistic data mining rules; Example 4 uses explicit negation to represent a contradiction in information arising from different sources; and Example 5 uses default negation as an instance of default reasoning based on probabilistic rules. Thus, annotations, explicit negation, and default negation are all required together for this deductive database example. The generality and practical applicability of coherent wellfounded annotated programs with strong coherency as so far described indicates that their efficient implementation is a worthwhile task; the simplicity of their fixpoint definition suggests that they can be implemented by extending a system, such as XSB, that computes the well-founded semantics, a task which is now underway.
276
C.V. Dam´ asio, L.M. Pereira, and T. Swift
Acknowledgements We thank PRAXIS XXI project MENTAL (Mental Agents Architecture in Logic) and FLAD-NSF project REAP for their support. This work was also partially supported by NSF grants CCR-9702581, EIA-97-5998, and INT-96-00598. We also thank Jos´e Alferes for his helpful comments.
References 1. S. Adali and V. S. Subrahmanian. Amalgamating knowledge bases, III: Algorithms, data structures, and query processing. J. of Logic Programming, 28(1):45–88, 1996. 2. J. J. Alferes and L. M. Pereira. Reasoning with Logic Programming. LNAI volume 1111, Springer–Verlag, 1996. 3. Carlos Viegas Dam´ asio. Paraconsistent Extended Logic Programming with Constraints. PhD thesis, Universidade Nova de Lisboa, October 1996. 4. Carlos Viegas Dam´ asio and Lu´ıs Moniz Pereira. A survey of paraconsistent semantics for logic programas. In D. Gabbay and P. Smets, editors, Handbook of Defeasible Reasoning and Uncertainty Management Systems, volume 2, pages 241– 320. Kluwer, 1998. 5. A. Dekhtyar and V. S. Subrahmanian. Hybrid probabilistic programs. In International Conference on Logic Programming 1997, pages 391–495, 1997. 6. M. Van Emden and R. Kowalski. The semantics of predicate logic as a programming language. Journal of ACM, 4(23):733–742, 1976. 7. J. Freire, P. Rao, K. Sagonas, T. Swift, and D. S. Warren. XSB: A system for efficiently computing the well-founded semantics. In Fourth LPNMR, pages 430– 440, 1997. 8. A. Van Gelder, K. A. Ross, and J. S. Schlipf. The well-founded semantics for general logic programs. Journal of the ACM, 38(3):620–650, 1991. 9. M. Gelfond and V. Lifschitz. Logic programs with classical negation. In Warren and Szeredi, editors, 7th International Conference on Logic Programming, pages 579–597. MIT Press, 1990. 10. M. Kifer and E. Lozinskii. A logic for reasoning with inconsistency. J. of Automated Reasoning, 8:179–215, 1992. 11. M. Kifer and V. S. Subrahmanian. Theory of generalized annotated logic programming and its applications. J. of Logic Programming, 12:335–367, 1992. 12. C. Sakama and K. Inoue. Paraconsistent Stable Semantics for Extended Disjunctive Programs. J. of Logic and Computation, 5(3):265–285, 1995. 13. V. S. Subrahmanian. Amalgamating knowledge bases. ACM Transactions on Database Systems, 19(2):291–331, 1994.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics Thomas Lukasiewicz Institut f¨ ur Informationssysteme, Technische Universit¨ at Wien Treitlstraße 3, A-1040 Wien, Austria [email protected]
Abstract. We present many-valued disjunctive logic programs in which classical disjunctive logic program clauses are extended by a truth value that respects the material implication. Interestingly, these many-valued disjunctive logic programs have both a probabilistic semantics in probabilities over possible worlds and a truth-functional semantics. We then define minimal, perfect, and stable models and show that they have the same properties like their classical counterparts. In particular, perfect and stable models are always minimal models. Under local stratification, the perfect model semantics coincides with the stable model semantics. Finally, we show that some special cases of propositional many-valued disjunctive logic programming under minimal, perfect, and stable model semantics have the same complexity like their classical counterparts.
1
Introduction
In the logic programming framework, there exist at least two main streams in handling uncertain knowledge. Many-valued and probabilistic logic programming aims to handle numerical uncertainty, whereas disjunctive logic programming deals with disjunctive knowledge and nonmonotonic negation. In this paper, we now propose a combination of both of them in a uniform framework. This paper relies on probability theory as a commonly accepted formalism for handling numerical uncertainty. Probabilistic propositional logics and related languages are thoroughly studied in the literature (see especially [26] and [7]). Their extensions to probabilistic first-order logics can be classified into firstorder logics in which probabilities are defined over a set of possible worlds and those in which probabilities are given over the domain (see especially [2] and [9]). The first ones are suitable for representing degrees of belief, while the latter are appropriate for describing statistical knowledge. In the present paper, we assume that probabilities are defined over a set of possible worlds. Probabilistic reasoning in its full generality is a quite tricky task and very different from classical reasoning (see especially [19], [15], and [14]). It should generally be performed by global linear programming methods, rather than by local inference techniques. For this reason, it is generally also computationally more complex than classical reasoning. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 277–289, 1999. c Springer-Verlag Berlin Heidelberg 1999
278
T. Lukasiewicz
In particular, the model and fixpoint characterization and the proof theory of classical definite logic programming generally do not carry over to probabilistic definite logic programming (as presented in [14]). Moreover, the tractability of special cases of classical logic programming generally does not carry over to the corresponding special cases of probabilistic logic programming. However, we would like an approach to many-valued disjunctive logic programming that does not ignore the years of work in classical disjunctive logic programming. Furthermore, it would be nice if query processing in many-valued disjunctive logic programs is not computationally more complex than query processing in classical disjunctive logic programs. The key to achieve all this is to increase the axioms of probability by an axiom that brings probabilistic logics closer to truth-functional logics [17]. In detail, our many-valued disjunctive logic programs have a probabilistic semantics in probabilities over possible worlds. Furthermore, the truth values of all clauses are truth-functionally defined on the truth values of atoms. We showed in [17] and [18] that many-valued definite logic programming with this probabilistic semantics has a model and fixpoint characterization and a proof theory similar to classical definite logic programming. Moreover, special cases of many-valued logic programming with this semantics were shown to have the same computational complexity like their classical counterparts. Many-valued definite logic programming with this probabilistic semantics has an important companion in the literature. More precisely, van Emden’s quantitative deduction [31] can be given a probabilistic semantics by probabilities over possible worlds under the additional axiom. However, van Emden’s quantitative deduction is based on a conditional probability semantics of the implication connective, while [17], [18], and the present paper use the material implication semantics. Interestingly, it turns out that the material implication is much closer to classical logic programming. In particular, the material implication is more suitable for additionally handling disjunction and nonmonotonic negation. It is also important to point out that both many-valued definite logic programming with probabilistic semantics and van Emden’s quantitative deduction are approximations of probabilistic logic programming. More precisely, our approach is an approximation of probabilistic logic programming under the material implication [18], while van Emden’s quantitative deduction can be understood as an approximation of probabilistic logic programming under the conditional probability implication (as defined in [14]). The literature contains many other approaches to many-valued logic programming (see, for example, [11], [31], [3], [8], and [20]) and probabilistic logic programming (see, for example, [23], [27], [24], [25], [4], [14], and [22]). To our knowledge, this paper is the first to integrate numerical uncertainty in the form of probabilities over possible worlds, disjunction, and nonmonotonic negation in a uniform framework close to classical disjunctive logic programming. The work closest in spirit to this paper is perhaps the one by Mateis [20]. It also combines numerical uncertainty, disjunctive knowledge, and nonmonoto-
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
279
nic negation. Its uncertainty formalism, however, is based on t-norms and not on probabilities over possible worlds. Ngo [25] also combines numerical uncertainty and disjunction. However, he does not consider nonmonotonic negation. Moreover, he does not allow numerical uncertainty on the rule level. Furthermore, his approach is closer to Bayesian networks than to classical disjunctive logic programming. Finally, Ng and Subrahmanian [23] also deal with the combination of numerical uncertainty, disjunctive knowledge, and nonmonotonic negation. However, they also do not allow numerical uncertainty on the rule level. Moreover, their work is perhaps best described as logic programming under nonmonotonic negation about probabilistic disjunctions and conjunctions of atoms. The main contributions of this paper can be summarized as follows: • We present many-valued disjunctive logic programs in which classical disjunctive logic program clauses are extended by a truth value that respects the material implication. These programs have both a probabilistic semantics in probabilities over possible worlds and a truth-functional semantics. • We define minimal, perfect, and stable models and show that they have the same properties like their classical counterparts. In particular, perfect and stable models are always minimal. Furthermore, under local stratification, the perfect model semantics coincides with the stable model semantics. • We show that the problems of deciding whether a ground program has a minimal, perfect, or stable model have the same complexity like their classical counterparts. Moreover, we show that some special cases of propositional query processing under minimal, perfect, and stable model semantics have the same complexity like their classical counterparts. The rest of this paper is organized as follows. In Section 2, we describe the technical background in probabilistic first-order logics over possible worlds. Sections 3 and 4 introduce many-valued disjunctive logic programs. In Section 5, we focus on their minimal, perfect, and stable models. Section 6 concentrates on the complexity of many-valued disjunctive logic programming. In Section 7, we summarize the main results and give an outlook on future research. Note that all proofs are given in full detail in [16].
2
Technical Preliminaries
In this section, we focus on the technical background. We briefly describe firstorder logics of probability and their semantics in Pr- and Pr? -interpretations. 2.1
Pr-Interpretations
We now briefly summarize how (a quantifier-free fragment of) classical first-order logics can be given a probabilistic semantics in which probabilities are defined
280
T. Lukasiewicz
over a set of possible worlds. We basically follow the work by Halpern [9], which we adapt to our needs in the logic programming framework. Let Φ be a first-order vocabulary that contains a set of function symbols and a set of predicate symbols (as usual, constant symbols are function symbols of arity zero). Let X be a set of variables. We define terms by induction as follows. A term is a variable from X or an expression f (t1 , . . . , tk ), where f is a function symbol of arity k ≥ 0 from Φ and t1 , . . . , tk are terms. We define classical formulas by induction as follows. If p is a predicate symbol of arity k ≥ 0 from Φ and t1 , . . . , tk are terms, then p(t1 , . . . , tk ) is a classical formula (called atom). If F and G are classical formulas, then ¬F and (F ∧ G) are classical formulas. A probabilistic formula is an expression prob(F ) ≥ c, where F is a classical formula and c is a real number from [0, 1]. We abbreviate (F ∨ G) and (F ← G) by ¬(¬F ∧ ¬G) and ¬(¬F ∧ G), respectively. We adopt the usual conventions to eliminate parentheses in combination with these abbreviations. Literals, positive literals, and negative literals are defined as usual. Terms, classical formulas, and probabilistic formulas are ground iff they do not contain any variables. The notions of substitutions, ground substitutions, and ground instances of classical formulas are defined as usual. The latter is assumed to be canonically extended to probabilistic formulas. An interpretation I is a subset of the Herbrand base HB Φ over Φ. A variable assignment σ is a mapping that assigns to each variable from X an element from the Herbrand universe HU Φ over Φ. It is by induction extended to terms by σ(f (t1 , . . . , tk )) = f (σ(t1 ), . . . , σ(tk )) for all terms f (t1 , . . . , tk ). The truth of classical formulas F in I under σ, denoted I |=σ F , is inductively defined as follows (we write I |= F if F is ground): • I |=σ p(t1 , . . . , tk ) iff p(σ(t1 ), . . . , σ(tk )) ∈ I. • I |=σ ¬F iff not I |=σ F , and I |=σ (F ∧ G) iff I |=σ F and I |=σ G. A probabilistic interpretation (Pr-interpretation) p = (I, µ) consists of a set I of classical interpretations (called possible worlds) and a discrete probability function µ on I (that is, a mapping µ from I to the real interval [0, 1] such that all µ(I) with I ∈ I sum up to 1 and that the number of all I ∈ I with µ(I) > 0 is countable). The truth value pσ (F ) of a formula F in the Pr-interpretation p under a variable assignment σ is defined by (we write p(F ) if F is ground): X µ(I) . (1) pσ (F ) = I∈I, I |=σ F
A probabilistic formula prob(F ) ≥ c is true in p under σ iff pσ (F ) ≥ c. The formula prob(F ) ≥ c is true in p, or p is a model of prob(F ) ≥ c, denoted p |= prob(F ) ≥ c, iff prob(F ) ≥ c is true in p under all variable assignments σ. The Pr-interpretation p is a model of a set of probabilistic formulas P, denoted p |= P, iff p is a model of all probabilistic formulas in P. The set of probabilistic formulas P is satisfiable iff a model of P exists. The formula prob(F ) ≥ c is a tight logical consequence of P, denoted P |=tight prob(F ) ≥ c, iff c is the infimum of pσ (F ) subject to all models p of P and all variable assignments σ.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
281
For Pr-interpretations p = (I, µ) with µ(I) = 1 for some possible world I ∈ I, we use classical (p) to denote this I. For a set of probabilistic formulas P, we use classical (P) to denote the set of all F with prob(F ) ≥ 1 ∈ P. 2.2
Pr? -Interpretations
We now define Pr? -interpretations by restricting Pr-interpretations (that is, by assuming another axiom besides the axioms of probability): A Pr? -interpretation is a Pr-interpretation p with: p(A ∧ B) = min(p(A), p(B)) for all A, B ∈ HB Φ .
(2)
Note that the condition p(A ∧ B) = min(p(A), p(B)) is just assumed for ground atoms A and B. This condition brings probabilistic logics over possible worlds closer to truth-functional logics. It is important to point out that we do not assume that (2) always holds in the part of the real world that we want to model. The axiom (2) is simply a technical assumption that carries us to a form of many-valued logic programming that approximates probabilistic logic programming (see Section 5.1). It makes a global probabilistic semantics over possible worlds match with the truth-functionality that stands behind logic programming techniques. Interestingly, the axiom (2) is equivalent to the assumption of a subset relationship between possible worlds as follows. Theorem 1. Let p = (I, µ) be a Pr-interpretation. Let I+ = {I ∈ I | µ(I) > 0} and for all ground atoms A let I+ (A) = {I ∈ I+ | I |= A}. Then the condition (2) is equivalent to each of the following conditions (3) and (4): I+ (A) ⊆ I+ (B) or I+ (A) ⊇ I+ (B) for all A, B ∈ HB Φ
(3)
I1 ⊆ I2 or I1 ⊇ I2 for all I1 , I2 ∈ I+ .
(4)
The next theorem shows that the truth value of certain ground formulas under Pr? -interpretations is truth-functionally defined on the truth values of their components. In particular, the truth value of all ground classical clauses is truth-functionally defined on the truth values of their ground atoms. Note that the truth functions are the same as in the nondenumerable infinite-valued Lukasiewicz logic Lℵ1 (see [30] for a survey). Theorem 2. For all Pr? -interpretations p = (I, µ), all ground classical formulas F , and all ground classical formulas G and H that are built without the logical connectives ¬ and ←: p(¬F ) = 1 − p(F ) p(G ∧ H) = min(p(G), p(H)) p(G ∨ H) = max(p(G), p(H)) p(G ← H) = min(1, p(G) − p(H) + 1) .
(5) (6) (7) (8)
282
T. Lukasiewicz
The following theorem shows that Pr? -interpretations give a natural probabilistic semantics to van Emden’s quantitative deduction [31] in which the implication connective is interpreted as conditional probability (note that this result implies that van Emden’s quantitative deduction is an approximation of probabilistic logic programming under the conditional probability implication). Theorem 3. For all Pr? -interpretations p, all real numbers c ∈ [0, 1], and all ground atoms H, B1 , . . . , Bk with k ≥ 0: p(H) ≥ c · min(p(B1 ), . . . , p(Bk )) iff p(B1 ∧ · · · ∧ Bk ) = 0 or p(H | B1 ∧ · · · ∧ Bk ) ≥ c . Note that for p(B1 ∧ · · · ∧ Bk ) > 0, the expression p(H | B1 ∧ · · · ∧ Bk ) is defined as p(H ∧ B1 ∧ · · · ∧ Bk ) / p(B1 ∧ · · · ∧ Bk ). Note also that for k = 0, we naturally define both min(p(B1 ), . . . , p(Bk )) and p(B1 ∧ · · · ∧ Bk ) as 1. Finally, we show that Pr? -interpretations are already uniquely determined by the truth values they give to all ground atoms: Theorem 4. Let p = (I, µ) be a Pr? -interpretation with µ(I) > 0 for all I ∈ I. Then p is uniquely determined by all pairs (A, p(A)) with A ∈ HB Φ . Hence, Pr? -interpretations can be identified with mappings from HB Φ to the real interval [0, 1]. Since such mappings can also be viewed as fuzzy sets, we get the following natural subset relation on Pr? -interpretations. For Pr? -interpretations p and q, we say p is a subset of q, denoted p ⊆ q, iff p(A) ≤ q(A) for all A ∈ HB Φ . We use p ⊂ q as an abbreviation for p ⊆ q and p 6= q. For sets of probabilistic formulas P and probabilistic formulas prob(F ) ≥ c, we write P |=?tight prob(F ) ≥ c, iff c is the infimum of pσ (F ) subject to all Pr? interpretations p that are models of P and all variable assignments σ.
3
Many-Valued Disjunctive Logic Programs
We are now ready to define many-valued disjunctive logic programs. We start by defining many-valued disjunctive logic program clauses, which are special probabilistic formulas that are interpreted under Pr? -interpretations: A many-valued disjunctive logic program clause (or simply clause) is a probabilistic formula of the following kind: prob(A1 ∨ · · · ∨ Al ← B1 ∧ · · · ∧ Bm ∧ ¬C1 ∧ · · · ∧ ¬Cn ) ≥ c , where A1 , . . . , Al , B1 , . . . , Bm , C1 , . . . , Cn are atoms, l, m, n ≥ 0, and c ∈ [0, 1] is rational. It is abbreviated by (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1]. We call A1 ∨ · · · ∨ Al its head, B1 , . . . , Bm , ¬C1 , . . . , ¬Cn its body, and c its truth value. Such a clause is called an integrity clause iff l = 0, a fact iff l > 0 and m + n = 0, and a rule iff l > 0 and m + n > 0. A many-valued disjunctive logic program (or simply program) is a finite set of clauses.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
283
Given a program P , we identify Φ with the vocabulary Φ(P ) that consists of all the function and predicate symbols in P . We use HB P to denote the Herbrand base over Φ(P ). We use ground (P ) to denote the set of all ground instances of clauses from P with respect to Φ(P ). Given a program P , we do not need all the real numbers in [0, 1] to characterize the semantics of P . More precisely, the least set of equidistant rational numbers from [0, 1] that contains 0, 1, and all the rational numbers occurring in P is sufficient (see Theorem 7). Hence, we define the set of truth values of P , 0 1 , n−1 , . . . , n−1 denoted TV (P ), as the least set of rational numbers { n−1 n−1 }, where n ≥ 2 is a natural number, that contains all the rational numbers occurring in P . The program P is n-valued iff |TV (P )| = n. Crucially, the truth value of all ground clauses under Pr? -interpretations is truth-functionally defined on the truth values of their ground atoms: Theorem 5. A ground clause (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] is true in a Pr? -interpretation p iff the following condition holds: max(p(A1 ), . . . , p(Al ), p(C1 ), . . . , p(Cn )) ≥ c − 1 + min(p(B1 ), . . . , p(Bm )) . Note that the maximum and the minimum of an empty list of arguments are canonically defined as 0 and 1, respectively. We finally define queries and their correct and tight answers: A many-valued query (or simply query) is an expression ∃(F )[t, 1], where F is a ground classical formula and t is a variable or a rational number from [0, 1]. We call the query ∃(G)[t, 1] a positive query and the query ∃(¬G)[t, 1] a negative query if G is built without the logical connectives ¬ and ←. Given the queries ∃(F )[c, 1] and ∃(F )[x, 1] to a program P , where c ∈ [0, 1] and x ∈ X , we define their desired semantics in terms of correct and tight answers with respect to a set M(P ) of models of P as follows. The correct answer for ∃(F )[c, 1] to P under M(P ) is Yes if c ≤ inf{p(F ) | p ∈ M(P )} and No otherwise. The tight answer for ∃(F )[x, 1] to P under M(P ) is the substitution θ = {x/d}, where d = inf{p(F ) | p ∈ M(P )}. Many-valued query processing generalizes the classical cautious inference: Theorem 6. Let P be a 2-valued program and let M(P ) be a set of models (I, µ) of P with µ(I) ⊆ {0, 1}. The correct answer for the query ∃(F )[1, 1] to P under M(P ) is Yes iff F is true in all models from classical(M(P )).
4
Example
Assume that we have the following knowledge about roads and the reachability of places through roads: The probability that the road r is closed or that the road s is closed is greater than 0.5. The probability that r connects the place a with the place b is greater than 0.8. The probability that s connects b with c is greater than 0.7. The probability that we can reach Y through X if there is a
284
T. Lukasiewicz
road from X to Y that is not closed is greater than 0.9. The probability that we can reach Z through X if we can reach Z through Y and Y through X is greater than 0.9. This knowledge can be expressed by the following program P (r, s, a, b, and c are constant symbols and R, X, Y , and Z are variables): P = {(closed (r) ∨ closed (s) ← )[0.5, 1], (road (r, a, b) ← )[0.8, 1], (road (s, b, c) ← )[0.7, 1], (reach(X, Y ) ← road (R, X, Y ), ¬closed (R))[0.9, 1], (reach(X, Z) ← reach(X, Y ), reach(Y, Z))[0.9, 1]} . We may ask for the tight lower bound of the probability that we can reach c through a. This can be expressed by the query ∃(reach(a, c))[U, 1], where U is a variable. To give its tight answer, we must specify a set of models of P . Some models p1 , p2 , p3 , and p4 of P are shown in Table 1 (we assume that pi (A) = 0 for all ground atoms A that are not mentioned). The tight answer for ∃(reach(a, c))[U, 1] to P under {p1 , p2 , p3 , p4 } is given by {U/0}, whereas the tight answer for ∃(reach(a, c))[U, 1] to P under {p1 , p2 } is given by {U/0.5}. Hence, as far as the query ∃(reach(a, c))[U, 1] is concerned, {p1 , p2 } seems to describe the intended meaning of P better than {p1 , p2 , p3 , p4 }. Table 1. Some models of the program P closed (r) closed (s) road (r, a, b) road (s, b, c) reach(a, b) reach(b, c) reach(a, c) p1 p2 p3 p4
0.5 0 0 0
0 0.5 0.6 0.7
0.8 0.8 0.8 0.8
0.7 0.7 0.7 0.7
0.7 0.7 0.7 0
0.6 0.6 0 0
0.5 0.5 0 0
The models p1 , p2 , p3 , and p4 are some minimal models of P (with respect to the subset relationship defined in Section 2.2), whereas the models p1 and p2 are the only perfect and stable models of the locally stratified program P . We will introduce these notions in the following section.
5
Model Semantics
In this section, we define minimal, perfect, and stable models of many-valued disjunctive logic programs, and we discuss some of their properties. 5.1
Minimal Models
We now define minimal models of many-valued disjunctive logic programs. A model p of a program P is a minimal model of P iff no model of P is a proper subset of p. MM(P ) denotes the set of all minimal models of P .
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
285
Crucially, as far as minimal models of a program P are concerned, we can restrict our attention to the finite number of truth values in TV (P ): Theorem 7. All minimal models of a program P map into TV (P ). Given a positive query to a program P , the tight answer under MM(P ) describes a tight logical consequence under Pr? -interpretations. Moreover, it approximates a tight logical consequence under Pr-interpretations. That is, inference in Pr? -interpretations is an approximation of inference in Pr-interpretations. Theorem 8. Let P be a program. a) The tight answer for a positive query ∃(G)[x, 1] to P under MM(P ) is given by {x/d}, where d such that P |=?tight prob(G) ≥ d. b) If the tight answer for a positive query ∃(G)[x, 1] to P under MM(P ) is given by {x/d}, then [0, d] contains the unique c with P |=tight prob(G) ≥ c. Finally, many-valued minimal models generalize classical minimal models: Theorem 9. Let P be a 2-valued program. The set classical (MM(P )) coincides with the set of all minimal models of classical(P ). 5.2
Perfect Models
We now extend the notion of perfect models [28] to many-valued disjunctive logic programs. For this purpose, we must first define a priority relation on ground atoms and a preference relation on Pr? -interpretations. The priority relation on ground atoms is simply defined like in [28]: For a program P , the priority relation ≺ and the auxiliary relation are the least binary relations on HB P with the following properties. If ground (P ) contains a clause with the atom A in the head and the negative literal ¬C in the body, then A ≺ C. If ground (P ) contains a clause with the atom A in the head and the positive literal B in the body, then A B. If ground (P ) contains a clause with the atoms A and A0 in the head, then A A0 . If A ≺ B, then A B. If A B and B C, then A C. If A B and B ≺ C, then A ≺ C. If A ≺ B and B C, then A ≺ C. We say that the ground atom B has higher priority than the ground atom A iff A ≺ B. The preference relation on Pr? -interpretations is defined as follows. For Pr? -interpretations p and q, we say p is preferable to q, denoted p q, iff p 6= q and for each A ∈ HB P with p(A) > q(A) there is some B ∈ HB P with q(B) > p(B) and A ≺ B. We write p ≤≤ q iff p q or p = q. We are now ready to define perfect models. A model q of a program P is a perfect model of P iff no model of P is preferable to q. We use PM(P ) to denote the set of all perfect models of P . Every many-valued perfect model is a minimal model: Theorem 10. Every perfect model of a program P is a minimal model of P . Many-valued perfect models generalize classical perfect models:
286
T. Lukasiewicz
Theorem 11. Let P be a 2-valued program. The set classical(PM(P )) coincides with the set of all perfect models of classical(P ). 5.3
Perfect Models under Local Stratification
We now concentrate on perfect models of locally stratified programs. Locally stratified classical disjunctive logic programs without integrity clauses always have a perfect model [28]. We now show that the same holds for locally stratified many-valued disjunctive logic programs without integrity clauses. A program P without integrity clauses is locally stratified iff HB P can be partitioned into sets H1 , H2 , . . . (called strata) such that for each clause (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] ∈ ground (P ) , there exists an i ≥ 1 such that all A1 , . . . , Al belong to Hi , all B1 , . . . , Bm belong to H1 ∪ · · · ∪ Hi , and all C1 , . . . , Cn belong to H1 ∪ · · · ∪ Hi−1 . Given such a partition H1 , H2 , . . . of HB P (which is called a local stratification of P ) with i ≥ 1, we use Pi to denote the set of all clauses from ground (P ) whose heads belong to Hi . Moreover, we define Hi? = H1 ∪ · · · ∪ Hi , Pi? = P1 ∪ · · · ∪ Pi , and h?i = HB P |Hi? , where HB P = {(A, 1) | A ∈ HB P }. Every model of a locally stratified program is subsumed by a perfect model: Theorem 12. For every model q of a locally stratified program P , there exists a perfect model p of P such that p ≤≤ q. The next theorem shows that each perfect model of a locally stratified program has a natural characterization by iterative minimal models. Theorem 13. Let P be a program and let H1 , H2 , . . . be a local stratification of P . The Pr? -interpretation q is a perfect model of P iff 1. the Pr? -interpretation q|H1 is a minimal model of P1 and 2. for all i ≥ 2, the Pr? -interpretation q|Hi? is a minimal element in the set of ? ? = q|Hi−1 . all models o ⊆ h?i of Pi with o|Hi−1 Finally, the following theorem shows that locally stratified programs without disjunction always have a unique perfect model. Theorem 14. Every disjunction-free locally stratified program P has a unique perfect model p such that p ≤≤ q for all models q of P . 5.4
Stable Models
We now extend the notion of stable models [29] to many-valued disjunctive logic programs. For this purpose, we must slightly generalize clauses as follows. An extended many-valued disjunctive logic program clause (or simply extended clause) is an expression of the following kind: (A1 ∨ · · · ∨ Al ; d ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] ,
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
287
where A1 , . . . , Al , B1 , . . . , Bm , C1 , . . . , Cn are atoms, l, m, n ≥ 0, c is a rational number from [0, 1], and d is a real number from [0, 1]. It is true in a Pr? -interpretation p under a variable assignment σ iff max(pσ (A1 ), . . . , pσ (Al ), pσ (C1 ), . . . , pσ (Cn ), d) ≥ c − 1 + min(pσ (B1 ), . . . , pσ (Bm )). We next generalize the classical Gelfond-Lifschitz transformation: For a program P and a Pr? -interpretation q, the expression P /q denotes the set of extended clauses that is obtained from ground (P ) by replacing every clause (A1 ∨ · · · ∨ Al ← B1 , . . . , Bm , ¬C1 , . . . , ¬Cn )[c, 1] by the extended clause (A1 ∨ · · · ∨ Al ; max(q(C1 ), . . . , q(Cn )) ← B1 , . . . , Bm )[c, 1] . We are now ready to define stable models as follows. A Pr? -interpretation q is a stable model of a program P iff q is a minimal model of P /q. We use SM(P ) to denote the set of all stable models of P . Every stable model is also a minimal model: Theorem 15. Every stable model of a program P is a minimal model of P . The next theorem shows that for locally stratified programs, the notion of stable models coincides with the notion of perfect models. Theorem 16. The set of stable models of a locally stratified program P coincides with the set of perfect models of P . Many-valued stable models generalize classical stable models: Theorem 17. Let P be a 2-valued program. The set classical(SM(P )) coincides with the set of all stable models of classical(P ).
6
Computational Complexity
We now show that some decision problems related to many-valued disjunctive logic programs have the same complexity like their classical counterparts [6]. We first concentrate on the problems of deciding whether a ground program has a minimal, perfect, or stable model. Theorem 18. a) The problem of deciding whether a ground program P has a minimal model is NP-complete. b) The problem of deciding whether a ground program P has a perfect model is Σ2P -complete. c) The problem of deciding whether a ground program P has a stable model is Σ2P -complete. We next focus on some decision problems related to propositional query processing under minimal, perfect, and stable model semantics. Theorem 19. The problem of deciding whether Yes is the correct answer for a ground positive or negative query ∃(F )[c, 1] to a ground program P under M(P ) is Π2P -complete for every M(P ) among MM(P ), PM(P ), and SM(P ).
288
7
T. Lukasiewicz
Summary and Outlook
We presented many-valued disjunctive logic programs with probabilistic semantics in which classical disjunctive logic program clauses are extended by a truth value that respects the material implication. We showed that they have a natural minimal, perfect, and stable model semantics, which generalize the minimal, perfect, and stable model semantics of classical disjunctive logic programs. We also showed that some decision problems related to ground many-valued disjunctive logic programs under minimal, perfect, and stable model semantics have the same computational complexity like their classical counterparts. An interesting topic of future research is to explore other semantics of nonmonotonic negation in many-valued disjunctive logic programs. Moreover, it would be very interesting to elaborate a fixpoint semantics and a proof theory for many-valued disjunctive logic programs.
Acknowledgments I am very grateful to Thomas Eiter, Georg Gottlob, Nicola Leone, and Cristinel Mateis for useful discussions. Some of this work was done while I was supported by a DFG grant.
References 1. K. R. Apt. Logic programming. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, chapter 10, pages 493–574. MIT Press, 1990. 2. F. Bacchus, A. Grove, J. Y. Halpern, and D. Koller. From statistical knowledge bases to degrees of beliefs. Artif. Intell., 87:75–143, 1996. 3. J. F. Baldwin. Evidential support logic programming. Fuzzy Sets Syst., 24:1–26, 1987. 4. A. Dekhtyar and V. S. Subrahmanian. Hybrid probabilistic programs. In Proc. of the 14th International Conference on Logic Programming, pages 391–405, 1997. 5. T. Eiter and G. Gottlob. Complexity aspects of various semantics for disjunctive databases. In Proc. of the 12th ACM Symposium on Principles of Database Systems, pages 158–167. ACM Press, 1993. 6. T. Eiter and G. Gottlob. On the computational cost of disjunctive logic programming: Propositional case. Ann. Math. Artif. Intell., 15:289–323, 1995. 7. R. Fagin, J. Y. Halpern, and N. Megiddo. A logic for reasoning about probabilities. Inf. Comput., 87:78–128, 1990. 8. M. Fitting. Bilattices and the semantics of logic programming. J. Logic Program., 11(1–2):91–116, 1991. 9. J. Y. Halpern. An analysis of first-order logics of probability. Artif. Intell., 46:311– 350, 1990. 10. M. Kifer and V. S. Subrahmanian. Theory of generalized annotated logic programming and its applications. J. Logic Program., 12(3–4):335–367, 1992. 11. J.-L. Lassez and M. J. Maher. Optimal fixedpoints of logic programs. Theor. Comput. Sci., 39:15–25, 1985. 12. J. W. Lloyd. Foundations of Logic Programming. Springer, Berlin, 2nd ed., 1987.
Many-Valued Disjunctive Logic Programs with Probabilistic Semantics
289
13. J. Lobo, J. Minker, and A. Rajasekar. Foundations of Disjunctive Logic Programming. MIT Press, Cambridge, MA, 1992. 14. T. Lukasiewicz. Probabilistic logic programming. In Proc. of the 13th Biennial European Conf. on Artificial Intelligence, pages 388–392. J. Wiley & Sons, 1998. 15. T. Lukasiewicz. Local probabilistic deduction from taxonomic and probabilistic knowledge-bases over conjunctive events. Int. J. Approx. Reas., 21(1):23–61, 1999. 16. T. Lukasiewicz. Many-valued disjunctive logic programs with probabilistic semantics. Technical Report 1843-99-09, Institut f¨ ur Informationssysteme, Technische Universit¨ at Wien, 1999. ftp://ftp.kr.tuwien.ac.at/pub/tr/rr9909.ps.gz. 17. T. Lukasiewicz. Many-valued first-order logics with probabilistic semantics. In Proc. of the Annual Conference of the European Association for Computer Science Logic, 1998, volume 1584 of LNCS, pages 415–429. Springer, 1999. 18. T. Lukasiewicz. Probabilistic and truth-functional many-valued logic programming. In Proc. of the 29th IEEE International Symposium on Multiple-Valued Logic, pages 236–241, 1999. 19. T. Lukasiewicz. Probabilistic deduction with conditional constraints over basic events. J. Artif. Intell. Res., 10:199–241, 1999. 20. C. Mateis. A Quantitative Extension of Disjunctive Logic Programming. Doctoral Dissertation, Technische Universit¨ at Wien, 1998. 21. J. Minker. Overview of disjunctive logic programming. Ann. Math. Artif. Intell., 12:1–24, 1994. 22. R. T. Ng. Semantics, consistency, and query processing of empirical deductive databases. IEEE Trans. Knowl. Data Eng., 9(1):32–49, 1997. 23. R. T. Ng and V. S. Subrahmanian. A semantical framework for supporting subjective and conditional probabilities in deductive databases. J. Autom. Reasoning, 10(2):191–235, 1993. 24. R. T. Ng and V. S. Subrahmanian. Stable semantics for probabilistic deductive databases. Inf. Comput., 110:42–83, 1994. 25. L. Ngo. Probabilistic disjunctive logic programming. In Proc. of the 12th Conf. on Uncertainty in Artificial Intelligence, pages 397–404. Morgan Kaufmann, 1996. 26. N. J. Nilsson. Probabilistic logic. Artif. Intell., 28:71–88, 1986. 27. D. Poole. Probabilistic Horn abduction and Bayesian networks. Artif. Intell., 64:81–129, 1993. 28. T. C. Przymusinski. On the declarative semantics of stratified deductive databases and logic programs. In J. Minker, editor, Foundations of Deductive Databases and Logic Programming, pages 193–216. Morgan Kaufmann, 1988. 29. T. C. Przymusinski. Stable semantics for disjunctive programs. New Generation Comput., 9:401–424, 1991. 30. N. Rescher. Many-valued Logic. McGraw-Hill, New York, 1969. 31. M. H. van Emden. Quantitative deduction and its fixpoint theory. J. Logic Program., 3(1):37–53, 1986.
Extending Disjunctive Logic Programming by T -norms? Cristinel Mateis Information Systems Department, TU Vienna A-1040 Vienna, Austria [email protected]
Abstract. This paper proposes a new knowledge representation language, called QDLP, which extends DLP to deal with uncertain values. A certainty degree interval (a subinterval of [0, 1]) is assigned to each (quantitative) rule. Triangular norms (T -norms) are employed to define calculi for propagating uncertainty information from the premises to the conclusion of a quantitative rule. Negation is considered and the concept of stable model is extended to QDLP. Different T -norms induce different semantics for one given quantitative program. In this sense, QDLP is parameterized and each choice of a T -norm induces a different QDLP language. Each T -norm is eligible for events with determinate relationships (e.g., independence, exclusiveness) between them. Since there are infinitely many T -norms, it turns out that there is a family of infinitely many QDLP languages. This family is carefully studied and the set of QDLP languages which generalize traditional DLP is precisely singled out. Finally, the complexity of the main decisional problems arising in the context of QDLP (i.e., Model Checking, Stable Model Checking, Consistency, and Brave Reasoning) is analyzed. It is shown that the complexity of the relevant fragments of QDLP coincides exactly with the complexity of DLP. That is, reasoning with uncertain values is more general and not harder than reasoning with boolean values.
1
Introduction
Disjunctive logic programs are logic programs where disjunction is allowed in the heads of the rules and negation may occur in the bodies of the rules. Such programs are nowadays widely recognized as a valuable tool for knowledge representation and commonsense reasoning [3,16,22]. An important merit of disjunctive logic programming (DLP) is its capability to model incomplete knowledge [3,22]. DLP has a very high expressive power. In [14] it is proved that, under stable model semantics, disjunctive programs capture the complexity class Σ2P , that is, they allow us to express every property which is decidable in non-deterministic polynomial time with an oracle in NP. Thus, DLP can express real world situations that cannot be represented by disjunction-free programs. ?
Work partially supported by the Austrian Science Fund (FWF) under grants N Z29-INF and P12344-INF.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 290–304, 1999. c Springer-Verlag Berlin Heidelberg 1999
Extending Disjunctive Logic Programming by T -norms
291
However, real-life applications often need to deal with uncertain information and quantitative data which cannot be represented in DLP. The usual logical reasoning in terms of the truth values true and false are insufficient for the purposes of several real-life applications. Image databases, sensor data, temporal indeterminacy, information retrieval are only a few of the domains where uncertainty occurs [20]. Consider for instance a robot which moves and changes direction according to a prefixed route and to the coordinates received from a sensor. Since sensor data may be subject to error and sensors may have different reliability, a formalism able to deal with uncertain information is needed to encode the control mechanism of the robot. (See section 4 for the example on this subject.) Many frameworks for multivalued logic programming have been proposed to handle uncertain information. There is a split in the AI community between (i) those who attempt to deal with uncertainty using non-numerical techniques [8,9,12], (ii) those who use numerical representations of uncertainty but, believing that probability calculus is inadequate for the task, invent entirely new calculi, such as Dempster-Shafer calculus [10,17,32], fuzzy logic [6,7,15,18,19,33,34], and (iii) those who remain within the traditional framework of probability theory, while attempting to equip the theory with computational facilities needed to perform AI tasks [2,24,25,26,27,29,30]. We propose an approach to define the representation, inference, and control of uncertain information in the framework of DLP which is closely related to the second of the above categories. The main contributions of the paper are the following. – We define a new knowledge representation language, called Quantitative Disjunctive Logic Programming (QDLP), extending DLP to deal with uncertain values. – We define a mechanism of reasoning with uncertainty through rule chaining by using the well-studied and mathematically clean notion of T -norm. In particular, we consider a p-parameterized family of T -norms. Each T -norm is eligible for events with determinate relationships (e.g., independence, exclusiveness) between them. Different T -norms induce different semantics for one given quantitative program. Thus, QDLP is parameterized and each choice of a T -norm induces a different QDLP language. There are infinitely many T -norms, hence there are infinitely many QDLP languages. Importantly, the T -norm may be chosen according to the level of knowledge of the relationships between the atoms (events) of the program. – We single out precisely the fragments from the QDLP family which are generalizations of DLP. Basically, a fragment QF of QDLP induced by a T -norm T (p) , p ∈ [−∞, +∞], is a generalization of DLP iff to each program P from DLP corresponds a program QP in QF such that the set of all stable models of P is exactly the set of all stable models of QP under the semantics induced by T (p) . – We show that the Quantitative Logic Programming Language proposed by van Emden in [34] coincides with the disjunction-free fragment of QDLP induced by the T -norm T3 .
292
C. Mateis
– We analyze the complexity of the main decisional problems arising in QDLP. We classify precisely (i.e., by completeness results) the complexity of all relevant fragments of QDLP (i.e., of the QDLP languages which truly generalize DLP) for the T -norm T3 . Importantly, the addition of uncertainty does not cause any computational overhead, as the complexity of QDLP is exactly the same as the complexity of DLP. In other words, uncertainty comes for free! For space limitation, we omit the proofs of the results reported in section 6.2 and 7. The proofs of all results along with further material and details are reported in the long version of the paper [23] which can be retrieved from the mentioned web address.
2
Preliminaries: Triangular Norms and Conorms
The triangular norms (T -norms) and conorms (T -conorms) form the basis for the various uncertainty calculi discussed in this paper. We will denote a T -norm by T and a T -conorm by S. One of the advantages of these operators is their low computational complexity. The T -norms and T -conorms are functions T, S : [0, 1] × [0, 1] → [0, 1] which satisfy the following properties: T (a, 0) = T (0, a) = 0 S(1, a) = S(a, 1) = 1 [boundary] T (a, 1) = T (1, a) = a S(0, a) = S(a, 0) = a [boundary] T (a, b) ≤ T (c, d) S(a, b) ≤ S(c, d) if a ≤ c, b ≤ d [monotonicity] T (a, b) = T (b, a) S(a, b) = S(b, a) [commutativity] T (a, T (b, c)) = T (T (a, b), c) S(a, S(b, c)) = S(S(a, b), c) [associativity] Intuitively, T (a, b) (resp., S(a, b)) assigns a certainty value to the composition of two events e1 and e2 whose certainty values are a and b. Usually, the composition of e1 and e2 is the conjunction (resp., disjunction) under certain conditions (e.g., independence, mutual exclusiveness). Although defined as two-place functions, the T -norms and T -conorms can be used to represent the composition of a larger number of events. Because of the associativity property, it is possible to define recursively T (x1 , . . . , xn , xn+1 ) and S(x1 , . . . , xn , xn+1 ) for x1 , . . . , xn+1 ∈ [0, 1] as: T (x1 , . . . , xn , xn+1 ) = T (T (x1 , . . . , xn ), xn+1 ) S(x1 , . . . , xn , xn+1 ) = S(S(x1 , . . . , xn ), xn+1 ) Some typical T -norms and T -conorms are the following:
min(a, b) if max(a, b) = 1 0 otherwise 1) T1 (a, b) = max(0, a + b − √ √ T1.5 (a, b) = max(0, a + b − 1)2 T2 (a, b) = ab ab T2.5 (a, b) = a+b−ab T3 (a, b) = min(a, b) T0 (a, b) =
max(a, b) if min(a, b) = 0 1 otherwise S1 (a, b) = min(1, a + b) √ √ S1.5 (a, b) = 1 − max(0, 1 − a + 1 − b − 1)2 S2 (a, b) = a + b − ab S2.5 (a, b) = a+b−2ab 1−ab S3 (a, b) = max(a, b) S0 (a, b) =
Extending Disjunctive Logic Programming by T -norms
293
It is important to note that T0 ≤ T1 ≤ T1.5 ≤ T2 ≤ T2.5 ≤ T3 S3 ≤ S2.5 ≤ S2 ≤ S1.5 ≤ S1 ≤ S0 T1 is appropriate to perform the intersection of lower probability bounds (uncertainty values) and captures the notion of the worst case, where the arguments are considered as mutually exclusive as possible. T3 is appropriate to represent the intersection of upper probability bounds and captures the notion of the best case, where one argument attempts to subsume the others. T2 is the classical probabilistic operator that assumes independence of arguments and its dual T conorm S2 is the usual additive measure for the union. Schweizer and Sklar [31] proposed a parameterized family, denoted by T (a, b, p), where a and b are the T -norm’s arguments and p is the parameter that spans the space of T -norms from T0 to T3 : −p 1 if a−p + b−p ≥ 1 when p < 0 (a + b−p − 1)− p 0 if a−p + b−p ≤ 1 when p < 0 T (a, b, p) = when p → 0 limp→0 T (a, b, p) = ab −p 1 when p > 0 (a + b−p − 1)− p Let R = [−∞, +∞] and R+ = [0, +∞]. Given a real number p ∈ R, we denote by T (p) the member of the family of T -norms induced by p. Note that we allow p to be assigned the infinite values −∞ and +∞. Figure 1 illustrates how T (p) spans over the real numbers, so for example T (−∞) = T0 , T (−1) = T1 , T (0) = T2 , and T (+∞) = T3 . p T (a, b, p)
−∞
−1
−0.5
0
1
T0
T1
T1.5
T2
T2.5
+∞ T3
Fig. 1. Spanning of the T -norms over the real numbers
For suitable negation operators N (a), such as N (a) = 1 − a, T -norms and T -conorms are duals in the sense of the following generalization of DeMorgan’s law: S(a, b) = N (T (N (a), N (b))) T (a, b) = N (S(N (a), N (b))) This duality implies that given the negation operator N (a) = 1 − a, the selection of a T -norm uniquely constrains the selection of the T -conorm. The dual parameterized family of T -conorm, denoted by S(a, b, p) is defined as S(a, b, p) = 1 − T (1 − a, 1 − b, p). Given a real number p ∈ R we denote by S (p) the member of the family of T -conorms induced by p. So for example S (−∞) = S0 , S (−1) = S1 , S (0) = S2 , and S (+∞) = S3 . Theorem 1. The evaluation of the T -norms and T -conorms at the extremes of the unity interval [0, 1] satisfies the truth tables of the logical operators AN D and OR, respectively. 2
294
3
C. Mateis
Syntax of QDLP
A term is either a constant or a variable1 . An atom is a(t1 , ..., tn ), where a is a predicate of arity n and t1 , ..., tn are terms. A literal is either a positive literal p or a negative literal ¬p, where p is an atom. A positive (disjunctive) quantitative rule r is a clause of the form: [x,y]
h1 ∨ · · · ∨ hn ←− b1 , · · · , bk ,
n ≥ 1, k ≥ 0
where h1 , · · · , hn , b1 , · · · , bk are atoms and 0 < x ≤ y ≤ 1. The interval [x, y] is the certainty degree interval of the rule (i.e., the strength of the rule implication) and it is a measure of the reliability of the rule. h1 ∨ · · · ∨ hn is the head of the quantitative rule and it is a non-empty disjunction of atoms. b1 , · · · , bk is the body of the quantitative rule and it is a (possibly empty) conjunction of atoms. If the body is empty (i.e., k = 0) and the head contains exactly one atom (i.e., n = 1), the rule is a fact whose certainty degree interval coincides with the strength of the implication. A positive (disjunctive) quantitative program is a finite set of positive quantitative disjunctive rules.
4
Semantics of QDLP
Let P be a positive disjunctive quantitative program. The Herbrand universe UP , the Herbrand base BP , and ground(P) of P are defined like in DLP. Once we defined the syntax of quantitative rules, we need to evaluate the satisfiability of premises, to propagate uncertainty through rule chaining and to consolidate the same conclusion derived from different rules. A quantitative interpretation I of P is a mapping which assigns to each atom A ∈ BP a certainty degree interval [xA , yA ] ⊆ [0, 1]. We write I(A) = [xA , yA ], ∀ A ∈ BP , [xA , yA ] ⊆ [0, 1]. It is worth noting that a quantitative program P has infinitely many quantitative interpretations because each atom A ∈ BP can be assigned infinitely many intervals [xA , yA ] ⊆ [0, 1]. This is an important difference w.r.t. (function-free) DLP, where each program has always a finite number of Herbrand interpretations. Let p be any real number inducing T (p) from the family of T -norms. We denote by T (p) (resp., S (p) ) the generalization of the T -norm T (p) (resp., T -conorm S (p) ) whose arguments are intervals instead of single values, e.g., T (p) ([a, b], [c, d]) = [T (p) (a, c), T (p) (b, d)]. Now that we know what a quantitative interpretation I is, the first thing to straighten out is when a rule r is true w.r.t. I and what is the role of p. To this end, we first define the way the certainty degree intervals of the atoms of a conjunction or disjunction are combined. In particular, we define 1
Note that function symbols are not considered in this work.
Extending Disjunctive Logic Programming by T -norms
295
1. The certainty degree interval of a (possibly empty) conjunction C of atoms from BP , C = b1 ∧ . . . ∧ bm , w.r.t. I and p: [1, 1] if m = 0 (i.e., C = ∅) (p) I (C) = T (p) (I(b1 ), . . . , I(bm )) if m > 0 2. The certainty degree interval of a non-empty disjunction D of atoms from BP , D = h1 ∨ . . . ∨ hn , w.r.t. I and p: I (p) (D) = S (p) (I(h1 ), . . . , I(hn )). Given two certainty degree intervals [a, b] and [p, q], then [a, b] ≤ [p, q] iff a ≤ p and b ≤ q. Moreover, [a, b] < [p, q] iff (i) [a, b] ≤ [p, q], and (ii) a < p or b < q. [x,y]
We say that a rule r ∈ ground(P), H(r) ←− B(r), is p-satisfied w.r.t. I iff the following inequality is satisfied I (p) (H(r)) ≥ T2 (I (p) (B(r)), [x, y])
(1)
The member on the right-hand side of the inequality (1) represents the certainty degree interval propagated through the rule w.r.t. I and p. The head event H(r) depends on two events: (i) the rule reliability event, expressed through [x, y], and (ii) the reliability event of the body of r w.r.t. I and p, given by I (p) (B(r)). Intuitively, we can assume that the rule reliability is independent of the certainty degree intervals of the body literals, so that the two events are to be considered independent and for this reason we use T2 in (1). A quantitative p-model of P is a quantitative interpretation M of P such that each rule r ∈ ground(P) is p-satisfied w.r.t. M . Since the definition of quantitative p-model relies completely on the instantiation ground(P) of P, for simplicity, throughout the rest of this paper, we assume that P is a ground program (that can be either ground originally, or it is the instantiation ground(P 0 ) of a program P 0 ). The set of all p-models of P is denoted by M(p) (P). As previously noted, a quantitative program P has infinitely many quantitative interpretations. Thus, P may have (infinitely) many p-models. Therefore, it is useful to define an order relation between the p-models of P which makes possible to prefer some p-models to others. Since a p-model assigns certainty degree intervals to all atoms in BP , an order relation between p-models should be defined in terms of an order relation between intervals. Given M1 , M2 ∈ M(p) (P), M1 ≤ M2 iff M1 (A) ≤ M2 (A) for each A ∈ BP . Moreover, M1 < M2 iff (i) M1 ≤ M2 , and (ii) ∃A ∈ BP s.t. M1 (A) < M2 (A). We are now in a position to define what a minimal p-model is. A p-model M ∈ M(p) (P) is minimal iff there is no N ∈ M(p) (P) such that N < M . The minimal p-model semantics of P is the set of all minimal p-models of P and is denoted by MM(p) (P). Once we fix p, we uniquely select a T -norm and its dual T -conorm which completely describe an uncertainty calculus. That is, according to the previous definitions, once we fix p, we define a semantics for P, called the p-semantics. In
296
C. Mateis
this sense, we say that the semantics of the quantitative programs is parameterized and the choice of a T -norm induces the semantics of a quantitative program. Moreover, different T -norms induce different semantics in general. Since we can fix p in infinitely many ways, we can define infinitely many semantics for P. The T -norm may be chosen according to the level of knowledge of the relationships between the atoms of P. Example 1. Consider the ground program P consisting of the following rules [0.9,1]
[0.8,0.8]
[0.5,0.6]
a ∨ c ←− .
u ←− a, b .
w ←− u .
b ←−
v ←− b .
w ←− v .
[0.5,0.5]
.
[0.4,0.8]
[1,1]
and the interpretations I1 , I2 and I3 , I1 = {a : [0.9, 1], b : [0.5, 0.5], c : [0, 0], u : [0.4, 0.4], v : [0.2, 0.4], w : [0.2, 0.4]} I2 = {a : [0.9, 1], b : [0.5, 0.5], c : [0, 0], u : [0.4, 0.6], v : [0.2, 0.4], w : [0.2, 0.4]} I3 = {a : [0.9, 1], b : [0.5, 0.5], c : [0, 0], u : [0.2, 0.5], v : [0.2, 0.4], w : [0.2, 0.4]} If p = +∞ (i.e., T (p) = T3 ) then I1 , I2 ∈ M(p) (P). I3 6∈ M(p) (P) because the [0.8,0.8]
rule u ←− a, b is not p-satisfied w.r.t. I3 . Moreover, I1 < I2 and I1 is minimal. 2 Example 2. Consider a robot which moves and changes direction according to a prefixed route and to the coordinates received from a sensor. Sensor data is subject to error and different sensors may have different reliabilities. The control mechanism of the robot can be encoded in QDLP as follows. Consider the atoms moveT oRight, moveT oLef t, moveU p, moveDown, xCoord(X), yCoord(Y ), sensorX(X), and sensorY (Y ). At regular intervals of time, the sensors return instances of the atoms sensorX(X) and sensorY (Y ) which are used to derive the actual coordinates according to the following quantitative rules [0.9,1]
xCoord(X) ←− sensorX(Z), |X − Z| ≤ 0.5 [0.8,1]
yCoord(Y ) ←− sensorY (Z), |Y − Z| ≤ 0.5 where the strength of the implication of each rule represents the reliability of the corresponding sensor in normal environment conditions (e.g., good visibility, low level of usage, etc). The built-in predicates have always the maximal reliability (i.e., [1, 1]). The atoms sensorX(X) and sensorY (Y ) are assigned reliabilities according to the current environment conditions. For each turning point (x, y) of the assigned route, we define a rule like [1,1]
atom ←− xCoord(x), yCoord(y) where atom ∈ {moveT oRight, moveT oLef t, moveU p, moveDown}. The robot turns to the right when the certainty degree interval of moveT oRight is at least [0.75, 1], and so on. 2
Extending Disjunctive Logic Programming by T -norms
5
297
QDLP with Negation
Several real world situations can be represented much more naturally if negation is allowed [21]. It is therefore necessary to define a general (disjunctive) quantitative rule r which allows negative literals in its body: [x,y]
h1 ∨ · · · ∨ hn ←− b1 , · · · , bk , ¬bk+1 , · · · , ¬bk+m ,
n ≥ 1, k, m ≥ 0
where h1 , · · · , hn , b1 , · · · , bk+m are atoms and 0 < x ≤ y ≤ 1. We show next how the definitions of p-satisfiability and (minimal) p-model change when negative literals are allowed in the rules’ bodies. Moreover, we will see that the quantitative minimal model semantics is not the natural meaning to be assigned to a negative quantitative program, and we define the quantitative stable model semantics. We have to redefine only the relation (1) which the p-satisfiability of a positive rule depends on and take into consideration the case when the body of a rule contains also negative literals; all other definitions remain unchanged. A natural question that arises is, given I(A) = [x, y], how do we evaluate the certainty degree of the negative literal ¬A, that is, what is I(¬A)? The answer is I(¬A) = [N (y), N (x)] = [1 − y, 1 − x] where N is the negation operator N : [0, 1] → [0, 1], N (x) = 1 − x. Thus, the certainty degree interval of the body of r w.r.t. I and p is given by I (p) (B(r)) =
[1, 1] if k + m = 0 T (p) (I(b1 ), . . . , I(bk ), I(¬bk+1 ), . . . , I(¬bk+m )) if k + m > 0
Like in DLP, the quantitative minimal model semantics is applicable also to negative quantitative programs, but it does not capture the meaning of negation by failure (i.e., CWA). We define a new semantics, called quantitative stable model semantics. The quantitative stable model semantics involves the notion of stable p-model. Before defining this new notion, we define the extended quantitative program and the quantitative version (qGL) of the Gelfond-Lifschitz transformation (GL). An extended quantitative program is a quantitative program Pe where subintervals of the unity interval [0, 1] may occur as body atoms in the rules of Pe and are considered like normal atoms. It is worth noting that such atoms are not in BPe . We assume that every quantitative interpretation I of Pe assigns to each atom [x, y] occurring in the body of a rule the certainty degree interval [x, y], that is, I([x, y]) = [x, y]. Given a quantitative interpretation I for P, the qGL-transformation PI of P w.r.t. I is the positive extended quantitative program obtained from P by replacing in the body of every rule each negative literal ¬ Bi by the constant interval I(¬ Bi ). Let M be a p-model of P, for some p ∈ R. M is a stable p-model of P iff P M is a minimal p-model of M . The stable p-model semantics of P is the set of all stable p-models of P and is denoted by SM(p) (P). Note that if P is positive then MM(p) (P) = SM(p) (P) for each p ∈ R.
298
C. Mateis [0.5,0.6]
Example 3. Let P = {a ←− ¬ b}, p ∈ R (the value of p is irrelevant, since the body of the single rule of P contains only one literal), and the minimal p-model [0.5,0.6]
P = {a ←− [1, 1]} and that M is a M = {a : [0.5, 0.6], b : [0, 0]}. Note that M P minimal p-model of M , hence M is a stable p-model of P. Consider now the minimal p-model N = {a : [0.25, 0.36], b : [0.4, 0.5]}. Thus, P N
[0.5,0.6]
P = {a ←− [0.5, 0.6]} and N 0 = {a : [0.25, 0.36], b : [0, 0]} is a p-model of N . P 0 Since N < N , N is not minimal for N , hence N is not a stable p-model of P. 2
6 6.1
Generalization Results Van Emden’s Approach
One of the most relevant earlier works in this field was accomplished by van Emf den in [34]. There, a quantitative rule r is of the form A ←− B1 , . . . , Bn , where n ≥ 0, A, B1 , . . . , Bn are all positive atoms, f is a real number in the interval (0, 1]. r is true in a quantitative interpretation I iff I(A) ≥ f × min{ I(Bi ) | i ∈ {1, . . . , n} }. Theorem 2. The language proposed by van Emden is a particular case of the 2 p-model semantics, where p = +∞ (i.e., T (p) = T3 ). There are important differences between our approach and that of van Emden. First of all, the programs considered in [34] are positive and without disjunction. Moreover, unlike in our approach, each clause implication receives a scalar and not an interval. Finally, van Emden defines a unique uncertainty calculus, based on the T -norm T3 . 6.2
Traditional Disjunctive Logic Programming
From the syntax point of view, QDLP is an extension of DLP. Each P in DLP can be transformed in a program P 0 in QDLP, called the quantitative version of P, by assigning [1, 1] to the strength of the implication of each rule (fact) of P. Remember that in DLP the implications are strict logical true and the logical value true is regarded as [1, 1] in QDLP. Thus, P is equivalent to P 0 from the syntax point of view. Example 4. Consider the logic program P = {a ← ; b ← ; c ∨ d ← a, b}. The [1,1]
[1,1]
[1,1]
quantitative version of P is P 0 = {a ←− ; b ←− ; c ∨ d ←− a, b}.
2
We wish to see now whether QDLP is an extension of DLP also from the semantics point of view. We say that a stable p-semantics of QDLP is a generalization of the stable model semantics of DLP iff SM(p) (P 0 ) = SM(P) for each P in DLP, where P 0 is the quantitative version of P. Given p, a priori, it is not guaranteed that the p-semantics of QDLP generalizes the DLP semantics. It is highly desirable that QDLP semantics coincides with DLP semantics on boolean quantitative programs. Whether the p-semantics of a given class
Extending Disjunctive Logic Programming by T -norms
299
of boolean quantitative programs coincides with the DLP semantics, depends strongly on the value of p and on the features (e.g., positive, stratified negative, disjunctive,etc.) of the QDLP class. We single out the classes of QDLP and the values of p for which the p-semantics on the boolean quantitative programs of these classes coincides with the DLP semantics. Table 1. QDLP fragments generalizing DLP { } { ¬s } { ¬ } { ∨h } { ∨ } { ∨h , ¬s } { ∨, ¬s } { ∨, ¬ } p = −∞
YES YES
NO
YES
NO
YES
NO
NO
p ∈ (−∞, 0) YES YES
NO
NO
NO
NO
NO
NO
p ∈ [0, +∞] YES YES
NO
YES YES
YES
YES
NO
The results on generalizations are summarized in Table 1. Each column of the table collects the results for a specific class of programs for the T -norms induced by the values of p on the rows. The symbol ¬s refers to the stratified negation, while ∨h refers to the head cycle free (HCF) disjunction. 2 For instance, the last column of the table refers to the (unstratified) negative (non-HCF) disjunctive programs. A box of the table contains the answer YES if the class of quantitative programs given by the corresponding column header is a generalization for the values of p given by the header of the corresponding row of the table, and NO otherwise. From the non-disjunctive programs class, for positive and stratified programs, every p ∈ R induces a quantitative extension of DLP. From the class of disjunctive programs, like in the non-disjunctive case, for positive and stratified programs there are values of p which induce quantitative extensions of DLP, but unlike in the non-disjunctive case, where p ∈ R, p is reduced to {−∞} ∪ [0, +∞] for the HCF case and to [0, +∞] for the non-HCF case. The generalizations of the HCF and non-HCF programs are not supported by other values of p. Thus, generalization is guaranteed in most cases where recursion through negation and disjunction is forbidden (stratified and HCF programs). This is a nice result because stratified HCF programs have a very clear and intuitive declarative meaning (while unstratification and recursion through disjunction can be confusing). Intuitively, the fact that a fragment QF of QDLP is not a generalization of the corresponding fragment F of DLP is due to (i) the disjunctive rules’ heads, and (ii) that some values of p induce T -conorms for which, when applied to a disjunction of atoms, it is not absolutely necessary that the certainty degree 2
The notion of Stratified Negation [1] and of Head Cycle Free Disjunction [4,5] are extended from traditional DLP to QDLP in a straightforward manner. Their formal definitions are given in Appendix A.
300
C. Mateis
interval of all atoms be [1, 1] or [0, 0] in order to derive [1, 1] as certainty degree interval for the disjunction. For these values of p, the quantitative version P 0 in QF of a program P in F has pure quantitative stable p-models in QDLP which clearly cannot be accepted as stable models in DLP for P. Only the T -conorms and not also the T -norms corresponding to these values of p were reasons for not obtaining generalizations of DLP.
7
Complexity Results
As for traditional DLP, four main decisional problems arise in the context of QDLP. In particular, given a quantitative program P and p ∈ R, 1. is a given quantitative interpretation I of P a p-model for P? (p-Model Checking) 2. is a given p-model M of P a stable p-model for P? (Stable p-Model Checking) 3. does there exist a stable p-model for P? (p-Consistency) 4. given an atom A ∈ BP and a certainty interval [x, y], does there exist a stable p-model M for P such that M (A) ≥ [x, y]? (Brave p-Reasoning) We have analyzed the complexity of the above decisional problems for the classes of QDLP which are generalizations of the corresponding DLP classes, the other fragments being of low interest from the practical point of view. The results for non-disjunctive and disjunctive quantitative programs are summarized in Table 2 and 3, respectively. A box in the tables contains the complexity of the decisional problem given by the corresponding column header for the fragment of QDLP given by the corresponding row header. Table 2. Complexity of non-disjunctive QDLP Fragments for p ∈ R p-Model Checking Stable p-Model Checking p-Consistency Brave p-Reasoning
{}
P
P
Ensured
P
{ ¬s }
P
P
Ensured
P
The results in Table 2 for the non-disjunctive fragments are valid for p ∈ R. For both positive and stratified classes, all decisional QDLP problems, apart from p-Consistency which is O(1), are polynomial. Determining precisely the complexity of disjunctive QDLP is much more difficult. In this paper, we have concentrated our attention on the QDLP fragments relative to the T -norm T3 (p = +∞). This T -norm is of particular interest, as it is the norm for which QDLP generalizes also the quantitative language of van Emden (see section 6.1). The results for the disjunctive QDLP are shown in Table 3. The first column reports about the complexity of p-Model Checking for the various disjunctive fragments of QDLP. In all cases the complexity is polynomial.
Extending Disjunctive Logic Programming by T -norms
301
Table 3. Complexity of QDLP Fragments for p = +∞ (T -norm T3 ) p-Model Checking Stable p-Model Checking p-Consistency Brave p-Reasoning
{ ∨h }
P
P
Ensured
NP-complete
{∨}
P
coNP-complete
Ensured
Σ2P -complete
{ ∨ h , ¬s }
P
P
Ensured
NP-complete
P
coNP-complete
Ensured
Σ2P -complete
s
{ ∨, ¬ }
The 2nd column reports about the complexity of Stable p-Model Checking. The “hardest” QDLP fragments for this problem are proved to be the classes of positive and stratified negative (non-HCF) disjunctive programs whose complexity is coNP-complete. In the other two considered cases the complexity is polynomial. The 3rd column reports about the complexity of p-Consistency. In all considered cases the complexity is O(1) because the existence of a stable p-model is ensured. Finally, the 4th column reports about the complexity of Brave p-Reasoning. We can note an increment of complexity from NP-complete for the HCF case to Σ2P -complete for the non-HCF case. Note that the here considered classes of QDLP with stratified negation do not increase the complexity of any of the four decisional problems w.r.t. the corresponding positive classes. This result is shown in Table 3 by the rows pairs (1, 3) and (2, 4) which store the same complexity results in all columns. Remark that our results for QDLP coincide precisely with the results for DLP obtained by Eiter et al. in [13,14], i.e., reasoning under multiple-valued logics is more general but not harder than reasoning under boolean logics. That is, uncertainty comes for free!
Acknowledgments I am very grateful to Georg Gottlob and Nicola Leone for their useful criticism and numerous fruitful discussions on the manuscript.
References 1. K.R. Apt, H.A. Blair, and A. Walker. Towards a Theory of Declarative Knowledge. Foundations of Deductive Databases and Logic Programming, Minker, J. (ed.), Morgan Kaufmann, Los Altos, 1987. 2. F. Bacchus. Representing and Reasoning with Probabilistic Knowledge. Research Report CS-88-31, University of Waterloo, 1988. 3. Baral, C., Gelfond, M. Logic Programming and Knowledge Representation. Journal of Logic Programming, Vol. 19/20, May/July, pp. 73–148, 1994.
302
C. Mateis
4. R. Ben-Eliyahu and R. Dechter. Propositional Semantics for Disjunctive Logic Programs. Annals of Mathematics and Artificial Intelligence, 12:53–87, 1994. 5. R. Ben-Eliyahu and L. Palopoli. Reasoning with Minimal Models: Efficient Algorithms and Applications. In Proc. KR-94, pp. 39–50, 1994. 6. H.A. Blair and V.S. Subrahmanian. Paraconsistent Logic Programming. Theoretical Computer Science, 68, pp. 35–54, 1987. 7. P. Bonissone. Summarizing and Propagating Uncertain Information with Triangular Norms. International Journal of Approximate Reasoning, 1:71–101,1987. 8. P.R. Cohen and M.R. Grinberg. A Framework for Heuristic Reasoning about Uncertainty. In Proc. IJCAI ’83, pp. 355–357, Karlsruhe, Germany, 1983. 9. P.R. Cohen and M.R. Grinberg. A Theory of Heuristics Reasoning about Uncertainty. AI Magazine, 4(2):17–23, 1983. 10. A.P. Dempster. A Generalization of Bayesian Inference. J. of the Royal Statistical Society, Series B, 30, pp. 205–247, 1968. 11. J. Dix. Semantics of Logic Programs: Their Intuitions and Formal Properties. An Overview. In Logic, Action and Information, pp. 241–329. DeGruyter, 1995. 12. J. Doyle. Methodological Simplicity in Expert System Construction: the Case of Judgements and Reasoned Assumptions. AI Magazine, 4(2):39–43, 1983. 13. T. Eiter, G. Gottlob, and H. Mannila. Disjunctive Datalog. ACM Transaction on Database System, 22(3):364–417, September 1997. 14. T. Eiter and G. Gottlob. On the Computational Cost of Disjunctive Logic Programming: Propositional Case. Annals of Mathematics and Artificial Intelligence, 15(3/4):289–323, 1995. 15. M.C. Fitting. Bilattices and the Semantics of Logic Programming. J. Logic Programming, 11, pp. 91–116, 1991. 16. M. Gelfond and V. Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365–385, 1991. 17. M. Ishizuka. Inference Methods Based on Extended Dempster-Shafer Theory for Problems with Uncertainty/Fuzziness. New Generation Computing, 1, 2, pp. 159– 168, 1983. 18. M. Kifer and A. Li. On the Semantics of Rule-Based Expert Systems with Uncertainty. In 2-nd International Conference on Database Theory, Springer Verlag LNCS 326, pp. 102–117, 1988. 19. M. Kifer and V.S. Subrahmanian. Theory of the Generalized Annotated Logic Programming and its Applications. J. Logic Programming, 12, pp. 335–367, 1992. 20. L.V.S. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. ProbView: A Flexible Probabilistic Database System. ACM Transaction on Database Systems, 22, 3, pp. 419–469, 1997. 21. J.W. Lloyd. Foundations of Logic Programming. Springer-Verlag, 1987. 22. J. Lobo, J. Minker, and A. Rajasekar. Foundations of Disjunctive Logic Programming. MIT Press, Cambridge, MA, 1992. 23. C. Mateis. A Quantitative Extension of Disjunctive Logic Programming. Technical Report, available on the web as: http://www.dbai.tuwien.ac.at/staff/mateis/gz/qdlp.ps. 24. R.T. Ng and V.S. Subrahmanian. Probabilistic Logic Programming. Information and Computation, 101:150–201, 1992. 25. R.T. Ng and V.S. Subrahmanian. Empirical Probabilities in Monadic Deductive Databases. In Proc. Eighth Conf. Uncertainty in AI, pp. 215–222, Stanford, 1992. 26. R.T. Ng and V.S. Subrahmanian. A Semantical Framework for Supporting Subjective and Conditional Probabilities in Deductive Databases. J. of Automated Reasoning, 10, 2, pp. 191–235, 1993.
Extending Disjunctive Logic Programming by T -norms
303
27. R.T. Ng and V.S. Subrahmanian. Stable Semantics for Probabilistic Deductive Databases. Information and Computation, 110:42–83, 1994. 28. R.T. Ng and V.S. Subrahmanian. Non-monotonic Negation in Probabilistic Deductive Databases. In Proc. 7-th Conf. Uncertainty in AI, pp. 249–256, Los Angeles, 1991. 29. N.J. Nilsson. Probabilistic Logic. Artificial Intelligence, vol. 28, pp. 71–87, 1986. 30. J. Pearl. Probabilistic Reasoning in Intelligent Systems – Networks of Plausible Inference. Morgan Kaufmann, 1988. 31. B. Schweizer and A. Sklar. Associative Functions and Abstract Semi-Groups. Publicationes Mathematicae Debrecen, 10:69–81, 1963. 32. G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976. 33. E. Shapiro. Logic Programs with Uncertainties: A Tool for Implementing Expert Systems. In Proc. IJCAI ’83, pp. 529–532, 1983. 34. M.H. van Emden. Quantitative Deduction and its Fixpoint Theory. The Journal of Logic Programming, 1:37–53, 1986. 35. L.A. Zadeh. Fuzzy Sets. Inform. and Control, 8:338–353, 1965.
A
Stratified and Head Cycle Free QDLP
The stratified and head-cycle-free (HCF) quantitative programs are important classes of the quantitative programs and, as we will see, they have nice properties. The stratified quantitative programs are defined in the classical way, as introduced by Apt et al. in [1]. A quantitative program P is (locally) stratified iff it is possible to partition the set of its atoms into strata hS1 , . . . , Sr i, such that [x,y]
for every rule h1 ∨ · · · ∨ hk ←− b1 , · · · , bl , ¬bl+1 , · · · , ¬bl+m in P the following holds, (i) Strat(a) = i iff a ∈ Si and (ii) Strat(bs ) ≤ Strat(ht ) for all 1 ≤ s ≤ l, 1 ≤ t ≤ k and (iii) Strat(bs ) < Strat(ht ) for all l + 1 ≤ s ≤ l + m, 1 ≤ t ≤ k. Note that if P is stratified then there is a partition P = P1 ∪ . . . ∪ Pr , where r is the number of strata and Pi contains the rules of P defining the atoms of Si , 1 ≤ i ≤ r. In the sequel, if a negative program is not explicitly said to be stratified, it is assumed to be unstratified. Example 5. Consider the program P consisting of the following rules: [0.6,0.6]
[0.4,0.4]
a ∨ b ←− ¬ c.
a ←− .
e ←− b, ¬ d.
c ∨ d ←− .
[0.5,0.5]
[0.8,0.8]
P is stratified. A partition of BP into strata is hS1 , S2 i with S1 = {c, d} and S2 = {a, b, e}. The partition of P corresponding to the partition of BP is P = P1 ∪ P2 with P1 = {c ∨ d [0.5,0.5]
¬ c; e ←− b, ¬ d}.
[0.8,0.8]
←− } and P2 = {a
[0.4,0.4]
←− ; a ∨ b
[0.6,0.6]
←−
2
304
C. Mateis
At every program P, we associate a directed graph DGP = (N, E), called the dependency graph of P, in which (i) each predicate of P is a node in N , and (ii) there is an arc in E directed from a node a to a node b iff there is a rule in P such that b and a are the predicates of a positive literal appearing in H(r) and B(r), respectively. DGP singles out the dependencies of the head predicates of a rule r from the positive predicates in its body. 3 Example 6. Consider the program P1 consisting of the following rules: [0.6,0.6]
[0.6,0.6]
a ∨ b ←− .
4
[0.6,0.6]
c ←− a.
c ←− b.
DGP1 is depicted in Figure 2a. (Note that, since the sample program is propositional, the nodes of the graph are atoms, as atoms coincide with predicates in this case.) Consider now the program P2 , obtained by adding to P1 the rules [0.8,0.8]
[0.4,0.4]
d ∨ e ←− a.
[0.5,0.5]
d ←− e.
e ←− d, ¬ b.
The dependency graph DGP2 is shown in Figure 2b.
c
a
e
b
d
2
c
a
(a)
b (b)
Fig. 2. Dependency Graph (DGP )
The HCF quantitative programs are an important class of the quantitative programs with disjunction in the head and are defined in the classical way, as defined in [4,5]. A program P is HCF iff there is no clause r in P such that two predicates occurring in the head of r are in the same cycle of DGP . In the sequel, if a disjunctive program is not explicitly said to be HCF, it is assumed to be nonHCF. Example 7. The dependency graphs given in Figure 2 reveal that program P1 [0.8,0.8]
of Example 6 is HCF and that P2 is not HCF, as rule d ∨ e ←− a contains in 2 its head two predicates belonging to the same cycle of DGP2 . 3 4
Note that negative literals cause no arc in DGP . We point out again that we use propositional programs for simplicity, but the results are valid for the general case of (function-free) programs with variables.
Extending the Stable Model Semantics with More Expressive Rules Patrik Simons? Department of Computer Science and Engineering Helsinki University of Technology, FIN-02015 HUT, Finland [email protected], http://www.tcs.hut.fi/˜psimons Abstract. The rules associated with propositional logic programs and the stable model semantics are not expressive enough to let one write concise programs. This problem is alleviated by introducing some new types of propositional rules. Together with a decision procedure that has been used as a base for an efficient implementation, the new rules supplant the standard ones in practical applications of the stable model semantics.
1
Introduction
Logic programming with the stable model semantics has emerged as a viable method for solving constraint satisfaction problems [4,5]. The state-of-the-art system smodels [6] can often handle non-stratified programs with tens of thousands of rules. However, propositional logic programs can not compactly encode several types of constraints. For example, expressing the subsets of size k of an n-sized set as stable models requires on the order of nk rules. In order to remedy this problem, we improve upon the techniques of smodels, by extending the semantics with some new types of propositional rules: – choice rules for encoding subsets of a set, – constraint rules for enforcing cardinality limits on the subsets, and – weight rules for writing inequalities over weighted linear sums. The extended semantics is not based on subset-minimal models as is the case for disjunctive logic programs. For instance, the choice rule is more of a generalization of the disjunctive rule of the possible model semantics [7]. A system that computes the stable models of programs containing the new rules has been implemented [9], and it has successfully been applied to deadlock and reachability problems in a class of Petri nets [3]. Other problem domains, such as planning and configuration, will benefit by the improved rules as well. The system is based on smodels 1.10 from which it evolved. The new rules and the stable model semantics are introduced in Section 2. A decision procedure for the extended syntax is presented in Section 3, and some important implementation details are described in Section 4. Experimental results are found in Section 5. ?
The financial support of the Academy of Finland (project nr 43963) and the Helsinki Graduate School in Computer Science and Engineering is gratefully acknowledged.
M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 305–316, 1999. c Springer-Verlag Berlin Heidelberg 1999
306
2
P. Simons
The Stable Model Semantics
Let Atoms be a set of primitive propositions, or atoms, and consider logic programs consisting of rules of the form h ← a1 , . . . , an , not b1 , . . . , not bm , where the head h and the atoms a1 , . . . , an , b1 , . . . , bm in the body are members of Atoms. Call the expression not b a not-atom — atoms and not-atoms are referred to as literals. The stable model semantics for a logic program P is defined as follows [2]. The reduct P A of P with respect to the set of atoms A is obtained by 1. deleting each rule in P that has a not-atom not x in its body such that x ∈ A, and by 2. deleting all not-atoms in the remaining rules. Definition 1. A set of atoms S is a stable model of P if and only if S is the deductive closure of P S when the rules in P S are seen as inference rules. In order to facilitate the definition of more general forms of rules, we introduce an equivalent characterization of the stable model semantics. Proposition 1. We say that fP : 2Atoms → 2Atoms is a closure if fP (S) = {h | h ← a1 , . . . , an , not b1 , . . . , not bm ∈ P, a1 , . . . , an ∈ fP (S), b1 , . . . , bm 6∈ S}. Let gP (S) =
\
{fP (S) | fP : 2Atoms → 2Atoms is a closure}.
Then, S is a stable model of the program P if and only if S = gP (S). Proof. Note that the deductive closure of the reduct P S is a closure, and note that for every fP that is a closure, the deductive closure of P S is a subset of fP (S). A stable model is therefore a model that follows from itself by means of the smallest possible closure. In other words, a stable model is a supported model, and this is the essence of the semantics. Definition 2. A basic rule r is of the form h ← a1 , . . . , an , not b1 , . . . , not bm and is interpreted by the function fr : 2Atoms × 2Atoms → 2Atoms as follows. fr (S, C) = {h | a1 , . . . , an ∈ C, b1 , . . . , bm 6∈ S}.
Extending the Stable Model Semantics with More Expressive Rules
307
The function fr produces the result of a deductive step when applied to a candidate stable model S and its consequences C. Definition 3. A constraint rule r is of the form h ← k {a1 , . . . , an , not b1 , . . . , not bm } and is interpreted by
fr (S, C) = h |{a1 , . . . , an } ∩ C| + |{b1 , . . . , bm } − S| ≥ k .
The constraint rule can be used for testing the cardinality of a set of atoms. The rule h1 ← 2 {a, b, c, d} states that h1 is true if at least 2 atoms in the set {a, b, c, d} are true. The rule h2 ← 1 {not a, not b, not c, not d}, on the other hand, states that h2 is true if at most 3 atoms in the set are true. Definition 4. A choice rule r is of the form {h1 , . . . , hk } ← a1 , . . . , an , not b1 , . . . , not bm and is interpreted by fr (S, C) = h h ∈ {h1 , . . . , hk } ∩ S, a1 , . . . , an ∈ C, b1 , . . . , bm 6∈ S . The choice rule is typically used when one wants to implement optional choices. The rule {a} ← b, not c declares that if b is true and c is false, then a is one or the other. Definition 5. Finally, a weight rule r is of the form h ← {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≥ w, for wai , wbi ≥ 0, and is interpreted by X X wai + wbi ≥ w}. fr (S, C) = {h | ai ∈C
bi 6∈S
The weight rule is a generalization of the constraint rule. If every literal in the body of a weight rule has weight 1, then the rule behaves precisely as a constraint rule. Definition 6. Let P be a set of rules. As before we say that fP : 2Atoms → 2Atoms is a closure if [ fr S, fP (S) , fP (S) = r∈P
and we define gP (S) =
\
{fP (S) | fP : 2Atoms → 2Atoms is a closure}.
Then, S is a stable model of the program P if and only if S = gP (S).
308
P. Simons
The motivation for defining constraint, choice, and weight rules is that they can be easily and efficiently implemented and that they are quite expressive. For example, the constraint rule h ← k {a1 , . . . , an , not b1 , . . . , not bm } replaces the program {h ← ai1 , . . . , aik1 , not bj1 , . . . , not bjk2 | k1 + k2 = k, 1 ≤ i1 < · · · < ik1 ≤ n, 1 ≤ j1 < · · · < jk2 ≤ m}, rules. which contains n+m k Thus, a constraint rule guarantees that if the sum of the number of atoms in its body that are in a stable model and the number of not-atoms in its body that are not is at least k, then the head is in the model. Similarly, if the body of a choice rule agrees with a stable model, then the rule motivates the inclusion of any number of atoms from its head. A weight rule h ← {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≥ w, in turn, will force the head to be a member of a stable model S if X X wai + wbi ≥ w. ai ∈S
bi 6∈S
Example 1. The stable models of the program {a1 , . . . , an } ← false ← {a1 = w1 , . . . , an = wn } ≥ w true ← {a1 = v1 , . . . , an = vn } ≥ v containing the atom true but not the atom false correspond to the ways one can pack a subset of a1 , . . . , an in a bin such that the total weight is less than w and the total value is at least v. The individual weights and values of the items are given by respectively w1 , . . . , wn and v1 , . . . , vn . Example 2. The satisfying assignments of the formula (a ∨ b ∨ ¬c) ∧ (¬a ∨ b ∨ ¬d) ∧ (¬b ∨ c ∨ d) correspond to the stable models of the program {a, b, c, d} ← false ← not a, not b, c false ← a, not b, d false ← b, not c, not d that do not contain false.
Extending the Stable Model Semantics with More Expressive Rules
3
309
The Decision Procedure
For an atom a, let not (a) = not a, and for a not-atom not a, let not (not a) = a. For a set of literals A, define not (A) = {not (a) | a ∈ A}. Let A+ = {a ∈ Atoms | a ∈ A} and let A− = {a ∈ Atoms | not a ∈ A}. Define Atoms(A) = A+ ∪ A− , and for a program P , define Atoms(P ) = Atoms(L), where L is the set of literals that appear in the program. A set of literals A is said to cover a set of atoms B if B ⊆ Atoms(A), and B is said to agree with A if A+ ⊆ B
and A− ⊆ Atoms − B.
Algorithm 1 displays a decision procedure for the stable model semantics. The function smodels(P, A) returns true whenever there is a stable model of P agreeing with A, and it relies on the three functions expand (P, A), conflict(P, A), and lookahead (P, A). Let A0 = expand (P, A). We assume that E1 A ⊆ A0 and that E2 every stable model of P that agrees with A also agrees with A0 . Moreover, we assume that the function conflict(P, A) satisfies the two conditions C1 if A covers Atoms(P ) and there is no stable model that agrees with A, then conflict(P, A) returns true, and C2 if conflict(P, A) returns true, then there is no stable model of P that agrees with A. In addition, lookahead (P, A) is expected to return literals not covered by A. Theorem 1. Let P be a set of rules and let A be a set of literals. Then, there is a stable model of P agreeing with A if and only if smodels(P, A) returns true. Let S be a stable model of P agreeing with the set of literals A. Then, fr (S, S) ⊆ S for r ∈ P , and we make the following observations. Let \ fr (C, C) min r (A) = A+ ⊆C A− ∩C=∅
be the inevitable consequences of A, and let [ fr (C, C) max r (A) = A+ ⊆C A− ∩C=∅
be the possible consequences of A. Then,
310
P. Simons
Algorithm 1 A decision procedure for the stable model semantics function smodels (P; A) 0 A := expand (P; A) if con ict (P; A0 ) then return false
else if
A
0
covers
return true
else
Atoms (P ) then
f 0 + is a stable modelg A
lookahead (P; A0 ) if smodels (P; A0 [ fxg) then :=
x
return true
else
return
end if end if.
smodels
P; A
0 [ fnot (x)g
function expand (P; A) repeat A
0 := A
A
:=
A
:=
Atleast (P; A) A [ fnot x j x 2 Atoms (P ) and x 62 Atmost (P; A)g
until A = A0 return
A.
function con ict (P; A) fPrecondition: A = expand (P; A)g if A+ \ A 6= ; then return true
else
return false
end if.
function lookahead (P; A) B := Atoms (P ) Atoms (A); B := B [ not (B ) while B 6= ; do Take any literal
x
2
B
0 := expand (P; A [ fxg) if con ict (P; A0 ) then A
return
else B
:=
x
B
A
0
end if end while return heuristic (P; A).
1. for all r ∈ P , S agrees with min r (A), 2. if there is an atom a such that for all r ∈ P , a 6∈ max r (A), then S agrees with {not a},
Extending the Stable Model Semantics with More Expressive Rules
311
3. if the atom a ∈ A, if there is only one r ∈ P for which a ∈ max r (A), and if there exists a literal x such that a 6∈ max r (A ∪ {x}), then S agrees with {not (x)}, and 4. if not a ∈ A and if there exists a literal x such that for some r ∈ P , a ∈ min r (A ∪ {x}), then S agrees with {not (x)}. The four statements help us deduce additional literals that are in agreement with S. Define Atleast(P, A) as the smallest set of literals containing A that can not be enlarged using 1–4 above, i.e., let Atleast(P, A) be the least fixed point of the operator f (B) = A ∪ B ∪ {a ∈ min r (B) | r ∈ P } ∪ {not a | a ∈ Atoms(P ) and for all r ∈ P , a 6∈ max r (B)} ∪ not (x) there exists a ∈ B such that a ∈ max r (B) (B ∪ {x}) for only one r ∈ P and a ∈ 6 max r ∪ not (x) there exists not a ∈ B and r ∈ P such that a ∈ min r (B ∪ {x}) . We conclude, Proposition 2. If the stable model S of P agrees with A, then S agrees with Atleast(P, A). Furthermore, we can bound the stable models from above. Proposition 3. For a choice rule r of the form {h1 , . . . , hk } ← a1 , . . . , an , not b1 , . . . , not bm , let fr0 (S, C) = h ∈ {h1 , . . . , hk } a1 , . . . , an ∈ C, b1 , . . . , bm 6∈ S , and for any other type of rule, let fr0 (S, C) = fr (S, C). Let S be a stable model of P that agrees with A. Define Atmost(P, A) as the least fixed point of f 0 (B) =
[
fr0 (A+ , B − A− ) − A− .
r∈P
Then, S ⊆ Atmost(P, A). It follows that expand (P, A) satisfies the conditions E1 and E2. The function conflict(P, A) obviously fulfills C2, and the next proposition shows that also C1 holds. Proposition 4. If A = expand (P, A) covers the set Atoms(P ) and A+ ∩ A− = ∅, then A+ is a stable model of P .
312
3.1
P. Simons
Looking Ahead and the Heuristic
Besides Atleast(P, A) and Atmost(P, A), there is a third way to prune the search space. If the stable model S agrees with A but not with A ∪ {x} for some literal x, then S agrees with A ∪ {not (x)}. One can therefore avoid futile choices if one looks ahead and tests whether A ∪ {x} gives rise to a conflict for some literal x. Since x0 ∈ expand (P, A ∪ {x}) implies expand (P, A ∪ {x0 }) ⊆ expand (P, A ∪ {x}) due to the monotonicity of Atleast(P, A) and Atmost(P, A), it is not even necessary to examine all literals not covered by A. That is, if we have tested x, then we do not have to test the literals in expand (P, A ∪ {x}). When looking ahead fails to find a literal that causes a conflict, one falls back on a heuristic. For a literal x, let Ap = expand (P, A ∪ {x}) and An = expand P, A ∪ {not (x)} . Assume that the search space is a full binary tree of height H, and let p = |Ap −A| and n = |An − A|. Then, 2H−p + 2H−n = 2H
2n + 2p 2p+n
is an upper bound on the size of the remaining search space. Minimizing this number is equal to minimizing log
2n + 2p = log(2n + 2p ) − (p + n). 2p+n
Since 2max(n,p) < 2n + 2p ≤ 2max(n,p)+1 is equivalent to max(n, p) < log(2n + 2p ) ≤ max(n, p) + 1 and − min(n, p) < log(2n + 2p ) − (p + n) ≤ 1 − min(n, p), it suffices to maximize min(n, p). If two different literals have equal minimums, then one chooses the one with the greater maximum, max(n, p).
Extending the Stable Model Semantics with More Expressive Rules
4
313
Implementation Details
The deductive closures Atleast(P, A) and Atmost(P, A) can both be implemented using two versions of a linear time algorithm of Dowling and Gallier [1]. The basic algorithm associates with each rule a counter that keeps track of how many literals in the body of a rule are not included in a partially computed closure. If a counter reaches zero, then the head of the corresponding rule is included in the closure. From the inclusion follows changes in other counters, and in this manner is membership in the closure propagated. We begin with basic rules of the form h ← a1 , . . . , an , not b1 , . . . , not bm . For every rule r we create a literal counter r.literal , which is used as above, and an inactivity counter r.inactive. If the set A is a partial closure, then the inactivity counter records the number of literals in the body of r that are in not (A). The counter r.inactive is therefore positive, and the rule r is inactive, if one can not now nor later use r to deduce its head. For every atom a we create a head counter a.head that holds the number of active rules with head a. Recall that a literal can be brought into Atleast(P, A) in four different ways. We handle the four cases with the help of the three counters. 1. If r.literal reaches zero, then the head of r is added to the closure. 2. If a.head reaches zero, then not a is added to the closure. 3. If a.head is equal to one and a is in the closure, then every literal in the body of the only active rule with head a is added to the closure. 4. Finally, if a is the head of r, if not a is in the closure, and if r.literal = 1 and r.inactive = 0, then there is precisely one literal x in the body of r that is not in the closure, and not (x) is added to the closure. Constraint rules and choice rules are easily incorporated into the same framework. Specifically, one does neither use the first nor the fourth case together with choice rules, and one does not compare the literal and inactivity counters of a constraint rule h ← k {a1 , . . . , an , not b1 , . . . , not bm } with zero but with m + n − k. A weight rule h ← {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≥ w, is managed using the upper and lower bound of the sum of the weights in its body. Given a set of literals A, the lower bound is X X wai + wbi ai ∈A+
bi ∈A−
and the upper bound is X ai 6∈A−
wai +
X bi 6∈A+
wbi .
314
P. Simons
If the upper bound is less than w, then the rule is inactive, and if the lower bound is at least w, then the head is in the closure. Notice that the implementation provides for incremental updates to the closure Atleast(P, A) as A changes. This is crucial for achieving a high performance. Since the function Atmost(P, A) is anti-monotonic, it will shrink as A grows. It is no good computing Atmost(P, A) anew each time A is modified. Instead all atoms that might not be in the newer and smaller closure are found using a variant of the basic algorithm. By inspecting these atoms it is possible to decide which ones must be in the closure, and then the basic algorithm can again be used to compute the final closure. A small example will make the method clear. Example 3. Suppose P is the program a←b b←a
a ← not c a ← not d,
and suppose A has changed from the empty set to {d}. Then, we have already computed Atmost(P, ∅) = {a, b}, and we want to find Atmost(P, A). If r is the rule a ← not d, then the counter of r is at first zero and then changes to one as d becomes a member of A. Therefore, we deduce that a is possibly not a part of the new closure. The basic algorithm proceeds to increment the counters of b ← a, removing b, and a ← b, where it stops. At this point the counter of the rule a ← not c is still zero, and we note that a must be part of the closure. Including a causes the counter of b ← a to decrease to zero. Consequently, b is added to the closure and the counter of a ← b is decremented. Since nothing more remains to be done, the final closure is {a, b}. One can argue, in this particular example, that a follows from the rule a ← not c and need not be removed in the first stage of the procedure. However, in general it is not possible to decide whether an atom is in the final closure by inspecting the rules of which it is a head. Notwithstanding, we can make improvements based upon this observation. For every atom a, create a source pointer whose mission is to point to the first rule that causes a to be included in the closure. During the portion of the computation when atoms are removed from the closure, we only remove atoms which are to be removed due to a rule in a source pointer. For if the rule in a source pointer does not justify the removal of an atom, then the atom is reentered into the closure in the second phase of the computation. In practice, this simple trick yields a substantial speedup of the computation of Atmost(P, A).
5
Experiments
We will search for sets of binary words of length n such that the Hamming distance between any two words is at least d. The size of the largest of these sets is denoted by A(n, d). For example, A(5, 3) = 4 and any 5-bit one-error-correcting code contains at most 4 words. One such code is {00000, 00111, 11001, 11110} =
Extending the Stable Model Semantics with More Expressive Rules
315
{0, 7, 25, 30}. Finding codes becomes very quickly very hard. For instance, it was only recently proved that A(10, 3) = 72 [10]. Construct a program that includes a rule wi ← not wj1 , . . . , not wjk for every word i = 0, . . . , 2n such that j1 , . . . , jk are the words whose distance to i is positive and less than d. Then, the stable models of the program are the maximal codes with Hamming distance d. Add the rule true ← m {w0 , . . . , w2n } and every model containing true is a code of size at least m. For the purpose of making the problem a bit more tractable, we only consider codes that include the zero word. The test results are tabulated below. The minimum, maximum, and average times are given in seconds and are calculated from ten runs on randomly shuffled instances of the program. All tests where run under Linux 2.2.6 on a 233MHz Pentium II with 128MB of memory. Problem Min Max Average A(5, 3) ≥ 4 0.01 0.02 0.02 A(5, 3) < 5 0.00 0.02 0.02 A(6, 3) ≥ 8 0.02 0.04 0.03 A(6, 3) < 9 0.16 0.18 0.17 A(7, 3) ≥ 16 0.14 14.19 6.77 A(7, 3) < 17 69.08 72.29 70.55 A(8, 3) ≥ 20 6.39 202.41 55.98 A(8, 3) < 21 > 1 week
6
Problem Min Max Average A(6, 5) ≥ 2 0.02 0.03 0.03 A(6, 5) < 3 0.02 0.03 0.02 A(7, 5) ≥ 2 0.05 0.07 0.06 A(7, 5) < 3 0.04 0.07 0.06 A(8, 5) ≥ 4 0.29 0.36 0.34 A(8, 5) < 5 2.64 2.75 2.71 A(9, 5) ≥ 6 3.18 8.71 4.81 A(9, 5) < 7 1127.03 1162.10 1145.85
Conclusion
We have presented some new and more expressive propositional rules for the stable model semantics. A decision procedure, which has been used as a base for an efficient implementation, has also been described. We note that the decision problem for the extended semantics is NP -complete, as a proposed stable model can be tested in polynomial time. Accordingly, the exponential worst case timecomplexity of the decision procedure comes as no surprise. The literals that smodels(P, A) can branch on are, in this paper, the literals that do not cover Atoms(P ) − Atoms(A). In previous work, for instance in Niemel¨a and Simons [6,8], the eligible literals have also been required to appear in the form of not-atoms in the program. This additional restriction can reduce the search space, and a similar requirement is, of course, also possible here. The question of which literals one necessarily must consider as branch points is left to future research.
316
P. Simons
References 1. W.F. Dowling and J.H. Gallier. Linear-time algorithms for testing the satisfiability of propositional Horn formulae. Journal of Logic Programming, 3:267–284, 1984. 2. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proceedings of the 5th International Conference on Logic Programming, pages 1070–1080, Seattle, USA, August 1988. The MIT Press. 3. K. Heljanko. Using logic programs with stable model semantics to solve deadlock and reachability problems for 1-safe petri nets. In Tools and Algorithms for the Construction and Analysis of Systems, volume 1579 of Lecture Notes in Computer Science, pages 240–254, Amsterdam, The Netherlands, March 1999. SpringerVerlag. 4. V.W. Marek and M. Truszczy´ nski. Stable models and an alternative logic programming paradigm. The Computing Research Repository, http://xxx.lanl.gov/ archive/cs/, September 1998. cs.LO/9809032. 5. I. Niemel¨ a. Logic programs with stable model semantics as a constraint programming paradigm. In Proceedings of the Workshop on Computational Aspects of Nonmonotonic Reasoning, pages 72–79. Research Report A52, Helsinki University of Technology, May 1998. 6. I. Niemel¨ a and P. Simons. Efficient implementation of the well-founded and stable model semantics. In Proceedings of the 1996 Joint International Conference and Symposium on Logic Programming, pages 289–303, Bonn, Germany, September 1996. The MIT Press. 7. C. Sakama and K. Inoue. An alternative approach to the semantics of disjunctive logic programs and deductive databases. Journal of Automated Reasoning, 13:145– 172, 1994. 8. P. Simons. Towards constraint satisfaction through logic programs and the stable model semantics. Research Report A47, Helsinki University of Technology, August 1997. 9. P. Simons. Smodels 2.10. http://www.tcs.hut.fi/pub/smodels/, 1999. A system for computing the stable models of logic programs. ¨ 10. P. Osterg˚ ard, T. Baicheva, and E. Kolev. Optimal binary one-error-correcting codes of length 10 have 72 codewords. IEEE Transactions on Information Theory, 45(4):1229–1231, May 1999.
Stable Model Semantics of Weight Constraint Rules Ilkka Niemel¨a1 , Patrik Simons1 , and Timo Soininen2 1 Helsinki University of Technology, Dept. of Computer Science and Eng., Laboratory for Theoretical Computer Science, P.O.Box 5400, FIN-02015 HUT, Finland {Patrik.Simons,Ilkka.Niemela}@hut.fi 2 Helsinki University of Technology, TAI Research Center and Lab. of Information Processing Science, P.O.Box 9555, FIN-02015 HUT, Finland [email protected]
Abstract. A generalization of logic program rules is proposed where rules are built from weight constraints with type information for each predicate instead of simple literals. These kinds of constraints are useful for concisely representing different kinds of choices as well as cardinality, cost and resource constraints in combinatorial problems such as product configuration. A declarative semantics for the rules is presented which generalizes the stable model semantics of normal logic programs. It is shown that for ground rules the complexity of the relevant decision problems stays in NP. The first implementation of the language handles a decidable subset where function symbols are not allowed. It is based on a new procedure for computing stable models for ground rules extending normal programs with choice and weight constructs and a compilation technique where a weight rule with variables is transformed to a set of such simpler ground rules.
1
Introduction
The implementation techniques for normal logic programs with the stable model semantics have advanced considerably during the last years. The performance of their state of the art implementations, e.g. the smodels system [12,13], is approaching the level needed in realistic applications. Recently, logic program rules with the stable model semantics have also been proposed as a methodology for expressing constraints capturing for example combinatorial, graph and planning problems, see, e.g., [9,11]. This indicates that interesting applications can be handled using normal programs and stable models. However, there are important aspects of combinatorial problems which do not seem to have a compact representation using normal rules. We explain these difficulties by first introducing the basic ideas behind the methodology of using rules for problem solving [9,11]. Then we examine a number of examples involving cardinality, cost and resource M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 317–331, 1999. c Springer-Verlag Berlin Heidelberg 1999
318
I. Niemel¨ a, P. Simons, and T. Soininen
constraints which are difficult to express using normal programs, i.e., programs consisting of rules without disjunction but with default negation in the body. On the basis of the examples we present an extension of normal rules where a generalized notion of cardinality constraints is used and which is suitable for handling choices with cardinality, cost and resource constraints in the examples. When solving, e.g., a combinatorial problem using the stable model semantics the idea is to write a program such that the stable models of the program correspond to the solutions to the problem [9,11]. As an example consider the 3-coloring problem where given a graph, we can build a program where for each vertex v in the graph we take the three rules on the left and for each edge (v, u) the three rules on the right v(1) ← not v(2), not v(3) v(2) ← not v(1), not v(3) v(3) ← not v(1), not v(2)
← v(1), u(1) ← v(2), u(2) ← v(3), u(3)
Now a stable model of the program, which is a set of atoms of the form v(n), gives a legal coloring of the graph where a node v is colored with the color n iff v(n) is included in the stable model. These kinds of logic programming codings of different kinds of combinatorial, constraint satisfaction and planning problems can be found, e.g., in [9,11]. The encodings demonstrate nicely the expressivity of normal programs. However, there are a number of conditions which are hard to capture using normal programs. For example, in the product configuration domain [14] choices with cardinality, cost and resource constraints need to be handled. Next we consider some motivating examples demonstrating the difficulties and show that extending normal rules by a suitable notion of cardinality constraints is an interesting approach to handling the problems. By a cardinality constraint we mean an expression written in the form L ≤ {a1 , . . . , an , not b1 , . . . , not bm } ≤ U .
(1)
The intuitive idea is that such a constraint is satisfied by any model (a set of atoms) where the cardinality of the subset of the literals satisfied by the model is between the integers L and U . For example, the cardinality constraint 1 ≤ {a, not b, not c} ≤ 2 is satisfied by the model {a, b} but not by {a}. These kinds of cardinality constraints are useful in a number of settings and rules extended with such constraints can be used to express different kinds of choices and cardinality restrictions. For example, vertex covers of size less than K could be captured in the following way. For a given graph, we build a program by including for each edge (v, u) a rule 1 ≤ {v, u} ← and then adding an integrity constraint ← K ≤ {v1 , . . . , vn } where {v1 , . . . , vn } is the set of vertices in the graph. The first rule expresses a choice saying that at least one end point for each edge should be selected and
Stable Model Semantics of Weight Constraint Rules
319
the second rule states a cardinality restriction saying that the cover must have size less than K. Now stable models of the program directly represent vertex covers of the graph. It seems that the choice rule cannot be expressed by normal rules without introducing additional atoms in the program and that there are no compact encodings of the cardinality restriction using normal rules. For applications it is important to be able to work with first-order rules having variables. Hence, this kind of a cardinality constraint needs to be generalized to the first-order case where the set on which the constraint is imposed could be given compactly using expressions with variables. Consider, e.g., the problem of capturing cliques in a graph which is given by two relations vertex and edge, i.e., two sets of ground facts vertex (v) and edge(v, u) specifying the vertices and edges of the graph, respectively. The idea is to define the set of ground atoms in the constraint by attaching conditions to non-ground literals which are local to each constraint, i.e., using conditional literals, for example, in the following way: 0 ≤ {clique(X) : vertex (X)} ←
(2)
where the set of atoms in the constraint consists of those instances of clique(v) for which vertex (v) holds. Such a rule chooses a subset of vertices and cliques. Cliques, i.e., subsets of vertices where each pair of vertices is connected by an edge, can be captured by including the rule ← clique(X), clique(Y ), not (X = Y ), not edge(X, Y ) . It is also useful to allow both local and global variables in a rule. The scope of a local variable is one constraint, as for the variable X in (2), but the scope of a global variable is the whole rule. The first of the following rules capturing the colorings of a graph demonstrates the usefulness of this distinction. 1 ≤ {colored (V, C) : color (C)} ≤ 1 ← vertex (V )
(3)
← edge(V, U ), colored (V, C), colored (U, C)
(4)
Here V is a global variable in the first rule stating the requirement that for each vertex v exactly one instance of colored (v, c) should be chosen such that color (c) holds for the term c. The set of facts color (c) provides the available colors. As the examples show cardinality constraints are quite expressive and useful in practice. However, in for instance product configuration [14] applications there are conditions which are hard to capture even using cardinality constraints. One important class is resource or cost constraints. A typical example of these is the knapsack problem where the task is to choose a set of items ij each having a weight wj and value vj such that the sum of the weights of the chosen items does not exceed a given limit W but the sum of the values exceeds a given limit V. It turns out that these kinds of constraints can be captured by generalizing cardinality constraints in a suitable way which becomes obvious by noticing that a cardinality constraint of the form (1) can be seen as a linear inequality L ≤ a1 + · · · + an + b1 + · · · + bm ≤ U
320
I. Niemel¨ a, P. Simons, and T. Soininen
where ai , bj are variables with values 0 or 1 such that x+x = 1 for all variables x. We can generalize this by allowing a real-valued coefficient for each variable, i.e., a weight for each atom in the cardinality constraint. Hence we are considering constraints of the form L ≤ {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≤ U
(5)
where, e.g., wa1 is a real-valued weight for the atom a1 . The idea is that a stable model satisfies the constraint if the sum of the weights of the literals satisfied by the model is between L and U . For example, 1.02 ≤ {a = 1.0, b = 0.02, not c = 0.04} ≤ 1.03 is satisfied by {a, b, c} but not by {a}. Hence, a weight constraint of the form (5) corresponds to a linear inequality L ≤ wa1 × a1 + · · · + wan × an + wb1 × b1 + · · · + wbm × bm ≤ U
(6)
Using weight constraints the knapsack problem can be captured using the following rules: 0 ≤ {i1 = w1 , . . . , in = wn } ≤ W ← ← {i1 = v1 , . . . , in = vn } ≤ V In the light of the examples it seems that weight constraints provide an expressive and uniform framework for handling large classes of combinatorial problems. In this paper we present a novel rule language which extends normal rules by taking weight constraints as the basic building blocks of the rules. Hence, the extended rules which we call weight rules are of the form C0 ← C1 , . . . , Cn .
(7)
Here each Ci is a weight constraint L ≤ {a1 : c1 = w1 , . . . , an : cn = wn , not an+1 : cn+1 = wn+1 , . . . , not am : cm = wm } ≤ U
(8)
where ai , ci are atomic formulae possibly containing variables. These kinds of constraints are a first-order generalization of weight constraints of the form (5). The weight rules are given a declarative nonmonotonic semantics that extends the stable model semantics of normal logic programs [4] and generalizes the propositional choice rules presented in [16] to the first-order case where type information and weight constraints can be used. Unlike the approaches based on associating priorities, preferences, costs, probabilities or certainty factors to rules (see e.g. [1,8,10,6] and the references there), our aim is to provide a relatively simple way of associating weights or costs to atoms and representing constraints using the weights. Approaches such as NP-SPEC [3], constraint logic programs (CLP) and constraint satisfaction problems are not based on stable model semantics like ours and thus do not include default negation. In addition, our semantics treats the constraints, rules and choices uniformly unlike
Stable Model Semantics of Weight Constraint Rules
321
the CLP and NP-SPEC approaches. There is also some related work based on stable models. For example, in [2] priorities are added to integrity constraints. However, this is done to express weak constraints, as many of which as possible should be satisfied, and not weight constraints which must all be satisfied. In [5] several types of aggregates are integrated to Datalog in a framework based on stable models in order to express dynamic programming optimization problems. This contrasts with our approach which is not primarily intended to capture optimization. In addition, their approach covers only the subclass of programs with stratified negation and choice constructs. Our approach also differs from the main semantics of disjunctive logic programs in that they are based on subset minimal choices through disjunction while we support a general notion of cardinality constraints. The computational complexity of the decision problem for the language is analyzed and found to remain in NP for ground rules. The first implementation of the language handles a decidable subset of weight rules where function symbols are not allowed. Although the semantics of the language is based on real-valued weights, the implementation handles only integer weights in order to avoid problems arising from finite precision of real number arithmetic. The implementation is based on the smodels-2 procedure [15] which is a new extended version of the smodels procedure [12,13]. It computes stable models for ground logic programs but supports several types of rules extending normal logic programs. Our language extends that handled by smodels-2 further: it is first-order with conditional literals, variables, and built-in functions; both upper and lower bounds of a constraint can be given and a weight constraint is allowed also in the head of a rule. However, we show that it is possible to translate a set of weight rules containing variables to a set of simple ground rules supported by smodels-2. This provides the basis for our implementation.
2
Weight Constraint Rules
We extend logic program rules by allowing weight constraints of the type (8) with conditional literals that have real-valued weights. First we develop a semantics for ground rules and then we show how to generalize this to rules with variables. 2.1
Ground Rules
The basic building block of a weight constraint is a conditional atom which is an expression of the form p : q where the proper part p and conditional part q are atomic formulae. In ground rules formulae p and q are variable-free (ground) atoms. If q is >, i.e., always valid, it is typically omitted. A conditional literal is a conditional atom or its negation, an expression of the form not p : q. Note that the not is intended as a nonmonotonic, default negation. A weight constraint C is an expression of the form: l(C) ≤ lit(C) ≤ u(C)
322
I. Niemel¨ a, P. Simons, and T. Soininen
where lit(C) is a set of conditional literals and l(C), u(C) two real numbers denoting the lower and upper bounds, respectively. The bounds l(C), u(C) can also be missing in which case we denote them by l(C) = −∞, u(C) = ∞, respectively. To each constraint C we associate a local weight function w(C) from the set of literals in C to the real numbers, typically specified directly as in the constraint for C below: 2.1 ≤ {p : d1 = 1.1, not q : d2 = 1.0001} where, e.g., w(C)(not q) = 1.0001 and u(C) = ∞. The extension to allow < in the constraints is straightforward but for brevity we discuss only ≤. Finally, a weight program is a set of weight rules, i.e., expressions of the form (7) where each Ci is a weight constraint and where the head C0 contains no negative literals. Our semantics for weight rules generalizes the stable model semantics for normal logic programs and is given in terms of models that are sets of atoms. First we define when a model satisfies a rule and then using this concept the notion of stable models. Definition 1. A set of atoms S satisfies a weight constraint C (S |= C) iff for the weight W(C, S) of C in S, l(C) ≤ W(C, S) ≤ u(C) holds where X X w(C)(p) + w(C)(not p) W(C, S) = p∈plit(C,S)
not p∈nlit(C,S)
with plit(C, S) = {p | p : q ∈ lit(C), {p, q} ⊆ S} and nlit(C, S) = {not p | not p : q ∈ lit(C), p 6∈ S, q ∈ S} which are the positive and negative literals satisfied by S, respectively. A rule r of the form (7) is satisfied by S (S |= r) iff S satisfies C0 whenever it satisfies C1 , . . . , Cn . We also allow integrity constraints, i.e., rules without the head constraint C0 , which are satisfied if at least one of the body constraints C1 , . . . , Cn is not. Example 1. Consider the weight constraints C1 : 2 ≤ {p : d1 = 1, not q : d1 = 2, r : d2 = 1.5} ≤ 5 C2 : 2 ≤ {p : d2 = 1, not q : d2 = 2, r : d1 = 1.5} ≤ 5 and a set of atoms S = {p, d1 , r}. Now plit(C1 , S) = {p} and nlit(C1 , S) = {not q} and, hence, W(C1 , S) = 1 + 2 = 3. Similarly, W(C2 , S) = 1.5. Thus, S |= C1 but S 6|= C2 and S |= C1 ← C2 but S 6|= C2 ← C1 . Moreover, S |=← C1 , C2 but S 6|=← C1 . We define stable models first for weight programs with non-negative weights. We then show how the general case, i.e., programs with negative weights reduce to this case. In the definition we need the notion of a deductive closure of rules in a special form P ← C1 , . . . , Cn
Stable Model Semantics of Weight Constraint Rules
323
where P is a ground atom and each weight constraint Ci contains only positive literals and non-negative weights, and has only a lower bound condition. We call such rules Horn weight rules. A set of atoms is closed under a set of rules if each rule is satisfied by the atom set. A set of Horn weight rules P has a unique smallest set of atoms closed under P . We call it the deductive closure and denote it by cl(P ). The uniqueness is implied by the fact that Horn weight rules are monotonic, i.e., if the body of a rule is satisfied by a model S, then it is satisfied by any superset of S. Note that the closure can be constructed iteratively by starting from the empty set of atoms and iterating over the set of rules and updating the set of atoms with the head of a rule not yet satisfied until no unsatisfied rules are left. Example 2. Consider a set of Horn weight rules P a ← 1 ≤ {a = 1} b ← 0 ≤ {b = 100} c ← 6 ≤ {b = 5, d = 1}, 2 ≤ {b = 2, a = 2} The deductive closure of P is the set of atoms {b} which can be constructed iteratively by starting from the empty set and realizing that the body of the second rule is satisfied by the empty set and, hence, b should be added to the closure. This set is already closed under the rules. If a rule d ← 1 ≤ {a = 1, b = 1, c = 1} is added, then the closure is {b, d, c}. Stable models for programs with non-negative weights are defined in the following way using the concept of a reduct. The idea is to define a stable model of a program P as an atom set S that satisfies all rules of P and that is the deductive closure of a reduct of P w.r.t. S. The role of the reduct is to provide the possible justifications for the atoms in S. Each atom in a stable model is justified by the program P in the sense that it is derivable from the reduct. We introduce the reduct in two steps. First we define the reduct of a constraint and then generalize this to rules. The reduct C S of a constraint C w.r.t. to a set of atoms S is the constraint L0 ≤ {p : q = w | p : q = w ∈ lit(C)} P where L0 = l(C) − not p∈nlit(C,S) w(C)(not p). Hence, in the reduct all negative literals and the upper bound are removed and the lower bound is decreased by w for each not p : q = w satisfied by S. The idea here is that for negative literals satisfied by S, their weights contribute to satisfying the lower bound. However, this does not yet capture the condition part of the negative literals satisfied by S. In order to guarantee that the conditions are justified by the program a set j(C, S) of justification constraints is used: j(C, S) = {1 ≤ {q = 1} | not p : q = w ∈ lit(C), p 6∈ S, q ∈ S}
324
I. Niemel¨ a, P. Simons, and T. Soininen
For example, for a constraint C: 3 ≤ {not p : q = 2, not r : p = 3, p : q = 1} ≤ 4 and a set S = {q} we get the reduct and justification constraint C S = 1 ≤ {p : q = 1}
j(C, S) = {1 ≤ {q = 1}}
The reduct P S for a program P w.r.t. a set of atoms S is a set of Horn weight rules which contains a rule r0 with an atom p as the head if p ∈ S and there is a rule r ∈ P such that p : q = w appears in the head with q ∈ S, and the upper bounds of the constraints in the body of r are satisfied by S. The condition q is moved to the body as q is the justification condition for p and the body of r0 is obtained by taking the reduct of the constraints in the body of r and adding the corresponding justification constraints. Formally the reduct is defined as follows. Definition 2. Let P be a weight program with non-negative weights and S a set of atoms. The reduct P S of P w.r.t. S is defined by P S = {p ← 1 ≤ {q = 1}, C1S , j(C1 , S), . . . , CnS , j(Cn , S) | C0 ← C1 , . . . , Cn ∈ P, p : q = w ∈ lit(C0 ), {p, q} ⊆ S, for all i = 1, . . . , n, W(Ci , S) ≤ u(Ci )} Definition 3. Let P be a weight program with non-negative weights. Then S is a stable model of P iff the following two conditions hold: (i) S |= P , (ii) S = cl(P S ). Example 3. Consider first program P1 demonstrating the role of justification constraints. 0 ≤ {p : p = 2} ≤ 2 ← 2 ≤ {p = 2} ≤ 2 ← 2 ≤ {not q : p = 3} The empty set is a stable model of P1 because it satisfies both rules and the reduct P1∅ = ∅ . For S = {p} the reduct P1S is p ← 1 ≤ {p = 1} p ← −1 ≤ {}, 1 ≤ {p = 1}, Now cl(P1S ) = {} implying that S is not a stable model although it satisfies P1 . Consider the program P2 2 ≤ {b = 2, c = 3} ≤ 4 ← 2 ≤ {not a = 2, b = 4} ≤ 5 The definition of stable models guarantees that atoms in a model must be justifiable by the program in terms of the reduct and thus, e.g., P2 cannot have a stable model containing a. The empty set is not a stable model as {} 6|= P2 . The same holds if S = {b} because the reduct P2S is empty since the upper bound in the body is exceeded. However, S = {c} is a stable model as S |= P2 and cl(P2S ) = {c} where P2S = {c ← 0 ≤ {b = 4}}. Note that as there are no conditional literals, no justification constraints are needed.
Stable Model Semantics of Weight Constraint Rules
325
Our definition is a generalization of the stable model semantics for normal programs as a simple literal l in a normal program can be seen as a shorthand for 1 ≤ {l = 1} ≤ 1. Thus, e.g., a normal rule a ← b, not c is a shorthand for 1 ≤ {a = 1} ≤ 1 ← 1 ≤ {b = 1} ≤ 1, 1 ≤ {not c = 1} ≤ 1 . The reduct of the rule w.r.t. S = {a, b} is a ← 1 ≤ {b = 1}, 0 ≤ {} whose closure is {} and, hence, S is not a stable model of the rule although it satisfies the rule. We use this abbreviation frequently and, furthermore, we often omit the weight of a literal if it is 1. Definition 3 does not cover constraints with negative weights. However, it turns out that these can be transformed to constraints with non-negative weights by simple linear algebraic manipulation which translates a constraint C L ≤ {a1 = wa1 , . . . , an = wan , not b1 = wb1 , . . . , not bm = wbm } ≤ U to an equivalent form C 0 with only non-negative weights L+
X
|wai | +
wai is irreflexive), and is used to represent priority information among rules. A defeasible theory T is a triple (F, R, >) where F is a finite set of literals (called facts), R a finite set of rules, and > a superiority relation on R. A conclusion of T is a tagged literal and can have one of the following four forms: – +∆q, which is intended to mean that q is definitely provable in T . – −∆q, which is intended to mean that we have proved that q is not definitely provable in T . – +∂q, which is intended to mean that q is defeasibly provable in T . – −∂q which is intended to mean that we have proved that q is not defeasibly provable in T . A derivation (or proof) in T = (F, R, >) is a finite sequence P = (P (1), . . . P (n)) of tagged literals satisfying the following conditions (P (1..i) denotes the initial part of the sequence P of length i): +∆: If P (i + 1) = +∆q then either q ∈ F or ∃r ∈ Rs [q] ∀a ∈ A(r) : +∆a ∈ P (1..i) −∆: If P (i + 1) = −∆q then q 6∈ F and ∀r ∈ Rs [q] ∃a ∈ A(r) : −∆a ∈ P (1..i) +∆ denotes forward chaining provability, and −∆q denotes its strong negation, that is, finite failure to prove definitely. +∂: If P (i + 1) = +∂q then either (1) +∆q ∈ P (1..i) or (2) ∃r ∈ Rsd [q] such that
A Comparison of Sceptical NAF-Free Logic Programming Approaches
349
(2.1) ∀a ∈ A(r) : +∂a ∈ P (1..i) and (2.2) −∆ ∼ q ∈ P (1..i) and (2.3) ∀s ∈ R[∼ q], either (2.3.1) ∃a ∈ A(s) : −∂a ∈ P (1..i) or (2.3.2) ∃t ∈ Rsd [q] such that ∀a ∈ A(t) : +∂a ∈ P (1..i) and t > s −∂: If P (i + 1) = −∂q then (1) −∆q ∈ P (1..i) and (2) (2.1) ∀r ∈ Rsd [q] ∃a ∈ A(r) : −∂a ∈ P (1..i) or (2.2) +∆ ∼ q ∈ P (1..i) or (2.3) ∃s ∈ R[∼ q] such that (2.3.1) ∀a ∈ A(s) : +∂a ∈ P (1..i) and (2.3.2) ∀t ∈ Rsd [q] either ∃a ∈ A(t) : −∂a ∈ P (1..i) or t 6> s We give a brief explanation of the +∂ rule. One way of proving q defeasibly is to prove q definitely. The other way requires us to find a rule with head q whose antecedents have already been proven defeasibly (2.1). In addition, we must consider and discard potential attacks against q: we must be sure that the negation of q is not definitely provable (2.2), and for every attack on q by a rule with a head ∼ q there must be a stronger (counterattacking) rule with head q 1 . The elements of a derivation are called lines of the derivation. We say that a tagged literal L is provable (or derivable) in T = (F, R, >), denoted T ` L, iff there is a derivation in T such that L is a line of a proof P . Even though the definition seems complicated, it follows ideas which are intuitively appealing. For an explanation of this definition see [11]. In the remainder of this paper we will only need to consider defeasible rules and a superiority relation; facts, strict rules and defeaters will not be necessary. We conclude this section with an example, adapted from [6]. r1 r2 r3 r4 r2 r4
: bird(X) ⇒ f ly(X) r5 : : penguin(X) ⇒ ¬f ly(X) f1 : : walkslikepeng(X) ⇒ penguin(X) f2 : : ¬f latf eet(X) ⇒ ¬penguin(X) f3 : > r1 > r3
penguin(X) ⇒ bird(X) bird(tweety) walkslikepeng(tweety) ¬f latf eet(tweety)
We can derive +∂¬penguin(tweety) because both rules r3 and r4 are applicable (with instantiation tweety) and r4 is stronger than r3 . For the same reason we can derive −∂penguin(tweety). The fact f1 allows us to derive +∆bird(tweety), thus also +∂bird(tweety). Therefore rule r1 (with instantiation tweety) is applicable. Moreover rule r2 , the 1
It should also be noted that defeaters are only used as potential attacks on conclusions, but are never used to support a conclusion (directly or in a counterattack). This treatment is consistent with the intuitive idea of a defeater as explained previously.
350
G. Antoniou
only possible way for proving ¬f ly(tweety), cannot be applied because we have already derived −∂penguin(tweety). Thus we can derive f ly(tweety).
3
LPwNF
In LPwNF [6], a logic program consist of a set of rules of the form p ← q1 , . . . , qn , where p, q1 , . . . , qn are literals, and an irreflexive and transitive priority relation > among rules. A proof theory and a corresponding argumentation framework were introduced in [6]. The main idea of LPwNF is the following: In order to prove a literal q, a type A derivation must be found which proves q. One part of this derivation is a top-level proof of q in the sense of logic programming (SLD-resolution). But additionally every attack on this argument must be counterattacked. Attacks are generated in type B derivations. For an A derivation to succeed all B derivations must fail. In general, a rule r in a type B derivation can attack a rule r0 in a type A derivation if they have complementary heads, and r is not weaker than r0 , that is, r0 6> r. On the other hand, a rule r in a type A derivation can attack a rule r0 in a type B derivation if they have complementary heads, and r > r0 . This reflects the notion of scepticism: it should be easier to attack a positive argument than to counterattack (i.e. attack the attacker). For example, consider the following program which is the same as the example in the previous section, but for variations of syntax. r1 r2 r3 r4 r2 r4
: f ly(X) ← bird(X) r5 : ¬f ly(X) ← penguin(X) r6 : penguin(X) ← walkslikepeng(X) r7 : ¬penguin(X) ← ¬f latf eet(X) r8 > r1 > r3
: : : :
bird(X) ← penguin(X) bird(tweety) ← walkslikepeng(tweety) ← ¬f latf eet(tweety) ←
Here it is possible to prove f ly(tweety). Firstly there is a standard SLD refutation (A derivation) of ← f ly(tweety) via the rules r1 and r6 . Additionally we need to consider all possible attacks on this refutation. In our case, r1 can be attacked by r2 . Thus we start a B derivation with goal ← ¬f ly(tweety) (with first rule r2 ), and have to show that this proof fails. This happens because the rule r3 is successfully counterattacked by r4 . There are no other attacks on the original derivation. The following figure illustrates how the reasoning proceeds. Below we give the formal definition. LPwNF can support either credulous or sceptical reasoning. Since in this paper we are interested in a comparison with defeasible logic, we will restrict ourselves to the sceptical case (as we have already done so far in this section). Also, our presentation is slightly simpler than that of [6]. The reason is that in their paper, Dimopoulos and Kakas showed the soundness of their proof theory w.r.t. an argumentation framework, and they had to make the definition of derivations more complicated to collect the appropriate rules which are used to build an appropriate argument. This is not our concern here, so we just focus on the derivation of formulae.
A Comparison of Sceptical NAF-Free Logic Programming Approaches
argument (A derivation)
attack (B derivation)
← fly(tweety)
← ¬ fly(tweety)
r1
r2
← bird(tweety)
← penguin(tweety)
← ¬ penguin(tweety)
r6
r3
r4
2
351
counter-attack (A derivation)
← walkslikepeng(tweety) ← ¬ flatfeet(tweety) r7 2
r8 2
Fig. 1. A derivation in LPwNF
A type A derivation from (G1 , r) to (Gn , r) is a sequence ((G1 , r), (G2 , r), . . . , (Gn , r), where r is a rule, and each Gi has the form ← q, Q, where q is the selected literal and Q a sequence of literals. For Gi , i ≥ 1, if there is a rule ri such that either 1. i = 1, ri > r, ri resolves with Gi on q, and there is a type B derivation from ({←∼ q}, ri ) to (∅, ri ), or 2. i > 1, ri resolves with Gi on q, and there is a type B derivation from ({←∼ q}, ri ) to (∅, ri ) then Gi+1 is the resolvent of ri with Gi . A type B derivation from (F1 , r) to (Fn , r) is a sequence (F1 , r), (F2 , r), . . . , (Fn , r), where every Fi is of the form Fi = {← q, Q} ∪ Fi0 , q the selected literal, and Fi+1 is constructed from Fi as follows: 1. For i = 1, F1 must have the form ← q. Let R be the set of rules ri which resolve with ← q, and which satisfy the condition ri 6< r. Let C be the set of resolvents of ← q with the rules in R. If [] 6∈ C then F2 = C; otherwise there is no F2 . 2. For i > 1, let R be the set of rules ri which resolve with ← q, Q on q. Let R0 be the subset of R containing all rules ri such that there is no A derivation from (←∼ q, ri ) to ([], ri ). Let C be the set of all resolvents of the rules in R0 with the rule ← q, Q, by resolving on q. If [] 6∈ C then Fi+1 = C ∪ Fi0 ; otherwise there is no Fi+1 .
352
4
G. Antoniou
A Comparison of LPwNF and Defeasible Logic
Given a logic program without negation as failure P , let T (P ) be the defeasible theory containing the same rules as P , written as defeasible rules, and the same superiority relation. In other words, rules in LPwNF are represented as defeasible rules in defeasible logic. First we show that every conclusion provable in LPwNF can be derived in defeasible logic. The proof goes by induction on the length of a derivation and is found in the full version of this paper. Theorem 1. Let q be a literal which can be sceptically proven in the logic program without negation as failure P , that is, there is a type A derivation from (← q, r) to ([], r) for some rule r. Then T (P ) ` +∂q. However the reverse of the theorem is not true. The reason is that LPwNF argues on the basis of individual rules, whereas defeasible logic argues on the basis of teams of rules with the same head. The difference can be illustrated by the following simple example. r1 r2 r3 r4 r1 r2
: monotreme(X) ⇒ mammal(X) : hasF ur(X) ⇒ mammal(X) : laysEggs(X) ⇒ ¬mammal(X) : hasBill(X) ⇒ ¬mammal(X) > r3 > r4
monotreme(platypus) hasF ur(platypus) laysEggs(platypus) hasBill(platypus)
Intuitively we conclude that platypus is a mammal because for every reason against this conclusion (r3 and r4 ) there is a stronger reason for mammal (platypus) (r1 and r2 respectively). It is easy to see that +∂mammal(platypus) is indeed provable in defeasible logic: there is a rule in support of mammal (platypus), and every rule for ¬mammal(platypus) is overridden by a rule for mammal(platypus). On the other hand, the corresponding logic program without negation as failure is unable to prove mammal(platypus): If we start with r1 , trying to build an A derivation, then we must counter the attack r4 (which is not inferior to r1 ) used in a B derivation. But LPwNF does not allow counterattacks on r4 by another rule with head mammal(platypus), but only by an attack on the body of r4 . The latter is impossible in our case (there is no rule matching ¬hasBill(platypus)). Thus the attack via r4 succeeds and the proof of mammal(platypus) via r1 fails. Similarly, the proof of mammal(platypus) via r2 fails, due to an attack via rule r3 . Thus mammal(platypus) cannot be proven. Our analysis so far has shown that defeasible logic is stronger than LPwNF because it allows attacks to be counterattacked by different rules. But note that a counterattacking rule needs to be stronger than the attacking rule. Thus it is not surprising that if the priority relation is empty, both approaches coincide. Theorem 2. Let P be a logic program without negation as failure with empty priority relation. Then a literal q can be sceptically proven in P iff T (P ) ` +∂q.
A Comparison of Sceptical NAF-Free Logic Programming Approaches
5 5.1
353
Other Approaches Courteous Logic Programs
Courteous logic programs [7] share some basic ideas of defeasible logic. In particular, the approach is logic programming based, implements sceptical reasoning, and is based on competing teams of rules, and a priority relation. It imposes a total stratification on the logic program by demanding that the atom dependency graph be acyclic. This ensures that each stratum contains only rules with head p or ¬p. An answer set is built gradually, stratum by stratum. Compared to defeasible logic, courteous logic programs are more specialized in the following respects: (i) The atom dependency graph of a courteous logic program must be acyclic. This condition is central in the courteous logic program framework, but is not necessary in defeasible logic; (ii) Defeasible logic distinguishes between strict and defeasible conclusions, courteous logic programs do not. Thus defeasible logic is more fine-grained; (iii) Defeasible logic has the concept of a defeater, courteous logic programs do not. Thus defeasible logic offers a greater flexibility in the expression of information. On the other hand, there seems to be a major difference between the two approaches, in that courteous logic programs may use negation as failure. However, a courteous logic program with negation as failure C can be modularly translated into a program C 0 without negation as failure, using a technique suggested in [10]: every rule r : L ← L1 ∧ . . . ∧ Ln ∧ f ail M1 ∧ . . . ∧ f ail Mk can be replaced by the rules: r : L ← L1 ∧ . . . ∧ Ln ∧ pr pr ← ¬pr ← M1 ... ¬pr ← Mk where pr is a new propositional atom. If we restrict attention to the language of C, the programs C and C 0 have the same answer set. Thus, without loss of generality we may assume that a courteous logic program C does not use negation as failure. The corresponding defeasible theory df (C) is obtained by representing every rule in C 0 by an equivalent defeasible rule, and by using the same priority relation as C. Then we are able to show that courteous logic programs are a special case of defeasible logic: Theorem 3. Let C be a courteous logic program. A literal q is in the answer set of C iff df (C) ` +∂q.
354
5.2
G. Antoniou
Priority Logic
Priority logic [17,18] is a knowledge representation language where a theory consists of logic programming-like rules, and a priority relation among them. The meaning of the priority relation is that once a rule r is included in an argument, all rules inferior to r are automatically blocked from being included in the same argument. The semantics of priority logic is based on the notion of a stable argument for the credulous case, and the well-founded argument for the sceptical case. Priority logic is a general framework with many instantiations (based on so-called extensibility functions), and supports both credulous and sceptical reasoning. To allow a fair comparison to defeasible logic, one has to impose the following restrictions: (i) We will only consider defeasible rules in the sense of defeasible logic. That is, we will not distinguish between strict and defeasible rules, and we will restrict attention to rules in which only propositional literals occur (but not more general formulae, as in priority logic). Also, there will be no defeaters. (ii) The priority/superiority relation will only be defined on pairs of rules with complementary heads. (iii) We will consider the two basic instantiations of priority logic, as determined by the extensibility functions R1 and R2 (see [17,18] for details). (iv) We will compare defeasible logic to the sceptical interpretation of priority logic. Under these conditions, the difference between defeasible logic and priority logic is highlighted by the following example: r1 : r2 : r3 : r4 : The
quaker ← r5 : f ootballf an ← republican republican ← r6 : antimilitary ← pacif ist pacif ist ← quaker r7 : ¬antimilitary ← f ootballf an ¬pacif ist ← republican priority relation is empty.
(Obviously in defeasible logic we consider r1 -r7 to be defeasible rules.) In priority logic, if we use the extensibility relation R1 , then the well-founded argument is the set of all rules, and therefore inconsistent. On the other hand, in the defeasible logic version T of the priority logic program, T 6` +∂pacif ist, so the approaches are different. And if we use the extensibility relation R2 , then priority logic does not allow one to prove ¬antimilitary. But defeasible logic can prove +∂¬antimilitary. The difference is caused by the fact that defeasible logic does not propagate ambiguity, as extension-based formalisms like priority logic do (for a discussion of this issue see [15]). 5.3
Inheritance Networks
Nonmonotonic inheritance networks [14,9] were an early nonmonotonic reasoning approach which had powerful implementations, even though they lacked declarativity. Moreover they are based on the use of rules and an implicit notion of priority among rules. In [3] it was shown that inheritance networks as defined in [8] can be represented in defeasible logic. We outline the translation below.
A Comparison of Sceptical NAF-Free Logic Programming Approaches
355
A nonmonotonic inheritance network consists of a set of objects, a set of properties, and a set of arcs which is acyclic. Below is a list of the possible kinds of arcs, where a is an object, and p and q are properties (we use a variation of syntax to be consistent with this paper): a ⇒ p, meaning that a has the property p. a 6⇒ p, meaning that a does not have property p. p ⇒ q, meaning that an object with property p typically has property q. p 6⇒ q, meaning that an object with property p typically does not have property q. A nonmonotonic inheritance network N is naturally translated into a defeasible theory T (N ): For every arc a ⇒ p in N include the fact p(a) in T (N ). For every a 6⇒ p in N include the fact ¬p(a) in T (N ). For every path a ⇒ . . . ⇒ p ⇒ q in N include the rule p(a) ⇒ q(a) in T (N ). For every path a ⇒ . . . ⇒ p 6⇒ q in N include the rule p(a) ⇒ ¬q(a) in T (N ). We have omitted the definition of the superiority relation which simulates specificity in the inheritance networks of [8]. The complicated definition is found in [3]. That paper also proposes a way of compiling specificity into the definition of a derivation, which can be used to make the translation of a nonmonotonic inheritance network into a defeasible theory modular. Result 5.2 Let N be a nonmonotonic inheritance network. Then we may construct a defeasible theory T (N ), such that for every literal q, q is supported by N iff T (N ) ` +∂q.
6
Conclusion
We have looked at the relationship between four logic programming-based formalisms that employ a priority relation among rules and take a sceptical approach to inference. Three, defeasible logic, LPwNF and courteous logic programs, belong to the same “school” of conservative reasoning in the classification of [16], while priority logic takes a fundamentally different approach, which is evident in its propagation of ambiguity. In addition, a class of nonmonotonic inheritance networks can be embedded into defeasible logic, so it belongs, too, to the school of conservative reasoning, even though it is not a logical formalism. Of the three formalisms in the conservative reasoning school, defeasible logic is the most powerful. It is able to draw more conclusions (from the same rules) than LPwNF can, principally because it argues on the basis of teams of rules. Courteous logic programs also employ teams of rules, but the approach is severely restricted in that the atom dependency graph is required to be acyclic. In addition, of course, defeasible logic makes a distinction between definite knowledge (obtained by facts and strict rules) and defeasible knowledge.
356
G. Antoniou
The results of this paper indicate that defeasible logic deserves more attention. In other papers [2,11] we have studied the logic as a formal system, including representation results, properties of the inference relation, and semantics.
References 1. G. Antoniou. Nonmonotonic Reasoning. MIT Press 1997. 2. G. Antoniou, D. Billington, and M.J. Maher. Normal forms for defeasible logic. In Proc. 1998 Joint International Conference and Symposium on Logic Programming, MIT Press 1998. 3. D. Billington, K. de Coster and D. Nute. A modular translation from defeasible nets to defeasible logic. Journal of Experimental and Theoretical Artificial Intelligence 2 (1990): 151-177. 4. D. Billington. Defeasible Logic is Stable. Journal of Logic and Computation 3 (1993): 370–400. 5. M.A. Covington, D. Nute and A. Vellino. Prolog Programming in Depth. Prentice Hall 1997. 6. Y. Dimopoulos and A. Kakas. Logic Programming without Negation as Failure. In Proc. ICLP-95, MIT Press 1995. 7. B.N. Grosof. Prioritized Conflict Handling for Logic Programs. In Proc. Int. Logic Programming Symposium, J. Maluszynski (Ed.), 197–211. MIT Press, 1997. 8. J.F. Horty, R.H. Thomason and D. Touretzky. A skeptical theory of inheritance in nonmonotonic semantic networks. In Proc. AAAI-87, 358-363. 9. J.F. Horty. Some direct theories of nonmonotonic inheritance. In D.M. Gabbay, C.J. Hogger and J.A. Robinson (eds): Handbook of Logic in Artificial Intelligence and Logic Programming Vol. 3, Clarendon Press 1994, 111-187. 10. A.C. Kakas, P. Mancarella and P.M. Dung. The Acceptability Semantics for Logic Programs. In Proc. Eleventh International Conference on Logic Programming, (ICLP’94), 504-519, MIT Press 1994 11. M.J. Maher, G. Antoniou and D. Billington. A Study of Provability in Defeasible Logic. In Proc. 11th Australian Joint Conference on Artificial Intelligence, LNAI 1502, Springer 1998, 215-226. 12. D. Nute. Defeasible Reasoning. In Proc. 20th Hawaii International Conference on Systems Science, IEEE Press 1987, 470–477. 13. D. Nute. Defeasible Logic. In D.M. Gabbay, C.J. Hogger and J.A. Robinson (eds.): Handbook of Logic in Artificial Intelligence and Logic Programming Vol. 3, Oxford University Press 1994, 353-395. 14. D. Touretzky. The mathematics of inheritance systems. Morgan Kaufmann 1986. 15. D. Touretzky, J.F. Horty and R.H. Thomason. A clash of intuitions: The current state of nonmonotonic multiple inheritance systems. In Proc. IJCAI-87, 476-482, Morgan Kaufmann 1987. 16. G. Wagner. Ex contradictione nihil sequitur. In Proc. 12th International Joint Conference on Artificial Intelligence, Morgan Kaufmann 1991. 17. X. Wang, J. You and L. Yuan. Nonmonotonic reasoning by monotonic inferences with priority constraints. In Nonmonotonic Extensions of Logic Programming, J. Dix, P. Pereira, and T. Przymusinski (eds), LNAI 1216, Springer 1997, 91-109. 18. X. Wang, J. You and L. Yuan. Logic programming without default negation revisited. In Proc. IEEE International Conference on Intelligent Processing Systems, IEEE 1997.
Characterizations of Classes of Programs by Three-Valued Operators Pascal Hitzler1 and Anthony Karel Seda2 1
National University of Ireland, Cork, Ireland, [email protected], WWW home page: http://maths.ucc.ie/˜pascal/index.html 2 National University of Ireland, Cork, Ireland, [email protected], WWW home page: http://maths.ucc.ie/˜seda/index.html
Abstract. Several important classes of normal logic programs, including the classes of acyclic, acceptable, and locally hierarchical programs, have the property that every program in the class has a unique twovalued supported model. In this paper, we call such classes unique supported model classes. We analyse and characterize these classes by means of operators on three-valued logics. Our studies will motivate the definition of a larger unique supported model class which we call the class of Φ∗ -accessible programs. Finally, we show that the class of Φ∗ -accessible programs is computationally adequate in that every partial recursive function can be implemented by such a program.
1
Introduction
A good deal of recent research in logic programming has been put into the determination of standard, or intended, models for normal logic programs. Some standard semantics, such as the well-founded semantics ([14]) or the stable model semantics ([15]), are applicable to very large classes of programs. However, whilst the general applicability of these semantics is certainly desirable, the study of these large classes of programs has a natural practical limitation: it is possible to assign standard models to logic programs for which useful interpreters have not yet been implemented, and for which it is questionable whether or not this ever will be possible. It is therefore reasonable to study smaller classes of programs whose behaviour is more controlled, so long as these classes are large enough for practical purposes. On the other hand, certain classes of logic programs have been defined purely in order to study termination and computability properties. For instance, the acyclic programs of Cavedon [8] (initially called locally ω-hierarchical programs by him) are precisely the terminating programs, and were shown by Bezem [7] to be able to compute all the total computable functions, see also [1]. Next, the class of acceptable programs ([3]) was introduced by Apt and Pedreschi. Such programs are left-terminating and, conversely, left-terminating non-floundering M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, pp. 357–371, 1999. c Springer-Verlag Berlin Heidelberg 1999
358
P. Hitzler and A.K. Seda
programs are acceptable. In fact, the class of all acceptable programs strictly contains the acyclic programs but, nevertheless, is not computationally adequate, i.e. not every partial recursive function can be implemented by such a program. Finally, the class of all locally hierarchical programs was introduced in [8]. However, this class, which also contains all acyclic programs, is computationally adequate under Prolog if the use of safe cuts is allowed ([23]). All the programs contained in the classes mentioned in the previous paragraph have a common property: they have unique supported models. These classes will be called here unique supported model classes. In fact, they even have unique three-valued models under Fitting’s Kripke-Kleene semantics ([11]). Thus, the programs in question leave little doubt about the semantics, i.e. the model, which is to be assigned to them as standard model and, in addition, they have interesting computational properties under existing interpreters, as noted above. In this paper, we will analyse and characterize unique supported model classes by means of certain three-valued logics, and study computability properties of these. In particular, in Section 2 we will introduce three different three-valued logics and their associated consequence operators, and study the relationships between them. In Sections 3.1 and 3.2, we will characterize acceptable and locally hierarchical programs by means of the behaviour of these operators. We will also give constructions of their canonical level mappings. Prompted by the studies of acceptable and locally hierarchical programs, we will define a new class of programs which we call the Φ∗ -accessible programs. We study this class in Section 3.3, where it is shown that the Φ∗ -accessible programs contain the acceptable and the locally hierarchical programs. Moreover, we will show that each Φ∗ -accessible program has a unique supported model, that each has a canonical level mapping, and that the class of Φ∗ -accessible programs is computationally adequate under SLDNF-resolution. Many-valued logics have been employed in several studies of the semantics of logic programs. In particular, they have been used to assign special truth values to atoms which possess certain computational behaviour such as being non-terminating ([11,20]), being ill-typed ([21]), being floundering ([4]), or failing when backtracking ([6]). The motivation for the definitions of the three-valued logics we will be using in this paper comes from a couple of sources. Primarily, these logics are formulated in order to allow for easy analysis and characterization of the programs or classes of programs in question by using the logic to mimic the defining property of the program or class of programs. This idea is akin to some of those considered in the papers just cited, see also [6], and is a component of work being undertaken by the authors in [16] where a program transformation which outputs a locally hierarchical program, when input an acceptable one, is used in the characterization of acceptable programs given in [16]. Natural questions, partly answered here, then arise as to the different ways that different classes of programs can be characterized. On the other hand, the present work can also be viewed as a contribution to the asymmetric semantics proposed by Fitting and Ben-Jacob in [13] where it is noted that certain differences between Pascal,
Characterizations of Classes of Programs by Three-Valued Operators
359
LISP and Prolog, for example, are easily described in terms of three-valued logic. Thus, [13] is also a source of motivation for our definitions. However, we note that all programs analysed herein do have unique supported models, therefore the third truth value undefined will only be used for obtaining the unique supported two-valued model. Hence, interpretations of undefined from the point of view of computation (such as non-halting) are not actually necessary in this paper. Preliminaries and Notation Our notation basically follows [18], but we will include next a short review of the main terminology used. Given a normal logic program P , we work over an arbitrary preinterpretation J (complete generality is needed in [16] and hence also in this companion paper). We refer to variable assignments which map into the domain D of J as J-variable assignments; the underlying first order language of P will be denoted by L. By BP,J , we denote the set of all J-ground instances of atoms in L. Thus, BP,J is the set of all p(d1 , . . . , dn ), where p is an n-ary predicate symbol in L and d1 , . . . , dn ∈ D. An element A = p(d1 , . . . , dn ) of BP,J is called a J, v-(ground) instance or J-(ground) instance of an atomic formula A0 = p(t1 , . . . , tn ) in L if there exists a J-variable assignment v such that A0 | v = A, meaning that ti | v = di for i = 1, . . . , n, where t | v is the denotation of a term t relative to J and v. Since each ti | v ∈ D, any J-instance of A0 is variable free. This extends easily to literals L, where L = ¬A0 = ¬p(t1 , . . . , tn ), say. Thus, the symbol ¬p(d1 , . . . , dn ) is called a J, v-(ground) instance or J-(ground) instance of the literal L if there exists a J-variable assignment v such that p(t1 , . . . , tn ) | v = p(d1 , . . . , dn ). We often loosely refer to J-ground instances of atoms and of literals as J-ground atoms and J-ground literals respectively or even as ground atoms and ground literals respectively if J is understood. In accordance with [22, Definition 1], we write groundJ (P ) for the set of all J-(ground) instances of clauses, or J-ground clauses, or simply ground clauses, in P ; the latter term being used, of course, when again J is understood. Thus, typically, if A0 ← L1 , . . . , Ln is a clause in P , then A0 | v ← L1 | v, . . . , Ln | v is an element of groundJ (P ), where v is a J-variable assignment such that A = A0 | v is a J-instance of A0 and Li | v is a J-instance of Li for i = 1, . . . , n. All elements of groundJ (P ) are obtained thus from some clause and some J-variable assignment. Example 1. As an example of a normal logic program, we give the following program from [3] for computing the transitive closure of a graph. r(X, Y, E, V ) ← m([X, Y ], E) r(X, Z, E, V ) ← m([X, Y ], E), ¬m(Y, V ), r(Y, Z, E, [Y |V ]) m(X, [X|T ]) ← m(X, [Y |T ]) ← m(X, T ) e(a) ←
for all a ∈ N
Here, N denotes a finite set containing the nodes appearing in the graph as elements. In the program, uppercase letters denote variable symbols, lowercase
360
P. Hitzler and A.K. Seda
letters constant symbols, and lists are written using square brackets as usual under Prolog. One evaluates a goal ← r(x, y, e, [x]) where x and y are nodes and e is a graph specified by a list of pairs denoting its edges. The goal is supposed to succeed when x and y can be connected by a path in the graph. The predicate m implements membership of a list. The last argument of the predicate r acts as an accumulator which collects the list of nodes which have already been visited in an attempt to reach y from x. The transitive closure program has been studied in detail in [3,12]. The set of all two-valued interpretations based on J for a given normal program P will be denoted by IP,J . Elements of IP,J are called J-interpretations and are called J-models of P if they are also models of P . The set IP,J is a complete lattice with respect to the ordering ⊆ defined by I ⊆ K if and only if I |= A implies K |= A for every A ∈ BP,J . In order to simplify notation, we note that IP,J can be identified with the power set 2BP,J and the ordering ⊆ is then indeed setinclusion. For I ∈ IP,J , we set c I = BP,J \ I. With this convention and following [22, Section 2], in classical two-valued logic we write I |= p(d1 , . . . , dn ) (respectively I |= ¬p(d1 , . . . , dn )) if p(d1 , . . . , dn ) ∈ I (respectively p(d1 , . . . , dn ) 6∈ I). By abusing the meaning of conjunction, and its notation, in the obvious way (see [22, Section 2]), it is now meaningful to write I |= L1 | v, . . . , Ln | v, where L1 | v, . . . , Ln | v denotes a “conjunction” L1 | v ∧ . . . ∧ Ln | v of J-instances of literals. The immediate consequence operator TP,J for a given program P is defined as usual as a mapping on IP,J as follows (where body denotes a conjunction of J-instances of literals): TP,J (I) = {A ∈ BP,J | there exists A ← body in groundJ (P ) with I |= body}. Finally, recall from [2] that a two-valued J-interpretation M is a supported Jmodel of P if and only if M (together with Clark’s Equality Theory) is a J-model of the Clark-completion of P if and only if TP,J (M ) = M .
2
Three-Valued Semantics
A three-valued J-interpretation of a program P is a pair (T, F ) of disjoint sets T, F ⊆ BP,J . Given such a J-interpretation I = (T, F ), a J-ground atom A is true (t) in I if A ∈ T , false (f) in I if A ∈ F , and undefined (u) otherwise; ¬A is true in I iff A is false in I, ¬A is false in I iff A is true in I and ¬A is undefined in I iff A is undefined in I. Given I = (T, F ), we denote T by I + and F by I − . Thus, I = (I + , I − ). If + I ∪ I − = BP,J , we call I a total three-valued J-interpretation of the program P . Total three-valued interpretations can be identified with elements of IP,J . Given a program P , the set IP,J,3 of all three-valued J-interpretations of P forms a complete partial order (in fact, complete semi-lattice) with the ordering ≤ defined by I≤K
if and only if
I + ⊆ K + and I − ⊆ K −
Characterizations of Classes of Programs by Three-Valued Operators
361
with least element (∅, ∅) which we will denote by ⊥. Notice that total threevalued J-interpretations are maximal elements in this ordering. In our present context, it will be sufficient to give truth tables for conjunction and disjunction, and we will make use of three different three-valued logics which we are now going to define. It should be noted here that the truth tables for disjunction are the same in all three logics and that disjunction is commutative. The first logic, which we will denote by L1 , evaluates conjunction as in Fitting’s Kripke-Kleene semantics [11] (in fact, as in Kleene’s strong three-valued logic, see [13]). Fitting’s work built on [20] and was subsequently studied in the literature by Kunen in [17], Apt and Pedreschi in [3], and Naish in [21]. Disjunction will be evaluated differently though, as indicated by the truth table in Table 1. Table 1. Truth tables for the logics L1 , L2 , and L3
p q t t t u t f u t u u u f f t f u f f Operator
Logic L1 Logic L2 Logic L3 p∧q p∨q p∧q p∨q p∧q p∨q t t t t t t u u u u u u f t f t f t u u u u u u u u u u u u f u u u u u f t f t f t f u f u u u f f f f f f ΦP,1 ΦP,2 ΦP,3
The second three-valued logic, L2 , will be used for studying acceptable programs and is non-commutative under conjunction. It will be sufficient to evaluate u ∧ f to u instead of f and leaving the truth table for L1 otherwise unchanged. This way of defining conjunction was employed in [4] and [6], see also the discussion of LISP in [13]. The truth table is again given in Table 1. The third logic, L3 , will be used for studying locally hierarchical and acyclic programs. For this purpose, we use a commutative version of L2 where we evaluate f ∧ u to u instead of f, see the discussion in [13] of Kleene’s weak three-valued logic in relation to Pascal. The truth table is shown in Table 1. Let P be a normal logic program, and let Li denote one of the three-valued logics above, where i = 1, 2 or 3. Corresponding to each of these logics we define an operator FP,J on IP,J,3 as follows. For I ∈ IP,J,3 , let FP,J (I) = (T, F ) where T denotes the set {A ∈ BP,J | there is A ← body ∈ groundJ (P ) s.t. body is truei in I}, and F denotes the set
362
P. Hitzler and A.K. Seda
{A ∈ BP,J | for every A ← body ∈ groundJ (P ), body is falsei in I}. Of course, truei and falsei here denote truth respectively falsehood in the logic Li . Notice that if A is not the head of any clause in P , then A is false in FP,J (I) for any I. It is clear that FP,J is monotonic in all three cases. We set FP,J ↑ 0 = ⊥, FP,J ↑ α = FP,J (FP,J ↑ (α − 1)) for α a successor ordinal, and [ FP,J ↑ α = FP,J ↑ β for α a limit ordinal. β l(Li ).
j=1
A program is called J-acceptable with respect to l if l is a level mapping and there exists a J-model I such that the program is J-acceptable with respect to l and I. A program is called J-acceptable, or just acceptable if J is understood, if it is J-acceptable with respect to some level mapping and some J-model.
Characterizations of Classes of Programs by Three-Valued Operators
365
Example 3. The transitive closure program given in Example 1 is Herbrandacceptable; for details of the model and level mapping required, see [3]. We are able to characterize J-acceptable programs by means of the operator ΦP ∗ ,2 , and we do this next. We will need the following proposition from [16]. Proposition 5. Suppose that P is J-acceptable with respect to a level mapping + is the unique supported J-model of P l. Then MP,J = ΦP,1 ↑ ω is total, MP,J + . and P is J-acceptable with respect to l and MP,J Lemma 1. Let P be J-acceptable. Then M = ΦP ∗ ,2 ↑ ω is total. Furthermore, M = ΦP,2 ↑ ω, and M + is the unique supported J-model of P . Proof. Let l be a level mapping with respect to which P is J-acceptable. By + . Assume that there Proposition 5, P is J-acceptable with respect to l and MP,J is a J-ground atom A which is undefined in M . Without loss of generality we can assume that l(A) is minimal. Then by definition of L2 , there is precisely one pseudo clause in P ∗ of the form A ← ∨i Ci in which at least one of the Ci , say C1 , is undefined. Thus, there must occur a left-most J-ground body literal B in C1 which is undefined in M , and this ground literal is to the left in C1 of the first ground literal which is false in M . Hence, all ground literals occurring to the left of B must be true in M . Since M ≤ MP,J by Proposition 4, all these + ground literals must also be true in MP,J . By acceptability of P we therefore conclude that l(B) < l(A), contradicting the minimality of l(A). By Proposition 4, the second statement holds. The last statement follows from Proposition 3. Definition 3. Let P be J-acceptable. Define its canonical level mapping as follows: lP (A) is the lowest ordinal α such that A is not undefined in ΦP ∗ ,2 ↑ (α+1). Proposition 6. Let P be J-acceptable. Then lP is an ω-level mapping and P is J-acceptable with respect to lP and MP,J . Furthermore, if l is another level mapping with respect to which P is J-acceptable, then lP (A) ≤ l(A) for all A ∈ BP,J . In particular, lP is exactly the canonical level mapping defined in [16]. Proof. By the previous lemma, lP is indeed an ω-level mapping. Let A be the head of a J-ground clause C in P with lP (A) = n. Then the body ∨i Ci of the corresponding pseudo clause in P ∗ is either true or false (i.e. is not undefined) in N = ΦP ∗ ,2 ↑ n. If ∨i Ci is true, each Ci evaluates to true or false in N . If Ci evaluates to true in N (and at least one must), then all J-ground literals in Ci are true in N , and therefore have level less than or equal to n − 1. If Ci evaluates to false in N , then there must be a ground literal in Ci which is false in N such that all ground literals occurring to the left of it are true in N . Moreover all these ground literals are not undefined in N and hence have level less than or equal to n − 1. A similar argument applies if ∨i Ci is false in N . Since N ≤ MP,J , it is now clear that the clause C satisfies the condition of acceptability given in Definition 2 with respect to lP and MP,J .
366
P. Hitzler and A.K. Seda
Now let l be another level mapping with respect to which P is J-acceptable. By Proposition 5, P is J-acceptable with respect to l and MP,J . Let A ∈ BP,J with l(A) = n. We show by induction on n that l(A) ≥ lP (A). If n = 0, then A appears only as the head of unit clauses, and therefore lP (A) = 0. Now let n > 0. Then in every clause with head A, the left prefix of the corresponding body, up to and including the first ground literal which is false in MP,J , contains only ground literals L with l(L) < n. By the induction hypothesis, lP (L) < n for all these ground literals L and, consequently, lP (A) ≤ l(A) by definition of lP . The last statement follows from [16], where it is shown that the given minimality property characterizes lP . We are now in a position to characterize J-acceptable programs. Theorem 1. Let P be a normal logic program. Then P is J-acceptable if and only if M = ΦP ∗ ,2 ↑ ω is total. Proof. By Lemma 1 it remains to show that totality of M implies acceptability. Define the ω-level mapping lP for P as in Definition 3. Since M is total, lP is indeed an ω-level mapping for P . We will show that P is J-acceptable with respect to lP and M . Arguing as in the proof of the previous proposition, let A be the head of a Jground clause C in P with lP (A) = n. Then the corresponding body C evaluates to true or false in N = ΦP ∗ ,2 ↑ n. If it evaluates to true in N , then all J-ground literals in C are true in N , and therefore have level less than or equal to n − 1. If it evaluates to false in N , then there must be a ground literal in C which is false in N such that all ground literals occurring to the left of it are true in N . Again, all these ground literals are not undefined in N and hence have level less than or equal to n − 1. Since N ≤ M, the clause C satisfies the condition of acceptability given in Definition 2. In [19], it was shown that the class of programs which terminate under Chan’s constructive negation ([10]) coincides with the class of programs which are acceptable with respect to a model based on a preinterpretation whose domain is the Herbrand universe and contains infinitely many constant and function symbols. We therefore obtain the following result. Theorem 2. A normal logic program P terminates under Chan’s constructive negation if and only if ΦP ∗ ,2 ↑ ω is total, where ΦP ∗ ,2 is computed with respect to a preinterpretation whose domain is the Herbrand universe and contains infinitely many constant and function symbols. 3.2
Locally Hierarchical Programs
Locally hierarchical programs were introduced in [8], for the special case of the Herbrand base, as a natural generalization of acyclic programs. They were further studied in [9] and in [23] (and also called strictly level-decreasing there). Here, we consider them over an arbitrary preinterpretation J and our definition and subsequent results are therefore completely general.
Characterizations of Classes of Programs by Three-Valued Operators
367
Definition 4. A normal logic program P is called locally hierarchical if there exists a level mapping l : BP,J → α, where α is some countable ordinal, such that for every clause A ← L1 , . . . , Ln in groundJ (P ) we have l(A) > l(Li ) for all i. If, further, α = ω, we call P acyclic. We will now give a new characterization of these programs along the lines of Theorem 1, using the operator ΦP ∗ ,3 . Lemma 2. Let P be locally hierarchical with respect to the level mapping l and let A ∈ BP,J be such that l(A) = α. Then A is true or false in ΦP ∗ ,3 ↑ (α + 1). In particular, there exists an ordinal αP such that ΦP ∗ ,3 ↑ αP is total. Proof. The proof is by transfinite induction on α. The base case follows directly from the fact that if α = 0, then A appears as head of unit clauses only. Now let α = β + 1 be a successor ordinal. Then all J-ground literals appearing in bodies of clauses with head A have level less than or equal to β. By the induction hypothesis, they are all not undefined in ΦP ∗ ,3 ↑ (β + 1) and therefore A is either true or false in ΦP ∗ ,3 ↑ (α + 1). If α is a limit ordinal, then all ground literals occurring in bodies of clauses with head A have level strictly less than α. Hence, by the induction hypothesis and since α is a limit ordinal, all these ground body literals are not undefined in ΦP ∗ ,3 ↑ α, and therefore A is true or false in ΦP ∗ ,3 ↑ (α + 1). Corollary 1. Let P be a locally hierarchical program with level mapping l : BP,J → α and let M = ΦP,1 ↑ α. Then M is total and MP,J = M + is the unique supported J-model of P . Proof. By Propositions 1 and 4, we have ΦP ∗ ,3 ↑ β ≤ ΦP,3 ↑ β ≤ ΦP,1 ↑ β for all ordinals β. Since ΦP ∗ ,3 ↑ α is total by Lemma 2, the given statement holds using Proposition 3. Definition 5. Let P be locally hierarchical. Define the canonical level mapping lP of P as a function lP : BP,J → αP where lP (A) is the least ordinal α such that A is true or false in ΦP ∗ ,3 ↑ (α + 1). Proposition 7. Let P be locally hierarchical with respect to some level mapping l. Then lP is a level mapping for P and, for all A ∈ BP,J , we have lP (A) ≤ l(A). Furthermore, the notion of canonical level mapping as defined here coincides with the same notion defined by different methods in [23]. Proof. The mapping lP is indeed a level mapping by Lemma 2. Let A ∈ BP,J with l(A) = α. We show the given minimality statement by transfinite induction on α. If α = 0, then A appears as the head of unit clauses only, and so lP (A) = 0. If α = β + 1 is a successor ordinal, then all J-ground literals L occurring in bodies of clauses with head A have level l(L) ≤ β. By the induction hypothesis, we obtain lP (L) ≤ β for all those ground literals, and so lP (A) ≤ α = l(A) by construction of lP . If α is a limit ordinal, then all ground literals L occurring in bodies of clauses with head A have level l(L) < α. Since lP (L) ≤ l(L) and since
368
P. Hitzler and A.K. Seda
α is a limit ordinal, we obtain that all these ground literals L are not undefined in ΦP ∗ ,3 ↑ α and therefore lP (A) ≤ α = l(A) as desired. The last statement follows since the minimality property just proved characterizes the canonical level mapping as was shown in [23]. Note that it is an easy corollary of the previous results that if a program P is acyclic, then ΦP ∗ ,3 ↑ ω is total. Theorem 3. A normal logic program P is locally hierarchical if and only if ΦP ∗ ,3 ↑ α is total for some ordinal α. It is acyclic if and only if ΦP ∗ ,3 ↑ ω is total. Proof. Let P be a normal logic program such that ΦP ∗ ,3 ↑ α is total for some α. We define a mapping l : BP,J → α by analogy with the definition of the canonical level mapping for locally hierarchical programs. From the definion of L3 it is now obvious that P is indeed locally hierarchical with canonical level mapping l. The reverse was shown in the previous proposition. The statement for acyclic programs now follows immediately as well. 3.3
Φ∗ -Accessible Programs
Our investigations of J-acceptable and locally hierarchical programs suggest we define a class of programs by the property that ΦP ∗ ,1 ↑ α is total for some ordinal α. We will do this next and show also that this class is computationally adequate. Definition 6. A normal logic program P will be called a Φ∗ -accessible program if ΦP ∗ ,1 ↑ α is total for some ordinal α. Theorem 4. Every Φ∗ -accessible program has a unique supported J-model. Furthermore, the class of Φ∗ -accessible programs contains all J-acceptable and all locally hierarchical programs. Proof. Immediate by Propositions 3 and 4. Definition 7. The canonical level mapping l∗ for a given Φ∗ -accessible program is defined as follows. For every A ∈ BP,J , set l∗ (A) = α, where α is the minimal ordinal such that A is true or false in ΦP ∗ ,1 ↑ (α + 1). The following is immediate by Proposition 4. Proposition 8. If P is J-acceptable or locally hierarchical with canonical level mapping lP , then l∗ (A) ≥ lP (A) for all J-ground atoms A. Proposition 9. Let P be Φ∗ -accessible with unique supported J-model M . Let C be an arbitrary element of groundJ (P ), let A be its head, and let l∗ (A) = α. Then the following property (∗) holds: Either the body of C is true in M , in which case every J-ground literal L in this body has level l∗ (L) < α, or there exists a ground body literal B in C which is false in M , and in this case l∗ (B) < α. Furthermore, if l is a level mapping for P which satisfies (∗), then l∗ (A) ≤ l(A) for every A ∈ BP,J .
Characterizations of Classes of Programs by Three-Valued Operators
369
Proof. Since P is Φ∗ -accessible, every body of every J-ground clause with head A is either true or false in ΦP ∗ ,1 ↑ α. In particular, the body of C is true or false in ΦP ∗ ,1 ↑ α. If it is true, then all J-ground literals L in the body are true in ΦP ∗ ,1 ↑ α and so l∗ (L) < α by definition of l∗ . If the body is false, then there is a ground body literal B which is false in ΦP ∗ ,1 ↑ α, and again by definition of l∗ we obtain l∗ (B) < l(A). The minimality property of l∗ is shown by transfinite induction along the same lines as in the proofs of the Propositions 6 and 7. It was shown in [23] that the class of all locally hierarchical programs is computationally adequate in the sense that every partial recursive function can be computed with such a program if the use of safe cuts is allowed. For Φ∗ accessible programs, the cut need not be used, and we will show this next. The proof basically shows that given a partial recursive function, there is a definite program as given in [18] which computes that function. This program will turn out to be a Φ∗ -accessible program. Theorem 5. Let f be a partial recursive function. Then there exists a definite Φ∗ -accessible program which computes f . Proof. We will make use of the definite program Pf given in [18, Theorem 9.6], and we refer the reader to the proof of this theorem for details. It is easily seen that we have to consider the minimalization case only. In [18], the following program Pf was given as an implementation of a function f which is the result of applying the minimalization operator to a partial recursive function g, which is in turn implemented by a predicate pg . We abbreviate X1 , . . . , Xn by X. pf (X, Y ) ← pg (X, 0, U ), r(X, 0, U, Y ) r(X, Y, 0, Y ) ← r(X, Y, s(V ), Z) ← pg (X, s(Y ), U ), r(X, s(Y ), U, Z) This program is not Φ∗ -accessible. However, we can replace it with a program Pf0 which has the same procedural behaviour and is Φ∗ -accessible. In fact, we replace the definition of r by r(X, Y, 0, Y ) ← r(X, Y, s(V ), Z) ← pg (X, s(Y ), U ), r(X, s(Y ), U, Z), lt(Y, Z), where the predicate lt is in turn defined as lt(0, s(X)) ← lt(s(X), s(Y )) ← lt(X, Y ) and is obviously Φ∗ -accessible. By a straightforward analysis of the original program Pf , it is clear that the addition of lt(y, z) in the second defining clause of r does not alter the behaviour of the program. Since lt and pg are Φ∗ -accessible, it is now easy to see that r is Φ∗ -accessible, and so therefore is Pf0 .
370
P. Hitzler and A.K. Seda
It is worth noting that negation is not needed here in order to obtain full computational power, so Theorem 5 strenghtens the result of [18] referred to in the proof of Theorem 5. By contrast, as already noted, definite locally hierarchical programs seem not to provide full computational power. Regardless of some known drawbacks in SLDNF-resolution, it is interesting to know that relative to it the class of all Φ∗ -accessible programs has full computational power – neither the class of acyclic nor even the class of J-acceptable programs has this property.
4
Conclusions
The rather simple characterizations of the classes discussed in this paper are a contribution to exploring the “space” of all normal programs, a task which appears not yet to have been addressed very extensively. Both the class of locally hierarchical programs and the class of J-acceptable programs are natural generalizations of acyclic programs; the first can be understood as a generalization in semantical terms, and the second as a generalization expressing termination. The results presented in this paper establish a common framework which highlights more clearly the differences and the similarities between these generalizations: each can be obtained uniquely by suitably defining conjunction in the underlying three-valued logic whilst retaining a fixed meaning for disjunction. Our approach then leads naturally to the definition of the class of all Φ∗ -accessible programs, by choosing yet another definition of conjunction. This class is remarkable for two reasons: (i) each program in it has a unique supported J-model, and (ii) the class itself has full computational power under SLDNF-resolution whilst containing all J-acceptable and all locally hierarchical programs, but not all definite programs. However, a simple syntactical description of this class and how it relates to other better known classes is not yet known to us, nor is the complexity of deciding if a program is Φ∗ -accessible. Other classes of programs may well be susceptible to the sort of analysis presented here, and this also is ongoing research of the authors. As already noted in the Introduction, such an investigation carries forward the suggestion made in [13] that asymmetric semantics is worthy of further study. Acknowledgements The authors wish to thank three anonymous referees for their comments which substantially helped to improve the style of this paper. The first named author acknowledges financial support under grant SC/98/621 from Enterprise Ireland.
References 1. Apt, K.R., Bezem, M.: Acyclic Programs. In: Warren, D.H.D., Szeredi, P. (Eds.): Proceedings of the Seventh International Conference on Logic Programming. MIT Press, Cambridge MA, 1990, pp. 617–633 2. Apt, K.R., Blair, H.A., Walker, A.: Towards a Theory of Declarative Knowledge. In: Minker, J. (Ed.): Foundations of Deductive Databases and Logic Programming. Morgan Kaufmann Publishers Inc., Los Altos, 1988, pp. 89–148
Characterizations of Classes of Programs by Three-Valued Operators
371
3. Apt, K.R., Pedreschi, D.: Reasoning about Termination of Pure Prolog Programs. Information and Computation 106 (1993) 109–157 4. Andrews, J.H.: A Logical Semantics for Depth-first Prolog with Ground Negation. Theoretical Computer Science 184 (1–2) (1997) 105–143 5. Bidoit, N., Froidevaux, C.: Negation by default and unstratifiable logic programs. Theoretical Computer Science 78 (1991) 85–112 6. Barbuti, R., De Francesco, N, Mancarella, P, Santone, A.: Towards a Logical Semantics for Pure Prolog. Science of Computer Programming 32 (1–3) (1998) 145–176 7. Bezem, M: Characterizing Termination of Logic Programs with Level Mappings. In: Lusk, E.L., Overbeek R.A.(Eds.): Proceedings of the North American Conference on Logic Programming. MIT Press, Cambridge MA, 1989, pp. 69–80 8. Cavedon, L.: Continuity, Consistency, and Completeness Properties for Logic Programs. In: Levi, G., Martelli, M. (Eds.): Proceedings of the 6th International Conference on Logic Programming. MIT Press, Cambridge MA, 1989, pp. 571–584 9. Cavedon L.: Acyclic Logic Programs and the Completeness of SLDNF-Resolution. Theoretical Computer Science 86 (1991) 81–92 10. Chan, D.: Constructive Negation Based on the Completed Database. In: Proc. of the 5th Int. Conf. and Symp. on Logic Programming, 1988, pp. 111–125 11. Fitting, M.: A Kripke-Kleene Semantics for General Logic Programs. J. Logic Programming 2 (1985) 295-312 12. Fitting, M.: Metric Methods: Three Examples and a Theorem. J. Logic Programming 21 (3) (1994) 113–127 13. Fitting, M., Ben-Jacob, M.: Stratified, Weak Stratified, and Three-Valued Semantics. Fundamenta Informaticae XIII (1990) 19–33 14. Van Gelder, A., Ross, K.A., Schlipf, J.S.: The Well-Founded Semantics for General Logic Programs. Journal of the ACM 38 (3) (1991) 620–650 15. Gelfond, M., Lifschitz, V.: The Stable Model Semantics for Logic Programming. In: Kowalski, R.A., Bowen, K.A. (Eds.): Proceedings of the 5th International Conference and Symposium on Logic Programming, MIT Press, 1988, pp. 1070–1080 16. Hitzler, P., Seda, A.K.: Acceptable Programs Revisited. Preprint, Department of Mathematics, University College Cork, Cork, Ireland, 1999, pp. 1–15 17. Kunen, K.: Negation in Logic Programming. J. Logic Programming 4 (1987) 289– 308 18. Lloyd, J.W.: Foundations of Logic Programming. Second Edition, Springer, Berlin, 1988 19. Marchiori, E.: On Termination of General Logic Programs with respect to Constructive Negation. J. Logic Programming 26 (1) (1996) 69–89 20. Mycroft, A.: Logic Programs and Many-valued Logic. In: Fontet, M., Mehlhorn, K. (Eds.): STACS 84, Symposium of Theoretical Aspects of Computer Science, Paris, France, 1984, Proceedings. Lecture Notes in Computer Science, Vol. 166, Springer, 1984, pp. 274–286 21. Naish, L.: A Three-Valued Semantics for Horn Clause Programs. Technical Report 98/4, University of Melbourne, pp. 1–11 22. Seda, A.K.: Topology and the Semantics of Logic Programs. Fundamenta Informaticae 24 (4) (1995) 359–386 23. Seda, A.K., Hitzler, P.: Strictly Level-decreasing Logic Programs. In: Butterfield, A., Flynn, S. (Eds.): Proceedings of the Second Irish Workshop on Formal Methods 1998 (IWFM’98), Electronic Workshops in Computing, British Computer Society, 1999, 1–18
Using LPNMR for Problem Specification and Code Generation Marco Cadoli Dipartimento di Informatica e Sistemistica Universit` a di Roma “La Sapienza” Via Salaria 113, I-00198 Roma, Italy [email protected] WWW home page: http://www.dis.uniroma1.it/˜cadoli
In an ongoing research project1 we use a form of LPNMR as the formal basis for some code generation tools, which take as input the specification for a problem, and give as output the code to solve it, in C++ or prolog. Formally, we defined a logic-based specification language, called np-spec, extending negation-free datalog by allowing a limited use of some second-order predicates of predefined forms. The semantics of np-spec is fully declarative, and is based on the notion of minimal model, typical of circumscription. np-spec programs specify solutions to problems in a very abstract and concise way, and are executable. As an example, this is the np-spec program for the “graph 3-coloring” problem, which specifies both an instance (i.e., a graph, in the DATABASE section), and the question (in the SPECIFICATION section). DATABASE NODE = {1..6}; EDGE = {(1,2), (1,3), (2,3), (6,2), (6,5), (5,4), (3,5)}; SPECIFICATION Partition(NODE,coloring,3). non_3_colorable 0, w
We run a policy g as follows. We let v0 be the “observation” START. We then compute an infinite sequence of observations by letting vi+1 be the observation resulting from the execution of the action g(vi ). Note that observations are Herbrand terms and hence this definition allows the observation vi to be the entire history of primitive actions and their observations. If M is a finite MDP then vi might be a single symbol naming the current state. In an elevator controller or a robosoccer controller the action g(v) might rely entirely on sensing and ignore the observation v.
384
D. McAllester
A policy g and an initial state s0 determines a probability distribution over infinite sequences of states. We can evaluate the utility of a policy by introducing a reward function on states. A reward function maps each state to a real number called the reward of that state. Intuitively, the reward function expresses the goal of the robot — it should behave so as to maximize reward. Here we will be concerned with (undiscounted) asymptotic average reward. A given behavior of the robot leads, ultimately, to an infinite sequence of states s0 , s1 , s2 , . . .. We define the asymptotic average reward of such a sequence to be the following quantity where r is the reward function. k 1X r(si ) k→∞ k i=0
lim
In general there can be a nonzero probability over the choice of the infinite sequence that this limit does not exist (even for bounded reward). However, if the set of states is finite and the set of observations passed between runs of the policy h is also finite, then with probability 1 over the generation of the sequence the limit exists. In section 7 we will assume that the sets of the states and observations reachable by the policy from the start state s0 is finite. Of course the semantics of action expressions and policies supports other methods of evaluating a policy. We could consider discounted reward, or the expected time to reaching a goal state. However, for the formalism developed here, asymptotic average reward turns out to be most easily computed.
5
Symbolic POSDPs
In this section we give a method of constructing POSDPs. We define a symbolic POSDP to be a pair hP, Ai where P is a consistent stochastic program as defined in section 2 and A is a set of non-constructor function symbols defined in P which we identify with the primitive action functions. In the program P an n-ary primitive action function is defined as an n + 1-ary function — the last argument is interpreted as the state in which the action is executed. To formally define the semantics of a symbolic POSDP we first define an internal action value to be a Herbrand term of the following form. PAIR[INSERT[s1 , . . . , INSERT[sn , EMPTY] . . .], w] If u is a term of this form then we define s(u) to be the state sequence hs1 , . . . , sn i and we define o(u) to be the observation w. If u is not of this form then we define s(u) to be EMPTY and o(u) to be FAILURE[BAD-ACTION-VALUE]. We now define the semantics of a symbolic POSDP M consisting of P and A by taking the set of states to be the set of Herbrand terms and by defining the semantics of primitive actions with the following equation. X P (hs, h{v1 , . . . , vn }i → hα, wi | M) =
P (h(v1 . . . , vn , s) → u)
u: s(u)=α, o(u)=w
World-Modeling vs. World-Axiomatizing
385
It is interesting to note that the McCarthy frame problem does not arise in this approach to constructing world models. For example, a natural representation of a state of the world is a list of assertions — a list of Herbrand terms each of which intuitively represents some aspect of the world. A blocks world state might include assertions such as ON[A, B] and COLOR[A, GREEN]. We can “implement” a primitive action MOVE-FROM-TO so that if s contains CLEAR[x], ON[x, y] and CLEAR[z], then MOVE-FROM-TO(x, y, z, s) returns (the Herbrand representation of) hhui, SUCCESSi where u is the state resulting from removing the assertions ON[x, y] and CLEAR[z] and adding the assertions CLEAR[y] and ON[x, z]. We can also easily arrange that if the required conditions on the input state are not met then MOVE-FROM-TO(x, y, z, s) returns hhi, FAILURE[PRECONDITIONS-NOT-MET]i. Note that this action will automatically not affect assertions about colors — there is no need to list the properties unaffected by the action. The need to list unaffected properties does not arise in the modeling approach.
6
Computing Value Distributions for Program Expressions
This section gives an algorithm for computing the probability distribution over the values of a given program expression. This will be a required step in later algorithms and also provides a warm-up exercise for the slightly more complex computation of value distributions for action expressions. Given a closed program expression e we define the computation graph of e to be the least set of assertions containing the following. The computation graph contains all assertions of the form Eval(e0 , ρ) such that there exists a nonzero probability that the evaluation e will cause an evaluation of he0 , ρi. The computation graph also includes all assertions of the form he0 , ρi → v such that it includes Eval(e0 , ρ) and the evaluation of he0 , ρi has a nonzero probability of returning the value v. The computation graph of a given expression e can be computed using a bottom-up logic program, i.e., a set of rules for deriving new assertions. We start with the single assertion Eval(e, ∅) where e is the given top level expression and ∅ is the empty variable substitution. We then add new assertions as they become derivable under the rules. For example, there is a rule stating that if we derive Eval(g(e1 , . . . , en ), ρ) then we also derive Eval(ei , ρ) for each ei . Furthermore, if we derive Eval(g(e1 , . . . , en ), ρ) and he1 , ρi → v1 , . . ., hen , ρi → vn then we derive Eval(u, ρ0 ) where g(x1 , . . . , xn ) ≡ u ∈ P and ρ0 is the environment mapping xi to vi . It is possible to write down “evaluation rules” for each of the five types of program expressions. This generation process terminates if and only if the computation graph of e is finite. Our algorithm for computing value distributions requires that the computation graph be finite. As an example, suppose that we have defined a function NEXT-STATE such that for any Herbrand expression v we have that NEXT-STATE(v) stochastically
386
D. McAllester
computes one of a finite set of Herbrand constants representing a finite set of states. Now suppose we define the following program. TERMINAL-STATE(s) ≡ CASE s OF A : A B:B z : TERMINAL-STATE(NEXT-STATE(s)) Now suppose we take the top level assertion to be TERMINAL-STATE(C). If the transition matrix defined by the function NEXT-STATE is ergodic then the procedure TERMINAL-STATE terminates with probability 1. Furthermore, it has only two possible values — the constants A and B. We wish to compute the relative probabilities of these two possible outcomes. Assuming that calls to NEXT-STATE produce finite computation graphs, calls to FINAL-STATE also produce finite computation graphs. The graph consists, essentially, of assertions of the form NEXT-STATE(D) → E and FINAL-STATE(D) → A. We now give a general algorithm for computing value distributions for expressions with finite computation graphs. Now for each “edge” he0 , ρi → v in the computation graph we compute the probability of that edge, i.e., the probability that the evaluation of he0 , ρi gives value v. This is done with a numerical least fixed point calculation on the (finite) computation graph. More specifically, for each edge in the graph we define P 0 (he0 , ρi → v) to be zero. For each such edge we then compute P i+1 (w → v | ρ) using the equations of figure 1 with P replaced by P i+1 in the right hand side of each equation and P replaced with P i on the left hand side. The edge probability P (he0 , ρi → v) equals the limit as i → ∞ of P i (he0 , ρi → v). In practice the numerical computation can be terminated when the edge probabilities have stabilized. In the above example, this process will essentially compute all probabilities of the form P (NEXT-STATE(D) → E) and P (TERMINAL-STATE(D) → A).
7
Computing Value Distributions for Action Expressions
We now assume a given symbolic POSDP defined by a stochastic program P and assume a given action program Q. We will compute distributions of “values” for actions. For any state s and action expression a we define the computation graph of hs, ai to be the least set of assertions containing the following. First, we include all assertions of the form Eval(s0 , a0 , ρ) such that the running action a in state s has a nonzero probability of causing a0 to run in s0 under environment ρ. Second, we include all “edges” of the form hs0 , a0 , ρi → hs00 , wi such that the computation graph contains Eval(s0 , a0 , ρ) and there is a nonzero probability that running a0 in state s0 under environment ρ produces observation w and a state sequence ending in the state s00 . Third, we include the computation graph for all terms of the form h(v1 , . . . , vn , sn+1 ) such that the graph includes assertions of the form Eval(s1 , h{e1 , . . . , en }, ρ) and hs1 , e1 , ρi → hs2 , v1 i, . . ., hsn , en , ρi → hsn+1 , vn i. The computation graph for hs, ai can be computed with a bottom-up logic program for generating these assertions. We require that the computation graph for hs, ai be finite.
World-Modeling vs. World-Axiomatizing
387
If hs, ai has a finite computation graph then we can compute a probability for each edge hs0 , a0 , ρi → hs00 , wi using a numerical least fixed point calculation analogous to that in section 6.
8
Computing Asymptotic Average Reward
Finally, we define the computation graph for a policy g and initial state s0 to be the least set of assertions containing the following. First it contains the computation graph of the state-action pair hs0 , g(START)i. Second, if the graph contains an edge of the form hs, g(v), ∅i → hs0 , v 0 i then it also includes the computation graph for hs0 , g(v 0 )i. We require that the computation graph of hs0 , gi be finite. The edge probabilities for each edge in this graph can be computed using the the numerical least fixed point calculation mentioned in the previous section. Given a finite computation graph for hs0 , gi with computed edge probabilities we now compute two additional numbers for each edge hs, a, ρi → hs0 , wi. First we compute the expected time of the edge, i.e., the expected number of states in the state sequence generated by the execution of hs, a, ρi given that the execution produces hs0 , wi. Given the edge probabilities (which are all nonzero), the expected times of the edges can be computed using a numerical least fixed point calculation on the (finite) computation graph. Finally, for each edge we compute the expected total reward of that edge, i.e., the expected sum of the rewards for the states in the state sequence generated by the execution of hs, a, ρi given that the execution produces hs0 , wi. Given the edge probabilities, the expected rewards of the edges can again be computed by a numerical least fixed point calculation. Given the probability, expected time, and expected total reward of each edge we can compute the the asymptotic average reward as follows. We define S to be the set of pairs hs, vi such that the computation graph contains Eval(s, g(v), ∅). We define a probability transition matrix M on S where the probability of the transition from hs, vi to hs0 , v 0 i is the probability of the edge hs, g(v), ∅i → hs0 , v 0 i if the computation graph contains this edge and zero otherwise. Let D0 be the probability distribution on S concentrating all mass on the element hs0 , STARTi. Now define Di+1 to be (D0 + iDi M )/(i + 1). It is possible to show that the limit as i → ∞ of Di exists, is a stationary distribution of M , and equals the long-term distribution of the elements of S under the transitions defined by M (whether or not M is ergodic). We let D be this limit distribution. We let T be the average time per transition, i.e., the quantity X D(hs, vi)M (hs, vi, hs0 , v 0 i)T (hs, vi, hs0 , v 0 i) 0 0 hs, vi,hs , v i where T (hs, vi, hs0 , v 0 i) is the expected transition time for hs, g(v), ∅i → hs0 , v 0 i. Similarly, we define R to be the average transition reward, i.e., the quantity X D(hs, vi)M (hs, vi, hs0 , v 0 i)R(hs, vi, hs0 , v 0 i) hs, vi,hs0 , v 0 i
388
D. McAllester
where R(hs, vi, hs0 , v 0 i) is the expected transition reward for hs, g(v), ∅i → hs0 , v 0 i. The asymptotic average reward is now just R/T . To see this for the case where M is ergodic consider sampling an infinite run of the policy starting at state s0 . For any finite number k let R(k) be sum of the rewards up to time k and let n(k) be the number of top level iterations of the policy up to time k. We now have the following. lim
k→∞
9
limk→∞ R(k)/n(k) R(k) R(k)/n(k) = R/T = lim = k→∞ k k/n(k) limk→∞ k/n(k)
Conclusions
We have argued that world knowledge is often more usefully expressed as a world model rather than as world axioms. A particular formalism for expressing world models — symbolic POSDPs — has been defined, as well as a high level language for writing policies for these models. Finally, an algorithm has been given for computing asymptotic average reward. There are many directions for further research. It should be possible to give an algorithm for computing expected discounted reward or expected time to a goal state. It should also be possible to enrich the programming languages with types, exceptions, and concurrency. Finally, it should be possible write more sophisticated analysis algorithms such as algorithms for verifying the consistency of stochastic programs.
References 1. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and J. Hwang. Symbolic model checking:1020 states and beyond. In Proceedings of the Fifth Annual IEEE Symposium on Logic in Computer Science, June 1990. 2. J.Y. Halpern and M. Y. Vardi. Model checking vs. theorem proving – a manifesto. In V. Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation, pages 151–176. Academic Press, 1991. An Earlier version appeared in KR-91. 3. Gerard J. Holzmann. Design and Validation of Computer Protocols. Prentice Hall, 1991. 4. H.J. Levesque and R. Reiter. High-level robotic control: beyond planning. A position paper for AAAI 1998 Spring Symposium: Integrating Robotics Research: Taking the Next Big Leap. Available at http://www.cs.toronto.edu/cogrobo, 1998. 5. H.J. Levesque, R. Reiter, Y. Lesp´erance, F. Lin, and R. Scherl. Golog: A logic programming language for dynamic domains. Journal of Logic Programming, 31:59– 84, 1997. 6. David McAllester, Daphne Koller, and Avi Pfeffer. Effective bayesian inference for stochastic programs. In AAAI-97, 1997. 7. John McCarthy and Patrick Hayes. Some philosophical problems from the standpoint of artificial intelligence. In B. Meltzer and D. Michie, editors, machine intelligence 4, pages 463–502. Edinburgh University Press, 1969. 8. Kenneth L. McMillan. Symbolic Model Checking. Kluwer Academic, July 1993. 9. R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. AIJ, 1999. to appear.
Practical Nonmonotonic Reasoning: Extending Inheritance Techniques to Solve Real-World Problems Leora Morgenstern IBM T.J. Watson Research 30 Saw Mill River Drive Hawthorne, NY 10532 [email protected]
Despite the obvious relevance of plausible reasoning to real-world problem solving, nonmonotonic logics are rarely used in commercial applications or largescale commonsense reasoning systems. This is largely because few efficient algorithms and tools have thus far been developed. A notable exception is nonmonotonic inheritance, which provides a natural model for commonsense taxonomic reasoning, and for which low-order polynomial algorithms are available (Horty et al., 1990; Stein, 1992). However, inheritance is not sufficiently powerful to model the reasoning needed in many real-world applications. This talk discusses how the paradigm of nonmonotonic inheritance can be extended to a broader and more powerful kind of nonmonotonic reasoning. This is done by introducing formula-augmented semantic networks (FANs), semantic networks which attach well-formed formulae to nodes. The problem of inheriting well-formed formulae within this structure is explored, and an algorithm, based on selecting preferred maximal consistent subsets of wffs subject to various preference criteria, is given and discussed. We examine in detail several real-world problems which have been or can be solved using FANs. These include benefits inquiry in the medical insurance domain (Morgenstern, 1998; Morgenstern and Singh, 1997), rapid search of large knowledge bases for helpdesk applications (Hantler et al.), and legal reasoning in the tax domain using a combination of taxonomic and case-based reasoning techniques (Ashley et al.).
References 1. Ashley, K., Horty, J., Morgenstern, L, and Thomason, R.: work in progress. 2. Hantler, S., Laker, M., and Morgenstern, L.: work in progress. 3. Horty, J., Thomason, R., and Touretzky, D.: A skeptical theory of inheritance in nonmonotonic semantic networks, Artificial Intelligence 42 (1990): 311-349. 4. Morgenstern, L.: Inheritance comes of age: applying nonmonotonic techniques to problems in industry, Artificial Intelligence 103 (1998): 237-271. 5. Morgenstern, L. and Singh, M.: An expert system using nonmonotonic techniques for benefits inquiry in the insurance industry, Proceedings IJCAI-97, Morgan Kaufmann, San Francisco, 655-661, 1997. 6. Stein, L.: Resolving ambiguity in nonmonotonic inheritance hierarchies, Artificial Intelligence 55 (1992): 259-310. M. Gelfond, N. Leone, G. Pfeifer (Eds.): LPNMR ’99, LNAI 1730, p. 389–389, 1999. c Springer-Verlag Berlin Heidelberg 1999
Author Index Alferes, J. J. . . . . . . . . . . . . . . . . . . . 162 Antoniou, G. . . . . . . . . . . . . . . . . . . 347
Miller, R. . . . . . . . . . . . . . . . . . . . . . . . 78 Morgenstern, L. . . . . . . . . . . . . . . . 389
Billington, . . . . . . . . . . . . . . . . . . . . . 347
Niemel¨a, I. . . . . . . . . . . . . . . . . . . . . .317
Cadoli, M. . . . . . . . . . . . . . . . . . . . . . 372 Cenzer, D. . . . . . . . . . . . . . . . . . . . . . . 34 Cui, B. . . . . . . . . . . . . . . . . . . . . . . . . 206
Pereira, L. M. . . . . . . . . . . . . . 162, 262 Pfeifer, G. . . . . . . . . . . . . . . . . . . . . . 177 Pivkina, I. . . . . . . . . . . . . . . . . . . . . . . 49 Przymusinska, H. . . . . . . . . . . . . . . 162 Przymusinski, T. . . . . . . . . . . . . . . 162
Dam´asio, C. V. . . . . . . . . . . . . . . . . 262 Dekhtyar, M. . . . . . . . . . . . . . . . . . . 132 De Vos, M. . . . . . . . . . . . . . . . . . . . . 236 Dikovsky, A. . . . . . . . . . . . . . . . . . . . 132 Dudakov, S. . . . . . . . . . . . . . . . . . . . 132 Erdem, E. . . . . . . . . . . . . . . . . . . . . . 107 Faber, W. . . . . . . . . . . . . . . . . . . . . . 177 Gottlob, G. . . . . . . . . . . . . . . . . . . . . . . 1 Governatori, G. . . . . . . . . . . . . . . . . 347 Greco, S. . . . . . . . . . . . . . . . . . . . . . . 221 Hitzler, P. . . . . . . . . . . . . . . . . . . . . . 357 Inoue, K. . . . . . . . . . . . . . . . . . . . . . . 147 Janhunen, T. . . . . . . . . . . . . . . . . . . . 19 Kakas, A. . . . . . . . . . . . . . . . . . . . . . . . 78 Leone, N. . . . . . . . . . . . . . . . . . . . . . . 177 Lifschitz, V. . . . . . . . . . . . 92, 107, 373 Lin, F. . . . . . . . . . . . . . . . . . . . . . . . . .117 Linke, T. . . . . . . . . . . . . . . . . . . . . . . 247 Lukasiewicz, T. . . . . . . . . . . . . . . . . 277 Maher, M. J. . . . . . . . . . . . . . . . . . . 347 Marek, V. . . . . . . . . . . . . . . . . . . . . . . 49 Mateis, C. . . . . . . . . . . . . . . . . . . . . . 290 McAllester, D. . . . . . . . . . . . . . . . . . 375
Remmel, J. B. . . . . . . . . . . . . . . . . . . 34 Rosati, R. . . . . . . . . . . . . . . . . . . . . . 332 Sakama, C. . . . . . . . . . . . . . . . . . . . . 147 Scarcello, F. . . . . . . . . . . . . . . . . . . . . . 1 Schaub, T. . . . . . . . . . . . . . . . . . . . . .247 Seda, A. K. . . . . . . . . . . . . . . . . . . . . 357 Sefranek, J. . . . . . . . . . . . . . . . . . . . . . 63 Shen, Y. . . . . . . . . . . . . . . . . . . . . . . . 192 Sideri, M. . . . . . . . . . . . . . . . . . . . . . . . . 1 Simons, P. . . . . . . . . . . . . . . . . 305, 317 Soininen, T. . . . . . . . . . . . . . . . . . . . 317 Spyratos, N. . . . . . . . . . . . . . . . . . . . 132 Swift, T. . . . . . . . . . . . . . . . . . . 206, 262 Toni, F. . . . . . . . . . . . . . . . . . . . . . . . . .78 Truszczy´ nski, M. . . . . . . . . . . . . . . . . 49 Turner, H. . . . . . . . . . . . . . . . . . . . . . . 92 Vanderbilt, A. . . . . . . . . . . . . . . . . . . 34 Vermeir, D. . . . . . . . . . . . . . . . . . . . . 236 Wang, K. . . . . . . . . . . . . . . . . . . . . . . 117 Warren, D. S. . . . . . . . . . . . . . . . . . . 206 Yuan, L. . . . . . . . . . . . . . . . . . . . . . . . 192 You, J. . . . . . . . . . . . . . . . . . . . . . . . . 192 Zhou, N. . . . . . . . . . . . . . . . . . . . . . . . 192