254 54 12MB
English Pages 207 [215] Year 2012
Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany
LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany
7205
Madalina Croitoru Sebastian Rudolph Nic Wilson John Howse Olivier Corby (Eds.)
Graph Structures for Knowledge Representation and Reasoning Second International Workshop, GKR 2011 Barcelona, Spain, July 16, 2011 Revised Selected Papers
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Madalina Croitoru University Montpellier II, France E-mail: [email protected] Sebastian Rudolph Karlsruher Institut für Technologie, Germany E-mail: [email protected] Nic Wilson University College Cork, Ireland E-mail: [email protected] John Howse University of Brighton, UK E-mail: [email protected] Olivier Corby INRIA, Sophia Antipolis, France E-mail: [email protected] ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-29448-8 e-ISBN 978-3-642-29449-5 DOI 10.1007/978-3-642-29449-5 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2012934971 CR Subject Classification (1998): I.2, F.4.1, F.1, H.3, F.2, F.3 LNCS Sublibrary: SL 7 – Artificial Intelligence © Springer-Verlag Berlin Heidelberg 2012 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The development of effective techniques for knowledge representation and reasoning (KRR) is a crucial aspect of successful intelligent systems. Different representation paradigms, as well as their use in dedicated reasoning systems, have been extensively studied in the past. Nevertheless, new challenges, problems, and issues have emerged in the context of knowledge representation in artificial intelligence, involving the logical manipulation of increasingly large information sets (as, for example, in the Semantic Web, bioinformatics, and various other areas). Improvements in storage capacity and performance of computing infrastructure have also affected the nature of KRR systems, shifting their focus toward higher representational power and scalability. Consequently, KRR research is facing the challenge of developing knowledge representation structures optimized for large-scale reasoning. This new generation of KRR systems includes graph-based knowledge representation formalisms such as Bayesian networks, semantic networks, conceptual graphs, formal concept analysis, CP nets, GAI nets, all of which have been successfully used in numerous applications. The goal of the GKR workshop series is to bring together researchers involved in the development and application of graph-based knowledge representation formalisms and reasoning techniques. The First International Workshop on Graph Structures for Knowledge Representation and Reasoning was held at IJCAI 2009 and well received by the community. This volume contains the papers presented at GKR 2011, the Second International Workshop on Graph Structures for Knowledge Representation and Reasoning held at IJCAI 2011 on July 16, 2011, in Barcelona, Spain. We received 12 submissions, each of which was reviewed by at least 3 Program Committee members. The committee decided to accept seven papers. In addition, the proceedings feature one invited paper. We wish to thank the organizers of IJCAI 2011 who made this workshop possible by hosting it and providing all the necessary facilities. We thank the members of the Program Committee and the additional reviewers for their thorough work which helped to ensure the workshop’s high-quality standards. February 2012
Madalina Croitoru Sebastian Rudolph Nic Wilson John Howse Olivier Corby
Organization
Workshop Chairs Madalina Croitoru Sebastian Rudolph Nic Wilson John Howse Olivier Corby
Universit´e Montpellier II, France Karlsruher Institut f¨ ur Technologie, Germany University College Cork, Ireland University of Brighton, UK INRIA, France
Program Committee Jean-Fran¸cois Baget Olivier Corby Cornelius Croitoru Madalina Croitoru Catherine Faron-Zucker Fabien Gandon Christophe Gonzales Tarik Hadzic John Howse Hamamache Keheddouci J´erˆome Lang Michel Lecl`ere Federica Mandreoli Radu Marinescu Marie-Laure Mugnier Wilma Penzo Sebastian Rudolph Eric Salvat Gem Stapleton Rallou Thomopoulos Nic Wilson
Additional Reviewers Peter Chapman Anika Schumann Mohammed Amin Tahraoui Pierre-Henri Wuillemin
INRIA, France INRIA, France Universitatea AI.I. Cuza, Romania Universit´e Montpellier II, France Universit´e de Nice Sophia Antipolis, France INRIA, France Universit´e Pierre et Marie Curie, France United Technologies Research Center, Ireland University of Brighton, UK Universit´e Claude Bernard Lyon 1, France Universit´e Paris Dauphine/CNRS, France Universit´e Montpellier II, France Universit` a di Modena e Reggio Emilia, Italy IBM, Ireland Universit´e Montpellier II, France Universit` a di Bologna, Italy Karlsruher Institut f¨ ur Technologie, Germany IMERIR, France University of Brighton, UK INRA/IATE, France University College Cork, Ireland
Table of Contents
Local Characterizations of Causal Bayesian Networks . . . . . . . . . . . . . . . . . Elias Bareinboim, Carlos Brito, and Judea Pearl
1
Boolean Formulas of Simple Conceptual Graphs (SGBF) . . . . . . . . . . . . . Olivier Carloni
18
Conflict, Consistency and Truth-Dependencies in Graph Representations of Answer Set Logic Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefania Costantini and Alessandro Provetti
68
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natalia Flerova, Emma Rollon, and Rina Dechter
91
Supporting Argumentation Systems by Graph Representation and Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J´erˆ ome Fortin, Rallou Thomopoulos, Jean-R´emi Bourguet, and Marie-Laure Mugnier
119
Representing CSPs with Set-Labeled Diagrams: A Compilation Map . . . . Alexandre Niveau, H´el`ene Fargier, and C´edric Pralet
137
A Semantic Web Interface Using Patterns: The SWIP System . . . . . . . . . Camille Pradel, Ollivier Haemmerl´ e, and Nathalie Hernandez
172
Visually Interacting with a Knowledge Base Using Frames, Logic, and Propositional Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel R. Schlegel and Stuart C. Shapiro
188
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
209
Local Characterizations of Causal Bayesian Networks⋆ Elias Bareinboim1, Carlos Brito2 , and Judea Pearl1 1
Cognitive Systems Laboratory Computer Science Department University of California Los Angeles CA 90095 {eb,judea}@cs.ucla.edu 2 Computer Science Department Federal University of Cear´a [email protected]
Abstract. The standard definition of causal Bayesian networks (CBNs) invokes a global condition according to which the distribution resulting from any intervention can be decomposed into a truncated product dictated by its respective mutilated subgraph. We analyze alternative formulations which emphasizes local aspects of the causal process and can serve therefore as more meaningful criteria for coherence testing and network construction. We first examine a definition based on “modularity” and prove its equivalence to the global definition. We then introduce two new definitions, the first interprets the missing edges in the graph, and the second interprets “zero direct effect” (i.e., ceteris paribus). We show that these formulations are equivalent but carry different semantic content.
1 Introduction Nowadays, graphical models are standard tools for encoding probabilistic and causal information (Pearl, 1988; Spirtes et al., 1993; Heckerman and Shachter, 1995; Lauritzen, 1999; Pearl, 2000; Dawid, 2001; Koller and Friedman, 2009). One of the most popular representations is a causal Bayesian network, namely, a directed acyclic graph (DAG) G which, in addition to the traditional conditional independencies also conveys causal information, and permits one to infer the effects of interventions. Specifically, if an external intervention fixes any set X of variables to some constant x, the DAG permits us to infer the resulting post-intervention distribution, denoted by P x (v), 1 from the pre-intervention distribution P (v). The standard reading of post-interventional probabilities invokes a mutilation of the DAG G , cutting off incoming arrows to the manipulated variables and leads to a “truncated product” formula (Pearl, 1993), also known as “manipulation theorem” (Spirtes et al., 1993) and “G-computation formula” (Robins, 1986). A local characterization of CBNs invoking the notion of modularity was presented in ⋆
1
This work was supported in parts by National Institutes of Health #1R01 LM009961-01, National Science Foundation #IIS-0914211 and #IIS-1018922, and Office of Naval Research #N000-14-09-1-0665 and #N00014-10-1-0933. (Pearl, 2000) used the notation P (v | set(t)), P (v | do(t)), or P (v | ˆt) for the postintervention distribution, while (Lauritzen, 1999) used P (v t).
M. Croitoru et al. (Eds.): GKR 2011, LNAI 7205, pp. 1–17, 2012. c Springer-Verlag Berlin Heidelberg 2012
2
E. Bareinboim, C. Brito, and J. Pearl
(Pearl, 2000, p.24) and will be shown here to imply as well as to be implied by the truncated product formula. This characterization requires the network builder to judge whether the conditional probability P (Y | PAy ) for each parents-child family remains invariant under interventions outside this family. Whereas the “truncated product” formula computes post-intervention from pre-intervention probabilities, given a correctly specified CBN, the local condition assists the model builder in constructing a correctly specified CBN. It instructs the modeller to focus on each parent-child family separately and judge whether the parent set is sufficiently rich so as to “shield” the child variable from “foreign” interventions. A second type of local characterization treated in this paper gives causal meaning to individual arrows in the graph or, more accurately, to its missing arrows. These conditions instruct the modeller to focus on non-adjacent pairs of nodes in the DAG and judge whether it is justified to assume that there is no (direct) causal effect between the corresponding variables. Two such conditions are formulated; the first requires that any variable be ”shielded” from the combined influence of its non-neighbours once we hold its parents constant; the second requires that for every non-adjacent pair in the graph, one of the variables in the pair to be “shielded” from the influence of the other, holding every other variable constant (ceteris paribus). From a philosophical perspective, these characterizations define the empirical content of a CBN since, in principle, each of these assumptions can be tested by controlled experiments and, if any fails, we know that the DAG structure is not a faithful representation of the causal forces in the domain, and will fail to correctly predict the effects of some interventions. From a practical viewpoint, however, the main utility of the conditions established in this paper lies in their guide to model builders, for they permit the modeller to focus judgement on local aspects of the the model and ensure that the sum total of those judgements be consistent with one’s knowledge and all predictions, likewise, will cohere with that knowledge. In several ways the conditions introduced in this paper echo the global, local, and pairwise conditions that characterize directed Markov random fields (Pearl, 1988; Lauritzen, 1996). The global condition requires that every d-separation condition in the DAG be confirmed by a corresponding conditional independence condition in the probability distribution. The local Markov condition requires that every variable be independent of its non descendants, conditional on its parents. Finally, the pairwise condition requires that every pair of variables be independent conditional on all other variables in the graph. The equivalence of the three conditions has been established by several authors (Pearl and Verma, 1987; Pearl, 1988; Geiger et al., 1990; Lauritzen, 1996). Our characterization will differ of course in its semantics, since our notion of “dependence” is causal; it is similar nevertheless in its attempt to replace global with local conditions for the sake of facilitating judgement of coherence. (Tian and Pearl, 2002) provides another characterization of causal Bayesian networks with respect to three norms of coherence called Effectiveness, Markov and Recursiveness, and showed their use in learning and identification when the causal graph is not known in advance. This characterization relies on equalities among products of probabilities under different interventions and lacks therefore the qualitative guidance needed for constructing the network.
Local Characterizations of Causal Bayesian Networks
3
The rest of the paper is organized as follows. In Section 2, we introduce the basic concepts, and present the standard global and local definitions of CBNs together with discussion of their features. In Section 3, we prove the equivalence between these two definitions. In Section 4, we introduce two new definitions which explicitly interpret the missing links in the graph as representing absence of causal influence. In Section 5, we prove the equivalence between these definitions and the previous ones. Finally, we provide concluding remarks in Section 6.
2 Causal Bayesian Networks and Interventions The notion of intervention and causality are tightly connected. Interventions are usually interpreted as an external agent setting a variable to a certain level (e.g., treatment), which contrasts with an agent just passively observing variables’ levels. The dichotomy between observing and intervening is extensively studied in the literature [Pearl, 1994; Lindley, 2002; Pearl, 2009, pp. 384-387] , and one example of its utilization is in the context of randomized clinical trials. It is known that performing the trial (intervening), and then collecting the underlying data is equivalent to applying the treatment uniformly over the entire population. This class of experiments is entirely different from simply collecting passive census data, from which no causal information can be obtained. The concept of intervention precedes any graphical notion. We consider here the most elementary kind of intervention, that is, the atomic one, where a set X of variables is fixed to some constant X = x. All the probabilistic and causal information about a set of variables V is encoded in a collection of interventional distributions over V, of which the distribution associated with no intervention (also called pre-intervention or observational distribution) is a special case. Definition 1 (Set of interventional distributions). Let P (v) be a probability distribution over a set V of variables, and let Px (v) denote the distribution resulting from the intervention do(X = x) that sets a subset X of variables to constant x. Denote by P∗ the set of all interventional distributions Px (v), X ⊆ V , including P (v), which represents no intervention (i.e., X = ∅). We assume that P∗ satisfies the following condition for all X ⊆ V: i. [Effectiveness] Px (vi ) = 1, for all Vi ∈ X whenever vi is consistent with X = x; The space of all interventional distributions can be arbitrary large and complex, therefore we seek formal schemes to parsimoniously represent the set of such distributions without being required to explicitly list all of them. It is remarkable that a single graph can represent the sum total all interventional distributions in such a compact and convenient way. This compactness however means that the interventional distributions are not arbitrary but highly structured. In other words, they are constrained by one another through a set of constraints that forces one interventional distribution to share properties with another. Our goal is to find meaningful and economical representations of these constraints by identifying their ”basis”, namely, a minimal set of constraints that imply all the others.
4
E. Bareinboim, C. Brito, and J. Pearl
This exercise is similar in many ways to the one conducted in the 1980’s on ordinary Bayes netwroks (Pearl, 1988) where an economical basis was sought for the set of observational distributions represented in a DAG. Moving from probabilistic to causal Bayesian network will entail encoding of both probabilistic and interventional information by a single basis. Formally, a causal Bayesian network (also known as a Markovian model) consists of two mathematical objects: (i) a DAG G, called a causal graph, over a set V = {V1 , ..., Vn } of vertices, and (ii) a probability distribution P (v), over the set V of discrete variables that correspond to the vertices in G. The interpretation of the underlying graph has two components, one probabilistic and another causal, and we discuss in turn global and local characterizations of these two aspects/components. 2.1 Global Characterization We begin by reviewing the global conditions that provide an interpretation for the causal Bayesian networks. 2 The probabilistic interpretation specifies that the full joint distribution is given by the product P (vi | pai ) (1) P (v) = i
where pai are (assignments of values to) the parents of variables Vi in G. The causal interpretation is based on a global compatibility condition, which makes explicit the joint post-intervention distribution under any arbitrary intervention, and makes a parallel to the full factorization of the (pre-interventional) probabilistic interpretation. This condition states that any intervention is associated with the removal of the terms corresponding to the variables under intervention, reducing the product given by the expression in eq. (1) to the so called “truncated product” formula. This operation is formalized in the following definition. Definition 2 (Global causal condition (Pearl, 2000)). A DAG G is said to be globally compatible with a set of interventional distributions P∗ if and only if the distribution Px (v) resulting from the intervention do(X = x) is given by the following expression: {i|Vi ∈X} P (vi | pai ) v consistent with x. Px (v) = (2) 0 otherwise. Eq. (2) is known as the truncated factorization product, since it has the factors corresponding to the manipulated variables “removed”. This formula can also be found in the literature under the name of “manipulation theorem” (Spirtes et al., 1993) and is implicit in the “G-computation formula” (Robins, 1986). Even when the graph is not available in its entirety, knowledge of the parents of each manipulated variable is sufficient for computing post-intervention from the preintervention distributions. 2
A more refined interpretation, called functional, is also common (Pearl, 2000), which, in addition to interventions, supports counterfactual readings. The functional interpretation assumes deterministic functional relationships between variables in the model, some of which may be unobserved. Complete axiomatizations of deterministic counterfactual relations are given in (Galles and Pearl, 1998; Halpern, 1998).
Local Characterizations of Causal Bayesian Networks
5
2.2 Local Characterization The truncated product is effective in computing post-interventional distributions but offers little help in the process of constructing the causal graph from judgemental knowledge. We next present a characterization that explicates a set of local assumptions leading to the global condition. Since the two definitions are syntactically very different, it is required to prove that they are (logically) equivalent. The local characterization of causal Bayesian networks also consists of a DAG G and a probability distribution over V, and the probabilistic interpretation (Pearl, 1988) in this characterization views G as representing conditional independence restrictions on P : each variable is independent of all its non-descendants given its parents in the graph. This property is known as the Markov condition, and can characterize the Bayesian network absent of any causal reading. Interestingly, the collection of independences assertions formed in this way suffices to derive the global assertion in eq. (1), and vice versa. Worth to remark that this local characterization is most useful in constructing Bayesian networks, because selecting as parents the “direct causes” of a given variable automatically satisfies the local conditional independence conditions. On the other hand, the (probabilistic) global semantics leads directly to a variety of algorithms for reasoning. More interestingly, the arrows in the graph G can be viewed as representing potential causal influences between the corresponding variables, and the factorization of eq. (1) still holds, but now the factors are further assumed to represent autonomous data-generation processes. That is, each family conditional probability P (vi | pai ) represents a stochastic process by which the values of Vi are assigned in response to the values pai (previously chosen for Vi ’s parents), and the stochastic variation of this assignment is assumed independent of the variations in all other assignments in the model. This interpretation implies all conditional independence relations of the graph (dictated by Markov), and follows from two facts: (1) when we fix all parents, the only source of randomness for each variable is the stochastic variation pointing to the nodes3 ; (2) the stochastic variations are independent among themselves, which implies that each variable is independent of all its non-descendents. This fact together with the additional assumption known as modularity, i.e., each assignment process remains invariant to possible changes in the assignments processes that govern other variables in the system, enable us to predict the effects of interventions, whenever interventions are described as specific modification of some factors in the product of eq. (1). Note that the truncated factorization of the global definition follows trivially from this interpretation, because assuming modularity the post-intervention probabilities P (vi | pai ) corresponding to variables in X are either 1 or 0, while those corresponding to unmanipulated variables remain unaltered.4 3 4
In the structural interpretation, they are represented by the error terms (Pearl, 2000, Ch. 7). In the literature, the other side of the implication is implicitly assumed to hold, but it is not immediately obvious, and it is object of our formal analysis in the next section.
6
E. Bareinboim, C. Brito, and J. Pearl
In order to formally capture the idea of invariance of the autonomous mechanism for each family entailed by the local characterization, the following definition encodes such feature facilitating subsequent discussions. Definition 3 (Conditional invariance (CInv)). We say that Y is conditionally invariant with respect to X given Z, denoted (Y ⊥⊥ci X | Z)P∗ , if intervening on X does not change the conditional distribution of Y given Z = z, i.e., ∀x, y, z, Px (y | z) = P (y | z). We view CInv relations as the causal image of conditional independence (or simply CInd) relations, and a causal Baysian network as as representing both. Recast in terms of conditional invariance, (Pearl, 2000) proposed the following local definition of causal Bayesian networks: Definition 4 (Modular causal condition ⌊⌈Pearl, 2000, p.24⌉ ⌋). A DAG G is said to be locally compatible with a set of interventional distributions P∗ if and only if the following conditions hold for every Px ∈ P∗ : i. [Markov] Px (v) is Markov relative to G; / X whenever pai is consistent with ii. [Modularity] (Vi ⊥⊥ci X | PAi )P∗ , for all Vi ∈ X = x. 5 In summary, the two definitions of CBNs emphasize different aspects of the causal model; Definition 4 ensures that each conditional probability P (vi | pai ) (locally) remains invariant under interventions that do not include directly Vi , while Definition 2 ensures that each manipulated variable is not influenced by its previous parents (before the manipulation), and every other variable is governed by its pre-interventional process. Because the latter invokes theoretical conditions on the data-generating process, it is not directly testable, and the question whether a given implemented intervention conforms to an investigator’s intention (e.g., no side effects) is discernible only through the testable properties of the truncated product formula (2). Definition 4 provides in essence a series of local tests for Eq. (2), and the equivalence between the two (Theorem 1 below) ensures that all empirically testable properties of Eq. (2) are covered by the local tests provided by Definition 4. 2.3 Example Figure 1 illustrates a simple yet typical causal Bayesian network. It describes the causal relationships among the season of the year (X1 ), whether it is raining (X2 ), whether the sprinkler is on (X3 ), whether the pavement is wet (X4 ), and whether the pavement is slippery (X5 ). In the probabilistic interpretation given by the global definition, we can also use eq. (1) and write the full joint distribution: P (x1 , x2 , x3 , x4 , x5 ) = P (x1 )P (x2 | x1 )P (x3 | x1 )P (x4 | x2 , x3 )P (x5 | x4 ) (3) 5
Explicitly, modularity states: P (vi |pai , do(s)) = P (vi |pai ) for any set S of variables disjoint of {Vi , PAi }.
Local Characterizations of Causal Bayesian Networks
7
Fig. 1. A causal Bayesian network representing influence among five variables
Equivalently, the probabilistic interpretation entailed by the modular characterization induces the joint distribution respecting the constraints of conditional independences entailed by the graph through the underlying families. For example, P (x4 | x2 , x3 ) is the probability of wetness given the values of sprinkler and rain, and it is independent of the value of season. Nevertheless, both probabilistic interpretations say nothing about what will happen if a certain intervention occurs – i.e., a certain agent interact with the system and externally change the value of a certain variable (also known as action). For example, what if I turn the sprinkler on? What effect does that have on the season, or on the connection between wetness and slipperness? The causal interpretation, intuitively speaking, adds the idea that whenever the sprinkler node is set to X3 = on, so the event (X3 = on) has all mass of probability, which is clearly equivalent to as if the causal link between the season X1 and the sprinkler X3 is removed6. Assuming that all other causal links and conditional probabilities remain intact in the model, which is the less intrusive possible assumption to make, the new model that generates the process is given by the equation: P (x1 , x2 , x4 , x5 | do(X3 = x3 )) = P (x1 )P (x2 | x1 )P (x4 | x2 , X3 = on)P (x5 | x4 ) where we informally demonstrate the semantic content of the do operator (also known as the interventional operator). As another point, consider the problem of inferring the causal structure with two variables such that V = {F, S}, and in which F stands for “Fire”, and S stands for “Smoke”. If we consider only the probabilistic interpretation, both structures, G1 = {F → S} and G2 = {S → F }, are equivalent, and both networks are equally capable of representing any joint distribution over these two variables. The global interpretation is hard to apply in this construction stage, but the modular interpretation is useful here. To see why, the definition helps one in choosing the causal network G1 over G2 , because they encode different mechanisms, and so formally different responses under intervention – notice that there is a directed edge from S to F in G2 , but not in G1 . The modular condition as a collection of autonomous mechanisms that may be reconfigured 6
This can be shown more formally without difficulties.
8
E. Bareinboim, C. Brito, and J. Pearl
locally by interventions, with the correspondingly local changes in the model, rejects the second network G2 based on our understanding of the world. (A more transparent reasoning that makes us to prefer structure G1 over G2 should be even clearer when we discuss about missing-links in Section 4. )
3 The Equivalence between the Local and Global Definitions We prove next that the local and global definitions of causal Bayesian networks are equivalent. To the best of our knowledge, the proof of equivalence has not been published before. Theorem 1 (Equivalence between local and global compatibility). Let G be a DAG and P∗ a set of interventional distributions, the following statements are equivalent: i. G is locally compatible with P∗ ii. G is globally compatible with P∗ Proof. (Definition 4 ⇒ Definition 2) Given an intervention do(X = x), X ⊆ V, assume that conditions 4:(i-ii) are satisfied. For any arbitrary instantiation v of variables V, consistent with X = x, we can express Px (v) as Px (v)
def.4:(i)
=
Px (vi | pai )
i
=
Px (vi | pai )
=
Px (vi | pai )
P (vi | pai )
Px (vi | pai )
{i|vi ∈X} /
{i|vi ∈X} effectiveness
{i|vi ∈X} / def.4:(ii)
=
(4)
{i|vi ∈X} /
which is the truncated product as desired. (Definition 2 ⇒ Definition 4) We assume that the truncated factorization holds, i.e., the distribution Px (v) resulting from any intervention do(X = x) can be computed as eq. (2). To prove effectiveness, consider an intervention do(X = x), and let vi ∈ X. Let Dom(vi ) = {vi1 , vi2 , ..., vim } be the domain of variable Vi , with only one of those values consistent with X = x. Since Px (v) is a probability distribution, we must have j Px (Vi = vij ) = 1. According to eq. (2), all terms not consistent with X = x have probability zero, and thus we obtain Px (vi ) = 1, vi consistent with X = x. To show Definition 4:(ii), we consider an ordering π : (v1 , ..., vn ) of the variables, consistentwith the graph G induced by the truncated factorization with no intervention P (v) = i P (vi | pai ). Now, given an intervention do(X = x) Px (vi | pai ) =
Px (vi , pai ) Px (pai )
Local Characterizations of Causal Bayesian Networks marginal.
v ∈{V / i ,PAi }
v ∈{V / i ,PAi ,X}
j
=
eq.(2)
vj ∈{PA / i}
j
=
Px (v)
Px (v)
vj ∈{PA / i ,X}
P (vi | pai ) ×
=
vk ∈X /
vk ∈X /
vj ∈{V / i ,PAi ,X}
9
vj ∈{PA / i ,X}
P (vk | pak )
P (vk | pak )
v ∈X,k / =i
k
vk ∈X /
P (vk | pak )
P (vk | pak ) (5)
The last step is due to the fact that variables in {Vi , PAi } do not appear in the summations in the numerator. Rewriting the numerator, breaking it in relation to variables before and after vi , we obtain P (vk | pak ) = / vj ∈{V / i ,PAi ,X} vk ∈X k=i
P (vk | pak )
P (vk | pak )
vk ∈X / vj ∈X /
/ vj ∈{PA / i ,X} vk ∈X
j>i k>i
k Vi appear in the summa-
k>i
tion. Thus, we obtain
P (vk | pak ) =
vj ∈{V / i ,PAi ,X} vk ∈X /
vj ∈{PA / i ,X}
j b holds (i.e., a is better than b), if there exists c ∈ A such that a = b ⊕ c. We denote by DY the set of tuples over a subset of variables Y, also called the domain of Y. Functions are defined on subsets of variables of X, called scopes, and their range is a set of valuations A. If f : DY → A is a function, the scope of f , denoted var(f ), is Y. In the following, we will use Df as a shorthand for Dvar(f ) . Definition 2 (combination operator). Let f : Df → A and g : Dg → A be two functions. Their combination, noted f g is a new function with scope var(f ) ∪ var(g), s.t. ∀t ∈ Dvar(f )∪var(g) , (f g)(t) = f (t) ⊗ g(t) Definition 3 (marginalization operator). Let f : Df → A be a function and W ⊆ X be a set of variables. The marginalization of f over W, noted ⇓W f , is a function whose scope is var(f ) − W, s.t. ∀t ∈ Dvar(f )−W , (⇓W f )(t) = ⊕t′ ∈DW (t · t′ ) Example 1. Consider three variables X1 , X2 and X3 with domains D1 = D2 = D3 = {1, 2, 3}. Let f (X1 , X2 ) = X1 X2 and g(X2 , X3 ) = 2X2 + X3 be two functions. If the combination operator is product (i.e., ×), then (f ⊗ g)(X1 , X2 , X3 ) = (f × g)(X1 , X2 , X3 ) = X1 X2 × (2X2 + X3 ). If the marginalization operator is max, then (f ⇓X1 )(X2 ) = max{f (X1 = 1, X2 ), f (X1 = 2, X2 ), f (X1 = 3, X2 )} = max {1X2 , 2X2 , 3X2 } = 3X2 .
94
N. Flerova, E. Rollon, and R. Dechter
2.2 Graphical Model Definition 4 (graphical model). A graphical model is a tuple M = (X, D, A, F, ), where: X = {X1 , . . . , Xn } is a set of variables; D = {D1 , . . . , Dn } is the set of their finite domains of values; A is a set of valuations (A, ⊗, ⊕); F = {f1 , . . . , fr } is is a set of discrete functions, where var(fj ) ⊆ X and fj : Dfj → A; and the combination operator over functions (see Definition 2). The graphical model M represents the function C(X) = f ∈F f . Definition 5 (reasoning task). A reasoning task is a tuple P = (X, D, A, F, , ⇓), where (X, D, A, F, ) is a graphical model and ⇓ is a marginalization operator (see Definition 3). The reasoning task is to compute ⇓X C(X). For a reasoning task M = (X, D, A, F, , ⇓) the choice of (A, ⊗, ⊕) determines the combination and marginalization ⇓ operators over functions, and thus the nature of the graphical model and its reasoning task. For example, if A is the set of non-negative reals and is product, the graphical model is a Markov network or a Bayesian network. If ⇓ is max, the task is to compute the Most Probable Explanation (MPE), while if ⇓ is sum, the task is to compute the Probability of the Evidence. The correctness of the algorithmic techniques for computing a given reasoning task relies on the properties of its valuation structure. In this paper we consider reasoning tasks P = (X, D, A, F, , ⇓) such that their valuation structure (A, ⊗, ⊕) is a semiring. Several works [22,1,17] showed that the correctness of inference algorithms over a reasoning task P is ensured whenever P is defined over a semiring. Example 2. MPE task is defined over semiring K = (R, ×, max), a CSP is defined over semiring K = ({0, 1}, ∧, ∨), and a Weighted CSP is defined over semiring K = (N ∪ {∞}, +, min). The task of computing the Probability of the Evidence is defined over semiring K = (R, ×, +). 2.3 Bucket and Mini-bucket Elimination Bucket Elimination (BE) [8] (see Algorithm 1) is a well-known inference algorithm that generalizes dynamic programming for many reasoning tasks. The input of BE is a reasoning task P = (X, D, A, F, , ⇓) and an ordering o = (X1 , X2 , . . . , Xn ), dictating an elimination order for BE, from last to first. Each function from F is placed in the bucket of its latest variable in o. The algorithm the buckets from Xn to X1 , processes n computing for each bucket Xi , noted Bi , ⇓Xi j=1 λ j , where λ j ’s are the functions in the Bi , some of which are original fi ’s and some are earlier computed messages. The result of the computation is a new function, also called message, that is placed in the bucket of its latest variable in the ordering o. Depending on the particular instantiation of the combination and marginalization operators BE solves a particular reasoning task. For example, algorithm elim-opt, which solves the optimization task, is obtained by substitution of the operators ⇓X f = maxS−X f and j = j . The message passing between buckets follows a bucket-tree structure.
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
95
Algorithm 1. Bucket elimination Input: A reasoning task P = (X, D, A, F, , ⇓); An ordering of variables o = {X1 , . . . , Xn }; Output: A zero-arity function λ1 : ∅ → A containing the solution of the reasoning task. 1: Initialize: Generate an ordered partition of functions in buckets B1 , . . . , Bn , where Bi contains all the functions whose highest variable in their scope is Xi . 2: Backward: 3: for i ← n down to 1 do 4: Generate λi = ( f ∈Bi f ) ⇓X i 5: Place λi in the bucket Bj where j is the largest-index variable in var(λi ) 6: end for 7: Return: λ1
Definition 6 (bucket tree). Bucket Elimination defines a bucket tree, where a node of the bucket is associated with its bucket variable and the bucket of each Xi is linked to the destination bucket of its message (called the parent bucket). The complexity of Bucket Elimination can be bounded using the graph parameter of induced width. Definition 7 (induced graph, induced width). The induced graph of a graphical model relative to ordering o is an undirected graph that has variables as its vertices. The edges of the graph are added by: 1) connecting all variables that are in the scope of the same function, 2) processing nodes from last to first in o, recursively connecting preceding neighbors of each node. The induced width w (o ) relative to ordering o is the maximum number of preceding neighbors across all variables in the induced graph. The induced width of the graphical model w ∗ is the minimum induced width of all orderings. It is also known as the treewidth of the graph. Theorem 1 (BE correctness and complexity). [8] Given a reasoning task P defined over a semiring (A, ⊗, ⊕), BE is sound and complete. Given an ordering o, the time and space complexity of BE(P ) is exponential in the induced width of the ordering. Mini-bucket Elimination (MBE) [11] (see Algorithm 2) is an approximation designed to avoid the space and time complexity of BE. Consider a bucket Bi and an integer bounding parameter z . MBE creates a z -partition Q = {Q1 , ..., Qp } of Bi , where each set Qj ∈ Q, called mini-bucket, includes no more than z variables. Then, each mini-bucket is processed separately, thus computing a set of messages {λij }pj=1 , where λij =⇓Xi ( f ∈Qj f ). In general, greater values of z increase the quality of the bound.
Theorem 2. [11] Given a reasoning task P , MBE computes a bound on P . Given an integer control parameter z, the time and space complexity of MBE is exponential in z.
3 M-best Optimization Task In this section we formally define the problem of finding a set of best solutions over an optimization task. We consider optimization tasks defined over a set of totally
96
N. Flerova, E. Rollon, and R. Dechter
Algorithm 2. Mini-Bucket elimination Input: A reasoning task P = (X, D, A, F, , ⇓); An integer parameter z; An ordering of variables o = {X1 , . . . , Xn }; Output: A bound on the solution of the reasoning task. 1: Initialize: Generate an ordered partition of functions in buckets B1 , . . . , Bn , where Bi contains all the functions whose highest variable in their scope is Xi . 2: Backward: 3: for i ← n down to 1 do 4: {Q1 , . . . , Qp } ← Partition(Bi ,z) 5: for k ← 1 up to p do 6: λi,k ← ( f ∈Qk f ) ⇓Xi 7: Place λi,k in the bucket Bj where j is the largest-index variable in var(λi,k ) 8: end for 9: end for 10: Return: λ1
ordered valuations. In other words, we consider reasoning tasks where the marginalization operator ⇓ is min or max. Without loss of generality, in the following we assume maximization tasks (i.e., ⇓ is max). Definition 8 (optimization task). Given a graphical model M, its optimization task is P = (M, max). The goal is to find a complete assignment t such that ∀t′ ∈ DX , C(t) ≥ C(t′ ). C(t) is called the optimal solution. Definition 9 (m-best optimization task). Given a graphical model M, its m-best optimization task is to find m complete assignments T = {t1 , . . . , tm } such that C(t1 ) > , · · · , > C(tm ) and ∀t′ ∈ DX \T, ∃1≤j≤m tj C(t′ ) = C(tj ) ∨ C(tm ) > C(t′ ). The solution is the set of valuations {C(t1 ), . . . , C(tm )}, called m-best solutions. 3.1 M-best Valuation Structure One of the main goals of this paper is to phrase the m-best optimization task as a reasoning task over a semiring, so that well known algorithms can be immediately applied to solve this task. Namely, given an optimization task P over a graphical model M, we need to define a reasoning task P m that corresponds to the set of m best solutions of M. We introduce the set of ordered m-best elements of a subset S ⊆ A. Definition 10 (set of ordered m-best elements). Let S be a subset of a set of valuations A. The set of ordered m-best elements of S is Sortedm {S} = {s1 , . . . , sj }, such that s1 > s2 > . . . > sj where j = m if |S| ≥ m and j = |S| otherwise, and ∀s′ ∈ Sortedm {S}, sj > s′ . Definition 11 (m-space). Let A be a set of valuations. The m-space of A, noted Am , is the set of subsets of A that are sets of ordered m-best elements. Formally, Am = {S ⊆ A | Sortedm {S} = S}. The combination and addition operators over the m-space Am , noted ⊗m and sortm respectively, are defined as follows.
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
97
Definition 12 (combination and addition over the m-space). Let A be a set of valuations, and ⊗ and max be its combination and marginalization operators, respectively. Let S, T ∈ Am . Their combination, noted S ⊗m T , is the set Sortedm {a ⊗ b | a ∈ S, b ∈ T }, while their addition, noted sortm {S, T }, is the set Sortedm {S ∪ T }. Proposition 1. When m = 1, the valuation structure (Am , ⊗m , sortm ) is equivalent to (A, ⊗, max). Theorem 3. The valuation structure (Am , ⊗m , sortm ) is a semiring. The theorem is proved in the Appendix A. It is worthwhile to see the ordering defined by the semiring (Am , ⊗m , sortm ), because it would be important in the extension of Mini-Bucket Elimination (Section 5). Recall that by definition, given two elements S, T ∈ Am , S > T if S = Sortedm {T ∪ W }, where W ∈ Am . We call S an m-best bound of T . Definition 13 (m-best bound). Let T, S, W be three sets of ordered m-best elements. S is an m-best bound of T iff S = Sortedm {T ∪ W }. Let us illustrate the previous definition by the following example. Let T = {10, 6, 4}, S = {10, 7, 4}, and R = {10, 3} be three sets of ordered 3-best elements. S is not a 3best bound of T because there is no set W such that S = Sortedm {T ∪ W }. Note that one possible set W is S \ T = {7} but, Sortedm {{10, 6, 4} ∪ {7}} = {10, 7, 6}, which is different to S. However, S is a 3-best bound of R because S = Sortedm {{10, 3} ∪ {7, 4}}. 3.2 Vector Functions We will refer to functions over the m-space Am f : Df → Am as vector functions. Abusing notation, we extend the ⊗m and sortm operators to operate over vector functions similar to how operators ⊗ and ⊕ were extended to operate over scalar functions in Definition 2. Definition 14 (combination and marginalization over vector functions). Let f : Df → Am and g : Dg → Am be two vector functions. Their combination, noted f g, is a new function with scope var(f ) ∪ var(g), s.t. ∀t ∈ Dvar(f )∪var(g) , (f
g)(t) = f (t) ⊗m g(t)
Let W ⊆ X be a set of variables. The marginalization of f over W, noted sortm {f }, is W
a new function whose scope is var(f ) − W, s.t. ′ ∀t ∈ Dvar(f )−W , sortm {f }(t) = sortm t′ ∈DW {f (t · t )} W
Example 3. Figure 1 shows the combination and marginalization over two vector functions h1 and h2 for m = 2 and ⊗ = ×.
98
N. Flerova, E. Rollon, and R. Dechter h1 : X1 a a b b
X2 a b a b
{4,2} {3,1} {5} {2}
h1 ⊗m h2 : X1 a a b b
h2 : X2 a {3,1} b {1}
X2 a b a b
{12,6} {3,1} {15,5} {2}
sortmX 2 {h1 }: X1 a {4, 3} b {5, 2}
Fig. 1. Combination and marginalization over vector functions for m = 2 and ⊗ = ×. For each pair of values of (X1 , X2 ) the result of h1 ⊗m h2 is an ordered set of size 2 obtained by choosing the 2 larger elements out of the result of pair-wise multiplication of the corresponding elements of h1 and h2 . The result of sortmX 2 {h1 } is an ordered set containing the two larger values of function h1 for each value of X1 .
3.3 M-best Optimization as a Graphical Model The m-best extension of an optimization problem P is a new reasoning task P m that expresses the m-best task over P . Definition 15 (m-best extension). Let P = (X, D, A, F, , ⇓) be an optimization problem defined over a semiring (A, ⊗, max). Its m-best extension is a new reasoning task P m = (X, D, Am , Fm , , sortm ) over semiring (Am , ⊗m , sortm ). Each function f : Df → A in F is trivially transformed into a new vector function f ′ : Df → Am defined as f ′ (t) = {f (t)}. In words, function outcomes of f are transformed to singleton sets in f ′ . Then, the set Fm contains the new f ′ vector functions. The following theorem shows that the optimum of P m corresponds to the set of m-best valuations of P . Theorem 4. Let P = (X, D, A, F, , ⇓) be an optimization problem defined over semiring (A, ⊗, max) and let {C(t1 ), . . . , C(tm )} be its m best solutions. Let P m be the m-best extension of P . Then, the optimization task P m computes the set of m-best solutions of P . Formally, sortX m {
f ∈Fm
f } = {C(t1 ), . . . , C(tm )}
The theorem is proved in the Appendix B It is easy to see how the same extension applies to minimization tasks. The only difference is the set of valuations selected by operator sortm .
4 Bucket Elimination for the m-best Task In this section we provide a formal description of the extension of Bucket Elimination algorithm for m best solutions based on the operators over the m-space defined in the previous section. We also provide algorithmic details for the operators and show through an example how the algorithm can be derived from first principles.
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
99
Algorithm 3. elim-m-opt algorithm Input: An optimization task P = (X, D, A, F, , max); An ordering of variables o = {X1 , . . . , Xn }; Output: A zero-arity function λ1 : ∅ → Am containing the solution of the m-best optimization task. 1: Initialize: Transform each function f ∈ F into a singleton vector function h(t) = {f (t)}; Generate an ordered partition of vector functions h in buckets B1 , . . . , Bn , where Bi contains all the functions whose highest variable in their scope is Xi . 2: Backward: 3: for i ← n down to 1 do 4: Generate λi = sortm Xi ( f ∈Bi f ) 5: Generate assignment xi = argsortm Xi ( f ∈Bi ), concatenate with relevant elements of the previously generated assignment messages. 6: Place λi and corresponding assignments in the bucket of the largest-index variable in var(λi ) 7: end for 8: Return: λ1
4.1 The Algorithm Definition Consider an optimization task P = (X, D, A, F, , max). Algorithm elim-m-opt (see Algorithm 3) is the extension of BE to solve P m (i.e., the m-best extension of P ). First, the algorithm transforms scalar functions in F to their equivalent vector functions as described in Definition 15. Then it processes the buckets from as usual, us last to first and sortm , respectively. ing the two new combination and marginalization operators Roughly, the elimination of variable Xi from a vector function will produce a new vector function λi , such that λi (t) will contain the m-best extensions of t to the eliminated variables Xi+1 , . . . , Xn with respect to the subproblem below the bucket variable in the bucket tree. Once all variables have been eliminated, the resulting zero-arity function λ1 contains the m-best cost extensions to all variables in the problem. In other words, λ1 is the solution of the problem. The correctness of the algorithm follows from the formulation of the m-best optimization task as a reasoning task over a semiring. Theorem 5 (elim-m-opt correctness). Algorithm elim-m-opt is sound and complete for finding the m best solutions over an optimization task P . There could be several ways to generate the set of m-best assignments, one of which is presented next and it uses the argsortm operator. Definition 16. Operator argsortm Xi f returns a vector function xi (t) such that ∀t ∈ Dvar(f )\Xi , where f (t · xi 1 ), . . . , f (t · xi m ), are the m-best valuations extending t to Xi and where xi j denotes the j th element of xi (t). In words, xi (t) is the vector of assignments to Xi that yields the m-best extensions to t.
100
N. Flerova, E. Rollon, and R. Dechter
4.2 Illustrating the Algorithm’s Derivation through an Example We have in mind the MPE (most probable explanation) task in probabilistic networks. Consider a graphical model with four variables {X, Y, Z, T } having the following functions (for simplicity we use un-normalizes functions). Let m = 3. x 0 0 1 1 2 2
z 0 1 0 1 0 1
f1 (z, x) 2 2 5 1 4 3
y 0 0 1 1 2 2
z 0 1 0 1 0 1
f2 (z, y) 6 7 2 4 8 2
z 0 0 1 1
t 0 1 0 1
f3 (t, z) 1 2 4 3
Finding the m best solutions to P (t, z, x, y) = f3 (t, z) · f1 (z, x) · f2 (z, y) can be expressed as finding Sol, defined by: m f3 (t, z) · f1 (z, x) · f2 (z, y) (1) Sol = sort t,x,z,y
m
Since operator sort is an extension of operator max, it inherits its distributive properties over multiplication. Due to this distributivity, we can apply symbolic manipulation and migrate each of the functions to the left of the sortm operator over variables that are not in its scope. In our example we rewrite as: Sol = sortm sortm f3 (t, z) sortm f1 (z, x) sortm f2 (z, y) z
t
x
y
(2)
The output of sortm is a set, so in order to make (2) well defined, we replace the multipication operator by the combination over vector functions as in Definition 14. Sol = sortm sortm (f3 (t, z) t
z
(sortm f1 (z, x)) x
(sortm f2 (z, y))) y
(3)
BE computes (3) from right to left, which corresponds to the elimination ordering o = {T, Z, X, Y }. We assume that the original input functions extend to vector functions, e.g., fi is extended as f i (t) = {fi (t)}. Figure 2 shows the messages passed between buckets and the bucket tree under o. Bucket BY containing function f2 (z, y) is processed first. The algorithm applies operator sortm to f2 (z, y), generating a message, which is a vector function denoted y
by λY (z), that is placed in BZ . Note that this message associates each z with the vector of m-best valuations of f2 (z, y). Namely, sortm f2 (z, y) = (λ1Y (z), . . . , λjY (z), . . . , λm Y (z)) = λY (z) y
(4)
where for z each λjY (z) is the j th best value of f2 (z, y). Similar computation is carried in BX yielding λX (z) which is also placed in BZ . x λY (z) y z λX (z) 0 {5,4,2} { 1, 2, 0} {8,6,2} {2, 0, 1} 1 {3,2,1} { 2, 0, 1} {7,4,2} {0, 1, 2}
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
101
T
y) Bucket Y : f2 (z,
λZ(t)
x) Bucket X : f 1 (z,
Z
(z) λY (z) Bucket Z : f 3 (t, z) λX Bucket T :
λZ (t)
(a) Messages passed between buckets
λX(z)
f3 (t, z)
λY(z)
X
Y
f1 (z, x)
f2 (z, y)
(b) Bucket-tree
Fig. 2. Example of applying elim-m-opt
When processing BZ , we compute (see Eq. 3): λZ (t) = sortm λX (z) λY (z)] z [f 3 (t, z)
The result is a new vector function that has m2 elements for each tuple (t, z) as shown below. t 0 0 1 1
z 0 1 0 1
f3 (t, z) λX (z) λY (z) {40, 32, 30, 16, 24, 12, 10, 8, 4} {84, 56, 48, 32, 28, 24, 16, 16, 8} {80, 64, 60, 48, 32, 24, 20, 16, 8} {63, 42, 36, 24, 21, 18, 12, 12, 6}
Applying sortm z to the resulting combination generates the m-best elements out of 2 those m yielding message λZ (t) along with its variable assignments: t λZ (t) x, y, z 0 {84,56,48} {2, 0, 1 , 0, 0, 1 , 2, 1, 1 } 1 {80,64,63} {1, 2, 0 , 2, 2, 0 , 2, 0, 1 }
In Sect. 4.3 we show that it is possible to apply a more efficient procedure that would calculate at most 2m elements per tuple (t, z) instead. Finally, processing the last bucket yields the vector of m best solution costs for the entire problem and the corresponding assignments: Sol = λT = sortm λZ (t) (see t Fig. 2a). λZ (t) x, y, z, t {84,80,64} {2, 0, 1, 0 , 1, 2, 0, 1 , 2, 2, 0, 1 }
4.3 Processing a Bucket and the Complexity of elim-m-opt We will next show that the messages computed in a bucket can be obtained more efficiently than through a brute-force application of followed by sortm . Consider processing BZ (see Fig. 2a). A brute-force computation of λZ (t) = sortm (f3 (z, t) λY (z) λX (z)) z
102
N. Flerova, E. Rollon, and R. Dechter
1,1
c = 80
1,1
1,2
eZ=0
c = 60
c = 63
eZ=1
eZ=0
2,1
eZ=0
c = 64
1,2
c = 36
eZ=1
2,1
c = 42
eZ=1
Fig. 3. The explored search space for T = 0 and m = 3. The resulting message is λZ (1) = {80, 64, 63}.
for each t combines f3 (z, t), λY (z) and λX (z) for ∀z ∈ DZ first. This results in a vector function with scope {T, Z} having m2 elements that we call candidate elements and denote by E(t, z). The second step is to apply sortm E(t, z) yielding the desired z
m best elements λZ (t). However, since λY (z) and λX (z) can be kept sorted, we can generate only a small i,j subset of these m2 candidates as follows. We denote by ez (t) the candidate element obtained by the product of the scalar function value f3 (t, z) with the ith element of i,j λY (z) and j th element of λX (z), having cost cz (t) = f3 (t, z) · λiY (z) · λjX (z). i,j We would like to generate the candidates ez in decreasing order of their costs while taking their respective indices i and j into account. i,j i,j The child elements of ez (t), children(ez (t)) are obtained by replacing in the i+1 product either an element λiY (z) with λY (z), or λjX (z) with λj+1 X (z), but not both. This leads to a forest-like search graph whose nodes are the candidate elements, where each search subspace corresponds to a different value of z denoted by GZ=z and 1,1 rooted in eZ=z (t). Clearly, the cost along any path from a node to its descendants is non-increasing. It is easy to see that the m best elements λZ (t) can then be generated using a greedy best-first search across the forest search space GZ=0 ∪ GZ=1 . It is easy to show that we do not need to keep more than m nodes on the OPEN list (the fringe of the search) at the same time. The general algorithm is described in Algorithm 4. The trace of the search for the elements of cost message λZ (t = 1) for our running example is shown in Figure 3. Proposition 2 (complexity of bucket processing). Given a bucket of a variable X over scope S having j functions {λ1 , ..., λj } of dimension m, where m is the number of best solutions sought and k bounds the domain size, the complexity of bucket processing is O(k |S| · m · j log m), where |S| is the scope size of S. Proof. To generate each of the m solutions, the bucket processing routine removes the current best element from OPEN (in constant time), generates its j children and puts them on OPEN, while keeping the list sorted, which takes O(log(m · j)) per node, since the maximum length of OPEN is O(m · j). This yields time complexity of O((m · j) · log(m · j)) for all m solutions. The process needs to be repeated for each of the O(k |S| ) tuples, leading to overall complexity O(k |S| · m · j log(m · j)). ⊔ ⊓ Theorem 6 (complexity of elim-m-opt). Given a graphical model (X, D, F, ) having n variables, whose domain size is bounded by k, an ordering o with induced-width
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
103
Algorithm 4. Bucket processing Input: BX of variable X containing a set of ordered m-vector functions {λ1 (S1 , X), · · · , λd (Sd , X)} Output: m-vector function λX (S), where S = ∪di=1 Si − X. 1: for all t ∈ DS do 2: for all x ∈ DX do 1,...,1 3: OP EN ← eX=x (t); Sort OPEN; 4: end for 5: while j ≤ m, by +1 do i1 ,··· ,id 6: n ← first element eX=x (t) in OPEN. Remove n from OPEN; j th 7: λX (s) ← n; {the j element is selected} i1 ,··· ,ir +1,··· ,id 8: C ← children(n) = {eX=x (t)|r = 1..d}; 9: Insert each c ∈ C into OPEN maintaining order based on its computed value. Check for duplicates; Retain the m best nodes in OPEN, discard the rest. 10: end while 11: end for
∗
w∗ and an operator ⇓= max, the time complexity of elim-m-opt is O(nk w m log m) ∗ and its space complexity is O(mnk w ). Proof. Let degi be the degree of the node corresponding to the variable Xi in the bucket-tree. Each bucket Bi contains degi functions and at most w∗ different variables with largest domain size k. We can express the time complexity of computing a mes∗ sage between two buckets as O(k w m m) (Proposition 2), yielding the total ·ndegiwlog ∗ time complexity of elim-m-opt of O( k m · degi log m). Assuming degi ≤ deg i=1 n ∗ and since i=1 degi ≤ 2n, we get the total time complexity of O(nmk w log m). The space complexity is dominated by the size of the messages between buckets, ∗ each containing m costs-to-go for each of O(k w ) tuples. Having at most n such mes∗ sages yields the total space complexity of O(mnk w ). ⊔ ⊓
5 Mini-bucket Elimination for m-best Task We next extend the elim-m-opt to the mini-bucket scheme. We prove that the new algorithm computes an m-best bound on the set of m-best solutions of the original problem, and describe how the m-best bound can be used to tighten the bound on the best solution of an optimization task. 5.1 The Algorithm Definition Algorithm mbe-m-opt (Algorithm 5) is a straightforward extension of MBE to solve the m-best reasoning task, where the combination and marginalization operators are the ones defined over vector functions. The input of the algorithm is an optimization task P , and its output is a collection of bounds (i.e., an m-best bound (see Definition 13)) on the m best solutions of P .
104
N. Flerova, E. Rollon, and R. Dechter
Algorithm 5. mbe-m-opt algorithm Input: An optimization task P = (X, D, A, F, , max); An ordering of variables o = {X1 , . . . , Xn }; parameter z. Output: bounds on each of the m-best solution costs and the corresponding assignments for the expanded set of variables (i.e., node duplication). 1: Initialize: Generate an ordered partition of functions f (t) = {f (t)} into buckets B1 , . . . , Bn , where Bi contains all the functions whose highest variable in their scope is Xi along o. 2: Backward: 3: for i ← n down to 1 (Processing bucket Bi ) do 4: Partition functions in bucket Bi into {Qi1 , ..., Qil }, where each Qij has no more than z variables. 5: Generate cost messages λij = sortm Xi ( f ∈Qi f ) j
6:
= Generate assignment using duplicate variables for each mini-bucket: xij argsortm ( f ), concatenate with relevant elements of the previously generated Xi f ∈Qi j
assignment messages 7: Place each λij and xij in the largest index variable in var(Qij ) 8: end for 9: Return: The set of all buckets, and the vector of m-best costs bounds in the first bucket.
Theorem 7 (mbe-m-opt bound). Given a maximization task P , mbe-m-opt computes an m-best upper bound on the m-best optimization task P m . The theorem is proved in the Appendix C Theorem 8 (mbe-m-opt complexity). Given a maximization task P and an integer control parameter z, the time and space complexity of mbe-m-opt is O(mnk z log(m)) and O(mnk z ), respectively, where k is the maximum domain size and n is the number of variables. The theorem is proved in the Appendix D 5.2 Using the m-best Bound to Tighten the First-best Bound Here is a simple, but quite fundamental observation: whenever upper or lower bounds are generated by solving a relaxed version of a problem, the relaxed problem’s solution set contains all the solutions to the original problem. We next discuss the ramification of this observation. Proposition 3. Let P be an optimization problem, and let C˜ = {˜ p1 ≥ p˜2 ≥, ..., ≥ p˜m } be the m best solutions of P generated by mbe-m-opt. Let popt be the optimal value of P , and let j0 be the first index such that p˜j = popt , or else we assign j0 = m + 1. Then, if j0 > m, p˜m is an upper bound on popt , which is as tight or tighter than all other p˜1 , ...˜ pm−1 . In particular p˜m is tighter than the bound p˜1 . Proof. Let C˜ = {˜ p1 ≥ p˜2 ≥, ..., ≥ p˜N1 } be the ordered set of valuations of all tuples over the relaxed problem (with duplicate variables). By the nature of any relaxation,
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
105
C˜ must also contain all the probability values associated with solutions of the original problem P denoted by C = {p1 ≥ · · · ≥ pN2 }. Therefore, if j0 is the first index such that p˜j0 coincides with popt , then clearly for all i < j0 , popt ≤ p˜i with p˜j−1 being the ⊔ ⊓ tightest upper bound. Also, when j0 > m we have p˜m ≥ popt . In other words, if j ≤ m, we already have the optimal value, otherwise we can use p˜m as our better upper bound. Such tighter bounds would be useful during search algorithm such as A*. It is essential therefore to decide efficiently whether a bound coincides with the exact optimal cost. Luckily, the nature of the MBE relaxation supplies us with an efficient decision scheme, since, as mentioned above, it is known that an assignment in which duplicates of variables take on identical values yields an exact solution. Proposition 4. Given a set of bounds produced by mbe-m-opt p˜1 ≥ p˜2 ≥, ... ≥ p˜m , deciding if p˜j = popt can be done in polynomial time, more specifically in O(nm) steps. Proof. mbe-m-opt provides both the bounds on the m-best costs and, for each bound, a corresponding tuple maintaining assignments to duplicated variables. The first assignment from these m-best bounds (going from largest to smallest) corresponding to a tuple whose duplicate variables are assigned identical values is optimal. And if no such tuple is observed, the optimal value is smaller than p˜m . Since the above tests require just O(nm) steps applied to m-best assignments already obtained in polytime, the claim follows. ⊔ ⊓
6 Empirical Demonstrations We evaluated the performance of mbe-m-opt on four sets of instances taken from UAI 2008 competition [7] and compared our algorithm with the BMMF scheme [23]. 6.1 Weighted Constraint Satisfaction Problems The first part of our empirical evaluation assumed solving the Weighted CSP task, i.e, summation-minimization problem. We ran mbe-m-opt on 20 WCSP instances using zbound equal to 10 and number of solutions m equal to 10. Table 1 shows for each instance the time in seconds it took mbe-m-opt to solve the 10-best problem and the values of the lower bounds on each of the first ten best solutions. For each problem instance we also show the number of variables n, the largest domain size k and the induced width w∗ . Note that 9 of the instances have induced width less than the zbound=10 and thus are solved exactly. We see that as the index number of solution goes up, the value of the corresponding lower bound increases, getting closer to the exact best solution. This demonstrates that there is a potential of improving the bound on the optimal assignment using the m-best bounds as discussed in Sect 5.2. Figure 4 illustrates this observation in graphical form, showing the dependency of the lower bounds on the solution index number for selected instances.
106
N. Flerova, E. Rollon, and R. Dechter
Table 1. The lower bounds on the 10 best solutions found by mbe-m-opt ran with z-bound=10 and m = 10. We also report the runtime in seconds, number of variables n, induced width w∗ and largest domain size k. k w ∗
Instance)
n
1502.uai 29.uai 404.uai 408.uai 42.uai 503.uai GEOM30a 3.uai GEOM30a 4.uai GEOM30a 5.uai GEOM40 2.uai GEOM40 3.uai GEOM40 4.uai GEOM40 5.uai le450 5a 2.uai myciel5g 3.uai myciel5g 4.uai queen5 5 3.uai queen5 5 4.uai
209 82 100 200 190 143 30 30 30 40 40 40 40 450 47 47 25 25
4 4 4 4 4 4 3 4 5 2 3 4 5 2 3 4 3 4
6 14 19 35 26 9 6 6 6 5 5 5 5 293 19 19 18 18
time (sec) 0.11 55.17 3.96 80.27 61.16 3.58 0.03 0.19 0.84 0 0.01 0.11 0.16 6.06 6.39 129.54 5.53 122.26
1 228.955109 147.556778 147.056229 436.551117 219.98053 225.038483 0.008100 0.008100 0.008100 0.007800 0.007800 0.007800 0.007800 0.571400 0.023600 0.023600 0.01600 0.01600
2 228.9552 147.557236 148.001511 437.17923 219.980713 225.039368 1.008000 1.008000 1.008000 2.007599 2.007599 2.007598 2.007599 1.571300 1.023500 1.023500 1.015900 1.015900
3 228.955292 147.924484 148.056122 437.549042 220.014938 225.039398 2.007898 2.007899 2.007898 2.007599 2.007599 2.007599 2.007600 1.571300 1.023500 1.023500 1.015901 1.015900
4 228.955414 147.924942 149.001404 437.550018 220.015106 225.040283 2.007899 2.007899 2.007899 2.007600 2.007600 2.007599 2.007600 1.571300 2.023399 2.023397 2.015797 1.015901
Solution index number 5 6 229.053192 229.053284 148.188965 148.189423 149.056015 149.05603 437.550995 438.177155 220.048157 220.048325 226.037476 226.037918 2.007899 3.007798 3.007799 3.007799 2.007899 3.007798 3.007499 3.007499 3.007500 4.007399 2.007599 2.007599 3.007500 4.007399 1.571300 1.571300 3.023299 4.023202 2.023398 2.023398 2.015797 2.015798 1.015901 2.015796
7 229.053406 148.556671 150.001297 438.178131 220.08255 226.037933 3.007798 4.007701 3.007798 3.007500 4.007399 3.007499 4.007400 1.571300 10.022601 2.023398 3.015694 2.015797
8 229.053497 148.557129 150.001312 438.179108 220.082733 226.038361 3.007798 4.007701 3.007798 4.007398 4.0074 3.007499 4.007400 2.571198 11.022501 2.023399 3.015696 2.015797
9 229.141693 148.924393 150.055923 438.547943 220.912811 226.038376 3.007799 4.007702 3.007799 4.007399 4.007400 3.007500 4.007401 2.571199 11.022502 3.023297 3.015697 2.015797
10 229.141785 148.92485 151.001205 438.549896 220.912827 226.038391 4.007700 5.007601 4.007700 4.007400 4.007401 4.007400 4.007401 20.569397 11.022503 3.023298 3.015697 2.015790
!
"" "!
"" "!
"" "!
"" "
"" "
Fig. 4. The change in the cost of the j th solution as j increases for chosen WCSP instances. Results are obtained by mbe-m-opt with z-bound=10.
6.2 Most Probable Explanation Problems For the second part of the evaluation the mbe-m-opt was solving the MPE problem, i.e. max-product task on three sets of instances: pedigrees, grids and mastermind. We search for m ∈ [1, 5, 10, 20, 50, 100, 200] solutions with z-bound equal to 10. Pedigrees. The set of pedigrees contains 15 instances with several hundred variables and induced width from 15 to 30. Table 2 contains the runtimes in seconds for each of the number of solutions m along with the parameters of the problems. Figure 5 presents the runtime in seconds against the number of solutions m for chosen pedigrees. Figure 6 demonstrates the difference between the way the runtime would scale
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
107
according to the theoretical worst case peformance analysis and the empirical runtimes obtained for various values of m. For three chosen instances we plot the experimental runtimes in seconds against the number of solutions m and the theoretical curve obtained by multiplying the value of empirical runtime for m = 1 by the factor of m log m for m equal to 5, 10, 50, 100 and 200. We see that the empirical curve lays much lower than theoretical for all instances. Figure 7 illustrates the potential usefulness of the upper bounds on m best solutions as an approximation of the best solution. We plot in logarithmic scale the values of upper bounds on the 100 best solutions found by mbe-m-opt for the z-bounds ranging from 10 to 15. When using MBE as an approximation scheme, the common rule of thumb is to run the algorithm with the highest z-bound possible. In general, higher z-bound indeed corresponds to better accuracy, however increasing the parameter by a small amount (one or two) does not provably produce better results, as we can see in our example, where mbe-m-opt with z-bound=10 achieves better accuracy then the ones with z-bound=11 and z-bound=12. Such behaviour can be explained by the differences in partitioning of the buckets into mini-buckets due to the changing of the control parameter z, which greatly influences the accuracy of MBE results. On the other hand, the upper bound on each next solution is always at least as good as the previous one, thus increase in m never leads to a worse bound and possibly will produce a better one. However, we acknowledge that the power of mbe-m-opt with larger m for improving the upper bound on the 1st best solution is quite weak compared with using higher z-bound. Although theory suggests that the time and memory complexity of mbe-mopt is exponential in the parameter z, while only depending as a factor of m log m on the number of solutions, our experiments show that in order to obtain a substantial improvement of the bound it might be necessary to use high values of m. For example, for a problem with binary variables mbe-m-opt with m = 1 and a certain z-bound z is equivalent in terms of complexity to mbe-m-opt with m = 3 and z-bound (z − 1). We observed that the costs of the first and third solutions are quite close for the instances we considered. In order to characterize when the use of mbe-m-opt with higher m would add power over increasing the z-bound the study of additional classes of instances is required. Grids. The set of grids contains 30 instances with 100 to 2500 binary variables and tree-width from 12 to 50. The parameters of each instance can be seen in Table 3 that contains the runtimes in seconds for each value of number of solutions m. Theory suggests that the runtimes for m = 1 and m = 100 should differ by at least two orders of magnitude, however, we can see that in practice mbe-m-opt scales much better. Figure 8 shows graphically the dependency of the runtime in seconds on the number of solutions m for 10 selected instances. Mastermind. The mastermind set contains 15 instances with several thousand binary variables and tree-width ranging from 19 to 37. The instances parameters can be seen in Table 4, that shows how the run time changes with various numbers of best solutions m. We refrain from reporting and discussing the values of the upper bounds found, since mastermind instances in question typically have a large set of solutions with the same costs, making the values of the bounds not particular informative.
108
N. Flerova, E. Rollon, and R. Dechter
Table 2. Runtime (sec) of mbe-m-opt on pedigree instances searching for the following number of solutions: m =∈ [1, 5, 10, 20, 50, 100, 200] with the z-bound=10. We report the number of variables n, largest domain size k and induced width w∗ .
Instances
n
pedigree1 pedigree13 pedigree19 pedigree20 pedigree23 pedigree30 pedigree31 pedigree33 pedigree37 pedigree38 pedigree39 pedigree41 pedigree51 pedigree7 pedigree9
334 1077 793 437 402 1289 1183 798 1032 724 1272 1062 871 867 935
k w∗ 4 3 3 5 5 5 5 4 5 5 5 5 5 4 7
15 30 21 20 20 20 28 24 20 16 20 28 39 32 27
m=1 0.22 0.64 1.84 0.54 0.92 0.38 0.83 0.38 1.64 4.52 0.33 1.45 0.76 0.66 0.85
m=5 0.57 1.06 4.67 1.22 2.09 0.66 1.82 0.76 3.27 11.77 0.63 3.33 1.24 1.17 1.48
Runtime (sec) m=10 m=20 m=50 1.01 1.46 3.35 1.32 1.65 2.77 7.65 10.17 24.12 1.83 2.34 5.00 2.89 3.58 7.51 1.00 1.26 2.48 2.68 3.60 7.65 1.11 1.23 2.60 4.56 6.25 14.15 19.63 28.87 73.21 0.89 1.25 2.42 4.43 5.56 11.67 1.65 2.16 3.98 1.61 2.15 4.45 2.12 2.77 5.70
m=100 6.87 5.06 44.17 9.43 14.63 4.58 13.16 4.72 26.43 127.65 4.64 20.59 6.97 8.01 9.49
m=200 25.46 23.80 194.79 50.36 87.22 19.53 57.35 27.81 158.74 552.30 18.31 120.79 33.95 39.26 50.58
Fig. 5. The run time (sec) for pedigree instances as a function of number of solutions m. mbe-mopt ran with the z-bound=10.
6.3 Comparison with BMMF BMMF [23] is a Belief Propagation based algorithm which is exact when ran on junction trees and approximate if the problem graph has loops. We compared the performance of mbe-m-opt and BMMF on randomly generated 10 by 10 binary grids. The algorithms differ in the nature of the outputs: BMMF provides approximate solutions with no guarantees while mbe-m-opt generates bounds on all the m-best solutions. Moreover, the runtimes of the algorithms are not comparable since our algorithm is implemented in C and BMMF in Matlab, which is inherently slower. For most instances that mbe-m-opt can solve exactly in under a second, BMMF takes more than 5 minutes.
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models "
109
" "
"
"
$ ' '"'
$
! '# ''%
!
& '$& '"'
&
Fig. 6. The empirical and theoretical runtime scaling with number of solutions m for chosen pedigree instances. The theoretical curve is obtained by multiplying the experimental runtime in seconds obtained for m = 1 by the factor of m log m for values m = 5, 10, 20, 50, 100, 200.
! %$
"#
Fig. 7. The upper bounds on the 100 best solutions (in log scale) found by mbe-m-opt ran with z-bounds∈ [10, 11, 12, 13, 14, 15] for pedigree30 instance. The parameters of the problem: n=1289, k=5, w∗ =20.
Still, some information can be learned from viewing the two algorithms side by side as is demonstrated by typical results in Figure 10. For two chosen instances we plot the values of the 10-best bounds outputted by both algorithms in logarithmic scale as a function of the solution index. We also show the exact solutions found by the algorithm elim-m-opt. We can see that mbe-m-opt with the z-bound equal to 10 can produce upper bounds that are considerably closer to the exact solutions than the results outputted by BMMF. Admittedly, these experiments are quite preliminary and not conclusive.
110
N. Flerova, E. Rollon, and R. Dechter
Table 3. Binary grid instances: runtime (sec) of mbe-m-opt for the number of required solutions m ∈ [1, 5, 10, 20, 50, 100, 200] with the z-bound=10. We report the number of variables n and induced width w∗ . Instances
n
w∗
50-15-5 50-16-5 50-17-5 50-18-5 50-19-5 50-20-5 75-16-5 75-17-5 75-18-5 75-19-5 75-20-5 75-21-5 75-22-5 75-23-5 75-24-5 75-25-5 75-26-5 90-20-5 90-21-5 90-22-5 90-23-5 90-24-5 90-25-5 90-26-5 90-30-5 90-34-5 90-38-5 90-42-5 90-46-5 90-50-5
144 256 289 324 361 400 256 289 324 361 400 441 484 529 576 625 676 400 441 484 529 576 625 676 900 1156 1444 1764 2116 2500
15 21 22 24 25 27 21 22 24 25 27 28 30 31 32 34 36 27 28 30 31 33 34 36 42 48 55 60 68 74
m=1 0.07 0.07 0.11 0.14 0.13 0.18 0.08 0.10 0.12 0.14 0.18 0.20 0.25 0.29 0.34 0.41 0.49 0.17 0.02 0.25 0.29 0.34 0.42 0.49 0.93 1.69 2.86 4.57 6.81 11.3
m=5 0.15 0.17 0.24 0.29 0.29 0.33 0.17 0.21 0.23 0.26 0.30 0.36 0.40 0.47 0.51 0.62 0.73 0.27 0.35 0.41 0.46 0.49 0.58 0.71 1.25 2.07 3.26 5.10 7.42 12.07
Runtime (sec) m=10 m=20 m=50 0.22 0.30 0.68 0.25 0.33 0.68 0.33 0.45 1.00 0.35 0.52 1.05 0.41 0.54 1.15 0.44 0.59 1.20 0.21 0.27 0.56 0.27 0.36 0.75 0.30 0.40 0.79 0.34 0.47 0.94 0.38 0.52 0.97 0.44 0.60 1.07 0.53 0.68 1.28 0.56 0.71 1.36 0.65 0.81 1.49 0.74 0.93 1.71 0.90 1.17 2.06 0.35 0.44 0.81 0.41 0.52 0.97 0.47 0.61 1.10 0.55 0.66 1.17 0.60 0.74 1.36 0.70 0.83 1.50 0.85 1.01 1.87 1.40 1.59 2.60 2.29 2.60 4.15 3.57 3.98 5.72 5.49 5.88 8.32 7.97 8.33 11.09 12.51 13.2 16.25
m=100 1.27 1.34 1.87 2.04 2.24 2.28 1.09 1.46 1.58 1.86 1.80 2.03 2.49 2.44 2.87 3.18 3.86 1.57 1.91 2.08 2.27 2.61 2.80 3.42 4.62 6.77 9.27 12.31 16.06 22.09
m=200 5.68 6.30 8.70 9.13 9.87 10.65 5.92 7.83 8.34 9.22 9.78 10.91 12.40 13.11 14.58 16.08 19.06 9.26 10.72 11.85 12.63 13.98 15.26 18.36 24.26 32.93 41.33 50.70 64.88 78.70
Table 4. The runtime (sec) of mbe-m-opt for the mastermind instances. Number of required solutions m ∈ [1, 5, 10, 20, 50, 100, 200], z-bound=10. We report the number of variables n, induced width w∗ , domain size k. Instances mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind mastermind
03 03 03 03 03 03 03 03 03 03 03 04 04 05 05
08 08 08 08 08 08 08 08 08 08 08 08 08 08 08
n 03-0006 03-0007 03-0014 04-0004 04-0005 04-0010 04-0011 05-0001 05-0005 05-0009 05-0010 03-0000 03-0013 03-0004 03-0006
1220 1220 1220 2288 2288 2288 2288 3692 3692 3692 3692 1418 1418 1616 1616
k w∗ 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
19 18 20 30 30 29 29 37 37 37 37 24 23 27 27
m=1 0.44 0.25 0.68 1.98 1.92 2.53 3.55 6.33 6.33 3.44 6.23 1.12 1.12 1.22 0.21
m=5 0.57 0.33 0.84 2.25 2.20 2.82 3.85 6.73 6.85 3.72 6.57 1.30 1.33 1.43 0.23
Runtime (sec) m=10 m=20 m=50 0.64 0.72 1.03 0.35 0.40 0.67 0.92 0.98 1.51 2.28 2.49 3.42 2.35 2.50 3.44 2.97 3.09 4.17 4.00 4.16 5.40 7.02 7.24 9.10 7.04 7.29 9.08 3.81 3.97 5.18 6.90 7.10 8.80 1.41 1.49 2.16 1.43 1.51 2.19 1.47 1.57 2.22 0.25 0.26 0.37
m=100 2.68 1.81 3.76 7.11 7.16 8.25 9.90 15.83 15.6 9.59 14.87 4.51 4.60 4.60 0.82
m=200 13.11 8.90 17.77 32.12 32.08 34.89 38.48 59.03 58.77 38.62 56.43 20.33 20.73 20.39 3.84
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models 80 70
111
60 50 40 30 20 10 0 1
5
10
20
50
100
200
Fig. 8. Selected binary grid instances: mbe-m-opt run time (sec) as a function of number of solutions m. The z-bound=10.
7 Related Work: The m-best Algorithms The previous works on finding the m best solutions describe either exact or approximate schemes. The exact algorithms can be characterized as inference based or search based, as we elaborate next. Earlier exact inference schemes. As mentioned above, one of the most influential works among the algorithms solving the m-best combinatorial optimization problems is the widely applicable iterative scheme developed by Lawler[18]. Given a problem with n variables, the main idea of Lawler’s approach is to find the best solution first and then to formulate n new problems that exclude the best solution found, but include all others. Each one of the new problems is solved optimally yielding n candidate solutions, the best among which becomes the overall second best solution. The procedure is repeated until m best solutions are found. The complexity of the algorithm is O(nmT (n)), where T (n) is the complexity of finding a single best solution. Hamacher and Queyranne [15] built upon Lawler’s work and presented a method that assumes the ability to directly find both the best and second best solutions to a problem. After finding the two best solutions, a new problem is formulated, so that the second best solution to the original problem is the best solution to the new one. The second best solution for the new problem is found, to become the overall third best solution and the procedure is repeated until all m solutions are found. The time complexity of the algorithm is O(m · T2 (n)), where T2 (n) is the complexity of finding the second best solution to the problem. The complexity of this method is always bounded from above by that of Lawler, seeing as Lawler’s scheme can be used as an algorithm for finding the second best task.
112
N. Flerova, E. Rollon, and R. Dechter
Lawler’s approach was applied by Nilsson to a join-tree [19]. Unlike Lawler or Hamacher and Queyranne, who are solving n problems from scratch in each iteration of the algorithm, Nilsson is able to utilize the results of previous computations for solving newly formulated problems. The worst case complexity of the algorithm is O(m · T (n)), where T (n) is the complexity of finding a single solution by the max-flow algorithm. If applied to a bucket-tree, Nilsson’s algorithm has run time of ∗ O(nk w + mn log(mn) + mnk).
Fig. 9. mbe-m-opt run time (sec) as a function of number of solutions m for the mastermind instances. The z-bound=10.
" "
!
"
#
$
! $
!
!
Fig. 10. Comparison of mbe-m-opt with z-bounds 10 and BMMF on random 10x10 grids. The exact solutions obtained by elim-m-opt. The mbe-m-opt provides upper bounds on the solutions, BMMF gives no guarantees whether it outputs an upper or a lower bound. In this particular example BMMF outputs lower bounds on the exact solutions.
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
113
More recently Yanover and Weiss [23] extended Nilsson’s idea for the max-product Belief Propagation algorithm, yielding a belief propagation approximation scheme for loopy graphs, called BMMF, which also finds solutions iteratively and which we compared against. At each iteration BMMF uses loopy Belief Propagation to solve two new problems obtained by restricting the values of certain variables. When applied to ∗ junction tree, it can function as an exact algorithm with complexity O(mnk w ). Two algorithms based on dynamic programming, similar to elim-m-opt, were developed by Serrousi and Golmard [21] and Elliot [12]. Unlike the previously mentioned works, Seroussi and Golmard don’t find solutions iteratively from 1st down to mth , but extracts the m solutions directly, by propagating the m best partial solutions along a junction tree that is pre-compiled. Given a junction tree with p cliques, each having at ∗ most deg children, the complexity of the algorithm is O(m2 p·k w deg). Nilsson showed that for most problem configurations his algorithm is superior to the one by Seroussi and Golmard. Elliot [12] explored the representation of Valued And-Or Acyclic Graph, i.e., smooth deterministic decomposable negation normal form (sd-DNNF) [6]. He propagates the m best partial assignments to the problem variables along the DNNF structure which is ∗ pre-compiled as well. The complexity of Elliot’s algorithm is O(nk w m log(m · deg)), excluding the cost of constructing the sd-DNNF. Earlier search schemes. The task of finding m best solutions is closely related to the problem of k shortest paths (KSP) which usually is solved using search. It is known that many optimization problems can be transformed into problems of finding a path in a graph. For example, the task of finding the lowest cost solution to a weighted constraint satisfaction problem can be represented as a search for a shortest path in a graph, whose vertices correspond to assignments of the original problem variables and the lengths of edges are chosen according to the cost functions of the constraint problem. A good survey of different k shortest path algorithms can be found in [5] and [13]. The majority of the algorithms developed for solving KSP assume that the entire search graph is available as an input and thus are not directly applicable to the tasks formulated over graphical models, since for most of them storing the search graph explicitely is infeasible. One very recent exception is the work by Aljazzar and Leue [2]. Their method, called K ∗ , finds the k shortest paths while generating the search graph ”on-the-fly” and thus can be potentially useful for solving problems defined over graphical models. Assuming application to an AND/OR search graph [10] and given a consistent heuristic, K ∗ yields asymptotic worst-case time and space complexity of ∗ O(n · k w · w∗ log(nk) + m). In recent paper [9] we proposed two new algorithms, m-A* and m-B&B, that extend best first and branch and bound search respectively to finding the m best solutions, and their modifications for graphical models: m-AOBF and m-AOBB. We showed that m-A∗ is optimally efficient compared to any other algorithm that searches the same search space using the same heuristic function. The theoretical worse case time com∗ ∗ plexity for m-AOBF is O(n · m · k w ) and for m-AOBB is O(n · deg · m log mk w ). However, we showed that the worst case analysis does not provide an accurate picture of algorithms’ performance and in practice in most cases they are considerably more efficient.
114
N. Flerova, E. Rollon, and R. Dechter
We also presented BE-Greedy-m-BF, a hybrid of variable elimination and best first search scheme. The idea behind the method is to use Bucket Elimination algorithm to calculate the costs of the best path from any node to the goal and use this information as an exact heuristic for A* search. BE-Greedy-m-BF has the time and space complexity ∗ of O(nk w + nm) and, unlike K ∗ , our scheme does not require compex data structures or precomputed heuristics. Earlier approximation schemes. In addition to BMMF, another extension of Nilsson’s and Lawler’s idea that yields an approximation scheme is an algorithm called STRIPES by [14]. They focus on m-MAP problem over binary Markov networks, solving each new subproblem by an LP relaxation. The algorithm solves the task exactly if the solutions to all LP relaxations are integral, and provides an upper bound of each m MAP assignments otherwise. In contrast, our algorithm mbe-m-opt can compute bounds over any graphical model (not only binary) and over a variety of m-best optimization tasks. Other related works. Very recently, Brafman et al. [4] studied the computational complexity of computing the next solution in some graphical models, such as constraint and preference-based networks. They showed that the complexity of this task depends on the structure of the graphical model and on the strict order imposed over its solutions. It is easy to see that our m-best task can be solved by iteratively finding the next solution until m solutions with different valuation have been found. However, since our m-best task defines a partial order over solutions and it only considers solutions with different valuation, further study is needed to determine if the tractability of our problem is the same as that of the problem of finding the next solution.
8 Conclusions We presented a formulation of the m-best reasoning task within a framework of semiring, thus making all existing inference and search algorithms immediately applicable for the task via the definition of the combination and elimination operators. We then focused on inference algorithms and provided a bucket elimination algorithm, elim-m-opt, for the task. Analysis of the algorithm’s performance and relation with earlier work is provided. We emphasize that the practical significance of the algorithm is primarily for approximation through the mini-bucket scheme, since other exact schemes have better worst-case performance. Furthermore, it could also lead to loopy propagation message-passing schemes that are highly popular for approximations in graphical models. For example, elim-m-opt can be extended into a loopy max-prod for the m-best task, which would differ from the scheme approach by Yanover and Weiss that uses loopy max-prod for solving a sequence of optimization problems in the style of Lawler’s approach. Our empirical analysis demonstrates that mbe-m-opt scales as a function of m better than worst-case analysis predict. Comparison with other exact and approximation algorithms is left for future work. Acknowledgments. This work was partially supported by NSF grants IIS-0713118 and IIS-1065618, NIH grant 5R01HG004175-03, and Spanish CECyT project TIN200913591-C02-0.
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
115
A Proof of Theorem 3 Let S, T , R be arbitrary elements of Am . We prove one by one the required conditions. – commutativity of ⊗m . By definition, S ⊗m T = Sortedm {a ⊗ b | a ∈ S, b ∈ T }. Since ⊗ is commutative, the previous expression is equal to Sortedm {b ⊗ a | b ∈ T, a ∈ S} = T ⊗m S. – associativity of ⊗m . We have to prove that (S ⊗m T ) ⊗m R = S ⊗m (T ⊗m R). Suppose that the previous equality does not hold. Then, it would imply that: i. there may exist an element a ∈ (S ⊗m T ) ⊗m R, s.t. a ∈ S ⊗m (T ⊗m R); or, ii. there may exist an element a ∈ S ⊗m (T ⊗m R), s.t. a ∈ (S ⊗m T ) ⊗m R. We show that both cases are impossible. Consider the first case. Let {a1 , . . . , am } = S ⊗m (T ⊗m R) where ∀1≤i am . Since a ∈ S ⊗m (T ⊗m R), it means that am > a. Element a comes from the combination of three elements a = (s ⊗ t) ⊗ r. Each element ai comes from the combination of three elements ai = sai ⊗ (tai ⊗ rai ). By associativity of operator ⊗, ai = (sai ⊗ tai ) ⊗ rai . Then, • If ∀1≤i≤m , sai ⊗ tai ∈ S ⊗m T , then (sai ⊗ tai ) ⊗ rai > (s ⊗ t) ⊗ r for all 1 ≤ i ≤ m, and a ∈ (S ⊗m T ) ⊗m R, which contradicts the hypothesis. • If ∃1≤j≤m , saj ⊗taj ∈ S ⊗m T , then there exists an element s′ ⊗t′ > saj ⊗taj . By monotonicity of >, (s′ ⊗ t′ ) ⊗ raj > (saj ⊗ taj ) ⊗ raj . As a consequence (sam ⊗tam )⊗ram ∈ (S ⊗m T )⊗m R. Since am > a, then a ∈ (S ⊗m T )⊗m R, which contradicts the hypothesis. The proof for the second case is the same as above, but interchanging the role of a and {a1 , . . . , am }, and S and R. – commutativity of sortm . By definition, sortm {S, T } = Sortedm {S ∪ T }. Since set union is commutative, Sortedm {S ∪ T } = Sortedm {T ∪ S} which is by definition sortm {T, S}. – associativity of sortm . By definition, sortm {sortm {S, T }, R} = Sortedm {Sortedm {S ∪ T } ∪ R}, and sortm {S, sortm {T, R}} = Sortedm {S ∪ Sortedm {T ∪ R}}. Clearly, the two expressions are equivalent to Sortedm {S ∪ T ∪ R}. – ⊗m distributes over sortm . Let us proceed by induction: 1. Base case. When m = 1, by Proposition 1, the valuation structure (Am , ⊗m , sortm ) is a semiring and, as a consequence, ⊗m distributes over sortm . 2. Inductive step. Up to m, operator ⊗m distributes over sortm , and let {a1 , . . . , am } be its result. We have to prove that S ⊗m+1 (sortm+1 {T, R}) = sortm+1 {S ⊗m+1 T, S ⊗m+1 R}. By definition of the operators, the result is the same ordered set of elements {a1 , . . . , am } plus one element am+1 . Suppose that ⊗m+1 does not distribute over sortm+1 . Then, it would imply that: i. Element am+1 ∈ S ⊗m+1 (sortm+1 {T, R}), but am+1 ∈ m+1 m+1 {S ⊗ T, S ⊗m+1 R}; or, sort ii. Element am+1 ∈ S ⊗m+1 (sortm+1 {T, R}), but am+1 ∈ m+1 m+1 sort {S ⊗ T, S ⊗m+1 R}
116
N. Flerova, E. Rollon, and R. Dechter
We show that both cases are impossible. Consider the first case. Since am+1 ∈ sortm+1 {S ⊗m+1 T, S ⊗m+1 R}, it means that ∃a′ ∈ sortm+1 {S ⊗m+1 T, S ⊗m+1 R} such that a′ > am+1 . Element a′ comes from the combination of two elements a′ = s′ ⊗ u′ , where s′ ∈ S and u′ ∈ T or u′ ∈ R. Then: • If u′ ∈ sortm+1 {T, R}, then since a′ > am+1 , by definition of ⊗m+1 , am+1 ∈ S ⊗m+1 (sortm+1 {T, R}), which contradicts the hypothesis. • If u′ ∈ sortm+1 {T, R}, then ∃u′′ ∈ sortm+1 {T, R} such that u′′ > u′ . By monotonicity of the order, u′′ ⊗ s′ > u′ ⊗ s′ and, by transitivity, u′′ ⊗ s′ > am+1 . By definition of ⊗m+1 , am+1 ∈ S ⊗m+1 (sortm+1 {T, R}), which contradicts the hypothesis. Consider now the second case. Since am+1 ∈ S ⊗m+1 (sortm+1 {T, R}), it means that ∃a′ ∈ S ⊗m+1 (sortm+1 {T, R}) such that a′ > am+1 . Element a′ comes from the combination of two elements a′ = s′ ⊗ u′ , where s′ ∈ S and u′ ∈ T or u′ ∈ R. Then: • If u′ ∈ T : ∗ and a′ ∈ S ⊗m+1 T . If a ∈ S ⊗m+1 T , since a′ > am+1 and by definition of ⊗m+1 , am+1 ∈ sortm+1 {S ⊗m+1 T, S ⊗m+1 R}, which contradicts the hypothesis. If a ∈ S ⊗m+1 T , since a′ > am+1 and by definition of sortm+1 , am+1 ∈ sortm+1 {S ⊗m+1 T, S ⊗m+1 R}, which contradicts the hypothesis. ∗ and a′ ∈ S ⊗m+1 T . Then, ∃a′′ ∈ S ⊗m+1 T such that a′′ > a′ . By transitivity of the order, a′′ > am+1 . Then, either by definition of ⊗m+1 or by definition of sortm+1 , am+1 ∈ sortm+1 {S ⊗m+1 T, S ⊗m+1 R}, which contradicts the hypothesis. ′ • If u ∈ R. The reasoning is the same as above, but interchanging the role of T and R. ⊔ ⊓
B Proof of Theorem 4 By definition of sortm ,
f } = Sortedm {
f } = Sortedm {
sortX m {
f ∈Fm
(
t∈DX
f ∈Fm
f (t))}
By definition of Fm , sortX m {
f ∈Fm
(
t∈DX
f ∈F
{f (t)})}
Since all {f (t)} are singletons, then {f (t)} ⊗m {g(t)} = {f (t) ⊗ g(t)}. Then, sortX m {
f } = Sortedm { m
f ∈F
{ f (t)}}
t∈DX f ∈F
Bucket and Mini-bucket Schemes for M Best Solutions over Graphical Models
117
By definition of C, sortX m {
f ∈Fm
f } = Sortedm {
{C(t)}}
t∈DX
By definition of the set union, sortX m {
f ∈Fm
f } = Sortedm {{C(t) | t ∈ DX }}
By definition of the set of ordered m-best elements, sortX m {
f ∈Fm
f } = {C(t1 ), . . . , C(tm )}.
⊔ ⊓
C Proof of Theorem 7 Let C m = {C(t1 ), . . . , C(tm )} be the m-best solutions of P . Let P˜ be the relaxed ˜ ′ ), . . . , C(t ˜ ′m )} be its m-best version of P solved by mbe-m-opt, and let C˜ m = {C(t 1 m ˜ solutions. We have to prove that (i) C is an m-best upper bound of C m ; and (ii) mbem-opt(P ) computes C˜ m . i. It is clear that C˜ m = Sortedm {C m ∪W }, where W is the set of solutions for which duplicated variables are assigned different domain values. Therefore, by definition, C˜ m is an m-best bound of C m . ii. As shown in Theorem 5, elim-m-opt(P˜ ) computes C˜ m , and by definition of minibucket elimination, elim-m-opt(P˜ ) = mbe-m-opt(P ). Therefore, mbe-m-opt(P ) computes C˜ m . ⊔ ⊓
D Proof of Theorem 8 Given a control parameter z, each mini-bucket contains at most z variables. Let degi be the number of functions in the bucket Bi of variable Xi , i.e., the degree of the node in the original bucket tree. Let li be the number of mini-buckets created from Bi and let mini li degij = degi . The time compexity of bucket Qij contain degij functions, where j=1 computing a message between two mini-buckets is bounded by O(k z m · degij log m) (Proposition 2) and the complexity of computing all messages in mini-buckets cre li k z m · degij log m) = O(k z m · degi log m). Taking into ated out of Bi is O( j=1 n account that i=1 degi ≤ 2n, we obtain the total runtime complexity of mbe-m-opt of n z z ⊔ ⊓ i=1 k m · degi log m) = O(nmk log m).
References 1. Aji, S.M., McEliece, R.J.: The generalized distributive law. IEEE Transactions on Information Theory 46(2), 325–343 (2000) 2. Aljazzar, H., Leue, S.: A heuristic search algorithm for finding the k shortest paths. Artificial Intelligence 175, 2129–2154 (2011)
118
N. Flerova, E. Rollon, and R. Dechter
3. Bistarelli, S., Faxgier, H., Montanari, U., Rossi, F., Schiex, T., Verfaillie, G.: Semiring-based CSPs and valued CSPs: Basic properties and comparison. Over-Constrained Systems, 111– 150 (1996) 4. Brafman, R.I., Pilotto, E., Rossi, F., Salvagnin, D., Venable, K.B., Walsh, T.: The next best solution. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA (2011) 5. Brander, A.W., Sinclair, M.C.: A comparative study of k-shortest path algorithms. In: Proceedings 11th UK Performance Engineering Workshop for Computer and Telecommunications Systems, pp. 370–379 (1995) 6. Darwiche, A.: Decomposable negation normal form. Journal of the ACM (JACM) 48(4), 608–647 (2001) 7. Darwiche, A., Dechter, R., Choi, A., Gogate, V., Otten, L.: Results from the probablistic inference evaluation of UAI 2008. In: UAI Applications Workshop (2008), a web-report in http://graphmod.ics.uci.edu/uai08/Evaluation/Report 8. Dechter, R.: Bucket elimination: A unifying framework for reasoning. Artificial Intelligence 113(1), 41–85 (1999) 9. Dechter, R., Flerova, N.: Heuristic search for m best solutions with applications to graphical models. In: 11th Workshop on Preferences and Soft Constraints, p. 46 (2011) 10. Dechter, R., Mateescu, R.: AND/OR search spaces for graphical models. Artificial Intelligence 171(2-3), 73–106 (2007) 11. Dechter, R., Rish, I.: Mini-buckets: A general scheme for bounded inference. Journal of the ACM (JACM) 50(2), 107–153 (2003) 12. Elliott, P.H.: Extracting the K Best Solutions from a Valued And-Or Acyclic Graph. Master’s thesis, Massachusetts Institute of Technology (2007) 13. Eppstein, D.: Finding the k shortest paths. In: Proceedings 35th Symposium on the Foundations of Computer Science, pp. 154–165. IEEE Comput. Soc. Press (1994) 14. Fromer, M., Globerson, A.: An LP View of the M-best MAP problem. In: Advances in Neural Information Processing Systems, vol. 22, pp. 567–575 (2009) 15. Hamacher, H.W., Queyranne, M.: K best solutions to combinatorial optimization problems. Annals of Operations Research 4(1), 123–143 (1985) 16. Kask, K., Dechter, R., Larrosa, J., Dechter, A.: Unifying cluster-tree decompositions for automated reasoning. Artificial Intelligence Journal (2005) 17. Kohlas, J., Wilson, N.: Semiring induced valuation algebras: Exact and approximate local computation algorithms. Artif. Intell. 172(11), 1360–1399 (2008) 18. Lawler, E.L.: A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science 18(7), 401–405 (1972) 19. Nilsson, D.: An efficient algorithm for finding the M most probable configurations in probabilistic expert systems. Statistics and Computing 8(2), 159–173 (1998) 20. Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann (1988) 21. Seroussi, B., Golmard, J.L.: An algorithm directly finding the K most probable configurations in Bayesian networks. International Journal of Approximate Reasoning 11(3), 205–233 (1994) 22. Shafer, G.R., Shenoy, P.P.: Probability propagation. Anals of Mathematics and Artificial Intelligence 2, 327–352 (1990) 23. Yanover, C., Weiss, Y.: Finding the M Most Probable Configurations Using Loopy Belief Propagation. In: Advances in Neural Information Processing Systems 16. The MIT Press (2004)
Supporting Argumentation Systems by Graph Representation and Computation J´erˆome Fortin1, Rallou Thomopoulos2, Jean-R´emi Bourguet3, and Marie-Laure Mugnier1 1
3
Universit´e Montpellier II, place Eug´ene Bataillon, 34095 MONTPELLIER CEDEX 5, France [email protected], [email protected] 2 INRA/IATE, 2 pl Pierre Viala, 34000 MONTPELLIER, France [email protected] Universit´e Montpellier III, route de Mende, 34090 MONTPELLIER, France [email protected]
Abstract. Argumentation is a reasoning model based on arguments and on attacks between arguments. It consists in evaluating the acceptability of arguments, according to a given semantics. Due to its generality, Dung’s framework for abstract argumentation systems, proposed in 1995, is a reference in the domain. Argumentation systems are commonly represented by graph structures, where nodes and edges respectively represent arguments and attacks between arguments. However beyond this graphical support, graph operations have not been considered as reasoning tools in argumentation systems. This paper proposes a conceptual graph representation of an argumentation system and a computation of argument acceptability relying on conceptual graph default rules.
1 Introduction Argumentative reasoning is based on the construction and the evaluation of interacting arguments. Most of the existing argumentation models are grounded on the abstract argumentation framework proposed by Dung in [11]. This framework consists of a set of arguments and a binary relation on this set, expressing conflicts among arguments. Argumentation systems are commonly represented by graph structures, where nodes represent arguments and edges attacks between arguments. Two main tools are currently available for argument visualization: Araucaria [1] and Carneades [8]. These tools facilitate the visual display of arguments and in particular their structure (e.g., premisses and conclusion), however they are not reasoning tools. This paper deals with the reasoning issue, moreover it regards a different level, since it does not focus on the structure of an argument, but on the representation of a whole base of arguments and on the computation of maximal acceptable sets of arguments from it. The chosen formalism is the conceptual graph (CG) model [16], as developed in [10]. Indeed, it is the only Artificial Intelligence formalism that combines a graphical representation and graph-based operations, together with an equivalent logical interpretation, providing a well-founded graphical and logical reasoning formalism. A recent M. Croitoru et al. (Eds.): GKR 2011, LNAI 7205, pp. 119–136, 2012. c Springer-Verlag Berlin Heidelberg 2012
120
J. Fortin et al.
study [13] has considered the representation of “argument maps” as conceptual graphs. It designs an architecture that allows to translate argument maps into conceptual graphs, to visualize the arguments put forward by different actors and to retrieve them through queries. The authors present the advantages of using conceptual graphs, and highlight that, beyond this first step, further reasoning features remain to be explored. In this paper, we focus on this reasoning issue. Non-monotonic reasoning is used as a computational tool to determine maximal acceptable sets of arguments. It is instantiated by conceptual graph defaults [2,3]. The paper is organized as follows. Section 2 presents the background on argumentation systems and conceptual graphs. Section 3 introduces a representation of argumentation systems in the conceptual graph formalism. Section 4 proposes a way of computing four kinds of acceptable sets of arguments, namely the naive, the admissible, the preferred, and the stable sets, according to the semantics of argumentation systems.
2 Background This section introduces fundamental notions of argumentation systems and the basic conceptual graph formalism as well as one of its extensions, namely CG defaults. 2.1 Argumentation Systems Three main steps can be distinguished in an argumentation process: 1. constructing arguments and counter-arguments, 2. evaluating the collective acceptability of sets of arguments, and 3. drawing justified conclusions. In [11] an abstract argumentation framework is defined as follows. Definition 1 (Dung’s argumentation framework). An argumentation framework is a pair AF = A, R where A is a set of arguments and R ⊆ A × A is an attack relation. An argument α attacks an argument β if and only if (α , β ) ∈ R. In the above definition, arguments are abstract entities, whose origin and structure are left unknown. With each argumentation system is naturally associated a directed graph whose nodes are the arguments and edges represent the attack relation between them. A set of arguments is said to be conflict free if there is no attack between arguments in this subset: Definition 2 (Conflict-free). Let B ⊆ A. B is conflict-free if and only if ∄ α , β ∈ B such that (α , β ) ∈ R. A set of arguments B defends an argument α if any argument that attacks α is attacked by at least an argument from B : Definition 3 (Defense). Let B ⊆ A. B defends an argument α ∈ B if and only if for each argument β ∈ A, if (β , α ) ∈ R, then ∃ γ ∈ B such that (γ , β ) ∈ R.
Supporting Argumentation Systems by Graph Representation and Computation
121
Among all the conflicting arguments, one has to select acceptable subsets of arguments for inferring conclusions and making decision. In [11,5], several semantics for the notion of acceptability have been proposed. For the purpose of this paper, we recall four of them, namely the naive, admissible, preferred, and stable semantics. Other semantics (e.g., complete, grounded) are not presented here. Definition 4 (Acceptability semantics). Let B ⊆ A. • B is a naive set if and only if it is a maximal (w.r.t. set-inclusion) conflict-free set. • B is an admissible set if and only if it is a conflict-free set that defends all its elements. • B is a preferred set if and only if it is a maximal (w.r.t. set-inclusion) admissible set. • B is a stable set if and only if it is a maximal (w.r.t. set-inclusion) conflict-free set such that every element of (A \ B) is attacked by an element in B.
A1
A2
A3 Fig. 1. Graph of attacks
Figure 1 presents a graph of attacks, where arguments A1 and A2 attack each other and argument A3 attacks A2 . There are two naive sets S1 = {A1 , A3 } and S2 = {A2 }. S1 is admissible, preferred and stable, while S2 has none of these properties. Note that a stable set is a fortiori a preferred set (which is itself an admissible set). Given an argumentation framework, there is always a naive (resp. admissible, preferred) set of arguments, possibly equal to the empty set. This is not true for the stable semantic. For example, consider the argumentation framework presented in Figure 2. In this argumentation framework, the three arguments A1 , A2 , A3 create a cycle of attacks. It does not have any stable set of arguments because, for any conflict-free set, there is at least one argument outside this set that is not attacked by any argument of the set.
A1
A2
A3 Fig. 2. Graph of attacks without stable set
122
J. Fortin et al.
2.2 The Conceptual Graph Formalism The conceptual graph formalism [16,10] is a knowledge representation and reasoning formalism based on labelled graphs. In its simplest form, a CG knowledge base is composed of two parts: the support, which encodes terminological knowledge –and constitutes a part of the represented domain ontology– and basic conceptual graphs built on this support, which express assertional knowledge, or facts. The knowledge base can be further enriched by other kinds of knowledge built on the support. Here, we will consider two kinds of rules: “usual rules” and CG defaults, which lead to non-monotonic reasoning. The support. It provides the ground vocabulary used to build the knowledge base. It is composed of a set of concept types, denoted by TC , and a set of relation types (or simply relations), denoted by TR . Relation types represent the possible relationships between concept instances, or properties of concept instances for unary relation types. TC is partially ordered by a kind of relation, with ⊤ being its greatest element. TR is also partially ordered by a kind of relation, with any two comparable relation types having necessarily the same arity (i.e., number of arguments). Each relation type has a signature that specifies its arity and the maximal concept type of each of its arguments. Figure 3 shows the sets of concept types and of relation types (each with their signature) used in the application. TR is partitioned into two sets, TR1 and TR2 . TR1 is a set of unary relations representing properties of sets (of arguments): being a naive set (Accna ), an admissible set (Accad ), a preferred set (Acc pr ), a stable set (Accst ) or a nonstable set (nonAcc pr). TR2 is a set of binary relations, which contains for instance the attack relation R (with signature (Arg, Arg)). Basic conceptual graphs. A basic CG is a bipartite graph composed of: (i) a set of concept nodes (pictured as rectangles), which represent entities, attributes, states or events; (ii) a set of relation nodes (pictured as ovals), which express the nature of relationships between concept nodes; (iii) a set of edges linking relation nodes to concept nodes; (iv) a labelling function, which labels each node or edge: the label of a concept node is a pair t : m, where t is a concept type and m is a marker; the label of a relation node is a relation type; the label of an edge is its rank in the total order on the arguments of the incident relation node. Furthermore, a basic CG has to satisfy relation signatures: the number of edges incident to a relation node is equal to the arity of its type r, and the concept type assigned to its neighbor by an edge labelled i is less or equal to the ith element of the signature of r. The marker of a concept node can be either an identifier referring to a specific individual (for instance A1 of type Arg in Figure 4) or the generic marker (noted ∗) referring to an unspecified instance of the associated concept type. The generic marker followed by a variable name (for instance ∗x) is used in a basic CG or a rule to indicate that the instance (noted x) represented by several concept nodes is the same (see for instance in Figure 5). A basic CG without occurrence of the generic marker is said to be totally instantiated. Figure 4 shows a totally instantiated basic CG built on the support of Figure 3, which encodes the attack graph of Figure 1.
Supporting Argumentation Systems by Graph Representation and Computation
123
⊤
Arg
Set (a) TC ⊤1
Accna
nonAccst
Accad
(Set)
(Set)
(Set)
Acc pr (Set)
Accst (Set)
(b) TR1 ⊤2
R
∈
(Arg, Arg)
(Arg, Set)
∈ /
(Arg, Set)
⊂ (Set, Set)
(c) TR2 Fig. 3. CG Support with concept and relation types for representing argumentation systems
Logical translation. Conceptual graphs have a logical translation in first-order logic, which is given by a mapping classically denoted by φ . φ assigns a formula φ (S) to a support S, and a formula φ (G) to any basic CG G on this support. First, each concept or relation type is translated into a predicate (a unary predicate for a concept type, and a predicate with the same arity for a relation type) and each individual marker occurring on the graphs is translated by a constant. Then, the kind of relation between types of the support is translated by logical implications. For example, the fact that Accst is a specialization of Acc pr (Figure 3b) is translated by: ∀x(Accst (x) → Acc pr (x)) Then, given a basic conceptual graph G on S, φ (G) is built as follows. A distinct variable is assigned to each concept node with a generic marker. An atom of the form t(e) is assigned to each concept node with label t : m, where e is the variable assigned to this node if m = ∗, otherwise e = m. An atom of the form r(e1 , . . . , ek ) is assigned to each relation node with label r, where ei is the variable or the constant corresponding to the ith neighbor of the relation. φ (G) is then the existential closure of the conjunction of all atoms assigned to its nodes.
124
J. Fortin et al. R 1 2 Arg : A1
Arg : A2
2
R
1
Arg : A3
2 1 R Fig. 4. Basic conceptual graph
For instance, the logical translation of the conceptual graph represented in Figure 4 is the following: Arg(A1 ) ∧ Arg(A2 ) ∧ Arg(A2 ) ∧ R(A1 , A2 ) ∧ R(A2 , A1 ) ∧ R(A3 , A2 ). Note that in this case, the graph is totally instantiated, thus its logical translation has no variable. Specialization relation, homomorphism. Any set of conceptual graphs is partially preordered by a specialization relation, which can be computed by a graph homomorphism (allowing the restriction of the node labels), also called projection in the conceptual graph community. The specialization relation, and thus homomorphism, between two graphs, corresponds to the logical entailment between the corresponding formulas, i.e., there is a homomorphism from G to H both built on a support S if and only if φ (G) is entailed by φ (H) and φ (S) (see e.g., [10] for details)1 . Basic CG rules. Basic CG rules [15] are an extension of basic CGs. A CG rule (notation: R = (H,C)) is of the form “if H then C”, where H and C are two basic CG (respectively called the hypothesis and the conclusion of the rule), which may share some concept nodes. Generic markers referenced by a variable name as ∗x refer to the same individual in the hypothesis and in the conclusion. Formally, it can be defined as a single bicolored basic CG, as illustrated in Figure 5: the hypothesis is composed of the white nodes; the conclusion is induced by the black nodes and the white concept nodes that are linked to a black relation node; these latter nodes that belong both to the hypothesis and the conclusion are called frontier nodes. Intuitively, this rule says that “if an argument x attacks an argument y that attacks an argument z, then x defends z”. A rule R is applicable to a basic CG G if there is a homomorphism from its hypothesis to G. Let π be such a homomorphism. Then, the application of R on G according to π produces a basic CG obtained from G by adding the conclusion of R according to π , i.e., merging each frontier node c of the added conclusion with the node of G that is image of c by the homomorphism π . For instance, the rule in Figure 5 can be applied three times to the basic CG of Figure 4, which allows to infer that A3 defends A1 (the concept nodes with label Arg : ∗x, Arg : ∗y and Arg : ∗z are respectively mapped to the concept 1
Note that, for the homomorphism completeness part, H has to be in normal form: each individual marker appears at most once in it, i.e., there are no two concept nodes in H representing the same identified individual.
Supporting Argumentation Systems by Graph Representation and Computation
125
Hypothesis
Arg : *x 1 1
R
2 Arg : *y
de f ends 1
R 2 2 Arg : *z
Conclusion Fig. 5. CG rule 2
de f ends
R 1
1 2 Arg : A1
Arg : A2
1
Arg : A3
2
1
1 de f ends
2
R
2
1
R
2
de f ends Fig. 6. Entailed basic CG
nodes with individual marker A3 , A2 and A1 ) and that A1 and A2 defend themselves. The resulting conceptual graph is represented in Figure 6. The mapping φ to first-order logic is extended to CG rules. Let R = (H,C) be a CG rule, and let φ ′ (R) denote the conjunction of atoms associated with the basic CG underlying R (all variables are kept free). Then, φ (R) = ∀x1 . . . ∀xk (φ ′ (H) → (∃y1 . . . ∃yq φ ′ (C))), where φ ′ (H) and φ ′ (C) are the restrictions of φ ′ (R) to the nodes of H and C respectively, x1 , . . . , xk are the variables appearing in φ (H) and y1 , . . . , yq are the variables appearing in φ (C) but not in φ (H). For example, the rule of Figure 5 is translated as follows: ∀x∀y∀z (Arg(x)∧Arg(y)∧Arg(z)∧R(x, y)∧R(y, z) → de f ends(x, z)); since there are no variables introduced in the rule conclusion, there are no existentially quantified variables.
126
J. Fortin et al.
The rule application mechanism is logically sound and complete: given a set of rules R, basic CGs G and H (representing for instance a query and a set of facts), all defined on a support S, φ (G) is entailed by φ (H), φ (S) and the logical formulas assigned to R if and only if there is a sequence of rule applications with rules of R leading from H to a basic CG H ′ such that there is a homomorphism from G to H ′ (in other words, by applying rules to H, it is possible to obtain H ′ which entails G). When a rule is applied, it may create new individuals (one for each generic concept node in its conclusion, i.e., one for each existential variable yi in the logical translation of the rule). In the following, we will assume that all facts (represented as basic CGs) are completely instantiated. Then, when a rule is applied, we will instantiate each new generic concept node created, by replacing its generic marker with a new individual marker (which can be seen as a Skolem function, moreover without variable in this case). This way of doing will allow us to present CG defaults in a simpler way. 2.3 Conceptual Graph Defaults A Brief Introduction to Reiter’s Default logic Let us recall some basic definitions of Reiter’s default logic. For a more precise description and examples, the reader is referred to [14,7]. Definition 5 (Reiter’s Default theory). A Reiter’s default theory is a pair (∆ ,W ) where W is a set of FOL formulae and ∆ is a set of defaults of form δ = → → → α (− x ):β1 (− x ),··· ,βn (− x) → → → x ), βi (− x ) are FOL formulae for which x ) and γ (− , n ≥ 0, where α (− → γ (− x) → − each free variable is in the tuple of variables x = (x1 , · · · , xk ). → → The intuitive meaning of a default δ is “For all individuals − x = (x1 , · · · , xk ), if α (− x) → − → − is believed and each of β1 ( x ), · · · , βn ( x ) can be consistently believed, then one is → → → x )”. α (− x ) is called the prerequisite, the βi (− allowed to believe γ (− x ) are called the → − → x ), justifications and γ ( x ) is called the consequent. A default is said to be closed if α (− → − → − βi ( x ) and γ ( x ) are all closed FOL formulae. A default theory (∆ ,W ) is said to be closed if all its defaults are closed. Intuitively, an extension of a default theory (∆ ,W ) is a set of formulae that can be obtained from (∆ ,W ) while being consistently believed. More formally, an extension E of (∆ ,W ) is a minimal deductively closed set of formulae containing W such that for any αγ:β ∈ ∆ , if α ∈ E and ¬β ∈ / E, then γ ∈ E. The following theorem [14] provides an equivalent characterization of extensions that we use here as a formal definition. Theorem 1 (Extension). Let (∆ ,W ) be a closed default theory and E be a set of FOL formulae. We inductively define E0 = W and for all i ≥ 0, Ei+1 = T h(Ei ) ∪ {γ | α :β1γ··· ,βn ∈ ∆ , α ∈ Ei and ¬β1 , · · · , ¬βn ∈ / E}, where T h(Ei ) is the deductive closure of Ei . Then E is an extension of (∆ ,W ) if and only if E = ∪∞ i=0 Ei . Note that this characterization is not effective for computational purposes since both Ei and E = ∪∞ i=0 Ei are required for computing Ei+1 (for more details on generating extensions, see [14,12]). Moreover, Theorem 1 holds for a closed default theory, which is less expressive than the general case, since no variable can be shared between the
Supporting Argumentation Systems by Graph Representation and Computation
127
hypothesis, the conclusion or a justification. When we want to apply some non-closed default, we first have to instantiate each free variable by all the constants that may appear in the knowledge base, which yields a set of closed defaults. The fundamental problems addressed in Reiter’s default logic are the following: – EXTENSION : Given a default theory (∆ ,W ), does it have an extension? – SKEPTICAL DEDUCTION : Given a default theory (∆ ,W ) and a formula Q, does Q belong to all extensions of (∆ ,W )? In this case we note (∆ ,W ) ⊢S Q. – CREDULOUS DEDUCTION : Given a default theory (∆ ,W ) and a formula Q, does Q belong to an extension of (∆ ,W )? In this case we note (∆ ,W ) ⊢C Q. Default Rules in the Conceptual Graph Formalism. We now present an extension of CG rules, which has been introduced in [2,3] and allows for default reasoning. It can be seen as a graphical implementation of a subset of Reiter’s default logic: indeed, we restrict the kind of formulae that can be used in the three components of a default. On the other hand, we can deal directly with non–closed defaults, i.e., without instantiating free variables before processing the defaults. In Reiter’s logic, the application of a default is subject to a consistency check with respect to current knowledge: each justification J has to be consistent with the current knowledge, i.e., ¬J should not be entailed by it. In CG defaults, justifications are replaced by graphs called constraints; a constraint C can be seen as the negation of a justification: C should not be entailed by current knowledge. Definition 6 (CG defaults). A CG default is a tuple D = (H,C,C1 , . . . ,Ck ) where H is called the hypothesis, C the conclusion and C1 , . . ., and Ck are called the constraints of the default; all components of D are themselves basic CGs and may share some concept nodes. Briefly said, H, C and each Ci respectively correspond to the prerequisite, the consequent and the negation of a justification in a Reiter’s default. H, C and the Ci ’s can share some concept nodes that have the same marker. These markers can be individual or generic, in which case the identification of the concept nodes is made by comparing the name of the variable associated with this generic marker. In this paper, we will represent a CG default by a multi-colored basic CG (with a distinct color for each component of the tuple). As in a CG rule, the hypothesis is encoded by the white nodes. The conclusion is encoded by the black nodes and frontier nodes. Each constraint is visualized with a different level of gray. The intuitive meaning of a CG default is rather simple: “for all individuals x1 . . . xk , if H[x1 . . . xk ] holds true, then C[x1 . . . xk ] can be inferred provided that no Ci [x1 . . . xk ] holds true”. If we can map by homomorphism the hypothesis H of a default to a fact graph G, then we can add the conclusion of the default according to this homomorphism (as in a standard rule application), unless this homomorphism can be extended to map one of the constraints from the default. As already pointed out, while the negation of a justification in a Reiter’s default should not be entailed, in a CG default the constraint itself should not be entailed. The entailment mechanism is based on the construction of a default derivation tree.
128
J. Fortin et al.
Let K = ((S , G, R), D) be a knowledge base, where G is a basic CG representing the initial fact,2 R is a set of CG rules, and D is a set of CG defaults, all defined on the support S . As previously mentioned, we will assume that G is completely instantiated. Then the rule application mechanism ensures that all derived facts are also completely instantiated. A node of the default derivation tree DDT(K ) is labelled by a basic CG called fact and a set of basic CGs called constraints. A node of DDT(K ) with label (G, C ) is said to be valid if there is no homomorphism from a constraint in C , or a constraint occurring in the label of one of its ancestors, to G. We now define inductively the tree DDT(K ): – the root is labelled by (G, 0) / –note that it is valid; – if x is a valid node of DDT(K ) with label (F, C ), it holds that: for every default D = (H,C,C1 , . . . ,Ck ) in D, for every homomorphism π from H to a basic CG F ′ R-derived from F, x has a successor whose fact is obtained by the application of D as a classical rule R = (H,C) without considering it as a default, and whose constraints are the π (Ci ), if and only if that successor is valid. In the above definition, π (Ci ) is obtained from Ci by replacing the labels of concept nodes that belong to the domain of π (thus are shared with H) with their image by π . This allows to bind some nodes of Ci to nodes of the current fact. Let us consider for instance the DDT in Figure 8, obtained with the set of rules in Figure 7 and the initial fact G in Figure 4. The successor of the root is obtained by applying the rule RN , which has an empty hypothesis and no constraint: hence, the conclusion is simply added to G, after instantiation of the generic concept node [Set:*] with a new individual marker (denoted by E in the figure). The leftmost successor of this tree node (let us note it n3 ) is obtained by applying the rule DN with the concept nodes [Set:*] and [Arg:*] from the rule hypothesis being respectively mapped to [Set:E] and [Arg:A1]. Both constraints of DN are instantiated accordingly. It is checked that n3 is valid: indeed, the instantiated constraints do not map to the newly built fact. Thus n3 is actually added to the tree. Note that all descendants of n3 will have to satisfy the constraints labelling n3 . The leaves of DDT(K ) exactly encode extensions of a default theory (see [2,3]).
3 Argumentation System Modelling This section proposes a representation of an abstract argumentation system in the CG formalism, as previously introduced. The associated reasoning mechanisms that compute acceptable sets of arguments, will be presented in Section 4. 3.1 Support Description To encode an argumentation system, independently from a given application domain, the elements that constitute an argumentation framework (arguments, sets of arguments), their properties (admissibility semantics) and the relations between them (attack relation, membership and inclusion relations) have to be introduced in the CG support. 2
Since basic CGs do not need to be connected, a set of facts can be seen as a single fact.
Supporting Argumentation Systems by Graph Representation and Computation
129
Figure 3 shows a support that represents these elements. The fact that stable sets of acceptable arguments are specializations of preferred sets, which are themselves specializations of admissible sets, is directly encoded in the support through the “kind of” relation. 3.2 Graph of Attacks Based on the generic support of Figure 3, a graph of attacks is defined as follows. Definition 7 (Graph of attacks). A graph of attacks is a basic CG such that: – concept nodes are labelled by pairs Arg : i, where i denotes (an instance of) an argument; – relation nodes are labelled by R, which represents the attack relation. The basic CG pictured in Figure 4 is an example of a graph of attacks in which A1 attacks A2 , A2 attacks A1 and A3 attacks A2 .
4 Computing Acceptable Sets of Arguments Using CG Defaults In this section, we use CG rules and defaults to compute acceptable sets of arguments. 4.1 Computing Naive Sets of Acceptable Argument We now show how to compute all the naive sets of arguments, with using two rules (a “classic” CG rule and a CG default, see Figure 7). The way we compute the naive sets
1
Set : *
Accna
(a) RN ∈
1
1
Arg : *
R
2
2 Accna
1
Set : ∗
2
∈
1
Arg : ∗
2
1 ∈
1
Arg : *
2
R
(b) DN Fig. 7. Rules generating naive sets of acceptable arguments
130
J. Fortin et al. G1
R
1
Arg : A1
2
Arg : A2
2
1
R
2
Arg : A3
1
R C1 = {}
RN
G2
R
1
Arg : A1
2
Arg : A2
2
1
Arg : A3
1
R Accna
1
R
2
Set : E
C2 = {} DN “applied on A3 ” DN “applied on A2 ”
DN “applied on A1 ”
G3
G5
C3
Arg : A1
G6
R
1
2
Arg : A2
2
C5 Set : E
1
Arg : A3
C6
∈
2
Accna
R
2
1 1
R DN “applied on A3 ”
1
DN “applied on A1 ”
Set : E
2
∈
1
Arg : ∗
1
2
∈
1
Arg : ∗
2
R
2
Arg : A2 G7
G4 Set : E
R
1
Arg : A2 C7
C4
Fig. 8. DDT
is thus purely declarative. Using these rules, the default computation mechanism calculates the default extensions. Each of these extensions encodes in a graphical manner a naive set of arguments. The first rule encodes the following information: “A naive set of arguments exists”. Indeed, as mentioned in section 2.1, for any graph of attacks, a naive set of arguments always exists. This rule (denoted by RN ) is a classic CG rule with an empty hypothesis (see Figure 7(a)). Given a conflict-free set of arguments E, a naive set of arguments (which has to be maximal) can be built by iteratively adding some arguments. An argument a may be added to E if E ∪ {a} is still conflict-free. The CG default DN given in Figure 7(b) is designed in such a way that the argument a is added to the set E only if it is not in conflict with any argument of E. This is guaranteed by the two constraints of DN :
Supporting Argumentation Systems by Graph Representation and Computation
131
– the first one (in light gray) ensures that a does not attack any argument of E; – the second one (in dark gray) ensures that a is not attacked by any argument of E. Therefore, applying this rule to a graph of attacks preserves the property that the group of arguments linked to the set E by the relation ∈ is conflict-free. The deduction mechanism based on the default derivation tree ensures that when a default extension is computed, the rule DN cannot be applied any more. Hence the node labelled by E encodes a maximal conflict-free set of arguments, i.e., a naive set. Figures 8, 9 and 10 show the default derivation tree that is computed to obtain the default extensions encoding the naive sets. The default extensions are encoded in the leaves of the tree. Note that two of the leaves of the tree are identical (G4 and G7 ), so na we obtain only two distinct default extensions: Accna 1 = {A2 } Acc2 = {A1 , A3 }.
DN “applied on A1 ”
G3
R
1
Arg : A1
2
Arg : A2
2
1
Arg : A3
1
R ∈
1
R
2
2
Accna
Set : E
1
C3 Set : E Set : E
2
∈
1
Arg : ∗
1
2
∈
1
Arg : ∗
2
2
R
1
R
Arg : A1 Arg : A1
DN “applied on A3 ”
G4
R
1
Arg : A1
2
Arg : A2
2
1
Acc
2
C4 Set : E Set : E
∈
2
na 1
Arg : A3
1
1
R ∈
1
R
2
Set : E
2
∈
1
Arg : ∗
1
2
∈
1
Arg : ∗
2
R
R
2
1
Arg : A3 Arg : A3
Fig. 9. Nodes 3 and 4 of the DDT in Figure 8
132
J. Fortin et al.
DN “applied on A3 ”
G6
R
1
Arg : A1
2
Arg : A2
2
∈
2
Acc
Set : E
1
C6 Set : E Set : E
Arg : A3
1
1
R na
1
R
2
2
∈
1
Arg : ∗
1
2
∈
1
Arg : ∗
2
2
R
1
R
Arg : A3 Arg : A3
DN “applied on A3 ”
G7
R
1
Arg : A1
2
Arg : A2
2
1
Acc
2
C7 Set : E Set : E
∈
2
na 1
Arg : A3
1
1
R ∈
1
R
2
Set : E
2
∈
1
Arg : ∗
1
2
∈
1
Arg : ∗
2
R
R
2
1
Arg : A1 Arg : A1
Fig. 10. Nodes 6 and 7 of the DDT in Figure 8
4.2 Computing Preferred Sets of Acceptable Arguments One way to obtain the preferred sets from the naive ones is to iteratively remove the nondefended arguments. For that, we use the two CG defaults shown in Figure 11. Given a set of arguments, if there is an argument in this set that is not defended, the CG default DCP1 creates a new set, and the non-defended argument is declared as not belonging to this set. The CG default DCP2 adds all the arguments to the new set of arguments, unless they have been declared as not belonging to this set. We need to apply these defaults to the graphs that represent naive sets obtained in the previous subsection. By applying these two rules, we obtain graphs that contain a path of “subset” relation nodes starting from the original naive set. Going from a concept node of this path to its successor corresponds to removing exactly one non-defended argument. The last concept node of the path thus represents the smallest set, and has the property of being:
Supporting Argumentation Systems by Graph Representation and Computation
133
– conflict-free, since all sets encoded in the concept nodes are subsets of naive sets; – without non-defended arguments, since the rule DCP1 can not be applied anymore. Hence, each of these sets is a candidate to be a preferred set. It remains to check that it has the property of being maximal. To select maximal sets, one would have to compare extensions, which is not possible in our framework. Thus, tagging such sets by the Acc pr relation, which indicates that they are preferred sets, is performed outside the CG framework.
Arg : *
R
1
2
Arg : * 1
1 ∈
R 2
2 2
Set : *
1
∈
Arg : *
2 1 ⊂
∈ /
1
2
Set : * (a) DCP1 2
Set : *
1
∈
Arg : *
2 1 ⊂ 1
1
∈ / 2
2
Set : ∗
∈
(b) DCP2 Fig. 11. Rules generating preferred sets of acceptable arguments
Figure 12 represents the preferred sets of our running example (selected by the maximality test). The first set E ′ , derived from Accna 1 , is the empty set (since there is no ∈ relation linking E ′ to any argument). As it is included in the set Acc2pr = {A1 , A3 }, derived from Accna 2 , it is not maximal and thus not tagged as a preferred set.
134
J. Fortin et al. R 1 2
Arg : A1
Arg : A2
2 1
1
R
2
∈
∈ / 2
Set : E
1
Arg : A3
1
2 Accna
1
R
⊂
2
Set : E ′
1
(a) Acc1pr R 1 2
Arg : A1
Arg : A2
2 1
R ∈
Accna
2
1
R
1
Arg : A3 1
2
∈
2 Set : E
1
(b) Acc2pr Fig. 12. Extensions computed from the leaves of Figure 8 with the CG defaults of Figure 11
notAccst
Arg : *
1 1 Acc pr
1
Set : *
∈
2
2 R
2 ∈
1
1 Arg : *
Fig. 13. Rule that tags preferred sets of arguments as “not stable”
4.3 Computing Stable Sets of Acceptable Arguments A stable set of arguments is a fortiori a preferred set. To be stable, a preferred set of arguments has to attack all the arguments that are not in the set. It turns out that once a set of preferred arguments is computed, it is easy to check whether it is also a stable set of arguments. This can be done using the CG default pictured in Figure 13, which
Supporting Argumentation Systems by Graph Representation and Computation
135
starts from a preferred set and expresses a condition to reject it if it is not a stable set; it is tagged as “not stable” (denoted by notAccst ) unless each argument belongs to it (dark gray constraint) or is attacked by an argument that belongs to it (light gray constraint). Then, when a default extension is computed, a preferred extension is identified as a stable extension if and only if it is not tagged as “not stable”. In our running example, the only preferred set of arguments Acc2pr = {A1 , A3 } is also a stable set, since the only argument that is not in the set, A2 , is attacked by A1 and A3 .
5 Conclusion In this paper, we have shown how an argumentation framework can be represented in the CG formalism. This formalism also allows to compute different kinds of acceptable sets of arguments. However, it does not capture the notion of maximality in the definition of preferred sets. Therefore, in order to be properly used in an argumentation context, this formalism still needs to be extended. Another interest of using conceptual graphs in this context would consist in representing not only the relationships between arguments, but also the internal structure of arguments. Indeed, basic CGs can be extended to nested CGs, in which concept nodes not only have a type and a marker, but also a description, which is itself a nested CG [9]. The basic homomorphism notion can be easily extended to nested CG. Generally speaking, this allows for a hierarchical representation of knowledge and reasoning by taking this structuring into account. In our application case, the first level would correspond to arguments seen as “black boxes”. Then, by “zooming” on arguments, one would have access to the internal description of arguments. Since the internal structure of an argument is a CG, it benefits from the graph mechanisms for reasoning. In the literature, there have been several proposals to represent this internal structure (e.g. [4]). In a preliminary study [6], whose aim was to represent the viewpoints of different actors and the associated arguments in a health policy case, we chose to represent an argument with several parts, one of them being the action advocated by the argument. Any two different actions were either in a specialization relation (i.e., one action is more specific than the other) or incompatible. The attack relation between arguments was computed from the action parts of the arguments: an argument a attacks an argument b if the action of a is not entailed by the action of b, i.e., either the actions of a and b are incompatible, or the action of a is a strict specialization of the action of b. In a CG framework, an argument can be represented as a concept node with a nested description, which is itself partitioned into several CGs, one of them corresponding to the action advocated by the argument. A set of arguments is then a nested CG, in which each concept node is an argument. The attack relation can be computed automatically by comparing the action graphs of arguments. For the above attack relation, this can be done with a simple homomorphism check: if the action of a does not map by homomorphism to the action of b, then a attacks b. This is only a simple example of how the internal structure of arguments can be represented, and the attack relation generated, in the CG framework. As for further work, we want to study the adequacy of this framework with the proposals in the argumentation literature.
136
J. Fortin et al.
References 1. Araucaria: (website), http://araucaria.computing.dundee.ac.uk/ 2. Baget, J.-F., Croitoru, M., Fortin, J., Thomopoulos, R.: Default Conceptual Graph Rules: Preliminary Results for an Agronomy Application. In: Rudolph, S., Dau, F., Kuznetsov, S.O. (eds.) ICCS 2009. LNCS(LNAI), vol. 5662, pp. 86–99. Springer, Heidelberg (2009) 3. Baget, J.-F., Fortin, J.: Default Conceptual Graph Rules, Atomic Negation and Tic-Tac-Toe. In: Croitoru, M., Ferr´e, S., Lukose, D. (eds.) ICCS 2010. LNCS, vol. 6208, pp. 42–55. Springer, Heidelberg (2010) 4. Bentahar, J., Moulin, B., B´elanger, M.: A taxonomy of argumentation models used for knowledge representation. Artif. Intell. Rev. 33(3), 211–259 (2010) 5. Bondarenko, A., Dung, P.M., Kowalski, R.A., Toni, F.: An abstract, argumentation-theoretic approach to default reasoning. Artificial Intelligence Journal 93, 63–101 (1997) 6. Bourguet, J.R.: Contribution aux m´ethodes d’argumentation pour la prise de d´ecision. Application a` l´arbitrage au sein de la fili`ere c´er´eali`ere. Ph.D. thesis, Universit´e Montpellier II (2010) 7. Brewka, G., Eiter, T.: Prioritizing default logic: Abridged report. In: Festschrift on the Occasion of Prof. Dr. W. Bibel’s 60th Birthday. Kluwer (1999) 8. Carneades: (website), http://carneades.berlios.de/ 9. Chein, M., Mugnier, M.L., Simonet, G.: Nested Graphs: A Graph-based Knowledge Representation Model with FOL Semantics. In: Proc. of KR 1998, pp. 524–534. Morgan Kaufmann (1998) 10. Chein, M., Mugnier, M.L.: Graph-based Knowledge Representation and Reasoning. Computational Foundations of Conceptual Graphs. Advanced Information and Knowledge Processing. Springer, London (2009) 11. Dung, P.M.: On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence Journal 77, 321– 357 (1995) 12. Brewka, G., Niemel¨a, I., Truszczynski, M.: Nonmonotonic reasoning. In: Lifschitz, V., Porter, B., van Harmelen, F. (eds.) Handbook of Knowledge Representation, pp. 239–284. Elsevier (2007) 13. de Moor, A., Park, J., Croitoru, M.: Argumentation Map Generation with Conceptual Graphs: the Case for ESSENCE. In: Proc. of the 4th ICCS Conceptual Structures Tool Interoperability Workshop (CS-TIW 2009), Russia, pp. 58–69 (2009) 14. Reiter, R.: A logic for default reasoning. Artificial Intelligence 13, 81–132 (1980) 15. Salvat, E., Mugnier, M.L.: Sound and Complete Forward and Backward Chaining of Graph Rules. In: Eklund, P., Mann, G.A., Ellis, G. (eds.) ICCS 1996. LNCS, vol. 1115, pp. 248–262. Springer, Heidelberg (1996) 16. Sowa, J.F.: Conceptual Structures: Information Proc. in Mind and Machine. Addison–Wesley (1984)
Representing CSPs with Set-Labeled Diagrams: A Compilation Map Alexandre Niveau1 , H´el`ene Fargier2 , and C´edric Pralet3 1
CRIL/Universit´e d’Artois, Lens, France [email protected] 2 IRIT/CNRS, Toulouse, France [email protected] 3 Onera/DCSD, Toulouse, France [email protected]
Abstract. Constraint Satisfaction Problems (CSPs) offer a powerful framework for representing a great variety of problems. Unfortunately, most of the operations associated with CSPs are NP-hard. As some of these operations must be addressed online, compilation structures for CSPs have been proposed, e.g. finitestate automata and Multivalued Decision Diagrams (MDDs). The aim of this paper is to draw a compilation map of these structures. We cast all of them as fragments of a more general framework that we call Set-labeled Diagrams (SDs), as they are rooted, directed acyclic graphs with variable-labeled nodes and set-labeled edges; contrary to MDDs and Binary Decision Diagrams, SDs are not required to be deterministic (the sets labeling the edges going out of a node are not necessarily disjoint), ordered nor even read-once. We study the relative succinctness of different subclasses of SDs, as well as the complexity of classically considered queries and transformations. We show that a particular subset of SDs, satisfying a focusing property, has theoretical capabilities very close to those of Decomposable Negation Normal Forms (DNNFs), although they do not satisfy the decomposability property stricto sensu.
1 Introduction Constraint Satisfaction Problems (CSPs) [RBW06] offer a powerful framework for representing a great variety of problems, e.g. planning or configuration problems. Different kinds of operations can be posted on a CSP, such as extraction of a solution (the most classical query), strong consistency of the domains, addition or retraction of new constraints (dynamic CSP), counting of the number of solutions, and even combinations of these operations. For instance, the interactive solving of a configuration problem amounts to a series of (unary) constraints additions and retractions while maintaining the strong consistency of the domains, i.e. each value in a domain is involved in at least one solution. Most of these operations are NP-hard, but must sometimes be addressed online. A possible way of solving this contradiction is to use knowledge compilation, which consists in transforming the problem offline in such a way that its online resolution becomes tractable. As a matter of fact, Multivalued Decision Diagrams (MDDs) [SKMB90, M. Croitoru et al. (Eds.): GKR 2011, LNAI 7205, pp. 137–171, 2012. c Springer-Verlag Berlin Heidelberg 2012
138
A. Niveau et al. 1
x1
x2
3
x3
2
=
=
1
x1
2
3
x3
=
1
x3
x2
3
x2
3
2
2 x3
x2 1
Fig. 1. This figure shows the constraint graph (on the left) of the 3-coloring problem (the domain of the variables is {1, 2, 3}), and an MDD (on the right) representing the set of solutions of this CSP {0, 1, . . . , 10}
x(3)
{0, 1, . . . 8} x {1, 3, 6}
{0}
∨ {0}
z
y y
{15}(5) {0}
z
(2)
{2, 7} {3}(4)
{1, 4} {7, 8}
x
{1} {0}
z
{6}(4)
y
(1)
y
(1)
{3, 4} {1} {1}
Fig. 2. An example of non-reduced SD. Variable domains are all {0, 1, . . . , 10}. The two nodes marked (1) are isomorphic; node (2) is stammering; node (3) is undecisive; the edges marked (4) are contiguous; edge (5) is dead.
Vem92, KVBSV98, AHHT07] have been proposed as a way to “compile” CSPs, and successfully used in product configuration [AFM02]. Figure 1 shows how an MDD can represent the set of solutions of a CSP. In the present paper, we investigate this landscape by capturing these existing compilation structures as subsets of a more general framework called “set-labeled diagrams”. The latter also covers new structures relaxing the requirements of determinism and ordering, which, as we show, can lead to exponentially more compact graphs without losing much in efficiency. In particular, we identify a subset of set-labeled diagrams that has theoretical capabilities very close to those of DNNFs (Decomposable Negation Normal Forms) [DM02], although it does not satisfy decomposability stricto sensu. Moreover, while most of the operations considered in classical knowledge compilation maps deal with reasoning problems, we introduce in the present map a few new operations, that are motivated by the use of the CSP framework for some more decision-oriented applications, such as planning and configuration. Proofs are gathered in Appendix A and B.
2 Set-Labeled Diagrams Let us first formally define set-labeled diagrams, their interpretation, and their place among other knowledge compilation languages.
Representing CSPs with Set-Labeled Diagrams: A Compilation Map
139
2.1 Structure and Semantics The definition of set-labeled diagrams is similar to the one of classical decision diagram structures: Definition 2.1 (Set-labeled diagram). A set-labeled diagram (SD) is a couple φ X, Γ , with
=
– X (also denoted Var(φ)) a finite and totally ordered set of variables whose domains are finite sets of integers; – Γ a directed acyclic graph1 with at most one root and at most one leaf (the sink). Non-leaf nodes are labeled by a variable of X or by the disjunctive symbol ⊻. Each edge is labeled by a finite subset of N. This definition allows the graph to be empty (no node at all, only case when there is not exactly one root and one leaf) or to contain only one node (together root and sink). It does not prevent edges’ labels to be empty, and ensures that every edge belongs to at least one path from the root to the sink. Figure 2 gives an example of SD. W will know introduce useful notations: For x ∈ X, Dom(x) ⊆ N denotes the domain of x. By convention, Dom(⊻) = {0}. For Y = {y1 , . . . , yk } ⊆ X, such that the yi are sorted in ascending order, Dom(Y ) denotes Dom(y1 ) × · · · × Dom(yk ), and y denotes a Y -assignment of variables from Y , i.e. y ∈ Dom(Y ). When Y ∩ Z = ∅, y .z is the concatenation of y and z. Last, y (yi ) denotes the value assigned to yi in y (by convention, y (⊻) = 0). Let φ = X, Γ be a set-labeled diagram, N a node and E an edge in Γ ; let us use the following notations: – Root(φ) the root of Γ and Sink(φ) its sink; – |φ| the size of φ, i.e. the sum of the cardinalities of all labels in φ plus the cardinalities of the variables’ domains; – Outφ (N ) (resp. Inφ (N )) the set of outgoing (resp. incoming) edges of N ; – Varφ(N ) the variable labeling N (not defined for Sink(φ)); – Srcφ (E) the node from which E comes and Dest(E) the node to which E points; – Lblφ (E) the set labeling E; – Varφ(E) = Varφ (Src(E)) the variable associated to E. We shall drop the φ subscript whenever there is no ambiguity. An SD can be seen as a compact representation of a Boolean function over discrete variables. This function is the interpretation function of the set-labeled diagram: Definition 2.2 (Semantics of a set-labeled diagram). A set-labeled diagram φ on X = Var(φ) represents a function I(φ) from Dom(X) to {⊤, ⊥}, called the interpretation function of φ, and defined as follows: for a given X-assignment x, I(φ)(x) = ⊤ if and only if there exists a path p from the root to the sink of φ such that for each edge E along p, x(Var(E)) ∈ Lbl(E). We say that x is a model of φ whenever I(φ)(x) = ⊤. Mod(φ) denotes the set of models of φ. φ is said to be equivalent to another SD ψ (denoted φ ≡ ψ) iff Mod(φ) = Mod(ψ). 1
Actually, depending on the definition Γ may not strictly be a graph, but rather a multigraph, since we allow two edges to go in parallel from one node to another (see e.g. Figure 2): the set of edges is a subset of N × N × 2N , N being the set of nodes.
140
A. Niveau et al. {2, 7}
{0, 1, . . . , 8} x
y
(N D)
{1, 3, 6}
∨
(N D)
{0}
y {0}
{1, 4} {7, 8}(†) z
(2 )
{3, 6}(4 ) x
{1} {0}
{3, 4}
z
y (1 )
{1}(N F )
Fig. 3. In this SD, all edges are focusing but the one marked (N F ) (it is not included in the one marked (†) ), and all nodes are deterministic but the ones marked (N D ) . This SD is the reduced form of the SD presented in Figure 2: isomorphic nodes marked (1) have been merged into node ′ (1′ ) , stammering node (2) has been collapsed into node (2 ) , contiguous edges marked (4) have ′ been merged into edge (4 ) , and undecisive node (3) and dead edge (5) have been removed.
Note that the interpretation function of the empty SD always returns ⊥, since it contains no path from the root to the sink. Conversely, the interpretation function of the one-node SD always returns ⊤, since in the one-node SD, the only path from the root to the sink contains no edge. From these two definitions it follows that SDs are strongly related to ordered binary decision diagrams [Bry86] and multivalued decision diagrams [SKMB90, Vem92, APV99, AHHT07] as a way to represent a set of assignments of discrete variables (or typically, the set of solutions of a CSP). They actually generalize these data structures twofold. First, there is no restriction on the order in which the variables are encountered along a path, and variables can be repeated along a path. Second, SDs are not necessarily deterministic: the sets labeling edges going out of a node are not due to be pairwise disjoint, and thus a single model can be captured by several paths. SDs even support pure non-deterministic “OR” nodes (the ⊻-nodes) that allow the unrestricted union of several subgraphs. Putting away these two restrictions is valuable both theoretically, to generalize a large class of data structures, and practically, since SDs can be more compact than their ordered and deterministic variants (see Section 3.1). Let us define determinism formally and then introduce useful concepts. Definition 2.3 (Deterministic set-labeled diagrams). A node N in a set-labeled diagram φ is deterministic if the sets labeling its outgoing edges are pairwise disjoint. A deterministic set-labeled diagram (dSD) is an SD containing only deterministic nodes. The notion of determinism is illustrated on Figure 3. Definition 2.4 (Consistency, validity, context). Let φ be a set-labeled diagram on X. φ is said to be consistent (resp. valid) if and only if Mod(φ) = ∅ (resp. Mod(φ) = Dom(X)). A value v ∈ N is said to be consistent for a variable y ∈ X in φ if and only if there exists an X-assignment x in Mod(φ) such that x(y) = v. The set of all consistent values for y in φ is called the context of y in φ and denoted Ctxtφ (y). We will see in the following that deciding whether an SD is consistent is not tractable. One of the reasons is that the sets restricting a variable along a path can be conflicting,
Representing CSPs with Set-Labeled Diagrams: A Compilation Map
{1, 6, 7} {2, 3, 9}
z
x {1, 3, 6}
y
{1, 2, 3} {0, 2} {3, 5}
z y
{8, 9}
{7, 8, 10}
x x
141
{8, 10}
{0, 1}
Fig. 4. Before an SD is proven inconsistent, every path must be checked. Here is an example of SD whose every path is inconsistent.
hence in the worst case all paths must be explored before a consistent one is found. Figure 4 shows an example of SD with no consistent path at all. To avoid this, we will consider SDs in which labeling sets can only shrink from the root to the sink, thus preventing conflicts: Definition 2.5 (Focusing set-labeled diagrams). A focusing edge in a set-labeled diagram φ is an edge E such that all edges E ′ on a path from the root of φ to Src(E) such that Var(E) = Var(E ′ ) verify Lbl(E) ⊆ Lbl(E ′ ). A focusing set-labeled diagram (FSD) is an SD containing only focusing edges. The notion of focusing edge is illustrated in Figure 3. It is sufficient for the consistency query to be polynomial, but for some other operations (such as obtaining the conjunction of two SDs), a stricter restriction is necessary. An interesting one, very common in knowledge compilation, is to impose an order on the variables encountered along the paths; applying to SDs, we recover Multivalued Decision Diagrams (MDDs) in their practical acception2 [SKMB90, Vem92, AHHT07]. Definition 2.6 (Ordered diagrams). Let < be a total order on the variables of X. A set-labeled diagram is said to be ordered w.r.t. < iff, for any two nodes N and M , if N is an ancestor of M then Var(N ) < Var(M ). A dSD ordered w.r.t. < is called an MDD< . The language MDD is the union of all MDD< languages.3 Obviously, if there are not two occurrences of the same variable in a path, all edges are focusing. Hence: MDD< ⊆ MDD ⊆ dFSD ⊆ FSD ⊆ SD and dFSD ⊆ dSD ⊆ SD. We will study in the next sections the main properties of the SD family, and their relationships with classical Boolean decision diagrams. But before that, let us show how to reduce an SD in order to make it as compact as possible — and save space. 2.2 Reduction Like a BDD, an SD can be reduced in size without changing its semantics. Reduction is based on several operations; some of them are straightforward generalizations of 2
3
The original definition of MDDs does not require determinism, nor introduces an order on the variables. Nevertheless, the papers resorting to these structures work only with ordered and deterministic MDDs; that is why we abusively designate ordered dSDs as MDDs. A language is a set of graph structures, fitted up with an interpretation function. We denote SD the language of SDs, dSD the language of dSDs, and so on.
142
A. Niveau et al.
those introduced in the context of BDDs [Bry86], namely merging isomorphic nodes (that are equivalent) and collapsing undecisive edges (that are always crossed), while others are specific to set-labeled diagrams, namely suppressing dead edges (that are never crossed), merging contiguous edges (that have the same source and the same destination) and collapsing stammering nodes (successive decisions that pertain to the same variable). All these notions are illustrated in the SD of Figure 2, and the reduced form of this SD is shown on Figure 3. Formally: Definition 2.7. – Two edges E1 , E2 are contiguous iff Src(E1 ) = Src(E2 ) and Dest(E1 ) = Dest(E2 ). – Two nodes N1 , N2 are isomorphic iff Var(N1 ) = Var(N2 ) and there exists a bijection σ from Out(N1 ) onto Out(N2 ), such that ∀E ∈ Out(N1 ), Lbl(E) = Lbl(σ(E)) and Dest(E) = Dest(σ(E)). – An edge E is dead iff Lbl(E) ∩ Dom(Var(E)) = ∅. – A node N is undecisive iff | Out(N )| = 1 and E ∈ Out(N ) is such that Dom(Var(E)) ⊆ Lbl(E). – A non-root node N is stammering iff all parent nodes of N are labeled by Var(N ), and either E∈O u t (N ) |E| = 1 or E∈In(N ) |E| = 1. Definition 2.8 (Reduced form). A set-labeled diagram φ is said to be reduced iff no node of φ is isomorphic to another, stammering, or undecisive; and no edge of φ is dead or contiguous to another. In the following, we can consider only reduced SDs since reduction can be done in time polynomial in their size; indeed, each reduction step (removal of isomorphic nodes, contiguous edges, etc.) is polytime and removes more nodes and edges than it adds, hence even if we have to traverse the graph several times, the global complexity remains polynomial. Proposition 2.9 (Reduction). Let L be a sublanguage of SD among {SD, FSD, dSD, dFSD, MDD, MDD< }. There exists a polytime algorithm that transforms any φ in L into an equivalent reduced φ′ in L such that |φ′ | ≤ |φ|. We have seen that SDs are strongly related to BDDs and MDDs; we will now detail these relations. 2.3 SDs and the Decision Diagram Family Binary Decision Diagrams (BDDs, [Bry86]) are rooted, directed acyclic graphs that represent Boolean functions of Boolean variables. They have two leaves, respectively labeled ⊥ and ⊤; their non-leaf nodes are labeled by a Boolean variable and have two outgoing edges, respectively labeled ⊥ and ⊤. A free BDD (FBDD) is a BDD that satisfies the read-once property (each path contains at most one occurrence of each variable). Whenever a same order is imposed on the variables along every path, we get an ordered BDD (OBDD). OBDDs have been extended to enumerated domains as MDDs by [SKMB90, Vem92, APV99] and later on, worked out by [AHHT07]. SDs are obviously not decision diagrams in the sense of Bryant since they do not have a ⊥ sink, but classical MDDs do not either. Adding or not such a sink is actually
Representing CSPs with Set-Labeled Diagrams: A Compilation Map
143
harmless, and does not represent a real difference. The first main difference between decision diagrams and SDs is that SDs can be non-deterministic. Relationships between SDs and their Boolean counterparts are formally provided thereafter. Definition 2.10 (Polynomial translatability). A sublanguage L2 of SD is polynomially translatable into another sublanguage L1 of SD, which we denote L1 ≤P L2 , if and only if there exists a polytime algorithm mapping any element from L2 to an equivalent element from L1 . For any subclass L of SD, any D ⊆ N, let LD be the sublanguage of L for which all domains are included in D. We will consider in particular classes dSD{0,1} , dFSD{0,1} , and FSD{0,1} , that generalize BDD, FBDD, and DNF respectively. When an order is imposed on the variables, MDD{0,1} and OBDD are equivalent representation languages, up to the existence of a ⊥ sink, which, once again, is harmless. More generally, [SKMB90] have shown that a log encoding of the domains allow to transform any MDD into an equivalent OBDD, providing by the way a convenient way to implement a MDD package on top of a BDD package. Let us put the emphasis on the new languages, namely focusing SDs: Proposition 2.11 (FSD{0,1} ≤P DNF). Any formula in the DNF language can be expressed in the form of a FSD{0,1} in linear time. Proposition 2.12 (dFSD{0,1} ≤P FBDD). Any FBDD (and thus any OBDD) can be expressed in the form of an equivalent dFSD{0,1} in time linear in the FBDD’s size. dFSD actually generalizes FBDD, and we will see that it reaches the same performances as this fragment, except for the counting query. But it is worth noticing that, contrary to FBDD, dFSD allows a variable to be met twice on a path. dFSD{0,1} is thus a proper superset of FBDD. FSD are more general than usual MDD compilations of CSPs, since they do not require any order nor even determinism; we will see in the following that this can lead to exponential savings in space. Last, but not least, it should be noticed that dFSDs and FSDs are not decomposable structures in the sense of Negation Normal Forms (NNFs). Indeed, the definition of decomposability [Dar01], when applied to a decision diagram, implies that variables cannot be repeated along a path. Since they are not decomposable, dFSDs do not define a subclass of AND/OR MDDs [MD06], of structured DNNFs [PD08], nor of tree automata [FV04], that are decomposable (and ordered) structures.
3 A Compilation Map of SDs In the following, we put forward the properties of the SD language and its sublanguages, according to their spatial efficiency (succinctness), and to their capacity to make queries and transformations tractable.
144
A. Niveau et al.
Table 1. Results about succinctness. ∗ means that the result is true unless the polynomial hierarchy PH collapses. The graph above illustrates those results, dotted lines meaning that we lack of information to prove both directions (i.e. it is proven that one of the languages is at least as succinct as the other, but it isn’t known whether the converse is true or false). ≤s
SD
dSD