229 53 3MB
English Pages 290 Year 2006
STAIRS 2006
Frontiers in Artificial Intelligence and Applications FAIA covers all aspects of theoretical and applied artificial intelligence research in the form of monographs, doctoral dissertations, textbooks, handbooks and proceedings volumes. The FAIA series contains several sub-series, including “Information Modelling and Knowledge Bases” and “Knowledge-Based Intelligent Engineering Systems”. It also includes the biennial ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, R. Dieng, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras, R. Mizoguchi, M. Musen and N. Zhong
Volume 142 Recently published in this series Vol. 141. G. Brewka et al. (Eds.), ECAI 2006 – 17th European Conference on Artificial Intelligence Vol. 140. E. Tyugu and T. Yamaguchi (Eds.), Knowledge-Based Software Engineering – Proceedings of the Seventh Joint Conference on Knowledge-Based Software Engineering Vol. 139. A. Bundy and S. Wilson (Eds.), Rob Milne: A Tribute to a Pioneering AI Scientist, Entrepreneur and Mountaineer Vol. 138. Y. Li et al. (Eds.), Advances in Intelligent IT – Active Media Technology 2006 Vol. 137. P. Hassanaly et al. (Eds.), Cooperative Systems Design – Seamless Integration of Artifacts and Conversations – Enhanced Concepts of Infrastructure for Communication Vol. 136. Y. Kiyoki et al. (Eds.), Information Modelling and Knowledge Bases XVII Vol. 135. H. Czap et al. (Eds.), Self-Organization and Autonomic Informatics (I) Vol. 134. M.-F. Moens and P. Spyns (Eds.), Legal Knowledge and Information Systems – JURIX 2005: The Eighteenth Annual Conference Vol. 133. C.-K. Looi et al. (Eds.), Towards Sustainable and Scalable Educational Innovations Informed by the Learning Sciences – Sharing Good Practices of Research, Experimentation and Innovation Vol. 132. K. Nakamatsu and J.M. Abe (Eds.), Advances in Logic Based Intelligent Systems – Selected Papers of LAPTEC 2005 Vol. 131. B. López et al. (Eds.), Artificial Intelligence Research and Development Vol. 130. K. Zieliński and T. Szmuc (Eds.), Software Engineering: Evolution and Emerging Technologies
ISSN 0922-6389
STAIRS 2006 Proceedings of the Third Starting AI Researchers’ Symposium
Edited by
Loris Penserini ITC-irst, Trento, Italy
Pavlos Peppas University of Patras, Greece
and
Anna Perini ITC-irst, Trento, Italy
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
© 2006 The authors. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 1-58603-645-9 Library of Congress Control Number: 2006929618 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: [email protected] Distributor in the UK and Ireland Gazelle Books Services Ltd. White Cross Mills Hightown Lancaster LA1 4XS United Kingdom fax: +44 1524 63232 e-mail: [email protected]
Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail: [email protected]
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
v
Preface STAIRS 2006 is the third European Starting AI Researcher Symposium, an international meeting aimed at AI researchers, from all countries, at the beginning of their career: PhD students or people holding a PhD for less than one year. A total of 59 papers were submitted from 22 different countries: Australia, Austria, Brazil, Canada, Czech Republic, France, Germany, Greece, India, Ireland, Italy, Japan, México, Netherlands, Serbia and Montenegro, Scotland, Slovenia, Spain, Sweden, Switzerland, UK, USA. The areas of the submitted and accepted papers/posters ranges form traditional AI areas, such as Knowledge Representation, Machine Learning, Natural Language Processing, Constraints, Planning and Scheduling, Agents, to AI applications, such as Semantic Web, Data Clustering for medical application, E-learning. The program committee decided to accept 20 full papers, which amounts to an acceptance rate of 33.8%, and 15 posters. The STAIRS 2006 best paper award, sponsored by IOS, goes to Giorgos Flouris, Dimitris Plexousakis, and Grigoris Antoniou for their paper: On Generalizing the AGM Postulate Congratulations! The symposium programme includes two invited talks and technical sessions in which the accepted papers and posters will be presented and discussed. We thank the authors, the PC members and the additional reviewers for making STAIRS 2006 a high-quality scientific event. We would like also to thank EasyChair for the courtesy license of its conference management software. STAIRS 2006 has benefited from ECAI 2006 in terms of administration support, grants, and sponsorship. June 2006
Loris Penserini Pavlos Peppas Anna Perini
This page intentionally left blank
vii
Symposium Chairs Loris Penserini, SRA, ITC-irst, Trento, Italy Pavlos Peppas, University of Patras, Greece Anna Perini, SRA, ITC-irst, Trento, Italy
Programme Committee Maurice Pagnucco, Australia Loris Penserini, Italy Mikhail Prokopenko, Australia Jochen Renz, Australia Panagiotis Rondogiannis, Greece Andrea Sboner, Italy Torsten Schaub, Germany Luca Spalazzi, Italy Steffen Staab, Germany Geoff Sutcliffe, USA Armando Tacchella, Italy Valentina Tamma, UK Thanassis Tiropanis, Greece Isabel Trancoso, Portugal George Vouros, Greece Renata Wassermann, Brazil Mary-Anne Williams, Australia Nirmalie Wiratunga, UK
Eyal Amir, USA Anbulagan A, Australia Grigoris Antoniou, Greece Silvia Coradeschi, Sweden Alessandro Cucchiarelli, Italy Jim Delgrande, Canada Sofoklis Efremidis, Greece Norman Foo, Australia Rosella Gennari, Italy Dina Goren-Bar, Israel Renata Guizzardi, Italy Costas Koutras, Greece Ornella Mich, Italy Dunja Mladenic, Slovenia Leora Morgenstern, USA Abhaya Nayak, Australia Antreas Nearchou, Greece Eva Onaindia, Spain Mehmet Orgun, Australia
Additional Reviewers Christian Anger Paolo Busetta João Paulo Carvalho Sutanu Chakraborti Alfredo Gabaldon Manolis Gergatsoulis
Randy Goebel Giancarlo Guizzardi Lars Karlsson Kathrin Konczak Rahman Mukras Christos Nomikos
Amandine Orecchioni Nikolaos Papaspyrou Joana L. Paulo Eugene Santos Josef Urban
This page intentionally left blank
ix
Contents Preface Loris Penserini, Pavlos Peppas and Anna Perini Conference Organization
v vii
FULL PAPERS Agents Cognitive Learning with Automatic Goal Acquisition Josef Kittler, Mikhail Shevchenko and David Windridge
3
Semantics of Alan Francesco Pagliarecci
14
A Compact Argumentation System for Agent System Specification Insu Song and Guido Governatori
26
Social Responsibility Among Deliberative Agents Paolo Turrini, Mario Paolucci and Rosaria Conte
38
Information Retrieval Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level Alessandro G. Di Nuovo
50
Tuning the Feature Space for Content-Based Music Retrieval Aleksandar Kovačević, Branko Milosavljević and Zora Konjović
62
Personalizing Trust in Online Auctions John O’Donovan, Vesile Evrim, Barry Smyth, Dennis McLeod and Paddy Nixon
72
Information Systems An Hybrid Soft Computing Approach for Automated Computer Design Alessandro G. Di Nuovo, Maurizio Palesi and Davide Patti
84
FUNEUS: A Neurofuzzy Approach Based on Fuzzy Adaline Neurons Constantinos Koutsojannis and Ioannis Hatzilygeroudis
96
Empirical Evaluation of Scoring Methods Luca Pulina
108
Knowledge Representation Binarization Algorithms for Approximate Updating in Credal Nets Alessandro Antonucci, Marco Zaffalon, Jaime S. Ide and Fabio G. Cozman
120
x
On Generalizing the AGM Postulates Giorgos Flouris, Dimitris Plexousakis and Grigoris Antoniou
132
The Two-Variable Situation Calculus Yilan Gu and Mikhail Soutchanski
144
Base Belief Change and Optimized Recovery Frances Johnson and Stuart C. Shapiro
162
Machine Learning Unsupervised Word Sense Disambiguation Using the WWW Ioannis P. Klapaftis and Suresh Manandhar
174
Relational Descriptive Analysis of Gene Expression Data Igor Trajkovski, Filip Zelezny, Nada Lavrac and Jakub Tolar
184
Scheduling Solving Fuzzy PERT Using Gradual Real Numbers Jérôme Fortin and Didier Dubois
196
Approaches to Efficient Resource-Constrained Project Rescheduling Jürgen Kuster and Dietmar Jannach
208
Semantic Web A Comparison of Web Service Interface Similarity Measures Natallia Kokash
220
Finding Alternatives Web Services to Parry Breakdowns Laure Bourgois
232
POSTERS Smart Ride Seeker Introductory Plan Sameh Abdel-Naby and Paolo Giorgini
247
Spam Filtering: The Influence of the Temporal Distribution of Training Data Anton Bryl
249
An Approach for Evaluating User Model Data in an Interoperability Scenario Francesca Carmagnola and Federica Cena
251
On the Improvement of Brain Tumour Data Clustering Using Class Information Raúl Cruz and Alfredo Vellido
253
Endowing BDI Agents with Capability for Modularizing Karl Devooght
255
Rational Agents Under ASP in Games Theory Fernando Zacarías Flores, Dionicio Zacarías Flores, José Arrazola Ramírez and Rosalba Cuapa Canto
257
xi
Automatic Generation of Natural Language Parsers from Declarative Specifications Carlos Gómez-Rodríguez, Jesús Vilares and Miguel A. Alonso Reconsideration on Non-Linear Base Orderings Frances Johnson and Stuart C. Shapiro
259 261
Dynamic Abstraction for Hierarchical Problem Solving and Execution in Stochastic Dynamic Environments Per Nyblom
263
A Comparison of Two Machine-Learning Techniques to Focus the Diagnosis Task Oscar Prieto and Aníbal Bregón
265
Argumentation Semantics for Temporal Defeasible Logic Régis Riveret, Guido Governatori and Antonino Rotolo NEWPAR: An Optimized Feature Selection and Weighting Schema for Category Ranking Fernando Ruiz-Rico and Jose-Luis Vicedo Challenges and Solutions for Hierarchical Task Network Planning in E-Learning Carsten Ullrich and Okhtay Ilghami
267
269 271
INVITED TALKS Artificial Intelligence and Unmanned Aerial Vehicles Patrick Doherty
275
Writing a Good Grant Proposal Chiara Ghidini
276
Author Index
277
This page intentionally left blank
Full Papers
This page intentionally left blank
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
3
Cognitive learning with automatic goal acquisition Josef KITTLER, Mikhail SHEVCHENKO and David WINDRIDGE Center for Vision, Speech and Signal Processing University of Surrey, Guildford GU2 7XH, United Kingdom {j.kittler,m.shevchenko,d.windridge}@surrey.ac.uk Abstract. Traditional algorithms of machine learning implemented in cognitive architectures demonstrate lack of autonomous exploration in an unknown environment. However, the latter is one of the most distinctive attributes of cognitive behaviour. The paper proposes an approach of self-reinforcement cognitive learning combining unsupervised goal acquisition, active Markov-based goal attaining and spatial-semantic hierarchical representation within an open-ended system architecture. The novelty of the method consists in division of goals into the classes of parameter goal, invariant goal and context goal. The system exhibits incremental learning in such a manner as to allow effective transferable representation of highlevel concepts. Keywords. Cognitive architectures, Autonomous Robotics, Unsupervised Learning, Markov Decision Process
1. Introduction The field of Artificial Cognition has arisen recently with the intention of developing agents capable to perceive, explore, learn and perform reasoning about the external world. A number of machine learning approaches have been adopted in order to make agent behaviour cognitive. However, most existing systems rely on an external expert guiding the process of learning, either explicitly defining correct examples (supervised methods) or providing rewards (reinforcement learning) [1,2]. Fully autonomous world exploration and self-driven learning have not yet been achieved. Nevertheless, some successful progress towards developing cognitive agents can be found in the approaches of Open-Ended Architecture, Simultaneous Localization and Mapping (SLAM Robotics) and Spatial-Semantic Hierarchy. Most of the open-ended architectures employ Hierarchical Perception-Action Approach, first introduced by Brooks [3], followed by OPAL architecture [4] and Low-Level Perception-Action Linking [5]. This direction is primarily intended to mimic connection between perception and action found in living organisms. There also has been suggested a subsumption principle which is based on hierarchical control of behaviour wherein abstract symbolic models emerge by bottom-up inference and the higher levels adjust the lower via inhibition. In SLAM robotics the number of cognitive tasks is constrained by a particular problem of
4
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
determining robot location. The main issue here is to overcome accumulating perceptual uncertainty of measuring distances and directions of movement. Usually the robot starts with building a primitive environmental model which is being refined via exploration. The most successful approaches in this area are given by Bayesian mapping [6]. The spatial-semantic hierarchy (SSH) principle [7] proposes to find distinctive perceptual states and intentionally move towards those attractive places, regions where is probably no ambiguity of visual description. By doing so an agent builds up a map of discrete locations transferring its internal representation from the continuous sensor-motor space to a discrete domain of perceptual states with allowed actions. Investigating the problem of creating hierarchical spatial-semantic representation of an unknown environment we propose an approach replacing a number of learning methods used for the SSH framework [8] by a universal mechanism of cognition within an open-ended architecture. The method of distinctive state detection is an internal unsupervised capturing of spatial goals taken instead of the external mechanism of rewards necessary for traditional reinforcement learning. A novel feature of our technique is introducing three different classes of goal representation: parametric, invariant and contextual. A parametric goal is the one that detected at the low-level by clustering perceptual information. Policies, the functions that define system reaction to the given input, are represented as a hierarchical structure of perception-action transformations. In order to associate a policy with the detected goal the latter is transferred into an invariant goal. A context goal is a projection of the invariant goal onto the current visual context used for applying the policy to perform the action. The system starts operating from acquiring simple goals and learning low-level policies bootstrapping itself by reusing obtained information on each next level of the hierarchy. The algorithm of learning is based on Markov Decision Process derived from reinforcement learning technique. In Section 2 we present a method of unsupervised goal acquisition and introduce different goal representations within the self-reinforcement system framework. Section 3 is devoted to the problem of learning policies by online analysis of distances to the goal. Section 4 is the learning scenario where the self-rewarding open-ended architecture is applied for the particular kind of visual environment. Finally, Section 5 discusses the results and future work.
2. Unsupervised goal acquisition A typical reinforcement learning task consists of a set of environment states {S}, actions {ΔM }, and scalar reinforcement commands {r(1) , . . . r(k) } . At each iteration the system perceives state S i , generates action ΔM i and receives new state S i+1 with reward ri+1 (Figure 1 (A)). The learning process is intended to derive policy Q mapping states to actions which maximizes the cumulative reward: R=
ri .
(1)
i
In the self-reinforcement approach the external expert giving rewards to the system is
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
5
Figure 1. Classical reinforcement (A) and self-reinforcement (B) frameworks
replaced by the internal mechanism which provides feedback to the learning module. The key principle of self-reinforcement is unsupervised acquisition of the behavioural goal. Suppose the system has a visual input and perceptual state S ci is a vector of visual features fj : S ci = {fj }.
(2)
We introduce an invariant system goal as a perceptual state S g which contains one or more significant perceptual features {fσg }σ=1,... – parameter goals. The algorithm of parameter goal detection is automatic selection of distinctive peaks on feature frequency histograms (FFHs). FFHs are calculated by integrating features perceived by the system over a set of perception-action iterations. For any feature fs its FFH I(fs ) is: 1 fs = fij I(fs ) = 0 fs = fij i,j
j = 1 . . . n(i) ,
i = 1 . . . N.
(3)
where n(i) is the number of features detected on ith step; N is the number of interations. A visual state with the given invariant goal and other feature vector components taken from the current visual scene is considered as a context goal. The invariant goal does not depend on the current visual context; it is used for representing detected visual structures or events together with the corresponding policies of perception-action mapping as symbolic entities at the high-level. The context goal implements its invariant prototype when the system applies a high-level command to generate action. Invariant goals can be combined into a complex invariant goal generating new kinds of symbolic classes and correspondent novel behaviour. Figure 1(B) demonstrates the self-reinforcement framework. The current perceptual state S ci is the input to the goal detector (1) that returns parameter goal S g . Both, S ci and S g are taken by the internal rewarding module (2) generating reinforcement signal ri . Finally, ri and S ci feed the procedure of learning policies (3).
6
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
3. Attaining goals by action 3.1. Goal-driven reinforcement The mechanism of internal reinforcement operates only with the parameter goal. The rewarding signal is proportional to the difference between goal distances calculated for current visual state S ci : di = d(fi,j , f g ), and for next one S ci+1 : di+1 = d(fi+1,j , f g ), obtained after policy Q has been applied: ri = k(di+1 − di )
(4)
where k is the weight factor and f g is the goal feature. The algorithm of calculating the parameter distance depends on particular representation of visual scenes (see Section 4 for details). Further, the method for learning policies by attaining visual goals is based on binary reinforcement signal: r = {0, 1}, where factor k is calculated as following: k=
0 di+1 − di ≥ 0 1/(di+1 − di ) di+1 − di < 0
(5)
The last equation defines those actions that cause decreasing of the goal distance as successful and the system generates self-reward: r = 1. Any other policy outputs and actions generated thereafter are punished by r = 0. In the next section we explicitly include the goal distance into the algorithm instead of considering the separate reward generating mechanism. Since our analysis will be concerned with abstract models for optimal policies the parameter goal distance is replaced by its invariant analogue: Di = D(S ci , S g ). The algorithm of calculating Di , as well as for the parameter distance, depends on particular feature vector representation and will be discussed in the experimental section. 3.2. Finding a policy Suppose that the system has detected an invariant goal and it is not equal to the current visual state S ci . Any physical action ΔM i changes the state of the external world that, in its turn, modifies current perceptual state S ci+1 . We denote this transformation which carries a sense of the physical model of the external world as L: L:
{M} → {S},
S ci+1 = L(ΔM i ).
(6)
A system internal transformation containing the learning mechanism is represented as another function mapping a perceptual state onto a system response (Figure 2):
Q:
{S} → {M},
ΔM i+1 = Q(S ci+1 ).
(7)
The objective of the algorithm is to find a function Qg which generates an appropriate action moving the system towards the invariant goal state S g . Formally we can consider the task as following:
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
7
Figure 2. Transformations L and Q
1. Prove that a policy Qg (S c ) exists such that the sequence S c[i] converges on the goal S c[i] → S g for a finite number of steps i with given L 2. Find a policy Qg (S c ) such that the sequence S c[i] converges on the goal S c[i] → S g for a minimal number of steps i, any initial state S c0 and given L The default model for Qg is based on repeating successful actions obtained by random trials. The first movement is purely random: Qg (S c0 ) ∈ {M},
∀S c0 ∈ {S}.
(8)
If the trial is followed by decreasing of the invariant goal distance then, on the next step, the system perform the same action; if not, then Qg (S c1 ) will be chosen randomly again. For any i: Qg (S ci+1 ) =
∈ {M}, Qg (S ci ),
Di ≥ Di−1 Di < Di−1
Di = D(S ci , S g ).
(9)
(10)
The process goes on until the distance D becomes small DK = D(S cK , S g ) ≤ ε.
(11)
The sequence of the movements and the corresponding perceptual states converging on the goal provide the system with a set of samples that form a primitive mapping from perception to action. Our objective is to find a function which transforms any percept into the movement. The transformation must be optimal and invariant to the starting visual state. The process of finding such a policy has two stages. Firstly, the most significant visual states are detected by calculating their frequencies (or probabilities) within the sample sequence. The selected key states build up initial policy Qg . Secondly, Qg is being refined by setting up other experiments, obtaining new samples and updating the state probabilities in order to find the states which do not depend on the starting configuration.
8
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
3.3. Acquiring a primitive model Suppose that Pr is a probability to find perceptual state S cr in sequence S c[i] , i = 1 . . . N , such that Pr =
K
(S cr = S ci )/N.
(12)
i=1
The visual states having high values of Pr are treated as the significant ones and added to model keystate list {S 0t }: Pr > δ :
S 0t = S cr ,
S cr ∈ {S ci }
(13)
where δ is the threshold of "significance". The response of policy Qg is defined by sample Qg (S ca ) if there exists index a which satisfies the following:
D(S 0a , S g ) < D(S c , S g ) . mina=1...T {D(S c , S 0a )}
(14)
If such an index is not found then the response is the action corresponding to the goal state: g 0 Q (S a ), ∃a c g Q (S ) = . (15) ✚ Qg (S g ), ✚ ∃a Obviously, the state probabilities for the first run give the same value for any visual state in the sample sequence. All of them are taken as the keystates defining the initial model of perception-action transformation. This can already be considered an improvement since the implementation of Qg converges on the goal. During the following series of runs with different starting configurations the policy will be refined by updating the model keystate list. 3.4. Improving the model Let’s suppose that after the first trial primitive model Qg0 has been sampled 1 . We also have the current perceptual state S c0 which is the starting point of the next experiment. Since the system already has the model, even if it is a primitive one, the first movement is not taken randomly: it is the corresponding value of Qg0 (S c0 ) : Qg1 (S c0 ) = Qg0 (S c0 ).
(16)
The rest of the procedure which takes the sample Qg1 (S ci ) is the same as for the first experiment (see eq. 9) except that the previously obtained model is used instead of random trials: 1 New
indexing for policy Qg is introduced here in order to denote current run
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
Qg1 (S ci+1 ) =
Qg0 (S ci ), Qg1 (S ci ),
Di ≥ Di−1 . Di < Di−1
9
(17)
The model update is done by recalculating the state probabilities for the current sample within distribution Pr : Pr =
Pr,0 + Pr,1 2
(18)
where Pr,0 and Pr,1 are the state probability distributions for the first and second experiments respectively. Without losing generality we can write the algorithm of model update for the nth run: Qgn (S c0 ) = Qgn−1 (S c0 )
Qgn (S ci+1 )
Qgn−1 (S ci ), = Qgn (S ci ),
Pr =
Di ≥ Di−1 Di < Di−1
Pr,n−1 + Pr,n . 2
(19)
(20)
(21)
4. Goal detection scenario for learning motor control 4.1. Visual representation and distance measuring We carried out our experiments on software simulating the physical ”world” as well as the system itself. The world is a 2D square box which has boundaries that restrict movements of the manipulator. The system motor domain has four DOFs defining position of the arm: length R, angle φ, gripper angle θ and gripper state γ: M = (R, φ, θ, γ)
(22)
The visual system performs attractor detection, visual attractor description, organizing the visual memory and recognition. The mechanism of discovering attractors is based on motion detection and tracking. It eliminates static, unknown "background" objects from the processing. For instance, during the random exploration mode this mechanism takes into account only the manipulator if it is the only object moving on the scene. It also can be other objects moved by the robot arm or a user. The visual scene is represented by a graph, each vertex of which is an attractor feature vector with the following components: attractor id, positions in Cartesian coordinates x and y, attractor orientation α and changes of position and orientation after the last action dx, dy, dα (Figure 3): S c = {f j } = {(id, x, y, α, dx, dy, dα)j }.
(23)
The visual distance between two scenes is a normalized sum of distances among the
10
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
Figure 3. Motor and visual parameters
attractors D(S cλ , S cn )
=
Λ,N
i,k=1
d(i, k)/(Λ · N )
(24)
where Λ, N are the numbers of the attractors for scenes S cλ and S cn respectively. The attractor distance is calculated as a weighted sum of the distances between corresponding components of the feature vectors:
δC =
d(i, k) = δid + δC + δA
(25)
0 idi = idk δid = 1 idi = idk
(26)
(xi − xk )2 + (yi − yk )2 + (dxi − dxk )2 + (dyi − dyk )2 √ 2X 2 + 2Y 2
(27)
(αi − αk )2 + (dαi − dαk )2 √ δA = 2 2π
(28)
f i = (idi , xi , yi , αi , dxi , dyi , dαi )
(29)
f k = (idk , xk , yk , αk , dxk , dyk , dαk ),
(30)
where
and where X, Y are the horizontal and vertical sizes of the workspace. The elementary goal is calculated within a short-term visual memory. It stores the scene descriptors for up to 30 frames; each frame is taken after a movement has been detected on the scene. The detected goals are converted into the invariant representation, stored in a permanent memory and linked with policies Qg obtained after the consequent learning.
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
11
Figure 4. Visual quantization of local movements
Figure 5. Local horizontal movements
4.2. Obtaining local motor control Suppose that the system starts movement in the random mode. Only the manipulator is detected as a visual attractor on the scene. The random generator produces small changes of the motor parameters so that we consider the resulting movements as local ones. The precision of measuring the corresponding local changes on the visual scene is low, therefore, we are allowed to quantize visual movements. Let us define four directions within a local surrounding of the manipulator position (see Figure 4). Any local movement is perceived as one of the four quantum steps of the closest direction and the unity length. It is obvious that after a series of small random movements all the quantum steps will be detected as the significant events (since the corresponding values of dx or dy are constant) and each of those movements qr , r = 1, . . . , 4 is considered as a parameter goal. Figure 5 shows us an example of learning motor control to move the manipulator right (the quantum step 1). The samples are taken from various starting configurations and the trajectories demonstrate how well the model describes arm control on each stage, namely, at the beginning, after 10 runs to the right boundary, after 20 runs to the right boundary. On the global scale even the best trajectories do not strictly follow the horizontal line. It happens sometimes because Qg does not return the appropriate response and the system switches to the random exploration mode in order to find needed motor changes and update the current model. But most of the time local movements are correct, i.e. they belong to the chosen quantum, and the trajectories can be explained by errors of the local visual measurements.
12
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
Figure 6. Approaching an object from different starting positions
Using the simulator the same method has been applied to learn control of local gripper rotation and grasping movements. 4.3. Learning global motor control Let us add an object to the scene. Visually the system detects another attractor and the spatial relation between the arm and the object: S c = {f 1 , f 2 , f 1,2 }
(31)
f 1,2 = d(1, 2).
(32)
If the system frequently finds that its manipulator is on the same position as the object then the new parameter goal is detected. The corresponding behavioural task is the intention to attain the object position by the manipulator. On this level of representation, instead of the parameters directly controlling different motors in the motor domain, the system operates within the set of previously acquired competencies – the policies performing local movements. Applying the learning mechanism (eq.19–21) the system obtains the model of approaching objects from any starting position. The example of goal attaining in the random mode, after 10 and 20 training series, is demonstrated in Figure 6. Other visual goals detected on this level (and the corresponding learned models of the system response) are grasping an object and moving an object.
5. Conclusions We have presented a method of learning by unsupervised goal acquisition. The approach replaces the external supervisor giving rewards by the internal mechanism of reinforcement. To introduce a system goal three classes of goal representation have been defined: parametric, invariant and contextual. Autonomous goal acquisition consists of calculating feature frequency histograms and detecting distinctive peaks. The algorithm of learning policies is a modified (goal-driven) MDP. Goal attaining is controlled by dynamics of visual goal distance: successful actions are those that reduce the distance to the
J. Kittler et al. / Cognitive Learning with Automatic Goal Acquisition
13
goal. The experimental section demonstrates efficiency of learning mechanisms for the task of obtaining arm control. The system starts from knowing "nothing" about visual manipulator guidance and bootstraps itself up to the level of attaining the goal position in the motor space. Transferrable representation of learned policies is an evidence for the open-ended character of the proposed architecture. Further research will be carried out in the direction of learning ways of arranging objects according to particular rules. One possible scenario is a shape-sorter game, where the task is to insert various blocks into holes of corresponding shapes. The system operating in such an environment should detect visual goals and build up the new levels of symbolic hierarchy implementing the game rules. This demonstration will prove the system’s ability not only to create directly grounded competencies such as arm or object movement control but also to understand complex world events and generate high-level behaviour.
References [1] [2] [3] [4] [5] [6] [7] [8] [9]
L. Kaelbling, M. Littman, A. Moore, Reinforcement Learning: A Survey. Journal of Artificial Intelli˝ gence Research, 4 (1996), 237-U285. V. Gullapalli, Reinforcement learning and its application to control, Ph.D. thesis, University of Massachusetts, Amherst, MA, 1992. R. Brooks, A robust layered control system for a mobile robot, IEEE Journal of Robotics and Automation, 14 (1986), 23. J. Modayil and B. Kuipers, Towards bootstrap learning for object discovery. In AAAI–2004 Workshop on Anchoring Symbols to Sensor Data, 2004. G. Granlund, A Cognitive Vision Architecture Integrating Neural Networks with Symbolic Processing, Kunstliche Intelligenz, 2 (2005), 18–24. Y. Endo and R. Arkin, Anticipatory robot navigation by simultaneously localizing and building a cognitive map. Tech. rep., Mobile Robot Laboratory, Georgia Institute of Technology, 2003. B. Kuipers, The spatial semantic hierarchy. Artificial Intelligence, 119 (2000), 191–233. B. Kuipers, P. Beeson, J. Modayil, J. Provost, Bootstrap learning of foundational representations. To appear in Connection Science, 18.2 (2006), special issue on Developmental Robotics. M. Minsky, P. Singh and A. Sloman, The st. thomas common sense symposium: designing architectures for human-level intelligence. AI Magazine, 25 (2004), 113–124.
14
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Semantics of Alan Francesco Pagliarecci Università Politecnica delle Marche Dipartimento di Ingegneria Informatica, Gestionale e dell’Automazione Via Brecce Bianche - 60131 Ancona - ITALY e-mail: [email protected] Abstract. This paper presents a formal definition of Alan. Alan is a programming language that aims to integrate both the agent-oriented and the object-oriented programming. The end is to take advantages from both the paradigms. We define the formal specification of Alan in the rewriting logic language Maude. In this respect, this paper represents the first step towards a complete formal definition of the operational semantics of Alan. This opens us the possibilty of using the wide-spectrum of formal modeling and reasoning supported by Maude: analyzing Alan programs by means of model checking, proving properties of particular Alan programs, and proving general properties of the Alan language. Keywords. Agent, Agent Oriented Software Engineering
1. Introduction Agent-oriented programming is a programming paradigm that aims to build any kind of software system by means of defining the "mental attitudes" that such a system must exhibit [11]. There are several different approaches to agent-oriented programming, in this paper we refer to the so called Belief-Desire-Intention (BDI) model [1,9]. According to this model, we have three mental attitudes: the beliefs, the desires, and the intentions. The beliefs represent what an agent knows about its internal state and the state of its environment (including other agents and their attitudes). The desires represent goals that an agent would wish to satisfy. The intentions represent desires that an agent has committed to achieve by means of appropriate plans. As a consequence, according to the BDI model, agent-oriented programming means to determine what are the beliefs, the desires, and the plans of a program. On the other side, object-oriented programming means the usage of principles as abstraction, encapsulation, inheritance, and polymorphism. Finally, we have some examples of integration of agent-oriented and object-oriented programming [4,6,8,10,12,14]. AUML [6] is not a real programming language, but an example of agent-object oriented modeling language for software engineering. Wagner [14] focuses on database models, namely he integrates the agent-oriented paradigm with the objectrelational data model. Romero-Hernandez and Koning [10] focus only on inheritance mechanisms for agent interaction protocols. Jack [4] and Jadex [8] are agent-oriented programming languages that allow programmers the use of Java for some parts of a program (a sort of procedural attachment). Alan [12,7] is a fully integrated agent-object pro-
F. Pagliarecci / Semantics of Alan
15
gramming language. Indeed, beliefs, desires, and plans can be programmed in terms of abstraction, encapsulation, inheritance, and polymorphism. The previous examples prove that the integration of agent-oriented and objectoriented paradigms is a feasible and noteworthy approach. This paper aims to formally define the agent-object oriented paradigm. This goal is accomplished by defining the syntax and the operational semantics of Alan using the rewriting logic language Maude [5]. Maude is a high-performance reflective language and system supporting both equational and rewriting logic specification and programming for a wide range of applications. It can model almost anything, from the set of rational numbers to a biological system or a programming language (e.g., see [13]). Its semantics is based on the fundamentals of category theory. The reader can refer to [2,3] for a detailed presentation of Maude. Our definition in Maude fully captures the intended meaning of an Alan program as described in [12]. Furthermore, it opens us the possibilty of using the wide-spectrum of formal modeling and reasoning supported by Maude. Namely, we can analyze Alan programs by means of model checking, prove properties of particular Alan programs, and prove general properties of the Alan language. The paper is structured as follows. In Section 2, we provide the syntax of Alan. Sections 3 formalizes the semantics of Alan. Finally, in Section 4 we give some conclusions.
2. Alan Syntax Agent-object oriented programming means to establish what are the interactions of the program with the other programs, what are its beliefs, its desires, and its plans. Beliefs, desires, and plans must be determinated in terms of abstraction, encapsulating their attributes and methods, exploiting eventual inheritance relations among them, and exploiting the polymorphism. In Maude, the notion that in Alan everything is a belief, a desire, or a plan can be formalized in two steps: first, we define the basic notion of belief, desire, and plan by means of the following three Maude’s modules in Figure 1; after that, each specific belief, desire, or plan in a given Alan program must be compliant to the above definition. Example. In order to explain the formal definition in Maude of an Alan program, in the rest of the paper, we consider the example of an e-commerce application. Let us suppose we have to program an agent that must play the role of searching a partner (another agent) that is able to provide a given web service. As a consequence, an Alan program is composed by a collection of beliefs, a collection of desires, and a collection of plans (see Figure 2). 2.1. Beliefs The beliefs play a role similar to the role of the data structures in traditional programming languages. They are used to represent the knowledge of the system. According to our goal of combining the agent-oriented and the object-oriented paradigm, we model the set of beliefs using the object-oriented approach. This means that we encapsulate in each belief all the attributes that describe it and all the methods that we need to manipulate it. Concerning the attributes, we have to take into account the fact that in Alan we have
16
F. Pagliarecci / Semantics of Alan
mod BELIEF is including CONFIGURATION . sort Belief . op belief : -> Cid [ctor] . ... endm
mod DESIRE is including CONFIGURATION . protecting PLAN . sorts Desire State .
op Desire : -> Cid [ctor] . op Ready : -> State [ctor] . op Running : -> State [ctor] . op Wait : -> State [ctor] . op Succeeded : -> State [ctor] . sorts Expression ActiveBelief . op Failed : -> State [ctor] . subsort ActiveBelief < Belief . ... op planList :_ : op activeBelief : -> Cid [ctor] . Plan -> Attribute [ctor] . op connDesire :_ : op desireState :_ : Desire -> Attribute [ctor] . State -> Attribute [ctor] . op beliefCondition :_ : ... Expression -> Attribute [ctor] . ops selected suspended ... resumed success op setDesire : Oid Desire -> Msg [ctor] . failure : Oid -> Msg [ctor] . ... ... endm var D : Oid . ... mod PLAN is rl [selected] : including CONFIGURATION . < D : Desire | ... desireState : ready > sort Plan . selected (D) op plan : -> Cid [ctor] . => ... < D : Desire | op precondition :_ : Void -> Bool . desireState : running > op planbody :_ : Void -> Void . ... ... endm endm mod ACTIVE-BELIEF is protecting BELIEF . protecting DESIRE .
Figure 1. The definition in Maude of Beliefs, Active Beliefs, Desires, and Plans
three kinds of structures: beliefs, desires, and plans. As a consequence, we have three kinds of attributes: belief attributes, desire attributes, and plan attributes. Each belief can be modified by any action (e.g., an Alan statement, or a reception of a message) that satisfies its encapsulation constraints. As any other programming language, some beliefs are predefined, they are used to represent basic data types as integer, strings, arrays, messages, protocols, and so on. Furthermore, in Alan, there exists a special kind of beliefs that are able to activates plans: the active beliefs. These beliefs are used to program the reactive behavior of a system. Each active belief has a method condition and is linked to a desire that is able to activate the needed plan. As a consequence, every time a new active belief is asserted (an event to which the system may react), the condition is evaluated; when the condition is true (the system must react), the corresponding desire is asserted as well, and finally the appropriate plan (i.e. the reaction) is executed. The condition is a Boolean function that must be provided by the programmer. Example. Service, Agent, Acquaintances, and Request have been defined as Beliefs. Service has belief attributes that describe the kind of service and methods (e.g., perform) that are needed to provide the service. Any service the agent needs to represent must be an instance of this belief. Notice that, in our example, the program must access web
F. Pagliarecci / Semantics of Alan
public Belief Agent { Belief String : name; Belief String : address; Belief Service : service[]; public boolean searchService (Service: s) {for (int : i=0; i Attribute [ctor] . ... op perform : Configuration -> Void . ... endm mod WEB-SERVICE is protecting BELIEF DESIRE PLAN . including SERVICE . ... sort WebService . subsort WebService < Service op webService : -> Cid [ctor] . op url : String -> Attribute [ctor] . ... endm Figure 3. The definition in Maude of Service, WebService mod AGENT is protecting BELIEF DESIRE PLAN. ... sorts Agent Service . subsort Agent < Belief . subsort Service < Belief . op agent : -> Cid [ctor] . op name :_ : String -> Attribute [ctor] . op address :_ : String -> Attribute [ctor] . op service :_ : List{Service} -> Attribute [ctor] . op searchService : Service -> Bool . ... endm Figure 4. The definition in Maude of Agent
and then executed. The plan is selected among all the plans in such a way the execution of the selected plan can produce the satisfaction of the goal. Usually, the agent-oriented languages represent a desire as a logic formula. In Alan, coherently with the previous assumptions, a desire is an extension of the predefined class Desire. Each desire has an arbitrary number of methods, belief attributes, desire attributes, and plan attributes. Each desire has also a set of plans that must be tried in order to satisfy the desire itself. Therefore, each desire has the predefined plan attribute plan and the programmer must declare which type of plans can fill this attribute. A programmer can apply the object-oriented methodologies in designing the desires, as well. For example, we can define a desire as extension of another desire. We can also encapsulate in it attributes and methods that can be used by the plan to satisfy the intention, making the plan independent by certain pa-
F. Pagliarecci / Semantics of Alan
19
mod REQUEST is including ACTIVE-BELIEF . protecting DESIRE PLAN . sorts Request ProvideServiceTo Agent Service . subsort Request < ActiveBelief . subsorts Agent Service < Belief . subsort ProvideServiceTo < Desire . ... op request : -> Cid [ctor] . op connDesire :_ : ProvideServiceTo -> Attribute [ctor] . op agent :_ : Agent -> Attribute [ctor] . op service :_ : Service -> Attribute [ctor] . ... endm Figure 5. The definition in Maude of Request
rameters. The desires that have been requested to be satisfied become intentions. Another important attribute of a desire is the belief attribute state. When the desire is created, its state is ready to be examined by the interpreter. When the desire is selected as intention, its state becomes running. When the plan related to the desire is suspended (e.g., when the plan has required the satisfaction of a sub-desire), its state becomes wait. When the plan is resumed, it goes back to the running state. Finally, the desire state can be succeeded or failed depending on the final result of the last plan execution. The semantics of a desire coincides with the semantics of classes in traditional object-oriented programming languages, for what concerns encapsulation, inheritance, and polymorphism. Furthermore, the semantics of a desire takes into account the mechanism for a plan selection and execution (as formally described in Section 3). Example. The desire nameServicemod NAME-SERVICE-PROVIDER is including NAME-ACQUAINTANCES . ... sorts NameServiceProvider SearchProvider SearchOldProvider SearchNewProvider Service . subsort NameServiceProvider < NameAcquaintances . subsort SearchProvider < Plan . subsort Service < Belief . subsort SearchOldProvider SearchNewProvider < SearchProvider . op NameServiceProvider : -> Cid [ctor] . op planList :_ : SearchProvider -> Attribute [ctor] . op service :_ : Service -> Attribute [ctor] . ... endm Figure 6. The definition in Maude of NameServiceProvider
Provider (see Figure 2) represents the goal of finding programs able to provide a given service. Which kind of service is reported in the corresponding attribute. The attribute plan is of type searchOldProvider or searchNewProvider that are the plans that must be tried in order to satisfy this desire. They are tried in this order. This kind of desire can be specialized in the goal of finding a provider of a given web service (the desire nameWeb-
20
F. Pagliarecci / Semantics of Alan
ServiceProvider). This is simply obtained by extending the desire nameServiceProvider (see Figure 2). In this case, this desire inherits the attribute plan of its superdesire. 2.3. Plans In BDI programming languages, plans play a role similar to the role played by procedures and methods in traditional programming languages. Each plan is specifically designed in order to satisfy a desire, a goal. As previously mentioned, to each desire is linked a set of possible plans that must be tried to satisfy the goal. In Alan, according to our goal of combining agent and object oriented paradigm, each plan has an arbitrary number of methods, belief attributes, desire attributes, and plan attributes. Among them, the predefined methods that must be defined by the programmer. Precondition. It is a Boolean method that checks whether the plan can be executed. Notice that, in traditional agent-oriented programming languages, preconditions are logic formulas that must be true. In Alan, a precondition is a piece of code that must be executed and return a boolean value. In other words, we adopt a procedural approach instead of a logic-based approach. Planbody. It is a procedure (i.e., a method) that contains a set of actions to be executed in order to accomplish the intention (to satisfy a desire or react to an event). Using Java-like statements for precondition and planbody allows the programmer to exploit the object-oriented principles. When a plan is selected and instantiated, the precondition is evaluated. When the precondition returns the value true, the plan is executed. A planbody consists of statements as in usual Java class methods. Therefore, a programmer has to activate appropriate methods in order to modify beliefs, send or receive messages, and generate internal events. As a consequence, the execution of a plan is something similar to the activation of a class method. Nevertheless, its semantics has some differences. For each intention, we can have several plans that can be executed. The choice depends on two facts: in which order they are listed in the attribute plan, and what are the values returned by the preconditions. Example. In our example, we have the plan Serve for providing a service. This plan is independent by the actual procedure to provide a given service. Moreover, we have the two plans connected to the desire nameServiceProvider: searchOldProvider and searchNewProvider. Notice that, they are altermod SEARCH-OLD-PROVIDER is protecting BELIEF DESIRE . including PLAN . ... sorts SearchOldProvider Service Acquaintances . subsort SearchOldProvider < Plan . subsorts Service Acquaintances < Belief . op searchOldProvider : -> Cid [ctor] . op acq :_ : Acquaintances -> Attribute [ctor] . op service :_ : Service -> Attribute [ctor] . ... endm Figure 7. The definition in Maude of SearchOldProvider
native plans from the point of view of the desire. Nevertheless, the two plans are very
F. Pagliarecci / Semantics of Alan
21
similar, this justifies the definition of the second plan as an extension of the first one. This has also effect in the definition of the corresponding planbody. Indeed, as reported in Figure 2, the planbody of the second plan is obtained by extending the planbody of the first one. Notice how the second plan instantiates a new desire (with the statement Message.internalEvent) and waits for its satisfaction before completing its execution. Both plans have no preconditions (see Figure 2).
3. The Alan Semantics To specify the semantics of Alan, we first explain what is the global state of a program. We then discuss the transition rules for the main Alan statements.
sort State MsgList . sorts IntentionBase BeliefBase DesireBase PlanBase . subsort Msg < MsgList . subsorts IntentionBase BeliefBase DesireBase PlanBase < Configuration . ... op intentionBase : Desire PlanList -> IntentionBase [ctor] . op nilBList : Belief -> BeliefBase [ctor] . op beliefBase : BeliefBase BaliefBase -> BeliefBase [ctor assoc comm id: nilBList] . ... op state : BeliefBase DesireBase IntentionBase PlanBase MsgList -> State [ctor] . ...
Figure 8. The sort and costructor declarations
The global state of a program is a configuration modeled as multiset (the four sorts BeliefBase, DesireBase, IntentionBase, and PlanBase) whose elements are beliefs (the sort Belief), desires (the sort Desire), intentions (the sort Intention), plans (the sort Plan), and messages (i.e., methods activations represented by the sort MsgList). The sort and constructor declarations are reported in Figure 8. Each belief, desire, or plan must be modeled as an object that has its own attributes and method. This can be easily fulfilled thanks to the expressivness of Maude. Indeed, we can model a belief, a desire, or a plan as a module (see Figure 1) that encapsulates all its characteristics. Each module can be included in another one (e.g., see Belief that is included in the definition of ActiveBelief). This allows us to model hierarchies and instantiations. The inclusion of a module in another one automatically models the inheritance of all the entities defined in the included module. The main statements of an Alan program aim at the management (adding, removing) of Alan elements (beliefs, desires, intentions, and plans); activations and selections of desires and plans; activations of methods. All these statements are modeled by means of messages (see Figure 9) and transition rules (see Figure 10). Messages are used to model the activation of a given statement while transition rules are used to model the behavior of such a statement. Indeed, as obvious, each statement changes the global state of the program. Each message has an appropriate constructor
22 op op op op op op op
F. Pagliarecci / Semantics of Alan newBelief : Belief -> Msg [ctor] . delBelief : Belief -> Msg [ctor] . newActBelief : ActiveBelief -> Msg [ctor] . newDesire : Desire -> Msg [ctor] . startDesire : Desire -> Msg [ctor] . newIntention : Intention -> Msg [ctor] . selectPlan : Plan PlanList -> Msg [ctor] . Figure 9. The message declarations
and is used to select the appropriate transition rule. Notice that, we may have conditional rules. In a rule, the computation of the new state can be performed by means of the support of appropriatre equations. More in detail, let us consider the example of adding a new belief (Figure 10). This activation of this statement is modeled by the message newBelief. The rule newBelief models the state transition associated to this statement. Therefore, this rule models a transition from a state where the belief base is BB and the message list contains the message newBelief. The arrival state is a state where the belief base is the result of the equation addBelief(B, BB) (i.e., the belief B has been added to BB) and the message newBelief has been removed from the message list. Removing a belief, adding or removing a desire, an intention, or a plan are modeled in a similar way. From an intuitive point of view, the selection of a desire produces the instantiation of a new intention. Each intention has a set of plans. These plans are ordered in a queue according to an order established by the programmer. They are tried one by one until a plan able to satisfy the intention is found. Therefore, the intention is modeled by a pair (D, PL), where D is the selected desire and PL is the plan list. This behavior is described by the rules selectDesire and selectPlan. selectDesire is a conditional rule (Figure 10) that models a transition from a state where the message list contains the message selectDesire to a state where it is requested a new intention (the message newIntention) and the change of the status of the desire (the message start(D)). The message start(D) activates a rule (see Figure 11) that describes a transition of the desire D from a state where desireState is ready to a state where desireState is running. selectPlan is a rule (Figure 10) that models a transition from a state that contains the message selectPlan to a state that contains the message start(P) for the plan P. The message start(P) activates a conditional rule (see Figure 12) that describes a transition of the plan P from a state where planState is Ready to a state that contains the message planbody and where planState is Running. This rule can be applied only if the precondition is true. The message planbody a set of rules that models the behavior of the plan, in other words the set of statements that must be executed in order to satisfy the intention. The activation of any kind of (belief, desire, or plan) method is modeled in a similar way. Once we have defined the semantics of an Alan program, as explained above, we can verify some properties of the program. For instance, we can verify whether given an intention is optional (inevitable) its satisfaction. This can easily verified checking whether from a state where a new intention is asserted, the system can reach a state where desireState of the given intention is succeeded (the system can not reach a state where stateDesire is failed). Another property we can verify consists of
F. Pagliarecci / Semantics of Alan
23
rl [newBelief] : (BB, DD, II, PP, (newBelief (B) ML)) => (addBelief (B, BB), DD, II, PP, ML) .
rl [delBelief] : (BB, DD, II, PP, (delBelief (B) ML)) => (remBelief (B, BB), DD, II, PP, ML) .
crl [newActBelief] : (BB, DD, II, PP, (newActBelief (AB) ML)) => (addBelief (AB, BB), DD, II, PP, (ML newDesire(D))) if < A : ActiveB | connDesire : D, ATTS > := AB .
rl [newDesire] : (BB, DD, II, PP, (newDesire (D) ML)) => (BB, addDesire(D, DD), II, PP, ML) .
crl [selectDesire] : (BB, DD, II, PP, (selectDesire (D) ML)) => (BB, DD, II, PP, (newIntention (I) selected (D) ML)) if < A : Desire | desireState : ready, planList : PL > := D /\ (D, PL) := I .
rl [newIntention] : (BB, DD, II, PP, (newIntention (I) ML)) => (BB, DD, addIntention (I, II), PP, ML) .
rl [selectPlan] : (BB, DD, II, PP (D, PL), (selectPlan (P, PL) ML)) => (BB, DD, II, PP (D, PL), (start (P) ML)) .
Figure 10. The rewriting rules of the interpreter
rl [start] : < D : Desire | desireState : ready, ATTS > start (D) => < D : Desire | desireState : running, ATTS > Figure 11. The semantics of a desire
verifying the termination of a plan. In order to do that, we have to verify whether the system reach a state where planState of the given plan is succeeded or failed.
24
F. Pagliarecci / Semantics of Alan
crl [start] : < P : plan | planState : Ready, ATTS > start (P) => < P : plan | planState : Running, ATTS > planbody (P) if precondition = true .
rl [planbody] : < P : plan | planState : Running, ATTS > planbody (P) => < P : plan | planState : Running, ATTS’ > ML . Figure 12. The semantics of a plan
4. Conclusions This paper presents the semantics of agent-object programming, an emerging programming paradigm. In order to do that, we have modeled Alan, a language that fully integrates both the agent-oriented and the object-oriented paradigm. The semantics has been formalized in Maude, a rewriting logic language. This paper presented the preliminary results on defining an agent-object oriented language. This language allows us to establish relations (as associations, generalizations, and so on) among beliefs, desires, and plans that can not be represented in a simple and natural way with logic based languages. Notice that, according to a recent but well consolidated direction in agent-oriented software engineering [6], it is easier to design a software system combining traditional object-oriented methodologies and tools (as UML) with agent-oriented methodologies. The two paradigms are amalgamated at the same level. Beliefs and Desires are objects and plan preconditions are procedural methods. For instance, this allows a programmer to define a plan that is independent on objects it has to manipulate. Indeed, such methods can be encapsulated in the beliefs that represent such objects. For example, if a plan has to manipulate geometrical solids (e.g., to compute the volume) it can be independent by the shape of such objects (the method to compute the volume of a given solid is encapsulated in the belief representing such a solid). On the other hand, this language is not a simple extension of an object-oriented programming language as Java. An object-oriented program is a set of classes and at the start-up the interpreter executes the main program. Classes interact each other by means of internal or remote method invocation. In this case, a program is a set of beliefs, desires, and plans and at the start-up the interpreter starts to listen messages; the interaction is supported by sending and receiving (internal and external) messages. Our goal is to provide a sound and complete formal definition of agent-object programming language based on the rewriting logic. In order to that, we use Maude: a language based on the rewriting logic. In this paper, we have provided a formal definition of the notion of beliefs, desires, plans and the interpreter. This is enough to prove some properties about the behaviour of an program written with this language. Nevertheless, this is only a first step towards a full definition. Indeed, as a future work, we have planned to represent in Maude a complete example, as well.
F. Pagliarecci / Semantics of Alan
25
Acknowledgment The author would like to thank C. L. Talcott and M.-O. Stehrn for a lot of fruitful discussions about Maude and the formalization of Alan in Maude.
References [1] M. E. Bratman. Intentions, Plans and Practical Reason. Harvard University Press, Cambridge, Mass., 1987. [2] M. Clavel, F. Durán, S. Eker, P. Lincoln, N. Marti-Oliet, J. Meseguer, and C. Talcott. Maude 2.0 Manual, 2003. http://maude.cs.uiuc.edu. [3] M. Clavel, F. Durán, S. Eker, P. Lincoln, N. Martí-Oliet, J. Meseguer, and C. L. Talcott. The Maude 2.0 system. In Robert Nieuwenhuis, editor, Rewriting Techniques and Applications (RTA 2003), volume 2706, pages 76–87. Springer-Verlag, 2003. [4] N. Howden, R. Rönnquist, A. Hodgson, and A. Lucas. JackT M - summary of an agent infrastructure. In 5th International Conference on Autonomous Agents, Montreal, Canada, 2001. [5] José Meseguer. A logical theory of concurrent objects and its realization in the Maude language. In Gul Agha, Peter Wegner, and Akinori Yonezawa, editors, Research Directions in Concurrent Object-Oriented Programming, pages 314–390. MIT Press, 1993. [6] J. Odell, H. Van Dyke Parunak, and B. Bauer. Extending UML for Agents. In AOIS Workshop at AAAI 2000, 2000. [7] F. Pagliarecci, L. Spalazzi, and G. Capuzzi. Formal Definition of an Agent-Object Pogramming Language. In Proc. of the International Symposium on Collaborative Technologies and Systems (CTS06). IEEE Computer Society Press, 2006. [8] A. Pokahr, L. Braubach, and W. Lamersdorf. Jadex: Implementing a bdi-infrastructure for jade agents. EXP - In Search of Innovation (Special Issue on JADE), 3(3):76–85, September 2003. [9] A. S. Rao and M. P. Georgeff. Modeling rational agents within a BDI architecture. In J. Allen, R. Fikes, and E. Sandewall, editors, Proceedings of the 2nd International Conference on Principle of Knowledge Representation and Reasoning. Morgan Kaufmann Publishers, 1991. [10] I. Romero-Hernandez and J.-L. Koning. State controlled execution for agent-object hybrid languages. In Advanced Distributed Systems: Third International School and Symposium (ISSADS), volume 3061 of Lecture Notes in Computer Science, pages 78–90. Springer, 2004. [11] Y. Shoham. Agent-oriented programming. Artificial Intellelligence, 60(1):51–92, 2003. [12] L. Spalazzi and F. Pagliarecci. Alan: An agent-object pogramming language. In Proceedings of the IADIS International Conference WWW/Internet 2005, 2005. [13] M.-O. Stehr and C. Talcott. Plan in Maude Specifying an Active Network Programming Language. Electronic Notes in Theoretical Computer Science, 71, 2002. [14] G. Wagner. The agent-object-relationship metamodel: Towards a unified view of state and behavior. Information Systems, 28(5), 2003.
26
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
A Compact Argumentation System for Agent System Specification Insu Song and Guido Governatori School of Information Technology & Electrical Engineering The University of Queensland, Brisbane, QLD, 4072, Australia e-mail: {insu,guido}@itee.uq.edu.au
Abstract. We present a non-monotonic logic tailored for specifying compact autonomous agent systems. The language is a consistent instantiation of a logic based argumentation system extended with Brooks’ subsumption concept and varying degree of belief. Particulary, we present a practical implementation of the language by developing a meta-encoding method that translates logical specifications into compact general logic programs. The language allows n-ary predicate literals with the usual first-order term definitions. We show that the space complexity of the resulting general logic program is linear to the size of the original theory. Keywords. Argumentation, Automated Reasoning, Agent
1. Introduction Over the past years, we have witnessed massive production of small electronic consumer devices such as cell phones, set-top boxes, home network devices, and MP3 players. The size of the devices gets smaller and the behaviors of the devices get more and more complex. In order to survive in the current competitive market, vendors now must scramble to offer more variety of innovative products faster than ever before because most of modern consumer electronic devices have comparatively low run rates and/or short market windows [7]. One solution for reducing the development cost and time is developing a more expressive and intuitive specification language for describing the behaviors of products. But, we must make the resulting systems compact and efficient to meet the current market demands: smaller systems and longer battery lifespan. One promising candidate language is a nonmonotonic logic-based agent architecture because nonmonotonic logics are close to our natural languages and many agent models are suitable for specifying complex autonomous behaviors. In short, our aim is to develop a more expressive logicbased language than existing industrial specification languages, such as Ladder Logic in PLCs (Programmable Logic Controllers), while maintaining its simplicity, robustness (when implemented), and implementability on low-profile devices. However, existing logical approaches [1,21,10,4,19,8,6,3] are not suitable for this purpose, because they suffer from the following major shortcomings to be embedded in small low powered devices: (a) they have difficulties in expressing behaviors, (b) they
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
27
require high computing power, and (c) they are not suitable for mission critical applications as they require sophisticated theorem provers running on a high powered CPU. We solve these problems (1) by devising a nonmonotonic logic that can be mapped into a computationally compact structure that is suitable for hardware implementation; (2) by allowing expressions of relative certainties in knowledge bases; (3) by allowing decomposition of systems into parallel interacting behaviors similarly to the subsumption architecture [5]. In particular, we develop a layered argumentation system called LAS that extends a logic based proposal of argumentation with subsumption concept and varying degree of confidence. The reasoning mechanism of each layer is the argumentation system and more confident layers are subsumed by less confident layers. Moreover, we present a practical implementation of LAS by developing a meta-encoding method that translates LAS into general logic program. Unlike other existing implementations of argumentation systems, the language allows n-ary predicate literals with the usual first-order term definitions including function expressions. Importantly, we show that the size of the resulting general logic program is linear to the size of the original theory. Similar metaencoding schemas can be developed for other variants of logic based argumentation systems under our framework. Thus, we believe the framework will also provide a platform for extending and developing practical implementations of logic based argumentation systems. In this paper we detail the logical language (LAS), the mapping of the language into general logic program, and benefits it offers in comparison with other approaches. The paper is structured as follows. In the next section we discuss a behavior based decomposition and its relation with LAS. In Section 3, we define the Layered Argumentation System (LAS) and the formal semantics of the underlying language. After that, in Section 5, we present a meta-encoding schema which transforms first-order knowledge bases of LAS into general logic programs. Then, in Section 6 we compare LAS with other logic base approaches and conclude with some remarks in Section 7.
2. Behavior Based Decomposition In [5], Rodney Brooks introduced a subsumption that decomposes a system into several layers of (possibly prioritized) parallel behaviors of increasing levels of competence rather than the standard functional decomposition. The basic idea is that it is easier to model a complex behavior by gradually implementing from less competent subbehaviors to more competent sub-behaviors. In addition, the relation between layers is that more competent layers subsume less competent layers. However, the original subsumption architecture does not scale well for non-physical systems [1]. To overcome the limitation, we need to develop a logic based subsumption architecture (e.g., [1]). To do this, we need to introduce a varying degree of confidence and a subsumption architecture into a logical system. For instance, let us consider a cleaning robot. A typical functional decomposition of this system might resemble the sequence: sensors→perception→modelling→planning→task selection→motor control The decomposition of the same system in terms of behaviors would yield the following set of parallel behaviors:
28
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
avoid objects < avoid water < clean < wander < map area where < denotes increasing levels of competence. However, less competent layers are usually given higher priorities (i.e., given more confidence) than more competent layers. That is, decisions made by less competent behaviors usually override decisions made by more competent behaviors (more task specific behaviors). The reason is because less competent behaviors are usually more reactive and urgent behaviors. However, this is not always the case because a strong will can suppress a reactive behavior. Therefore, since the layers of LAS represent the levels of confidence instead of competence, the layers of the subsumption architecture [5] do not exactly correspond to the layers of LAS as shown below. Behavior Layers 5 4 3 2 1
LAS Layers 1 2 3 4 5
In this figure, the bottom layer is the most confident layer of LAS whereas the right most layer is the most competent behavior layer. It shows that less competent behaviors tend to be related with more confident layers whereas more competent behaviors tend to spread over several layers of LAS. As we will see in the following sections, the subsumption concept of [5] is used in decomposing concepts as well because a less confident knowledge base subsumes more confident knowledge bases by rule subsumption in LAS. Example 1 Let us consider an example specification for the part that controls a vacuum cleaning unit of the cleaning robot. Suppose that the robot performs vacuuming action (v) if it detects (on sensor sA) that the area is dirty (d), but it stops the action if it detects some water (w) on sensor sB. That is, we have two parallel behaviors: avoiding water and cleaning. However, avoiding water has higher priority than cleaning. This specification can be represented as a set of defeasible rules decomposed into two levels of rules as follows: R1 = {r1 : sA → d, r2 : sB → w, r3 : w → ¬c} R2 = {r4 : d → c, r5 : c → v} The arrows represent defeasible inferences. For instance, sA → d is read as ‘if sA is true, then it is usually dirty’. The levels represent relative confidences between levels such that level-n conclusions are more confident than level-(n + 1) conclusions. Then, if an area is both dirty and wet, the vacuuming unit will be turned off: v is not true. The reason is that since w is a level-1 conclusion by r2 , ¬c is a level-1 conclusion by r3 . As level-1 conclusions are more confident than level-2 conclusions, ¬c is also a level-2 conclusion overriding c in R2 . Thus, we cannot conclude v.
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
29
3. Layered Argumentation System As the underlying logical language, we start with essentially propositional inference rules: r : L → l where r is a unique label, L is a finite set of literals, and l is a literal. If l is a literal, ∼ l is its complement. From now on, let level-n denote the degree of confidence of layer-n. An LAS theory is a structure T = (R, N) where R = {R1 , ..., Rn , ..., RN } is a set of finite sets of rules where Rn is a set of level-n rules and N is the number of layers. We now define layers of an LAS theory and their conclusions. Definition 1 Let T = (R, N) be an LAS theory. Let Cn be a finite set of literals denoting the set of layer-n conclusions of T . Let n be a positive integer over the range of [1, N]. The layers of T are defined inductively as follows: / 0). / 1. T0 = (0, / R1 ) is layer-1 theory where R1 = R1 ; 2. T1 = (0, 3. Tn = (Cn−1 , Rn ) is layer-n theory where Rn = R1 ∪ ... ∪ Rn ; We should note that by the definition layer-n subsumes (includes) layer-(n − 1) rules. That is, unlike other layered or hierarchical approaches (e.g., [1,21,14]) lower layer rules (more confident rules) are reused in higher layers. This feature is important because facts and rules with different levels of belief can interact similarly to Possibilistic Logic approaches [8,6] and Fuzzy Logic approaches. For example, the confidence of the resulting argument formed by a set A of rules is the same as the confidence of the rule having the minimum confidence in A. We will see an example of this (Example 2) after we define the semantics of LAS. We will also define and discuss a set of operators corresponding to the Fuzzy Logic operators in Section 6.2. As for the semantics of the language, we modify the argumentation framework given in [11] to introduce layers into the argumentation system. Argumentation systems usually contain the following basic elements: an underlying logical language, and the definitions of: argument, conflict between arguments, and the status of arguments. As usual, arguments are defined to be proof trees. An argument for a literal p based on a set of rules R is a (possibly infinite) tree with nodes labelled by literals such that the root is labelled by p and for every node with label h: 1. If b1 , ....,bn label the children of h then there is a rule in R with body b1 ,...,bn and head h. 2. The arcs in a proof tree are labelled by the rules used to obtain them. Given a layer Tn of an LAS theory T , the set of layer-n arguments is denoted as ArgsTn which also denotes the set of arguments that can be generated from Rn . The degree of confidence of an argument in ArgsTn is level-n. Thus, a layer-n argument of T is more confident than layer-(n + 1) arguments of T . We define ArgsT0 to be the empty set. A literal labelling a node of an argument A is called a conclusion of A. However, when we refer to the conclusion of an argument, we refer to the literal labelling the root of the argument. We now introduce a set of usual notions for argumentation system. An argument A attacks an argument B if the conclusion of A is the complement of a conclusion of B and the confidence of B is equal to or less than A. A set S of arguments attacks an argument
30
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
B if there is an argument A in S that attacks B. An argument A is supported by a set of arguments S if every proper subargument of A is in S. An argument A is undercut by a set of arguments S if S supports an argument B attacking a proper subargument of A. Example 2 Consider Example 1 theory with the following assumptions (arguments) added: R1 = {→ sB} and R2 = {→ sA}. Now we consider the arguments below: sA
d
sB
c
v
sA
d
w
~c
C B
Layer 2
A
Layer 1
A is a layer-1 argument for ∼ c, and thus it is also a layer-2 argument because layer1 arguments are more confident than layer-2 arguments. B is a layer-2 argument for d and a sub-argument of C. We should note that we have B because level-1 rule r1 is subsumed by layer-2. That is, level-2 evidence sA and level-1 rule r1 are combined to produce a level-2 argument. In addition, since there is no level-1 evidence for sA, there is no layer-1 argument for d. This cannot be represented in exiting layered approaches such as [14]. Unlike Fuzzy Logic approaches [8,6], the inference process is very simple and more intuitive, since LAS does not require number crunching nor require subjective measures of confidence. The heart of an argumentation semantics is the notion of an acceptable argument. Based on this concept it is possible to define justified arguments and justified conclusions, conclusions that may be drawn even taking conflicts into account. Given an argument A and a set S of arguments (to be thought of as arguments that have already been demonstrated to be justified), we assume the existence of the concept: A is acceptable w.r.t. S. Based on this concept we proceed to define justified arguments and justified literals. Definition 2 Let T = (R, N) be an LAS theory. We define JiTn as follows: • • • •
/ JArgsT0 = 0; J0Tn = JArgsTn−1 ; Tn Ji+1 = {a ∈ ArgsTn |a is acceptable w.r.t. JiTn }; Tn JArgsTn = ∪∞ i=1 Ji is the set of justified layer-n arguments.
We can now give the definition of Cn as the set of conclusions of the arguments in JArgsTn . A literal p is level-n justified (denoted as Tn l p) if it is the conclusion of an argument in JArgsTn . In Example 2, literal ∼ c is both level-1 and level-2 justified, but literal v is not justified because argument C is undercut by A. We now give two definitions of acceptable given in [11]. The following is an argumentation semantics that corresponds to Dung’s skeptical semantics (called grounded semantics) [9,10] which has been widely used to characterize several defeasible reasoning systems [10,4,19]. Definition 3 An argument A for p is acceptable w.r.t a set of arguments S if A is finite, and every argument attacking A is attacked by S.
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
31
The following definition is a modified notion of acceptable in order to capture defeasible provability in Defeasible Logic (DL) [17] with ambiguity blocking [11]. Definition 4 An argument A for p is acceptable w.r.t a set of arguments S if A is finite, and every argument attacking A is undercut by S. In this paper, we use this ambiguity blocking argumentation semantics as the semantics of LAS. But, the grounded semantics can also be easily adopted to LAS.
4. An Implementation of LAS We obtain the meta-program representation of an LAS theory from the meta-program formalization of Defeasible Logic (DL) given in [16] by removing defeaters, strict rules, priority relations, and converting the relationship between strict rules and defeasible rules in DL in to the relationship between layer-(n − 1) and layer-n in LAS. Thus, it is also an ambiguity blocking Dung-like argumentation system [11]. The details of the metaprogram representation of an LAS theory T = (R, N) and its layers are now described. First, we obtain the conclusion-meta-program ΠC (T ) consisting of the following general logic programs of each layer-n where 1 ≤ n ≤ N: C1. conclusionn (x):- conclusionn−1 (x). C2. conclusionn (x):- supportedn (x), not supported(∼x), not conclusionn−1 (∼x). where not denotes the negation as failure and ∼x maps a literal x to its complement. Next, we obtain the rule-meta-program ΠR (T ) consisting of the following general logic programs for each rule (q1 , ..., qm → p) ∈ Rn of each layer-n: R1. supportedn (p):- conclusionn (q1 ),...,conclusionn (qm ). Then, the corresponding general logic program of T is Π(T ) = ΠC (T ) ∪ ΠR (T ). In [11], it is shown that the following theorem holds for the ambiguity blocking argumentation system. Theorem 1 Let D be a defeasible theory, p be a literal. Then, D ⊢ +∂ p iff p is justified, where +∂ p means p is defeasibly provable. From this theorem and the correctness of the meta-program of Defeasible Logic [16], We obtain the following theorem. Let κ denote logical consequence under Kunen’s semantics of logic programs [15]. Theorem 2 Let Tn be layer-n of an LAS theory T . Let Dn be the meta-program counter part of Tn . 1. Dn κ conclusion(p) iff p is level-n justified (i.e., Tn l p). 2. Dn κ ¬conclusion(p) iff p is not level-n justified (i.e., Tn l p).
32
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
The following theorem is a direct consequence of rule C1. Theorem 3 (Conclusion subsumption) Let Tn be layer-n of LAS theory T and Tn+1 be layer-(n+1) of T . Let Cn be the set of conclusions of Tn and Cn+1 be the set of conclusions of Tn+1 . Then Cn ⊆ Cn+1 . The following layer-consistency is the result of [16] (correctness of the metaprogram of defeasible logic) and consistency of defeasible logic. Theorem 4 (Layer consistency) Let Tn be layer-n of an LAS theory T . Then, for each literal p, if Tn |=l p then Tn l ∼ p. Then, from theorem 3 and theorem 4, we can show that T is consistent. Theorem 5 (LAS consistency) Let T be an LAS theory. Let T qn denote that Tn |=l q. Then, for all 1 ≤ n ≤ N, if T qn then T (∼ q)m for all 1 ≤ m ≤ N. As Π(T ) is simply the union of the meta-program counterpart Dn of each layers of T and it is formed through a level mapping of each Dn to form a hierarchical program such that no literals appearing layer-n program appear in layer-(n − 1) rules, the following theorem holds. Theorem 6 Let T = (R, N) be an LAS theory and Π(T ) be the meta-program counterpart. Then, the following holds: T qn iff Π(T ) κ conclusionn (q). T qn iff Π(T ) κ ¬conclusionn (q). 5. Meta-Encoding We now formally define the meta-encoding schema that translates LAS theories into propositional general logic programs. First, we define two literal encoding functions that encode literals in an LAS theory to previously unused positive literals. Let q be a literal and n a positive integer. Then, these functions are defined below: +s pn if q is a positive literal p. Supp(q, n) = p−s n if q is a negative literal ¬p. + pn if q is a positive literal p. Con(q, n) = p− n if q is a negative literal ¬p. Supp(q, n) denotes a support of q at layer-n and Con(q, n) denotes a conclusion q at layer-n. Supp(q, n) corresponds to supportedn (q). Con(q, n) corresponds to conclusionn (q). Let ConA(A, n) be a set of new positive literals obtained from a set A of literals by replacing each literal q ∈ A by Con(q, n): ConA(A, n) = {Con(q, n)|q ∈ A}. With these functions, we now define the meta-encoding schema. Let T = (R, N) be an LAS theory, Π(T ) the corresponding meta-program, and L the set of all propositional letters in T . Then HT = L ∪ ∼ L is the Herbrand universe of Π(T ) where ∼ L = {∼ p|p ∈ L}. The translated Herbrand base G(T ) of Π(T ) is obtained according to the following guidelines for each layer-n (1 ≤ n ≤ N):
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
33
G1: For each q ∈ HT , add Con(q, n) ← Con(q, n − 1). G2: For each q ∈ HT , add Con(q, n) ← Supp(q, n), not Supp(∼ q, n), not Con(∼ q, n − 1). G3: For each r ∈ Rn , add Supp(C(r), n) ← ConA(A(r), n). For most of LAS theories, this direct translation of T results in a lot of redundant rules that will be never used for generating conclusions. But, if we know the set of all the literals that will ever be supported, we can reduce the number rules in G2 by replacing ‘For each q ∈ HT ’ by ‘For each q ∈ SLn ’ where SLn is the set of all supportive literals in layer-n of T . SLn can be obtained as follows: SLn = {C(r)|r ∈ Rn } Let G2(T ) be the set of rules introduced by G2 in G(T ). Then, we can also reduce the number rules in G1 by replacing ‘For each q ∈ HT ’ by ‘For each q ∈ CLn ’ where CLn is the set of all conclusion literals in layer-n of T . This can be obtained as follows: CL0 = 0/ CLn = CLn−1 ∪ {p|Con(p, n − 1) ∈ G2(T )} For example, let’s consider an LAS theory T = ({R1 }, 1) where R1 = {→ sA}. The corresponding meta-program G(T ) of T is (after removing literals with n = 0): sA+s 1. . −s +s sA+ 1 :- sA1 , not sA1 . Furthermore, even if we extend the language of LAS to allow for n-ary predicate literals that can be formed by the usual inductive definitions for classical logic, this metaencoding can be used to convert first-order LAS theories. For example, if we replace sA with sA(x) where x is a variable, then the corresponding logic program becomes: sA+s 1 (x). −s +s sA+ 1 (x):- sA1 (x), not sA1 (x). Let us consider an LAS theory T = (R, N) containing x unique rules in each layer. Then, the total number of rules is X = xN. The number of rules and facts created by the guidelines is bounded by the following equation: |G(T )| ≤ (X)(0.5 + 1.5N) That is, the size of G(T ) is linear to the size of T . In fact, if N = 1, |G(T )| ≤ 2|T |. In practice many subsumed rules can be removed because not all of the rules in a layer interact with other layers. For instance, in Example 1, if we don’t need level-2 conclusions of sB, w and ∼ c, there is no need to subsume r2 and r3 in layer-2.
34
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
6. Comparisons With Other Approaches 6.1. Hierarchical Approaches Our approach in decomposing systems differs from meta-hierarchical approaches [21] and functional decompositions of knowledge-bases [13]. Unlike hierarchical approaches, the layers in LAS represent varying degree of belief such as ‘Jane might be tall’, ‘Jane is surely tall’. That is, there are no layers representing beliefs about own beliefs and so on. Unlike functional decompositions, subsumption architecture decomposes a system based on the confidence degree of knowledge: one layer is more or less confident than the other layers rather than more or less complex (or dependant). It also differs from preference based logics which use preferences only to resolve conflicts. A knowledge base of a preference based logic (e.g., Defeasible Logic [16,2]) basically corresponds to a layer in LAS. The reason for this is because priority relations between rules (and defeaters in Defeasible Logic) can be represented as just defeasible rules. The layered structure of LAS is similar to the hierarchic autoepistemic logic (HAEL) [14] in which conclusions of lower layers are stronger than higher layers. However, HAEL has no notion of support (evidence) used in LAS. Thus, it has some limitations such as inconsistent extensions. In LAS, the notion of support is used to prevent credulous conclusions. In addition, HAEL does not allow higher layers to subsume rules of lower layers. That is, in HAEL a level-2 observation (support) can not be used with level1 rules to produce level-2 conclusions. Most importantly, LAS has a computationally realizable implementation, and the language of LAS is also much more intuitive than autoepistemic logic as formulas are free of modal operators. Logic based subsumption proposed by [1] is also similar to our work, but the language is based on Circumscription, thus it requires second-order theorem provers, and rules are not subsumed. 6.2. Fuzzy Logic Approaches Similarly to Possibilistic Logic approaches [8,6], facts and rules with different levels of confidence can be combined as shown in Example 2. For instance, let q and p be justified arguments and L(q) be the level of confidence of the conclusion of a justified argument q, then it is easy to see that the following relations corresponding to the Fuzzy Logic operators hold in LAS: L(q) = max({the levels of the rules used in q}) L(q and p) = max(L(q), L(p)) L(q or p) = min(L(q), L(p)) For instance, in Example 2, L(B) = 2 because the levels of sA and r1 are 2 and 1, respectively. However, unlike Fuzzy Logic the value of L(not q) is undefined or infinite meaning almost impossible because the agent has already committed to believe the conclusion of q until there is a contrary evidence. Thus, L(q and not q) is not possible. However, unlike Fuzzy Logic approaches, LAS does not rely on measurements on the degree of belief for each rules and facts. All the rules in LAS theories can be considered to represent near 100% conditional probability (or agents’ commitment despite of possible risks) of the consequents when the corresponding premises are all provable. The layers represent relative precedence of rules and relative risk/benefits on conclusions.
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
35
For example, in LAS if an association rule r1 occurs more frequently than r2 , we simply place r1 to level-1 and r2 to level-2 whereas a possibilistic logic requires exact figures for both rules and facts, such as certainty degree, in order to draw conclusions. Thus, LAS is more suitable when only relative preference over rules and evidence can be obtained. 6.3. Nonmonotonic Logics Unlike many existing implementations of argumentation systems, LAS is a concrete implementation with first-order knowledge-bases that considers arguments for and against grounded n-ary predicate literals. [3] proposes an argumentation system that considers arguments for first-order formulas with notion of argument strengths, but it is not clear how a practical reasoning system could be built for it and what the complexity of such systems might be. LAS incorporates the idea of team defeat [2]. For example, let us consider the following abstract LAS theory: S1 = {a1 , a2 , b1 , b2 } R1 = {r1 : b1 → ∼ q, r2 : b2 → ∼ q} R2 = {r3 : a1 → q, r4 : a2 → q} The argumentation system (PS Logic) developed by Prakken and Sartor [18,19] cannot conclude ∼ q when it is given a priority relation {r1 > r3 , r2 > r4 } [12] because an attack on a rule with head q by a rule with head ∼ q may be defeated by a different rule with head q, but Defeasible Logic [16] can conclude ∼ q. It is easy to see that both r1 and r2 are justified in LAS. Another interesting problem for testing semantics of nonmonotonic logics is the reinstatement problem that unjustified arguments are considered as reasons for their conclusions which are conclusions of other justified arguments [12]. As an example, let us suppose that birds usually fly (r1 ). But, as we all know, penguins usually do not fly (r2 ). Now, imagine that genetically modified penguins usually fly (r3 ). Then, argument for ‘fly’ by r1 is reinstated in PS Logic [12]. This information can be modelled in the following LAS theory: R1 = {r3 : gp → f , gp → p, p → b} R2 = {r2 : p → ∼ f } R3 = {r1 : b → f } We should note that r1 is not a justified argument in LAS unlike PS Logic and standard Defeasible Logic.
7. Conclusion This paper presented an argumentation system extended with concepts of subsumption and varying degree of confidence along with its interesting properties. LAS can provide conceptual decomposition as well as behavioral decomposition of agent systems through rule and conclusion subsumption. The conclusions of LAS theories contain the confidence level information so that agents can better cope with dynamic situations by adjusting the acceptance level of confidence depending on the risks involved for each situation. The meta-program G(T ) contains only three types of simple rules that can be easily represented as simple combinational logics. A mapping of a propositional LAS system to
36
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
module testModule( sAcp1, sBcp1, vcp2); assign dcp2 = dcp1; input sAcp1, sBcp1; assign dcp2 = dsp2; output vcp2; assign sAcp 2 = sAcp1; wor wsp2, dcp2, dcp1, wsp1, sAcp2, sBcp2; assign sBcp 2 = sBcp1; wor ccn2, ccn1, csp2, vcp2, wcp2, dsp2; assign ccn 2 = ccn1 | csn2 & ~csp2; wor dsp1, wcp1, csn1, csn2, ccp2, vsp2; assign vcp 2 = vsp2; wire sAcp1; assign wsp2 = sBcp2; wire sBcp1; assign wcp2 = wcp1 | wsp2; assign ccn 1 = csn1; assign csp 2 = dcp2; assign dcp1 = dsp1; assign dsp2 = sAcp2; assign wsp1 = sBcp1; assign csn 2 = wcp2; assign wcp1 = wsp1; assign ccp 2 = csp2 & ~csn2 & ~ccn1; assign dsp1 = sAcp1; assign vsp 2 = ccp2; assign csn 1 = wcp1; endmodule
Figure 1. This figure shows a compilation result of the vacuum controller unit into Verilog HDL code. This controller has two inputs and one output. All the symbols represent Boolean variables ranging over the set {0,1}. For example, sAcp1 represents the state of the level-1 positive conclusion of sA. Value 1 of the state represents that “it is known that sA is true” and value 0 represents “it is unknown that sA is true.”
sAcp1 in
sBcp1 in
INV
OR
NOR
AND
AND
vcp2 out
Figure 2. An implementation of the Verilog description of the vacuum control unit in a Xilinx Spartan-3 FPGA. This figure shows an RTL (Register Transistor Logic) level description of the Verilog code for Configurable Logic Blocks (CLBs) of a Spartan-3 FPGA. (Note: the logic circuit does not necessarily represent an optimal logic circuit since it must fit into the CLBs.)
Verilog HDL (Hardware Description Language) has been developed for designing agent silicon chips [20] without a CPU using purely combinational logics and registers. Figure 1 shows an example of the compilation of Example 1 into Verilog HDL. Figure 2 shows an implementation of the Verilog code on a XilinxT M SpartanT M -3 FPGA. Limited forms of first-order LAS theories can also be directly translated into Verilog descriptions using registers, adders, and counters without a CPU making the resulting systems robust and reactive. To our best knowledge, this is the first practical implementation of a first-order logic based argumentation system with subsumption concept and varying degree of belief.
References [1] Eyal Amir and Pedrito Maynard-Zhang. Logic-based subsumption architecture. Artif. Intell., 153(1-2):167–237, 2004.
I. Song and G. Governatori / A Compact Argumentation System for Agent System Specification
37
[2] Grigoris Antoniou, David Billington, Guido Governatori, Michael J. Maher, and Andrew Rock. A family of defeasible reasoning logics and its implementation. In Proceedings of the 14th European Conference on Artificial Intelligence, pages 459–463, 2000. [3] Ph Besnard and A Hunter. Practical first-order argumentation. In AAAI’2005, pages 590–595. MIT Press, 2005. [4] Andrei Bondarenko, Phan Minh Dung, Robert A. Kowalski, and Francesca Toni. An abstract, argumentation-theoretic approach to default reasoning. Artificial Intelligence, 93:63–101, 1997. [5] Rodney A Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2(1):14–23, 1986. [6] Carlos I. Chesnevar, Guillermo R. Simari, Teresa Alsinet, and L Godo. A logic programming framework for possibilistic argumentation with vague knowledge. In Proc. AUAI ’04: the 20th conference on Uncertainty in artificial intelligence, pages 76–84, 2004. ˝ a growing trend. FPGA and Pro[7] Suhel Dhanani. FPGAs enabling consumer electronics U grammable Logic Journal, June 2005. [8] Didier Dubois, Jérôme Lang, and Henri Prade. Possibilistic logic. In Dov Gabbay, Christopher J. Hogger, and J. A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Volume 3: Nonmonotonic Reasoning and Uncertain Reasoning, pages 439–513. Oxford University Press, Oxford, 1994. [9] Phan Minh Dung. An argumentation semantics for logic programming with explicit negation. In ICLP, pages 616–630, 1993. [10] Phan Minh Dung. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77(2):321– 358, 1995. [11] Guido Governatori, Michael J. Maher, Grigoris Antoniou, and David Billington. Argumentation semantics for defeasible logics. Journal of Logic and Computation, 14(5):675–702, 2004. [12] John F. Horty. Argument construction and reinstatement in logics for defeasible reasoning. Artificial Intelligence and Law, 9(1):1–28, 2001. [13] J. Huang, N. R. Jennings, and J. Fox. An agent architecture for distributed medical care. In Intelligent Agents: Theories, Architectures, and Languages, pages 219–232. Springer-Verlag: Heidelberg, Germany, 1995. [14] Kurt Konolige. Hierarchic autoepistemic theories for nonmonotonic reasoning. In NonMonotonic Reasoning: 2nd International Workshop, volume 346, pages 42–59. 1989. [15] Kenneth Kunen. Negation in logic programming. J. Log. Program., 4(4):289–308, 1987. [16] M. J. Maher and G. Governatori. A semantic decomposition of defeasible logics. In AAAI ’99, pages 299–305, 1999. [17] Donald Nute. Defeasible logic. In Handbook of Logic in Artificial Intelligence and Logic Programming, Volume 3: Nonmonotonic Reasoning and Uncertain Reasoning, pages 353– 395. Oxford University Press, Oxford, 1994. [18] Henry Prakken and Giovanni Sartor. A system for defeasible argumentation, with defeasible priorities. In Proc. FAPR ’96: the International Conference on Formal and Applied Practical Reasoning, pages 510–524, London, UK, 1996. Springer-Verlag. [19] Henry Prakken and Giovanni Sartor. Argument-based extended logic programming with defeasible priorities. Journal of Applied Non-Classical Logics, 7(1), 1997. [20] Insu Song and Guido Goverantori. Designing agent chips. In Fifth International Joint Conference on Autonomous Agents & Multi Agent Systems AAMAS06, in print, available at http://eprint.uq.edu.au/archive/00003576/. ACM Press, 2006. [21] Michael Wooldridge, Peter McBurney, and Simon Parsons. On the meta-logic of arguments. In Proc. AAMAS ’05, pages 560–567, 2005.
38
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Social Responsibility among deliberative agents Paolo Turrini a,1 , Mario Paolucci a and Rosaria Conte a a Institute of Cognitive Science and Technology, National Research Council; Via san Martino della Battaglia, 44 00185 Rome Abstract. In this paper we aim at giving a formal characterization of the notion of responsibility in multi-agent systems. A clearer view of responsibility is critical for regulating multiagent settings: to understand what kinds of responsibility are at stake in a given scenario can help predict system’s future behaviour and improve its efficiency. However, although attempts of formal theories of responsibility are increasing, they usually reduce it to causation, while we underline the importance of a theory of responsibility before a damage could ever take place. We propose an objective notion of responsibility, grounded upon cognitive, social, and material powers. In turn, aversive power - essential to account for antehoc responsibility - will be based on social damage, seen as reduction of one’s power. Furthermore, the multiagent dimension of the phenomenon will be addressed: shared versus collective responsibility will be distinguished. We will apply a modification of the AT EL − R∗ language (Alternating Time Epistemic Logic with Recall), dealing with goals and past history in order to characterize all the necessary ingredients (goals, ability, powers, awareness of strategies, damage) of multiagent responsibility in several scenarios. System validities will be given and discussed, and a brief discussion of results together with ideas for future work will conclude the paper. Keywords. Theory, Multi-Agent Systems, Responsibility, Temporal Reasoning.
1. Introduction Responsibility is a fundamental notion for the study and regulation of multi-agent systems. In AI and Computer Science the importance of the notion of responsibility is increasingly acknowledged, and proposals of formalization are being made [2], [3], [4], [5]. It is now clear that understanding what kinds of responsibility are at stake in a given social scenario can help predict the system’s future behaviour and improve its safety. But agreement on the required formal ingredients is still beyond reach. Responsibility is often related to action, causation and change, and usually called into question in direct modifications of the world state. Cognition and the mental counterparts of being responsible (like awareness of own capacities) are rarely focused on nor distinguished from objective states of the world. 1 Correspondence to: Paolo Turrini, ISTC-CNR Via san Martino della Battaglia, 44 00185 Rome. Tel.: +39 0644595252; Fax: +39 0644595243; E-mail: [email protected].
P. Turrini et al. / Social Responsibility Among Deliberative Agents
39
In this paper responsibility is analyzed in its primary nature [4], thus not concerned with role or task attribution (like for instance Jones and Sergot’s "institutionalized power" [6] or roles theory in organizational ontology [16] [18] [17]). In natural language, responsibility has at least two meanings: one can be responsible for a work or task, and one can be responsible for harm. In the present work, we focus on the notion of responsibility only for what concerns social harm, both actual and potential. In other words, we will not consider an artist to be responsible for a work of art nor a manager to be responsible for a project. Instead, from now on, we will use "responsible" only in connection with damage. Responsibility will be defined as an objective notion grounded upon cognitive, social, and material powers. In particular, the notion of aversive power will construct our notion of responsibility. We will underline the difference between shared and collective responsibility. All the ingredients will be defined without using any deontic operator, though dealing and overcoming traditional moral dilemmas such as Sophie’s Choice [23]. Finally, the theory developed will be applied to the formal description of several multi agent scenarios and some system validities will be shown.
2. The Problem Responsibility and causation are often claimed to be intertwined. For instance, Jones and Sergot argue that an agent is responsible for a state of the world that "(...) might have been otherwise but for his action" [6], whereas Cockler and Halpern [5] argue that the degree of responsibility of a process in a given context is computable on the grounds of it being the cause of a given effect, and the degree of blame on the grounds of agents’ expected responsibility. However, in many situations the mentioned links among responsibility, causation and even representation are not so clear-cut. For instance, • An agent may cause a given effect without being responsible for it. If somebody made a cake, this would not be enough to consider him responsible for it. No damage for which to be responsible exists. Instead, if the cake had been intentionally poisoned and this caused someone’s death, then the agent would be responsible. • An agent may not cause and nevertheless be responsible. Suppose an artificial agent could extinguish a fire. The robot did not actually cause the wood to burn; since without the robot being there the wood would be burning anyway. Thus the robot is responsible, since a different choice (not to let the wood burn) would have prevented the ordeal. • An agent may have epistemic incapacity to predict and yet be responsible. This is a classical example [5] in which a doctor causes a damage to a patient for not being certain of the effect of a treatment. The doctor does not know whether the treatment would be fatal, though he should have known. • Epistemic prediction alone is not enough even for blame. Suppose a robot has been programmed to explode, and suppose it had a perfect expectation and representation of the damage it was going to cause. But this is not enough since it had not the power to avoid the explosion. Epistemic and causal explanations seem to miss the point: they are neither necessary nor sufficient for one to be attributed responsibility. Not even norm violation is enough
40
P. Turrini et al. / Social Responsibility Among Deliberative Agents
to attribute responsibility: a law can be violated by an agent that does not have the power to behave any differently: this is the main reason why deontic operators are argued to be not sufficient to characterize responsibility. The capacity to predict and avoid a damage, instead, seem to be the crucial points. In the next section, we intuitively establish the cognitive and social ingredients that in our view are needed to account for responsibility, to be formalized later on in the paper.
3. An answer: the Power to prevent In this paper we adopt the view of [4], in which responsibility is the power to prevent a given harmful state of the world from being brought about, and presupposes agents’ deliberative capacity. Agents may therefore cause events they cannot be held responsible for, since they had no power to predict them: if a baby with a sharp knife scratched a valuable canvas, not even enormous damage would be enough to hold him responsible because children, though materially capable to cause damage, are not able to represent and prevent it. Conversely, the baby’s parents could be held responsible although they did not cause the wreckage. Responsibility presupposes a set of mental properties, including the capacity to represent a damage as a state consequent to a given action, and the capacity to act in order to prevent it. As far as groups are concerned, responsibility that comes from social action may not be necessarily reduced to individual guilt. In fact, some powers of a group cannot be reduced to the powers of members. In this regard, we distinguish the case in which each member of a group is individually able to avert a damage from the case in which only the combined action of all members can avert the damage. We distinguish therefore shared responsibility, that is the state in which group members are all individually able to avoid a damage that could be brought about, and collective responsibility, in which, instead, only collective action could avoid it. 3.1. Power and Responsibility As argued before, to have responsibility one needs to have power. For modelling power we adopt Castelfranchi’s view [8], according to which power is the set of internal (mental capacities abilities) and external means (material resources) that allow agents to pursue a set of given goals. The social side of power is composite. In fact, agents can have the power to realize or compromise their own goals - and in this case we talk of Power-of - but they can also have the power to realize or compromise others’ goals, and in this case we talk of Powerover. Power-over represents the social and interactive side of power, allowing agents to favour or threaten others, to create or break dependence relations, etc. Responsibility is associated with social harm, that is, agents’ loss of power [4]. Harm is not merely a goal that has been compromised: a parent holding back a fat child from eating too much is indeed compromising some goals of his’, but is not committing any social damage. Instead, if the parent, though aware of the danger, let him eat too much then he would compromise much of the child’s future capacities. In the next sections we build and develop a formal system on the philosophical investigation made so far.
P. Turrini et al. / Social Responsibility Among Deliberative Agents
41
4. Formal Tools 4.1. From AT EL − R∗ to AT EL − R + G∗ Our logic is a straightforward extension of AT EL − R∗ , an alternating time temporal epistemic logic with recall by Jamroga and Van der Hoek [19] that empowers ATL [11] with epistemic modalities among agents with imperfect information and able to recall the history of the game. AT EL − R∗ has been developed in order to be consistent with the assumption that agents have incomplete information about their current state, by having agents that "know how to play", i.e. they can identify and choose the same (uniform) strategy in situations that cannot tell apart. The formalization of the notion of responsibility requires mental states and temporal reasoning - the capacity to reason on history - to be core ingredients of a logic of cooperation. Moreover agents’ powers cannot simply coincide with what agents can do independently of their knowledge. AT EL − R∗ provides all key ingredients for responsibility: it seems extremely suitable for modelling situations in which an agent "had the power to prevent". Nevertheless, AT EL−R∗ is yet not sufficient, because an agent designed according to the cognitive theory of practical reasoning and social interaction must be governed by volitive attitudes such as goals or intentions [12]. Moreover, goals are necessary in our definition of power. Thus we limit our extension to a history dependent volitive modality, as it will be shown in the next section in which we provide a brief description of the core notions together with syntax, semantics and validities. 4.2. ATEL-R+G∗ To build a formal system for a theory of responsibility we propose a relational structure with the following shape: S =< Π, Σ, Q, I, π, ζ, ∼1 , ..., ∼n , G1 , ..., Gn , d > where Π is a countable set of atomic propositions, Σ is an ordered set of agents {a1 , ..., an }, Q is a finite nonempty set of states {q0 , q1 , ..., qn }, I ⊆ Q is the set of iniQ tial states, π : Q → 2Π is the evaluation function, ζ : Q × Σ → 22 is the system transition function that associates to a couple world-agent a set of sets of worlds. We require as in [10] the intersection of all agents’ choices to be always a singleton to determine univocally the next state. ∼a ⊆ Q × Q and Ga ⊆ Q × Q are respectively an epistemic and a volitive accessibility relation. We require for the epistemic operator to satisfy a S5 normal system of modal logic, whereas for goals - as traditional - we require KD. Let us recall that epistemic accessibility relation represents agents’ inability to tell states apart. C D General, distributed and common knowledge (∼E Γ , ∼Γ , ∼Γ ) are derived from epistemic accessibility relation. General knowledge is given by the union of individual knowledge, distributed knowledge by the intersection, common knowledge by the reflexive transitive closure. At last d : Σ × Q → N specifies how many options are available for an agent in a given state. Typically, an agent a at state q can choose his decision from set {1, ..., da (q)} and a tuple of decisions < j1 , ..., jn > for the n agents of Σ in a state q consititutes a system transition function of the form: δ(q, j1 , ..., jn ).
42
P. Turrini et al. / Social Responsibility Among Deliberative Agents
4.2.1. Strategies and Computations We are going to interpret formulas in a temporal structure whose behaviour can be described considering the set of possible histories of the system. We recall some notation for computations: for any sequence of states λ = q0 , q1 , ..., λ[i] is the i-th position in λ; λ|i = q0 , q1 , ...qi denotes λ up to i ;λi = qi , qi+1 , ... denotes λ from i on. Q+ indicates the set of all possible sequences of states. At each world in this structure an agent - say a - has a strategy fa which specifies decisions available to it for every possible history of the game, fa : Q+ → N . For groups Γ collective strategies are indicated by FΓ : Γ → (Q+ → N ). Moreover out(q, FΓ ) is the set of all possible computations starting from state q and consistent with FΓ . A computation λ ∈ Q+ is consistent with a collective strategy FΓ if for i = 0, 1... there exists a tuple of decisions 0 < jk ≤ dak (λ[i]), such that FΓ (ak )(λ|i ) = jk for ak ∈ Γ, with δ(λ[i], j1 , ..., jn ) = λ[i + 1]. Let us moreover state ℓ(λ) be the length of a computation, and that two computations are indistinguishable, λ ≈a λ′ iff λ[i] ∼a λ′ [i] for every i. We can straightforward generalize to common, general and distributed knowledge with operator ≈κΔ , where κ ∈ {E, D, C}. Taken Λ to be a run of the system, that is a computation starting from an initial state, a computation is said to be feasible if and only if: Λ|n ≈a λ, where n = ℓ(λ) − 1; Λ is consistent with fa . With the characterization of strategy described above, agents may happen to bring about some states of the world without having a proper power to do so (not knowing in which world they are, see [19],[20]). In order to solve this problem, the function out needs to be modified. In this regard let us restate out∗ (Λ, fa ) = {λ|λ is feasible, given Λ and fa }. As well argued in [19] such function should modify the semantics of cooperation modality ≪ a ≫ in order to describe de re strategies 2 . Strategies are now properly uniform: fa (λ) ≤ da (q), where q is the last state in λ, and if λ ≈a λ′ then fa (λ) = fa (λ′ ). 4.2.2. Syntax The syntax of the new system is given by the following grammar: φ ::= ⊤ | p | ¬φ | φ ∨ ψ | Xφ | φU ψ | κΓ φ |Goala φ | ≪ Γ ≫κ(Δ) φ | X −1 φ. where φ, ψ are formulas of AT EL − R + G∗ , Γ and Δ are groups of agents, κ is any of the epistemic operators in {E, D, C}. Derived operators are: φ ∧ ψ ≡ ¬(¬φ ∨ ¬ψ) (usual procedures for implication, bi conditional and so on), Ka φ ≡ Ca φ, F φ ≡ ⊤U φ etc..., [[Γ]]κ(Δ) φ ≡ ¬ ≪ Γ ≫κ(Δ) ¬φ (duality), Gφ ≡ ¬F ¬φ. 4.2.3. Semantics So as to include past tense modalities in our semantics we need to interpret our formulas over paths as shown in [22]. Let us extend the function out∗ to out∗κ(Γ) (λ, FΔ ) = {Λ|Λ|n ≈κΓ λ and Λn is consistent with FΔ , where n = ℓ(λ)}. 2 There is a subtle difference between the existence of strategies such that agents know they bring about a certain state of affairs ("de re") and the knowledge of the existence of strategies that bring about a certain state of affairs ("de dicto"). In order to have full knowledge of how to play agents are supposed to be endowed with "de re" strategies.
P. Turrini et al. / Social Responsibility Among Deliberative Agents
43
The semantics is given by: • • • • • • •
Λ, n |= ⊤ ; Λ, n |= p iff p ∈ π(Λ[n]) (where p ∈ Π); Λ, n |= ¬φ iff Λ, n |= φ; Λ, n |= φ ∨ ψ iff Λ, n |= φ or Λ, n |= ψ; Λ, n |= Xφ iff Λ, n + 1 |= φ; Λ, n |= X −1 φ iff n > 0 and Λ, n − 1 |= φ; Λ, n |= φU ψ iff there is a k ≥ n s.t. Λ, k |= ψ and Λ, i |= φ for all n ≤ i < k; Λ, n |= κΔ φ iff for every Λ′ such that Λ|n ≈κΔ Λ′|n we have Λ′ , n |= φ; Λ, n |= GoalΔ φ iff exists Λ′ s.t. Λ|n ≈κΔ Λ′|n and Λ[n]GΔ Λ′ [n] and Λ′ , n |= φ; Λ, n |=≪ Γ ≫κ(Δ) φ iff there is a collective uniform strategy FΓ , such that for all Λ′ ∈ out∗κ(Δ) (Λ|n , FΓ ), we have Λ′ , n |= φ.
5. Social Ingredients In this section, social notions needed to talk of responsibility will be specified. 5.1. Ability Social agents interact with and influence one another in order to realize their own goals and exploit their abilities. Abilities denote the whole set of material and mental resources that are needed to change the world state. The important feature that AT EL − R + G∗ adds to abilities is the concept of uniform strategy, that is the fact that agents’ will play the same strategies in indistinguishable worlds in order to achieve a given proposition. Note that in the operator ≪ Γ ≫κ(Δ) the sets of agents (Γ, Δ) are in general distinct; in other words, ≪ Γ ≫κ(Γ) can be simply read as "Γ knows how", while ≪ Γ ≫κ(Δ) should be read as "Δ knows a strategy for Γ", and this does not imply that Γ knows the strategy. As for epistemic modalities, we will consider the general case - unless otherwise specified - but sometimes we will be dealing with the strongest one ≪ Γ ≫C(Γ) , which requires the least amount of additional communication [19]. Abilities are not yet a form of power, since they may concern propositions of no interest for agents. Moreover, abilities can be of various kinds (cfr.[15], [21]): strong, weak, shared, unique and so on. Forms of ability. We consider first a strong form of ability (S-Able) in which a group of agents is capable to force a computation, whatever others do. Basically a group of agents Γ is strongly-able to bring about a formula φ iff they can reach a world in which this formula is satisfied: S-Able(Γ, φ) ≡≪ Γ ≫κ(Γ) φ. This puts forward the following considerations about system validities: 1. If a group of agents is strongly-able to bring about a state of the world in which φ is true, then the other agents cannot avoid this to happen. Formally ≪ Γ ≫κ(Γ) φ → ¬ ≪ Σ − Γ ≫κ(Σ−Γ) ¬φ. 2. S-Ability of a group does not decrease by adding new members. Formally, ≪ Γ ≫κ(Γ) φ →≪ Γ ∪ {a} ≫κ(Γ) φ. 3. Group S-Ability increases by adding new members with new abilities. Formally, ≪ Γ ≫E(Γ) φ∧ ≪ Z ≫E(Z) ψ →≪ Γ ∪ Z ≫E(Γ∪Z) φ ∧ ψ 3 . 3 This only works with general knowledge, because distributed and common knowledge do not distribute over union.
44
P. Turrini et al. / Social Responsibility Among Deliberative Agents
S-Ability actually rules out the capacity of agents to obstruct each other. Weaker degrees of ability can thus be observed by looking at the set of world states in the choice functions of the agents. For instance, agents that cooperate with others are in a way still able, in a weaker sense, to bring about φ. Let us express formally the possibility that an agent can enforce φ next only together with another agent: ¬ ≪ a1 ≫κ(a1 ) Xφ∧ ≪ a1 ∪ Γ ≫κ(Γ∪a1 ) Xφ. Whereas, the possibility that an agent is given an ability can be rendered as: ¬ ≪ a1 ≫κ(a1 ) Xφ∧ ≪ Σ ≫κ(Σ) X ≪ a1 ≫κ(a1 ) Xφ. It is also possible to force others to assign some ability: ¬ ≪ a1 ≫κ(a1 ) XXGφ∧ ≪ a2 ≫κ(a2 ) X ≪ Σ ≫κ(Σ) X ≪ a1 ≫κ(a1 ) Gφ. Whatever ability we take into account, this important validity of AT EL − R∗ holds: ≪ Γ ≫κ(Δ) φ → κΔ ≪ Γ ≫κ(Δ) φ, which claims that if a group Γ of agents can bring about φ by knowledge of a group Δ, then Δ has a corresponding epistemic representation about that. It is also provable that ≪ Γ ≫E(Γ) φ →≪ Γ ≫E(Γ∪Δ) φ, that is general knowledge of strategies spreads in larger groups. In addition it can be shown that common knowledge does not: ≪ Γ ≫C(Γ) φ →≪ Γ ≫C(Γ∪Δ) φ. Finally, we are able to set a constraint on the system: all agents have the goal to know their (uniform) strategies. This makes sense, since the knowledge of strategies increases ability. We set an axiom that we call (KG). κΓ ≪ Γ ≫κ(Δ) φ → GoalΓ ≪ Γ ≫κ(Γ) φ 5.2. Power Some abilities agents have can realize their own or other’s goals. Such abilities matter much more for the agents, because they affect plan realization. Therefore, together with Castelfranchi [8] we call Power the ability to realize a goal. From a cognitive point of view it is extremely different whether the goal I am able to realize belongs to me or to another agent. Thus we distinguish between Power-of, the ability of an agent to realize or not his own goals, and Power-over the ability of an agent to realize others’ goals. Such powers are not mutually exclusive. Power-of. A group Γ has Power-of towards the realization of his goal φ iff Γ has the positive and negative ability and the goal that φ is true. P W -OF (Γ, φ) ≡≪ Γ ≫κ(Γ) φ∧ ≪ Γ ≫κ(Γ) ¬φ ∧ GoalΓ φ.
Power-over. A group Γ has Power-over towards the realization of goal φ of group Z (Γ ∩ Z = ∅) iff Γ has the ability that φ is true and the ability that is false, and Z has the goal that φ. Formally: P W -OV (Γ, Z, φ) ≡≪ Γ ≫κ(Γ) φ∧ ≪ Γ ≫κ(Γ) ¬φ ∧ GoalZ φ. Is it reasonable to claim that Γ has power over Z even when Z has power to realize his goal? It seems so, at least in a weak sense, since goal realization represents indeed a cost that Z may prefer Γ to pay.
Strong Power over. A group Γ has strong-Power-over towards the realization of goal φ of group Z (Γ ∩ Z = ∅) iff Γ has positive and negative ability over φ, unlike Z, which has the goal that φ. Formally: SP W -OV (Γ, Z, φ) ≡≪ Γ ≫κ(Γ) φ∧ ≪ Γ ≫κ(Γ) ¬φ ∧ GoalZ φ ∧ ¬ ≪ Z ≫κ(Z) φ. Strong Power-over is a suitable notion for describing dependence relations. In order
45
P. Turrini et al. / Social Responsibility Among Deliberative Agents
q2
not phi, not psi
q3
phi, not psi
q2
not phi, not psi
q3 phi, not psi
q0
q0 phi, not psi
phi, not psi
q4
q4 q1
q1 phi, psi
phi, psi phi, psi
phi, psi
Figure 1. power distribution (a1-a2)
to realize their goal, agents in Z depend on Γ. This may give rise to patterns like cooperation, social exchange or even exploitation. 5.3. Damage For its importance within the theoretical analysis of responsibility, the notion of damage will be considered. It would be an oversimplification to treat damage simply as goals that are compromised. For instance children can have their power limited by parents without being considered damaged, or a state may well limit citizens’ powers without damaging them. Damage is hence more widely seen as power limitation, as in [4]. Let us exemplify it. We say that an agent a1 damages an agent a2 iff for a1 ’s choice, a2 ’s total power reduces. With total power we mean the set of formulas an agent (or a group) is able to bring about, intersected with its goals in case of power of, with others’ ones in case of power over. Let us imagine the following situation [Figure 1], and for sake of simplicity let us not consider knowledge of strategies: • • • • • • •
δ(q0 , a1 ) = {{q1 }, {q2 }} δ(q0 , a2 ) = {{q1 , q2 }} δ(q1 , a1 ) = {{q3 }} δ(q1 , a2 ) = {{q3 }} δ(q2 , a1 ) = {{q3 , q4 }} δ(q2 , a2 ) = {{q3 }, {q4 }} q3 and q4 are terminal.
in which we have an agent a1 that can decide about a2 ’s future. Suppose moreover that π(q0 ) = π(q3 ) = φ, π(q1 ) = π(q4 ) = φ ∧ ψ, π(q2 ) = ∅, and that q0 |= G Goala2 Xφ ∧ G Goala2 Xψ. Thus a2 has two always-goals in q0 . At last, even though not compromising a2 ’s goals by choosing to go to q1 , a1 brings the system to loop in a world in which a2 ’s goals are always compromised. Whereas, even though compromising a2 ’s goals by choosing to go to q2 , a1 leaves a2 with the choice to go to a world in which all his goals are satisfied. Let us introduce T Ab(a, q) : {φ|M, q |=≪ a ≫K(a) φ} which is most generally the set of formulas an agent can bring about, and the resulting set T P W (a, q) = {T Ab(a, q) ∩ {φ|q |= Goala φ}} which represents the total power of agent a at a given world. In our simple example, given a2 ’s goals it can be shown that #T P W (a2 , q1 ) < #T P W (a2 , q2 ), that is the power of agent a2 at state q1 is strictly less than that at state
46
P. Turrini et al. / Social Responsibility Among Deliberative Agents c2 c1,c2
q0
q3
c1,c2
q2 c1
q4
0 q1
Figure 2. Sophie’s Choice
q2 . Although after each move of the agents, the possible futures tend to shrink, considering formulas that are wanted represents a fair parameter to check agents’ powers. 5.4. Aversive Power An important kind of power that is linked with the notion of damage is Aversive Power, that is the power to avoid a damage. A group Γ has aversive power towards the harm of group Z iff, given a goal Xφ of Z, Z does not have a uniform strategy for Xφ, whereas Γ does. A-P W (Γ, Z, φ) ≡ ¬ ≪ Z ≫κ(Z) X¬φ ∧ ≪ Γ ≫κ(Γ) X¬φ ∧ G GoalZ X¬φ, which defines aversive power in relation with social damage. 5.5. Responsibility Responsibility can be characterized with the formal and social ingredients we have put forward so far. A good indicator of responsibility agents are endowed with (before any action can ever commence) is aversive power, which differs from responsibility after a damage has been brought about. Group Γ has responsibility for a damage φ wrt group Z iff it previously (immediately in the past) had the power to avoid φ, and now φ is true. Resp(Γ, Z, φ) ≡ φ ∧ X −1 (A-P W (Γ, Z, φ)).
Example: Sophie’s Choice One of the most widely discussed moral dilemmas [23] that can be treated with a formal theory of responsibility is “Sophie’s Choice”. Sophie is detained with their two children in a Nazi concentration Camp. Suddenly a Nazi official comes over and tells her that only one of her children will be allowed to live whereas the other will be killed. But it is up to Sophie to decide whom. Sophie can prevent one to die, by ordering to kill the other. What is more, if she chooses neither, both will be killed. Suppose a world in which there are two agents {a1 , a2 } with two observable atoms {c1 , c2 } with the intended meaning that the child 1 (/2) is alive. In the initial state q0 no child is dead. Straight arcs describe Sophie’s uniform strategies. Now two alternatives are at stake [Figure 2]. If the dotted arc is not present, then in whatever world we end up, Sophie (s) cannot be reckoned responsible. To the contrary, if the dotted arc is present, Sophie is responsible, because q0 |=≪ s ≫K(s) X(c1 ∧ c2). But what is the dotted arc? It is the possibility for Sophie to escape the dilemma, for instance by convincing the guard to retire his obligation, or by attempting to physically escape the concentration camp. In fact the analysis in terms of responsibility of Sophie’s Choice is strictly dependent on the model we design.
P. Turrini et al. / Social Responsibility Among Deliberative Agents
47
Let us now move to more complex forms of multiagent responsibility. 5.5.1. Shared Responsibility Two agents a1 and a3 in a group Γ share responsibility for a damage φ iff they are both responsible for φ towards a same agent a2 . SH-Resp(Γ, a2 , Xφ) ≡ Resp(a1 , a2 , Xφ) ∧ Resp(a3 , a2 , Xφ) Example: Criminals Suppose a model with four worlds q0 , q1 , q2 , q3 , with Π = d1 , d2 and π(q0 ) = 0, π(q1 ) = d1 , π(q2 ) = d2 , π(q3 ) = d1 , d2 . Suppose that two criminals a1 , a2 have cooperated in order to commit a murder. The cooperation would have failed if one had refused to participate in the crime. Although their action has been cooperative, their responsibility is therefore individual and shared. Imagine that a third agent wants the two criminals not to accomplish their joint action: Goala3 ¬X(d1 ∧ d2 ). And now suppose the following distribution of powers: • δ(q0 , a1 ) = {{q1 , q3 }, {q0 , q2 }} • δ(q0 , a2 ) = {{q2 , q3 }, {q0 , q1 }} • δ(q0 , a3 ) = {{q0 , q1 , q2 , q3 }} Each element of the couple {a1 , a2 } would be enough to ensure that the damage is avoided, although both are necessary for the damage to be brought about. Their responsibility is therefore shared: hence, they are both individually responsible. 5.5.2. Collective Responsibility Instead, agents a1 and a2 in a group Γare collectively responsible towards a damage Xφ towards an agent a3 iff they are interdependent towards its avoidance (they are all needed for the damage to be avoided, no-one is sufficient but all are necessary). Collective responsibility is thus characterized as follows: C-Resp(Γ, a3 , Xφ) ≡ φ ∧ X −1 (G Goala3 X¬φ ∧ ¬φ ∧ ≪ a1 ≫κ(a1 ) Xφ ∧ ≪ a2 ≫κ(a2 ) Xφ ∧ ¬ ≪ a1 ≫κ(a1 ) X¬φ ∧ ¬ ≪ a2 ≫κ(a2 ) X¬φ ∧ ≪ Γ ≫κ(Γ) X¬φ). Example: Pollution Suppose a company pollutes the environment. Who is responsible for such damage? No member of the company has a totalitarian power, but members together could have cooperated to avoid the company to pollute: they are collectively responsible. Let us take an agent a3 that wants no-one out of a1 and a2 to bring about a damage. Formally: Goala3 ¬X(d1 ∨ d2 ). And now suppose: • δ(q0 , a1 ) = {{q1 , q3 }, {q0 , q2 }} • δ(q0 , a2 ) = {{q2 , q3 }, {q0 , q1 }} • δ(q0 , a3 ) = {{q0 , q1 , q2 , q3 }} Notice that a single agent of the couple {a1 , a2 } is no more sufficient to ensure that the damage is avoided. Their responsibility cannot be individual: no one is sufficient to avoid the damage, but all of them are necessary. Let us conclude with some results about responsibility. Proposition 1 If a group Γ = {a1 , ..., an } is collectively responsible for damage φ to Z then members a1 , ...an of Γ do not share responsibility for φ. C-Resp(Γ, Z, φ) → ¬SH-Resp(Γ, Z, φ).
48
P. Turrini et al. / Social Responsibility Among Deliberative Agents
Proof. The proof follows from the definition of collective responsibility, whose condition is that single members cannot see to it that φ, which is a necessary condition for sharing responsibility. C-Resp(Γ, Z, Xφ) → (¬ ≪ a1 ≫κ(a1 ) Xφ ∧ ... ∧ ¬ ≪ an ≫κ(an ) Xφ) → ¬SH-Resp(Γ, Z, Xφ).
Proposition 2 Let us assume (KG) and consider a social damage ¬φ, and the fact that a group Γ is not aware of having a strategy to avert ¬φ, but another group Δ is, and has the ability to make Γ aware of that. Formally, ≪ Γ ≫κ(Δ) φ∧ ≪ Δ ≫κ(Δ) X ≪ Γ ≫κ(Γ) φ. We can conclude using definitions that Resp(Δ, Γ, X¬κΓ ≪ Γ ≫κ(Γ) φ). This puts forward a “communicative responsibility” agents have in order to avoid social damage. In fact it provides a formal analogue for situations in which agent “should have told” an other about how to avoid a given damage. This is furthermore consistent with what is said in [19] in which knowledge of uniform strategies are distinguished according to the requested amount of communication.
6. Results and Conclusions In this paper the notion of responsibility has been formally decomposed and analysed. On the one hand, it has been shown that no deontic operators were necessary to account even for the most complex multi agent scenarios, involving shared and collective responsibility. Moreover action causation has been distinguished from responsibility, showing how the simple fact to be forced to make choice does not allow for responsibility attribution. On the other hand scenarios in which responsible agents did not cause directly a damage have been also analysed: the power to prevent damages represents a microfoundation of the notion of responsibility and hence of responsibility-based social regulation in general. Applications and future work. Many are the fields in which a theory of responsibility could be of applicative use: in legal and juridical context (see also [6], [7]) to be distinguished from damage, causation and guilt; in distributed artificial intelligence, both for protocols (compare with control systems in [15]) and autonomous agents, in groups [3] or institutionalized settings [17]. As emphasized in [17] responsibility in role attribution is stricly interlinked with the notion of autonomy, extremely important for MAS theory and practice. In general to understand what is responsible for what damage (and where aversive power is located) could be of great help for all those systems built on the concepts of liveness and safety. Our paper shows a cross-fertilization between formal methods and theoretical advances, elucidating how social sciences phenomena can contribute to multiagent systems, providing an empirical test for formal tools, and how multiagent systems can be applied to account for social and cognitive science in general, providing rigorous and formal specification for theories. Future work will be directed towards the understanding of the role of collective responsibility, extending the present formal language to important multiagent notions, such as accountability and task based responsibility, and clarifying the dynamics of responsibility in institutionalized settings. Further cognitive counterparts of being responsible (shame, sense of guilt, feel of responsibility) will be analyzed. On the formal side the work will be directed to better understand the complexity of computing multiagent responsibility.
P. Turrini et al. / Social Responsibility Among Deliberative Agents
49
References [1] Latane’, B., Darley, J.: Bystander intervention in emergencies: diffusion of responsibility. Journal of Personality and Social Psychology. (1968) 377–383. [2] Jennings, N.R.: On being responsible. Decentralized AI 3 - Proceedings of the Third European Workshop on Modelling Autonomous Agents in a Multi-Agent World (MAAMAW-91). E. Werner and Y.Damazeau. (1992). [3] Norman, T.J., Reed, C.: Group Delegation and Responsibility. Proceedings of AAMAS E. Werner and Y.Damazeau. (2002) 491–499. [4] Conte, R., Paolucci, M.: Responsibility for Societies of Agents. Journal of Artificial Societies and Social Simulation. 7-4 (2004) . [5] Cockler, H., Halpern, J.: Responsibility and Blame: A Structural Model Approach. Procs of IJCAI. (2004) 147–153. [6] Jones, A., Sergot, M.: A Formal Characterization of Institutionalised Power. Journal of the JGPL. 4(3) (1995) 429-445. [7] Carmo, J., Pacheco, O.: Deontic and Action Logics for Organized Collective Agency Modeled through Institutionalized Agents and Roles. Fundamenta Informaticae. 48 (2001) 129– 163. [8] Castelfranchi, C.: The Micro-Macro Constitution of Power. Protosociology. 18-19 (2003) 208–268. [9] Castelfranchi, C., Miceli, M., Cesta, A.: Dependence relations among autonomous agents. E.Werner and Y.Demazeau. Decentralized A.I.. (1992) 215-231. Elsevier Science Publiscers. [10] Wooldridge, M., van der Hoek, W.: Cooperation, Knowledge, and Time: Alternating-time Temporal Epistemic Logic and its Applications. Studia Logica. (2003) 125–157. [11] Alur, R., Henzinger, T., Kupferman, O.: Alternating-Time Temporal Logic. Journal of the ACM. (2002) 672–713. [12] Bratman, M.: Intentions, Plans, and Practical Reason. (1987). Harvard University Press. [13] Conte, R., Castelfranchi, C.: Cognitive and social action. (1995). UCL Press. London. [14] Wooldridge, M.: Reasoning about Rational Agents. (1998). MIT Press. Cambridge, Massachusetts. [15] Wooldridge, M., van der Hoek, W.: On the logic of cooperation and propositional control. Artificial Intelligence (2005). 81–119. [16] Esteva, M., Padget, J., Sierra, C.: Formalizing a language for institutions and norms. Intelligent Agents VIII, LNAI 2333 (2002). Springer-Verlag. [17] Dignum, V., Vazquez-Salceda, J., Dignum, F. : OMNI: Introducing Social Structure, Norms and Ontologies into Agent Organizations. Programming Multi-Agent Systems: Second International Workshop ProMAS (2004). Springer-Verlag. [18] Bottazzi, E., Ferrario, R.: A Path to an Ontology of Organizations. Proc. of the Workshop on Vocabularies (2005). Enschede, CTIT. [19] Jamroga, W., van der Hoek, W.: Agents that Know How to Play. Fundamenta Informaticae (2004). 1–35. [20] Herzig, A., Troquard, N.: Uniform Choices in STIT, Proc. 5th International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS-06), ACM Press,Hakodate, Japan (2006). [21] Pauly, M.: A modal logic for coalitional power in games, Journal of Logic and Computation, ACM Press,Hakodate, Japan (2002). 149–166. [22] Schnoebelen, P.: The complexity of Temporal Logic Model Checking, Advances in Modal Logic, (2003). 393-436. [23] Stanford Encyclopedia of Philosophy: Moral Dilemmas, http://plato.stanford.edu/entries/moral-dilemmas.
50
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level a,1
a
Alessandro G. Di Nuovo Dipartimento di Ingegneria Informatica e delle Telecomunicazioni Università degli Studi di Catania
Abstract. In psychopathological diagnosis, a correct classification of mental retardation level is needed to choose the best treatment for rehabilitation and to assure a quality of life suitable for the patient’s specific condition. In order to meet this need this paper presents a new approach that permits performing automatic diagnoses efficiently and reliably, and at the same time is an easy-to-use tool for psychotherapists. The approach is based on a computational intelligence technique that integrates fuzzy logic and genetic algorithms in order to learn from samples a transparent fuzzy rule based on a diagnostic system. Empirical tests on a database of patients with mental retardation and comparisons with established techniques showed the efficiency of the proposed approach, which also gives a great deal of useful information for diagnostic purposes.
Keywords.
Automated diagnosis, Feature selection, Fuzzy System Design, Knowledge Extraction from Data.
1. Introduction Psychopathology is an excellent field where fuzzy sets theory can be applied with success, due to the high prominence of sources of uncertainty that should be taken into account when the diagnosis of a disease must be formulated. In psychopathological diagnosis a correct classification of a patient’s level of mental retardation, especially during childhood and adolescence, is of fundamental importance in order to guarantee appropriate treatment for proper rehabilitation and a quality of life suited to the patient’s condition. The methods currently adopted in this field use various diagnostic tools which require time and expertise to be administered correctly. Simplifying the use of these tools would make the identification of the most suitable treatment faster and more efficient. For this reason, it is a technique that is capable of executing automatic diagnoses efficiently and reliably, and at the same time it is easy to use to meet the needs of practitioners. To meet this need, in this paper, a new methodology is investigated and tested on a database of 186 previously diagnosed adults, to whom the Wechsler psychometric intelligence scale [19] was administered. Details about the Wechsler scale and about the database are given in section 4. 1 Corresponding author: Alessandro G. Di Nuovo, Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Viale A. Doria 6, 95125 Catania, Italy; E-Mail [email protected].
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
51
The creation of an easy-to-use automatic diagnostic tool, which at the same time is able to reduce the number of tests to be administered to a patient, is extremely useful because faster (and no less reliable) diagnosis allows us to plan effective treatment in a very short time. This work was focused on practical application and consequently on the effect that any error of assessment might have on an already disadvantaged human being. The aims of the tool are above all to assist clinical diagnosis and to avoid serious classification errors, which are inadmissible because they would lead to an incorrect diagnosis and thus make automatic diagnosis useless. It was chosen to use a fuzzy rule-based classifier which gives convincing indications in the correct diagnosis, even in badly classified cases, which can advise the psychotherapist to investigate more thoroughly, guided by the fuzzy affinities with the various groups or pathologies. The two main objectives of the work were the following: x Automatic recognition of the level of mental retardation by administering Wechsler intelligence scales. x Analysis of the set of data to discover the importance of each feature for diagnostic purposes, and consequent generation of a subset of attributes that would allow faster application of the scale. The problem is that the objectives being optimized are conflicting. The lowest classification error is achieved with highest number of feature and vice-versa. Many approaches could be used to find a solution to this problem, but the real problem is that they give only one solution, which often could not be practically applied. For this reason the results given by the proposed approach are a set of trade-off solutions, called Pareto set, among which the practitioners could choose the one that is suitable for a specific work. To meet these requirements in this paper, I propose a new hybrid computational intelligence system which is able to generate automatically a Pareto set of readable and transparent fuzzy rule base classifiers. A description of the proposed approach and details about the solutions adopted to integrate the two algorithms are given in section 3. The rest of the paper is organized as follows. Section 2 briefly introduces some similar works in medical field. In section 4 a case study is presented, where the proposed approach is applied on the database of patients with mental retardation and compared with a neuro-fuzzy and two widely used machine learning approaches. Finally, section 5 gives the conclusions.
2. Previous Works No similar works were made in psychological field, but several fuzzy approaches were presented for medical diagnosis [1, 5] that could be extended also to psychological diagnosis. The use of fuzzy mathematics in medicine was well explored in [9]. In [14] authors use ‘fuzzy logic’ in broad sense to formalize approximate reasoning in a medical diagnostic system, and they applied this formalism to build some fuzzy Expert Systems for diagnosis of several diseases. Nauck and Kruse developed NEFCLASS [11, 12], which uses generic fuzzy perceptrons to model Mamdani-type neuro-fuzzy systems. The authors observe that a neuro-fuzzy system should be easy to implement, handle and understand. Reinforcement learning is found to be more suitable than supervised learning for handling control problems. Castellano et al. [4] presented an approach, based on a fuzzy clustering
52
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
technique that is defined in three sequential steps, for automatic discovery of transparent diagnostic rules from medical data. Recently, different genetic programming-based intelligent methodologies for the construction of rule-based systems were compared in two medical domains [18], demonstrating the methodologies effectiveness.
3. A Genetic Fuzzy Approach for Knowledge Base Extraction from Examples The main aim of this research was to create a tool that would provide a set of fuzzy rules that any practitioners could apply. It was found that high performances, without sacrificing accuracy and transparency, can be achieved by using a genetic algorithm to optimize parameters of a fuzzy classifier algorithm. An optimized classifier can give useful data from which to obtain an efficient rule-based fuzzy system [2]. The proposal is therefore to use a genetic algorithm (SPEA, [23]) that provides the fuzzy classifier (PFCM, [13]) with feedback in the form of optimal parameters and a minimum subset of features that will ensure accuracy and compactness.
Figure 1. Systemic representation of proposed approach
3.1. The Possibilistic Fuzzy C-Means (PFCM) The PFCM [13] was chosen for several reasons: first of all, it is robust to noise and is thus more suitable than others for a greater number of applications. Another important factor is that besides the normal fuzzy affinity expressed as the probability of belonging to a group, PFCM provides some additional information (possibilistic typicality and centroid values) that are useful for the construction of fuzzy rules. Another advantage is its widespread use, which has led to a number of extensions, including the possibility of adapting it to sets of different shapes by choosing suitable metrics [22], or analyzing sets with missing values [8]. The version implemented proposes to minimize the following objective function: K
J
N
¦¦ (au
m ij
btijK ) d (xi , c j )
(1)
j 1 i 1
where cj is the prototype of the j-th cluster and d(•,•) is a metric appropriately chosen from the pattern space, xi is the i-th pattern. The objective function (1) contains two degrees, u and t. The first, u, indicates probabilistic fuzzy degree, whereas t expresses “typicality”, i.e. possibilistic fuzzy degree. It is possible to associate two parameters, a and b, to the two weights u and t: by means of these the algorithm can be guided to-
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
53
wards the features of probabilistic (b=0) or possibilistic (a=0) C-Means algorithm. To summarize: there are four parameters to be identified – m, Ș , a and b. Integrated implementation of the two algorithms caused certain problems: a pure PFCM requires random initialization, on which the result depends. It is obviously not possible to operate in this way with the genetic algorithm, because the fitness value would vary from one generation to another. A way to solve this problem is to insert the initial values of the centroids among the variables of the GA. Hall et al. [7] studied the effects of this strategy, concluding that use of a GA caused an increase in computing time of two orders of magnitude as compared with normal execution. In normal conditions it is therefore preferable to execute the algorithm several times, starting from different initial values, which gives similar, if not identical, results. This paper would explain performances of the proposed approach on a data set in which all the patterns had already been classified. So it was chosen to initialize the centroids with the average values of the single groups. This yielded the same results reachable with both random initialization and using the GA, but obviously meant saving a considerable amount of time. In a similar way it was defined Ȗ, a vector of user defined parameters needed by PFCM. They are chose by computing (2). N
Jj
¦u
ij
d (xi , g j )
i 1
N
¦u
(2)
ij
i 1
where gj is the vector of average values of the single groups. uij=1 if the i-th pattern belongs to j-th group, otherwise uij=0. 3.2. Distance measure Different distance metrics are used by different authors. At any rate there is not any clear indication about which of them provides the best results. The data themselves have the last say about which distance could provide best results. In this paper it was a variable Minkowski metric was chosen with a weights matrix, which makes the algorithm extremely flexible [22] and allows it to adapt to sets of any shape [3]. This is also beneficial for centroid initialization: as the space can be varied at will, the genetic algorithm will be able to find the parameters that will guarantee optimal convergence, even though the initialization is not equally optimal. The metric chosen was: D
d ( x, y )
p
¦w
p k
( xk y k ) p
(3)
k 1
where D is the size of the space of features and wk is the weight assigned to the k-th feature, which is inserted as a parameter to be estimated by the GA. The dimensionality p will also be estimated by the GA. 3.3. Normalization of data As for the distance metric, the right choice of the normalization method is still an open issue. There is not any method that is optimal for every classification problem. For this reason in this subsection it was analyzed the normalization impact on PFCM classification performance.
54
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
In order to choose the normalization method that best fit the proposed algorithm, different normalization methods were compared in terms of classification performance for different datasets. In tests the Euclidean metric (i.e. p=2 in equation(3)) was used as distance measure and the initialization method was the one that uses average values of the predefined groups. The normalization methods compared are: 1) Classical normalization, Xnorm = (X – min)/(max-min). 2) Rescaling normalization, the division by the maximum value, Xnorm = X / max. Table 1. Normalization methods comparison: classification error (%) of Euclidean PFCM with m = 2. Ș = 4, a = 1, b = 1. Without
Classical
Norm
Norm
Norm
Error%
Error %
error %
Iris
10.7
11.3
4.0
Wine
30.3
5.0
7.9
Diabetes
35.0
28.9
30.2
New Thyroid
32.8
9.8
8.4
Sonar
45.2
40.9
39.4
Data Set
Rescaling
Table 1 shows clearly that data normalization is very useful to improve classification quality, in fact all three normalization methods considered improve the PFCM performances. However there is no method which dominates the others for all the data sets. But Table 1 shows that the general rescaling normalization method leads often to the best accuracy, so it was decided to use this one, also because it is very simple and intuitive for a human reader to apply. These tests were also useful to set the stop criterion for PFCM. Figure 2 shows the objective function values for varying numbers of iterations. As can be seen no appreciable improvements in objective function are found after the 15th generation. So it was set at 15 the maximum number of iteration beyond whom the PFCM stops even if the error criterion was not satisfied. From Figure 2 it can be seen that there are no appreciable differences in number of iteration required by PFCM to converge with and without rescaling norm.
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
Iris Iris-Norm Diabetes Diabetes-Norm Wine Wine-Norm Sonar Sonar-Norm New Thyroid New Thyroid Norm
5 4.5 4 Normalized Objective Function Value
55
3.5 3 2.5 2 1.5 1
2
4
6
8
10 Iterations
12
14
16
18
Figure 2. Objective function value (normalized using division by the minimum value) for varying number of iterations. Normalization method used was the rescaling one.
3.4. The Strength Pareto Evolutionary Algorithm (SPEA) Multi-objective optimization is an area in which evolutionary algorithms have achieved great success. Most real-world problems involve several objectives (or criteria) to be optimized simultaneously, but a single, perfect solution seldom exists for a multi-objective problem. Due to the conflicting nature of at least some of the objectives, only compromise solutions may exist, where improvement in some objectives must always be traded-off against degradation in other objectives. Such solutions are called Pareto-optimal solutions, and there may be many of them for any given problem. For this work it was preferred SPEA2 [23], which is very effective in sampling from along the entire Pareto-optimal front and distributing the solutions generated over the trade-off surface. SPEA2 is an elitist multi-objective evolutionary algorithm which incorporates a fine-grained fitness assignment strategy, a density estimation technique, and an enhanced archive truncation method.
m
PFCM parameters Ș a b
p
w1
The D Feature weights … wD
Figure 3. Structure of a chromosome, whose genes are binary gray coded with 8 bit precision.
The chromosome (Figure 3) will then be defined with as many genes as there are free parameters and each gene will be coded according to the set of values it can take. In the case study, the parameters of PFCM and feature weights are mapped on a chromosome whose genes are binary gray coded, with 8 bit precision. Crossover (recombination) and mutation operators produce the offsprings. In this specific case, the mutation operator randomly modifies the value of a parameter chosen
56
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
at random. The crossover between two configurations exchanges the value of two parameters chosen at random. For each objective to be optimized it is necessary to define the respective measurement functions. These objective functions represent cost functions to be minimized. In this paper the number of features and the number of misclassifications (i.e. the number of bad diagnosis) was chosen as objective functions. If all D weights (wr) are 0 the objective values are set to infinity. A stop criterion based on convergence makes it possible to stop iterations when there is no longer any appreciable improvement in the Pareto sets found. The convergence criterion proposed uses the function of coverage [24] between two sets to establish when the GA has reached convergence. 3.5. Fuzzy rules generation process Once the described algorithm has achieved optimal classification it is possible to obtain the input membership associated with the subset of features indicated by the algorithm in various ways. PFCM provides a membership for each class, but this will obviously be defined in the space D of the subset of features selected and will thus be difficult to interpret. It may be more transparent to generate projections on the planes identified by the single features with the fuzzy affinity axis u, exploiting the information provided by the algorithm. In the case of projections, all the classical functions normally characterizing membership can be used – triangular, trapezoidal, Gaussian and so on – or combinations of them. Choice of the type of membership to adopt mainly depends on the application of the fuzzy system. Obviously, if the goal is to create rules a human can read, they will be easier to understand if they use triangular or trapezoidal rather than sinusoidal functions. For this reason it was chosen to use triangles in order to achieve the maximum transparency of fuzzy sets. Generation of the fuzzy rule base is therefore a specific operation which should ideally be separated from the classification algorithm and is shown in the representation of the system (Figure 1) as an external unit called a “fuzzy rule builder”. The fuzzy rule builder can be modified in order to meet the specific requirements of the problem being addressed. Choice of the shape and number of membership functions can be left to the genetic algorithm, taking all possible combinations and optimizations into consideration. The fuzzy sets generation technique is therefore very simple: The vertex of the K triangles, representing the membership functions fAr associated with the feature r, are the points corresponding to the values of the K centroids (C1r, … ,Cjr, … ,CKr). The j-th triangle will have fAr(Cjr) = 1 and fAr(Cj-1r)=fAr(Cj+1r)=0. With j=1 and j=K the triangle is assumed to have a vertex at infinity. The choice of the number of rules is not arbitrary: for greater readability the number of rules must be equal to the number of groups to be classified. For this reason in this work the rules are obtained using PFCM centroids, i.e. a rule is formed by sets generated starting from its associated group centroid. It is, however, possible to obtain an arbitrary number of rules with the proposed approach: it is sufficient to increase the number of groups. This may be useful when greater accuracy is required.
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
57
Once the rules have been obtained, it will be possible to apply optimization techniques to obtain a smaller number of sets, although it is preferable to have as many sets per feature as there are groups, in such a way that by looking at each single feature it is possible to have a direct idea of the classification, a property that helps us to read the system of rules generated. Other methods and more complex function shapes can be used to generate the rule base. In this study was only considered the one that would guarantee maximum simplicity in both the algorithm and the rules, so as to achieve maximum readability. The choice was confirmed by the good numerical results obtained. For the numerical examples was used a Mamdani-Assilian inference model of the following type: IF xi1 is A1j AND … AND xiD is ADj THEN xi is Tj with tij where tij is the degree of truth obtained by means of the weighted average (4) D
¦w f r
tij
Bjr
r 1
D
¦w
( xir )
(4)
r
r 1
being wr the weight assigned to the r-th feature by the genetic algorithm. fBjr(xir) is the degree of membership of i-th pattern in j-th group according with r-th set. The algorithm thus provides not only “crisp” membership in one group rather than another, but also the degree of membership, seen as the probability of belonging to the group. Obviously it is also possible to obtain a “crisp” classification by associating each pattern i with the j-th group in which tij is higher. To summarize, the proposed algorithm is as follows: 1) Normalize data by dividing by maximum value for each feature. 2) Initialize centroids. 3) Execute SPEA-PFCM in order to obtain a Pareto set of optimized fuzzy rule base classifiers.
4. A case study In preliminary test was found that better results could be achieved when mutation probability is 0.1 and crossover probability is 0.8, a population size of 30 chromosomes with 200 as the maximum number of generations. The stop criterion used for the FCM was the achievement of a maximum variation lower than 0.01 or 15 iterations. Range of weights is from 0 to 100. To prove the effectiveness of the proposed approach on unseen patterns the 10 fold cross-validation technique was applied, this practice is also useful to counterbalance the stochastic nature of GA. 4.1. WAIS-R: method, sample and results David Wechsler developed a scale for the measurement of intelligence in 1939 on the basis of his experience as a clinical psychologist at the Bellevue Psychiatric Hospital in New York. The test comprises various tasks, grouped into verbal and performance subscales.
58
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
Two main scales were developed and adapted for use in several countries and languages: WAIS-R (Wechsler Adult Intelligence Scale – Revised) for adults (over 16 years) and WISC-R (Wechsler Intelligence Scale for Children – Revised) for children (under 17 years). Complete administration of a scale takes about an hour and a half. WAIS-R [20] derived from the need to update both the norms and contents of WAIS. It is one of the most efficient tools for diagnosis and research and is currently considered as the best tool for measuring intelligence in adults. WAIS-R comprises 11 subtests, 6 included in the Verbal subscale: Information (Info), Digit Span (DgtS), Vocabulary (Voc), Arithmetic (Arit), Comprehension (Com) and Similarities (Simi); and 5 belonging to the performance subscale: Picture Completion (PicC), Picture Arrangement (PicA), Block Design (Blk), Object Assembly (Obj) and Digit Symbol (Digt). These subtests are the basic features of the data set. The database included 186 mentally retarded adults: 44 cases diagnosed as borderline, 88 mild, and 54 moderate mental retardation. Unlike normal diagnosis, in which the scores obtained are weighted using a standardized adjustment, the raw scores were used, i.e. scores deriving directly from the administration of each subtest. This was due to the fact that the scale was devised and adjusted for “normal” subjects and thus has a “floor effect” when applied to mentally retarded people. By using the raw scores the algorithm had to re-standardize the scales for the database, considerably improving the degree of accuracy. In psychological assessment the Wechsler scales are only one of many tools used for diagnosis. The results given below should therefore be read bearing this in mind. The reference diagnosis used in the classification for the databases was made using other tools, such as the Vineland Adaptation Scale [16] and clinical observations and interviews. Further information can be found in [6]. Table 2. WAIS-R, Pareto set solutions. The results were obtained by 10-fold cross validation. WAIS-R subtests average weights
N. of
Test set
Subtests
Classification Error
Selected
(%)
11
17.74
80
14
34
50
80
58
45
82
10
24
8
10
17.74
80
4
30
50
80
58
45
82
0
24
12
9
18.82
80
4
34
60
80
58
42
82
0
24
0
8
20.43
58
0
30
72
80
30
58
70
0
0
6
7
22.04
58
0
0
72
80
30
58
70
0
0
6
6
22.58
58
0
0
72
80
30
58
70
0
0
0
5
23.12
58
0
0
72
48
0
40
50
0
0
0
4
30.65
0
0
0
72
0
30
58
70
0
0
0
3
32.80
0
0
40
0
48
0
40
0
0
0
0
2
36.56
0
0
0
0
48
0
32
0
0
0
0
Info Com DgtS Simi Arit Voc PicC PicA Blk Obj Digt
Table 2 presents the Pareto set solutions (i.e. the best trade-off solutions between the number of features selected and classification error) obtained with the proposed approach. Table 2 is very useful for practitioners, who can choose the appropriate set of
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
59
subtests for the analysis to be made. If, for example, the aim is to confirm a previous diagnosis, a fast administration is needed, so the choice could be to use the five-subtest solution, saving about one hour in testing time with a quite good assessment precision of over 75%. The fuzzy rules extracted from database are very simply, they could be summarized as follows: 1. If subtests raw scores are LOW then Retardation Level is MODERATE 2. If subtests raw scores are MEDIUM then Retardation Level is MILD 3. If subtests raw scores are HIGH then Retardation Level is BORDERLINE To give a simple example Figure 4 shows the fuzzy sets obtained by the proposed approach with only two subtests. The weights associated with them are, for reasons of simplicity, assumed to be equal; in this way the degree of truth of the rules is the average of the two subtest degrees.
Figure 4. Examples of Fuzzy sets for Vocabulary and Arithmetic subtests of WAIS-R scale.
Using the fuzzy knowledge base, i.e. the 3 rules and the fuzzy sets shown in Figure 4, a practitioner is able to make a diagnosis easily. If, for example, a patient scores 2 in the vocabulary subtest and 3 in the arithmetic subtest, his diagnosis would be a moderate retardation level with a degree of truth of 1.0; if he scores 5 in vocabulary and 9 in arithmetic his retardation level is mild with a degree of truth of 0.9 and borderline with 0.1. Thanks to the simplicity of the rules and the transparency of the fuzzy sets it is clear that no particular technical knowledge or skill is required to use them to make an assessment. Another relevant characteristic of the proposed approach is that in many misclassified cases the algorithm yields a good membership in the group to which the subject really belongs, thus providing the psychologist with reliable guidance as to how to use other assessment tools. For example, using the sets in Figure 4, a mild level patient, who scores 4 in vocabulary and 13 in arithmetic could have a borderline retardation level diagnosis, but because the degrees of truth for mild and borderline are 0.35 and 0.40 respectively, the uncertainty will lead the practitioner to a more accurate analysis. Table 3 shows a comparison between the approach proposed in this paper and three well-known classification approaches: x NEFCLASS-J [10], which uses generic fuzzy perceptrons to model Mamdanitype neuro-fuzzy systems. x C5.0 [17], the newest version of C4.5 [15], which is the most famous algorithm that belongs to the class of decision tree methods. x The Naïve Bayes Classifier [21], which is one of the simplest yet most powerful techniques to construct predictive models from labeled training sets. None of the algorithms has the possibility to find a Pareto set automatically. So the results shown in Table 3 were obtained following two steps: first a feature selection method was used to rank the features, and then the classification approaches were ap-
60
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
plied on pruned data sets with a varying number of features selected on the basis of their rank. The ranking was made using an exhaustive wrapper approach and the 10fold cross validation technique, i.e. the data set was divided into 10 folds, and then, with an exhaustive search for every fold, a feature subset was selected using the precision of the naïve Bayes classification algorithm to evaluate each candidate subset. Finally, each feature was ranked on the basis of the number of times that it was selected: the lower the number, the lower the rank. Naïve Bayes and C5.0 classification performance was obtained by applying them to subsets with varying numbers of features, which were selected from those with a higher rank. In the case of NEFCLASS-J, on the other hand, its option to create a pruned system was used. Starting from the complete data set comprising all the 11 subtests, the pruned fuzzy system was created with 56 rules, 10 features and 35 misclassifications (18.82%) using the 10-fold cross-validation technique. To obtain the other results shown in Table 3, a trial-and-error iterative approach was used: 1. A feature is erased according to its rank. 2. NEFCLASS-J creates a pruned classifier. 3. Repeat step 1 and 2, until the fuzzy classifier created has the desired number of features. Table 3. Classification performance comparisons for varying number of subsets. Classification Error (%) N. of
Test set (10-fold cross-validation)
Subtests Proposed approach
Naïve Bayes
C5.0
NEFCLASS-J
11
17.74
20.43
25.81
21.51
10
17.74
18.82
29.03
18.82
9
18.82
20.43
29.57
20.43
8
20.43
22.04
28.60
21.51
7
22.04
25.81
31.80
22.58
6
22.58
29.03
31.80
25.81
5. Conclusion In this paper, inspired by a practical need, a new computational intelligence technique was presented for mining a fuzzy rule based diagnostic system from examples, and its potential was shown by real world application of the technique to automatic mental retardation level recognition. The approach proved to be a very useful tool for knowledge extraction in a readable and transparent manner and for feature selection on the basis of the classification accuracy wanted. The main characteristic of proposed technique is that it gives a Pareto set of best trade-off solutions, among which the practitioners could choose the one that is suitable for a specific work.
A.G. Di Nuovo / Knowledge Base Extraction for Fuzzy Diagnosis of Mental Retardation Level
61
Thanks to the low number of parameters to be set, this technique considerably simplifies the work of researchers, who do not need to have particular skills or knowledge to use it, and it thus allows it to be applied in everyday situations occurring in psychological rehabilitation and in general medical practice.
References [1] [2]
[3] [4]
[5]
[6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
[19] [20] [21] [22] [23]
[24]
James C. Bezdek, James M. Keller, Ranghu Krishnapuram, Ludmila I. Kuncheva, and Nikhil R. Pal, Will the Real Iris Data Please Stand Up?, IEEE Transactions on Fuzzy Systems, 7(3), (1999). James C. Bezdek, Jim Keller, Ranghu Krishnapuram, and Nikhil R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, The Handbooks of Fuzzy Sets Series, ed. D. Dubois and H. Prade: Springer, (1999), L. Bobrowski and J.C Bezdek, C-Means Clustering with the L1 and Lf Norms, IEEE Transactions on Systems, Man and Cybernetics, 21(3): pp. 545-554, (1991). Giovanna Castellano, Anna M. Fanelli, and Corrado Mencar, A Fuzzy Clustering Approach for Mining Diagnostic Rules, in IEEE International Conference on Systems, Man and Cybernetics, pp. 2007 - 2012, (2003). Maysam F. Abbod Colombetti, Diedrich G. von Keyserlingk, Derek A. Linkens, and Mahdi Mahfout, Survey of Utilisation of Fuzzy Technology in Medicine and Healthcare, Fuzzy Sets and Systems 120: pp. 331-349, (2001). Santo Di Nuovo and Serafino Buono, Strumenti Psicodiagnostici Per Il Ritardo Mentale, Linea Test, Milano: Franco Angeli, (2002), Lawrence O. Hall, Ibrahim B. Ozyurt, and James C. Bezdek, Clustering with a Genetically Optimized Approach, IEEE Transactions on Evolutionary Computation, 3(2): pp. 103-112, (1999). R.J. Hathaway and J.C. Bezdek, Fuzzy C-Means Clustering of Incomplete Data, IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics, 31(5), (2001). J.N. Mordeson and D.S. Malik, Fuzzy Mathematics in Medicine, New York: Physica-Verlag, (2000), D. Nauck, Fuzzy Data Analysis with Nefclass, International Journal of Approximate Reasoning, 32: pp. 103–130, (2003). Detlef Nauck and Rudolf Kruse, A Neuro-Fuzzy Method to Learn Fuzzy Classification Rules from Data, Fuzzy Sets and Systems, 89(3): pp. 277-288, (1997). Detlef Nauck and Rudolf Kruse, Obtaining Interpretable Fuzzy Classification Rules from Medical Data, Artificial Intelligence in Medicine, 16: pp. 149-169, (1999). Nikhil R. Pal, Kuhu Pal, James M. Keller, and James C. Bezdek, A Possibilistic Fuzzy C-Means Clustering Algorithm, IEEE Transactions on Fuzzy Systems, 13(4): pp. 517-530, (2005). Nguyen Hoang Phuong and Vladik Kreinovich, Fuzzy Logic and Its Applications in Medicine, International Journal of Medical Informatics, 62: pp. 165-173, (2001). J.R. Quinlan, C4.5: Programs for Machine Learning: Morgan Kauffman, (1993), Sara S. Sparrow, David A. Balla, and Domenic V. Cicchetti, The Vineland Adaptive Behavior Scales, Circle Pines, MN: America Guidance Service, (1984), RuleQuest Research Data Mining Tools., Http://Www.Rulequest.Com, Athanasios Tsakonas, Georgios Dounias, Jan Jantzen, Hubertus Axer, Beth Bjerregaard, and Diedrich Graf von Keyserlingk, Evolving Rule-Based Systems in Two Medical Domains Using Genetic Programming Artificial Intelligence in Medicine, 32(3): pp. 195-216, (2004). David Wechsler, Wais-R Wechsler Adult Intelligence Scale Revised – Manual, U.S.A.: The Psychological Corporation., (1981), David Wechsler, Wisc-R Wechsler Intelligence Scale for Child Revised – Manual, San Antonio, TX.: The Psychological Corporation, (1974), Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition ed, San Francisco: Morgan Kaufmann, (2005), Bo Yuan, George J. Klir, and John F. Swan-Stone, Evolutionary Fuzzy C-Means Clustering Algorithm, in FUZZ-IEEE '95, pp. 2221-2226, (1995). Eckart Zitzler, Marco Laumanns, and Lothar Thiele, Spea2: Improving the Performance of the Strength Pareto Evolutionary Algorithm, in EUROGEN 2001. Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, Athens, Greece, pp. 95-100, (2001). Eckart Zitzler and Lothar Thiele, Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach, IEEE Transactions on Evolutionary Computation, 4(3): pp. 257– 271, (1999).
62
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Tuning the Feature Space for Content-Based Music Retrieval Aleksandar Kovaˇcevi´c, Branko Milosavljevi´c, Zora Konjovi´c Faculty of Engineering, University of Novi Sad
1
Abstract. This paper presents a tunable content-based music retrieval (CBMR) system suitable for retrieval of music audio clips. Audio clips are represented as extracted feature vectors. The CBMR system is expert-tunable by altering the feature space. The feature space is tuned according to the expert-specified similarity criteria expressed in terms of clusters of similar audio clips. The tuning process utilizes our genetic algorithm that optimizes cluster compactness. The R-tree index for efficient retrieval of audio clips is based on the clustering of feature vectors. For each cluster a minimal bounding rectangle (MBR) is formed, thus providing objects for indexing. Inserting new nodes into the R-tree is efficiently conducted because of the chosen Quadratic Split algorithm. Our CBMR system implements the point query and the n-nearest neighbors query with the O(log n) time complexity. The paper includes experimental results in measuring retrieval performance in terms of precision and recall. Significant improvement in retrieval performance over the untuned feature space is reported. Keywords. music retrieval, genetic algorithms, spatial access methods
Introduction The field of information retrieval (IR) deals with problems of finding and accessing information. The concept of a document is commonly used as an information container. In a multimedia environment, documents may contain different media types such as text, images, audio and video clips. Content-based retrieval (CBR) in multimedia databases represents techniques for information retrieval that are based on the document content. Queries in CBR are usually modelled to belong to the same media type as the documents themselves. This paper presents a novel music information retrieval (MIR) system with CBR capabilities. The system operates by extracting feature vectors from audio clips and organizing them into clusters. The R-tree [7], an effective structure for indexing of multidimensional rectangles, is used for indexing the spatial coordinates of clusters. For each cluster a minimal bounding rectangle (MBR) is formed, thus providing objects for indexing with the R-tree. Similarity between two audio clips is determined by calculating the modified Euclidean distance of the corresponding feature vectors. The characteristics 1 Correspondence to: Aleksandar Kovaˇ cevi´c, Faculty of Engineering, Fruškogorska 11, 21000 Novi Sad. Tel.: +381 21 485-2422; Fax: +381 21 350-757; E-mail: [email protected].
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
63
of the feature space are tuned by adjusting the coefficients in the similarity metric by genetic algorithms (GA). Our MIR system has been developed as an add-on to an extensible multimedia document retrieval system XMIRS. XMIRS, presented in detail in [17,16], provides facilities for embedding multiple add-ons with retrieval capabilities for different media types. The rest of the paper is structured as follows. Section 2 reviews the related work. Section 3 presents our variant of the R-tree used for feature vector indexing. The process of tuning the feature space is described in Section 4. Section 5 presents our experimental results in using the system. Section 6 concludes the paper and outlines further research directions.
1. Related Work Audio information retrieval (AIR) has become an increasingly important field of research in recent years. In AIR systems of earlier date, the query is based on text-based metadata. The content-based audio retrieval (CBAR) allows user to query by audio content instead of metadata. Much work has been done on the development of CBAR. Query by humming or singing [5,9] are common approaches for retrieval from acoustic input. The queries were melodies hummed or sung by the user, and were transcribed into symbolic MIDI format. Query by tapping is another query method that takes the beat information for retrieval [11]. Recently, several researchers have explored polyphonic content-based audio retrieval [5,22,24]. Goals of the CBAR systems are to retrieve the audio clips that are similar to the query in melody, rhythm, pitch, etc. Recent work done in [13] presents a content-based music retrieval system that allows users to query music by melody style. An extensive survey of music information retrieval systems is given in [26]. CBAR systems differ from each other mostly by features that are extracted from audio clips and the similarity measure that is used. A survey on feature selection techniques is presented in [21]. In our system we use a set of features taken from [19]. The most of the published results in CBAR use a similar set of features directly or as a basis for a further feature selection process. Classification, as a means of retrieval, is commonly used in CBAR systems. The “Muscle Fish” [18] system utilizes a normalized Euclidean (Mahalanobis) distance and the nearest neighbor (NN) rule to classify the query sound into one of the sound classes in the database. In [15] the classification is performed by a neural network. Foote [4] used a tree-structured vector quantizer to partition the feature space into a discrete number of regions. In [6] a novel approach using support vector machines is used for classification. Many of the CBAR systems incorporate some kind of indexing structures to enhance their search engines. Inverted files are used in [14], tree of transition matrices in [8], Vantage objects [25] and clustering in [2]. To the author’s knowledge no system uses a combination of clustering and indexing. In our novel approach an R-tree is used for indexing the spatial coordinates of clusters after the classification process. This technique decreases the number of similarity (distance) calculations when processing a query and when indexing a new audio clip. Although applied in other areas, structures for multidimensional indexing like Rtrees have not been used in CBAR systems in this way to the best of our knowledge. In a recent paper [12], the R*-tree is used to index the MBRs comprising the groups
64
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
of features, so that each feature vector is represented by a MBR, but no feature vector clustering is used. Feature space tuning is also a part of some multimedia retrieval systems. The input gained from user’s interaction with the system is used for either (1) determining importance of particular features for that user, implemented by assigning weights to features (the M.A.R.S system [20]), or tuning the feature space so the results of the next query from the same user will be more appropriate, implemented by adjusting the weighted distance metric (the MindReader system [23]). The aforementioned systems perform feature space tuning online. 2. Our Approach In our approach, a subset of the collection of audio clips is partitioned into clusters by an expert. Partitioning is performed according to given criteria, which may be defined by the future user of the system or by the expert himself. Clusters obtained this way represent the training set for genetic algorithms used to tune the feature space. The result of the tuning process is an optimized similarity metric. Complete-link clustering using the optimized metric is then applied to the whole collection of audio clips. The feature vectors of audio clips are then indexed by an R-tree structure. Subsequent queries, being audio clips as well, are processed using the R-tree. Components of the feature vector and the algorithms for calculating their values are adopted from [19]. However, our CBMR system does not essentially depend on these features. Since the feature space is optimized later, the feature set may be altered. Our system for content-based audio retrieval operates in the following manner. The user provides an audio clip as a query, and the system returns a list of audio clips from the collection ordered by similarity with the given query. Indexing of audio clips comprises several steps: 1. A predefined set of features is extracted from the audio clip. A feature vector is composed of these values. 2. Calculations from step 1 are repeated for all clips in the given collection. 3. The collection of feature vectors is clustered. The clustering method used is agglomerative complete-link clustering [10]. The system does not directly depend on the selected clustering technique, as long as it produces disjunct clusters. 4. For each cluster a MBR is formed. 5. Rectangles formed in step 4 are inserted into the R-tree. Queries are assumed to be audio clips represented internally by their feature vectors. For the given feature vector the R-tree returns the rectangle that contains it. That rectangle represents the MBR of the cluster containing the query vector. If the query vector is not in the database, the R-tree returns the MBR which is the closest to the query. The list of nearest neighbors is then formed using the retrieved MBR. The R-tree algorithms that are used in our system are reviewed in the next section. 3. The R-tree Variant Our CBMR system uses the original R-tree designed by Guttman [7], with some modifications due to the nature of the system. Most of the data structures derived from the
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
65
original R-tree (e.g., R*-tree, TV-tree, X-tree) focus on handling of overlapping rectangles and perform additional calculations in their operation. These calculations represent an unnecessary computing overhead in our case, because overlap of MBRs is insignificant due to the fact that MBRs are formed around disjunt clusters. A specifically chosen algorithm Quadratic Split has improved the efficiency of inserting new nodes into the R-tree. The Quadratic Split variant [7] of the node splitting algorithm is used. Time complexity of this algorithm is O(M 2 ), where M is the number of elements of the node in which the split occurred. The node with M + 1 elements is split in the following way: two of the elements are chosen as the representatives of the newly made nodes. The pair of elements that would leave the most dead space if both elements were in the same node is chosen: for every pair of rectangles the difference between the sum of their areas and the area of their covering rectangle is calculated, the pair with the greatest difference is chosen. Rectangle P with the area Pp is the covering rectangle of the rectangles Si where i ∈ {1, 2, . . . , n} if the following conditions hold: 1. The area of each rectangle Si is the subset of the area of P . 2. P is the rectangle with the smallest area so that the first condition is met.
The rest of the M − 1 elements are inserted in each of the two new nodes as follows. For an element that is to be inserted, for each of the two nodes it is calculated how much should the covering rectangle of the rectangles that are already in the node be expanded to also cover the inserted rectangle, and the node with the smaller expansion is chosen. The following is a definition of the Quadratic Split algorithm. Quadratic Split. Divide a set of M + 1 elements into two groups: QS1 [Pick first element for each group] Apply algorithm Pick Seeds to choose two elements to be the first elements of the two groups. Assign each to a group. QS2 [Check if done] If all elements have been assigned, stop if one group has so few elements that all the rest must be assigned to it in order for it to have the minimum number m, assign them and stop. QS3 [Select element to assign] Invoke algorithm Pick Next to choose the next element to assign. Add it to the group whose covering rectangle will have to be enlarged least to accommodate it. Resolve ties by adding the element to the group with the smaller area, then to the one with fewer elements, then to either. Repeat from QS2. Pick Seeds. Select two elements to be the first elements of the groups. PSl [Calculate inefficiency of grouping elements together] For each pair of elements E1 and E2 , compose a rectangle J including E1 I and E2 I (MBRs of feature vectors in leaf nodes; MBRs of contained MBRs in other nodes). Calculate d = area(J) − area(El I) − area(E2 I). PS2 [Choose the most wasteful pair] Choose the pair with the largest d. Pick Next. Select one remaining element for classification into a group. PNl [Determine cost of putting each element in each group] For each element E not yet in a group, calculate d1 = the area increase required in the covering rectangle of Group 1 to include EI (a MBR of a feature vector in leaf nodes; a MBR of contained MBRs in other nodes). Calculate d2 similarly for Group 2. PN2 [Find entry with the greatest preference for one group] Choose an entry with the maximum difference between d1 and d2 . The R-tree used in this system supports various queries. In the rest of this Section we consider the term rectangle to be the MBR of the spatial objects (in this case clusters),
66
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
that are located in the leafs of the R-tree. All of the nodes in the R-tree can be traversed both preorder and postorder. The significant types of queries supported by our R-tree are the following: 1. Point query: The rectangle that contains the query point (in this case feature vector) is returned. In contrast to the common R-tree, which may contain overlapping MBRs, thus returning multiple rectangles for this type of query, our variant of the R-tree contains only non-overlapping leaf MBRs and therefore always returns a single MBR. 2. Nearest neighbor: For a given query point, the list of all the rectangles (not necessary in the leafs) that were visited during the search for the nearest neighbor is returned. The nearest neighbor is the last node in the list that is located in a leaf (contains the pointer to a cluster MBR). The nearest neighbor query is implemented using the algorithm described in [3].
4. Tuning the Feature Space Our CBMR system uses a weighted Euclidean distance metric as a vector similarity measure. Since all features do not have the same influence in the vector similarity measure, each feature is assigned a weight. We define the weighted Euclidean distance for vectors x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) as:
d(x, y) =
n i=1
ci |xi − yi |
where ci ∈ R+ ∪ {0}. Tuning the feature space is a process of calculating the values ci in such a way that distance between vectors in each of the clusters is minimized (i.e., cluster compactness is maximized). The tuning result is a vector (c1 , c2 , . . . , cn ) containing optimal values for the distance metric coefficients. We tune the feature space using a genetic algorithm described in the rest of this section. 4.1. The Tuning Process The first step in solving problems with GA is forming the appropriate representation of potential solutions (a genotype in the GA context). In this case, the real value vector representation is used. Each potential solution is represented by a vector (c1 , c2 , . . . , cn ), ci ∈ R+ ∪ {0}. Components of the vector are called genes. The next step is determining the objective function. An objective function is a function which is to be minimized. In our case, the objective function is a measure of cluster compactness. Let K be a collection of clusters K = {K1 , K2 , . . . , KN }, where N is the number of clusters in K. The cluster Ki has the form of Ki = {v1 , . . . , vpi }, consisting of feature vectors, where pi is the number of vectors in the cluster Ki . For each cluster the following value is calculated:
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
f ittKi =
pi pi
67
d(vi , vj )
i=1 j=1
The value f ittKi is the sum of all distances between vectors in the cluster Ki . The objective function for our GA is defined as:
f itt =
N
f ittKi
i=1
Argument of the objective function is a vector of coefficients ci from the weighted Euclidean distance. From the collection (a population in the GA context) of (c1 , c2 , . . . , cn ) vectors, the global minimum is determined by the process of artificial evolution. This process is realized using the chosen genetic operators described in the following subsections. 4.2. Selection We chose the tournament selection operator that functions as follows: two members (individuals) are chosen at random from the population; the member with a better (in this case lesser) value of the objective function is selected for further genetic processing. 4.3. Crossover After the selection operator is applied, the new population special operators, called crossover and mutation, are applied with a certain probability. For applying the crossover operator, the status of each population member is determined. Every population member is assigned a status as a survivor or non-survivor. The number of population members with survivor status is approximately equal to population size * (1 - probability of crossover). The number of non-surviving members is approximately equal to population size * probability of crossover. The non-surviving members in a population are then replaced by applying crossover operators to randomly selected surviving members. Several crossover operators exist. The operator used in this case is the uniform crossover. With uniform crossover, two surviving population members (parents) are randomly selected. The genes between the two parents are exchanged to produce two children. Probability of exchanging any given gene in a parent is 0.5. Thus, for every gene in a parent, a pseudo random number is generated. If the value of the pseudo random number is greater than 0.5, then the genes are flipped, otherwise they are not flipped. 4.4. Mutation A mutation operator randomly picks a gene in a surviving population member (with the probability equal to the probability of mutation) and replaces it with a new gene (in this case, a new real random number). Using aforementioned GA tuning process we have calculated the optimum vector with coefficients of the weighted Euclidean distance. The optimized metric, calculated off-line, is further used in all vector similarity calculations in our CBMR system – feature vector indexing (clustering and R-tree operations) and query processing.
68
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
5. Experimental Results We have conducted extensive experiments to measure retrieval performance in terms of precision and recall for both the tuned and the untuned system. This section presents the results of our experiments. The experiments were conducted a data set represented by a large collection of music clips of various genres. The clips are in MP3 format with the bit rate of 128 kbit/s. The collection comprises 10500 audio clips ripped from the original CDs. Music genres include alternative, blues, country, classical music, electronic, folk, gospel, jazz, latin, new age, pop, reggae, hip-hop, R’n’B, soul, rock, and ethno music from Africa, Asia, Europe and the Middle East. Small subset (1500 clips) of the data set is chosen and then partitioned by experts (music editors from the Serbian national radio station RTS) into clusters based on the predefined criteria (the music genre). The clusters were then used to tune the similarity metric. The complete-link clustering algorithm using the tuned metric was then applied on the complete unpartitioned data set. We use the precision and recall measures as defined in [1]. Figure 1 presents the average recall versus precision for the tuned and the un-tuned vector space. To evaluate the retrieval performance we average the precision figures as follows:
P¯ (r) =
Nq Pi (r) i=1
Nq
where P¯ (r) is the precision at recall level r, Nq is the number of queries used, and Pi (r) is the precision at recall level r for the i-th query. The values for recall range from 0% to 100% in steps of 10%, and the number of queries is 50. The set of relevant documents used for performance measurement is formed by the same expert that has defined the initial set of clusters. The source of potential differences between the relevant set and the answer set in this case is the location of the query vector inside the corresponding cluster (i.e., its MBR) and the query radius depending on the number of requested nearest neighbors in the result. Figure 2 illustrates this situation. If the following conditions are met: • the query point is near to the edge of the cluster, • query radius r is greater than the distance between the query and the edge, and • there is a neighboring cluster with some of its vectors enclosed by the result circle of radius r, there is a possibility that the set of relevant clips (as retrieved by the brute-force method) differs from the set retrieved by the R-tree. The R-tree will return only vectors from the cluster containing the query vector. The results of experiments presented in Figure 1 prove that the feature space tuning has a significant impact on retrieval performance. The use of the tuned feature space has provided better precision at same recall levels while preserving the same time complexity for query evaluation. Moreover, for the given recall level of 100% precision is always greater than 0, which implies that all clips from the relevant set have been retrieved. For recall levels above 40% retrieval precision in the tuned feature space is significantly improved.
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
69
Figure 1. Average recall versus precision for the tuned and the untuned system
Figure 2. Potential difference between the relevant set and the answer set
6. Conclusions This paper presented a tunable CBMR system suitable for retrieval of music audio clips. Audio clips are represented as feature vectors. Prior to the creation of index data structures, the feature space is tuned according to the expert-specified similarity criteria expressed in terms of clusters with similar audio clips. The tuning process utilizes our genetic algorithm that optimizes cluster compactness by assigning proper weights to components of the Euclidean distance metric. The R-tree index for efficient retrieval of audio clips is based on the clustering of feature vectors. Since hierarchical clustering facilitates the adjustment of the number of resulting clusters more easily than other clustering methods, we opted for this clustering technique. Our R-tree index is optimized for the following types of queries, both calculated with the time complexity of O(log n):
70
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
• Point query: For the given feature vector returns a list of all rectangles that contain the query point. • Nearest neighbors query: For the given feature vector returns the rectangle nearest to the query point. For the efficient query processing we have chosen the classical R-tree indexing data structure. A specifically chosen algorithm Quadratic Split has improved the efficiency of inserting new nodes into the R-tree. Results of experiments conducted on our system prove that the feature space tuning has a significant impact on retrieval performance. The use of the tuned feature space has provided better precision at same recall levels while preserving the same time complexity for query evaluation. Our feature space tuning process provides a possibility of creating multiple indexes optimized for different retrieval criteria. This way, a more versatile CBMR system supporting multiple retrieval criteria with the same efficiency may be built. Since a retrieval result in our CBMR system represents a list of hits ranked by similarity to the query, our system may easily be integrated into a more comprehensive multimedia retrieval system capable of handling different types of media, such as XMIRS [16,17]. References [1] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press / AddisonWesley, 1999. [2] E. Brochu and N. de Freitas. ’name that song!’: A probabilistic approach to querying on music and text. In Neural Information Processing Systems: Natural and Synthetic, 2002. [3] K. L. Cheung and A. W.-C. Fu. Enhanced nearest neighbour search on the r-tree. SIGMOD Record, 27(3):16–21, 1998. [4] J. F. et al. Content-based retrieval of music and audio. In Proc. SPIE Multimedia Storage Archiving Systems II, volume 3229, page 138147, 1997. [5] A. Ghias, J. Logan, D. Chamberlin, and B. Smith. Query by humming: Musical information retrieval in an audio database. In ACM International Multimedia Conference, 1995. [6] G. Guo and S. Z. Li. Content-based audio classification and retrieval by support vector machines. IEEE Trans. Neural Networks, 14(1), 2003. [7] A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proc. ACM SIGMOD Intl. Conf. on Management of Data, pages 47–57, 1984. [8] H. Hoos, K. Renz, and M. Gorg. Guido/mir - an experimental musical information retrieval system based on guido music notation. In Int. Symposium on Music Information Retrieval (ISMIR), page 4150, 2001. [9] N. Hu and R. Dannenberg. A comparison of melodic database retrieval techniques using sung queries. In ACM Joint Conference on Digital Libraries, 2002. [10] A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264–323, 1999. [11] J. Jang, H. Lee, and C. Yeh. Query by tapping: A new paradigm for content-based music retrieval from acoustic input. In IEEE Pacific-Rim Conference on Multimedia, 2001. [12] I. Karydis, A. Nanopoulos, A. N. Papadopoulos, and Y. Manolopoulos. Audio indexing for efficient music information retrieval. In Proceedings of the 11th IEEE International Multimedia Modelling Conference (MMM’05), 2005. [13] F.-F. Kuo and M.-K. Shan. Looking for new, not known music only: music retrieval by melody style.
A. Kovaˇcevi´c et al. / Tuning the Feature Space for Content-Based Music Retrieval
71
[14] F. Kurth, A. Ribbrock, and M. Clausen. Identification of highly distorted audio material for querying large scale data bases. In 112th Convention of the Audio Engineering Society, 2002. [15] Z. Liu, J. Huang, Y. Wang, and T. Chen. Audio feature extraction and analysis for scene classification. In IEEE Signal Processing Soc. Workshop Multimedia Signal Processing, 1997. [16] B. Milosavljevi´c. Models for extensible multimedia document retrieval. In IEEE Multimedia Software Engineering (MSE2004), pages 218–221, 2004. [17] B. Milosavljevi´c and Z. Konjovi´c. Design of an xml-based extensible multimedia information retrieval system. In IEEE Multimedia Software Engineering (MSE2002), pages 114–121, 2002. [18] www.musclefish.com. last visited 08/22/2005. [19] www.musicbrainz.org. last visited 08/27/2005. [20] M. Ortega, Y. Rui, K. Chakrabarti, K. Porkaew, S. Mehrotra, and T. S. Huang. Supporting ranked boolean similarity queries in mars. IEEE Transactions on Knowledge and Data Engineering, 10(6):905–925, 1998. [21] J. Pickens. A survey of feature selection techniques for music information retrieval, 2001. Technical report, Center for Intelligent Information Retrieval, Departament of Computer Science, University of Massachussetts, 2001. [22] J. Pickens and T. Crawford. Harmonic models for polyphonic music retrieval. In ACM International Conference on Information and Knowledge Management, 2002. [23] Y. Rui and T. S. Huang. A novel relevance feedback technique in image retrieval. In Seventh ACM international conference on Multimedia (Part 2), pages 67–70, 1999. [24] S. Shalev-Shwartz, S. Dubnov, N. Friedman, and Y. Singer. Robust temporal and spectral modeling for query by melody. In ACM SIGIR Conference on Research and Development in Information Retrieval, 2002. [25] R. Typke, P. Giannopoulos, R. Veltkamp, F. Wiering, and R. van Oostrum. Using transportation distances for measuring melodic similarity. In Int. Symposium on Music Information Retrieval (ISMIR), pages 107–114, 2003. [26] R. Typke, F. Wiering, and R. C. Veltkamp. A survey of music information retrieval systems. Retrieved 04/12/2004 from http://mirsystems.info.
72
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Personalizing Trust in Online Auctions John O’Donovan a,1 , Vesile Evrim b, Barry Smyth a, Dennis McLeod b, Paddy Nixon a a School of Computer Science and Informatics, University College Dublin, Ireland b Semantic Information Research Laboratory. Viterbi School of Engineering, University of Southern California, Los Angeles Abstract. The amount of business taking place in online marketplaces such as eBay is growing rapidly. At the end of 2005 eBay Inc. reported annual growth rates of 42.5% [3] and in February 2006 received 3 million user feedback comments per day [1]. Now we are faced with the task of using the limited information provided on auction sites to transact with complete strangers with whom we will most likely only interact with once. People will naturally be comfortable with old fashioned “corner store” business practice [14], based on a person to person trust which is lacking in large-scale electronic marketplaces such as eBay and Amazon.com. We analyse reasons why the current feedback scores on eBay and most other online auctions are too positive. We introduce AuctionRules, a trust-mining algorithm which captures subtle indications of negativity from user comments in cases where users have rated a sale as positive but still voiced some grievance in their feedback. We explain how these new trust values can be propagated using a graph-representation of the eBay marketplace to provide personalized trust values for both parties in a potential transaction. Our experimental results show that AuctionRules beats seven benchmark algorithms by up to 21%, achieving up to 97.5% accuracy , with a false negative rate of 0% in comment classification tests compared with up to 8.5% from other algorithms tested. Keywords. Trust, Transitivity, Online Auctions, Personalization
1. Introduction The majority of feedback comments on online auction transactions are positive. [14]. According to our analysis eBay is over 99% biased towards positive comments. Currently, eBay compiles a “generic” trust value from these comments, meaning that any seller’s trust value that gets presented to a buyer is (a) presented to every other person who looks up that user, and (b) compiled from a system which is 99% positively biased. This is going to yield unnaturally positive trust scores in the system. Furthermore, eBay actually removes some negative comments from the system. As of September 2005, negative comments from users who have been in the system for less than 90 days get deleted from the system. [1] We are proposing that trust values can be propagated throughout an e-commerce application between buyers and sellers, and that we can harness this information to compute a tailored trust value for a previously unseen user. A very basic example of this propagation might be as follows: Bob purchases from Mary and leaves a comment. Mary purchases from John and leaves a comment. If Bob’s comment on Mary is positive, and Mary’s comment on John is positive, then we might assume that if Bob were to purchase from Mary, there would be a positive comment. We make assumptions about the transitivity of trust in online applications. Section 2.1.2 outlines our arguments in detail. 1 Correspondence to: John O’Donovan, Adaptive Information Cluster, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland. E-mail: [email protected]
J. O’Donovan et al. / Personalizing Trust in Online Auctions
73
Figure 1. Personalized and Non-Personalized Trust Computation
Considering the fact that eBay is biased towards positive comments, we cannot use the existing eBay trust scores in our propagation mechanism. We examine several ways to compute trust-values which are different from the current eBay implementation, and introduce AuctionRules, an algorithm which we developed for this purpose. Details of the AuctionRules algorithm are given in section 3.1. In our evaluation section we use a dataset of auction feedback comments which have been classified by real users in an online survey. These are used to test the classification accuracy of AuctionRules against 7 popular classification algorithms. Our results show that AuctionRules outperforms all 7 benchmark classification algorithms by an average of 21%, and achieves a false-negative rate of zero, compared with an average false-negative rate of 5% for the other algorithms. We also outline ongoing experiments for testing the accuracy of our trust propagation system on the same dataset. Figure 1 illustrates the difference between the current non-personalized approach to trust-computation in eBay and our proposed personalized approach. Suppose user C wants to know how trustworthy user E is. Traditional (non-personalized) trust values are given by Equations 1 and 2 as a combination of all the incoming arcs. In these equations, n is the number of nodes used in the calculation. One of many computations for personalized trust is shown by Equation 3 as the combined trust of all the users in the path between nodes C and E. For example, Figure 1 shows the personalization in Equation 3 graphically in that node G will receive a trust value on node E through the connecting node B, whereas node C will receive a different trust value computed from the connecting nodes B and D. The benefit of this technique is that each node will receive a different, tailor-made trust prediction on user E, analogous to asking a friend about the seller in the local corner shop. The main contributions of this paper are: firstly, the introduction of AuctionRules, an algorithm for extracting subtle indications of negativity from user-comments in online auctions, and secondly a trust-propagation mechanism which enables personalization of trust in online auctions. T rust(E) =
T rust(F, E) + T rust(A, E) + T rust(B, E) + T rust(D, E) (1) n
T rust(C, E) = T rust(E)
(2)
74
J. O’Donovan et al. / Personalizing Trust in Online Auctions
T rust(C, E) =
T rust(C, B) ∗ T rust(B, E) + T rust(C, D) ∗ T rust(D, E) (3) n
2. Background Research Background work for this paper is in two areas. Firstly, we examine the related work in the area of trust computation and transitivity in online auction sites. The second part of our background work involves a comparative survey of ten popular auction sites. This is predominantly an applicability survey, in which we examine how suitable our trustpropagation algorithm is to each auction site. 2.1. Related Work on Trust in Online Auctions A large amount of research effort has focussed on issues of trust in e-commerce [15][14][2][5][17] and other online systems [10][11][9][4]. We will now examine work in formalising the concept of trust, it’s transitive properties and it’s use in e-commerce applications. 2.1.1. What is Trust? A clear definition of what is meant by trust can be somewhat elusive. Marsh describes a formalized model of trust in [9]. This computational model considers both the social and the technological aspects of trust. Marsh defines some important categories for trust which are useful to our research. Context-Specific Trust arises when one user must trust another with respect to a specific situation. Marsh also defines System Trust as the trust that a user places in the system as a whole. Both of these concepts are especially relevant to our experiments with online auctions, in dealing with the individual users with whom we transact, and with the environment in which we make the transaction. For our experiments with online marketplaces we must have a clear understanding of the differences between concepts of trust, and those of reputation. Resnick clearly differentiates between the concepts of trust and reputation in [14], citing the Oxford definition of reputation: Reputation is what is generally said or believed about a person or things character or standing. Resnick [14] sums up the difference in the following two plausible, normal statements. “I trust you because of your good reputation”. “I trust you despite your bad reputation”. Relating back to our previous example in Figure 1, reputation is given by Equations 1 and 2, whereas Equation 3 defines trust in the personalized sense given in [14]. 2.1.2. Transitivity of Trust in Online Applications Real world trust and reputation is propagated through friends and colleagues. [15] A potential stumbling block of this work lies in assumption that trust is transitive in the online world. For our experiments to be successful we require our trust values to have this property. There have been works which argue against this idea, for example Christianson shows in [2] that trust is not implicitly transitive. However, far more research supports transitive aspects of trust. Golbeck [4] introduces the filmtrust system, which operates successfully by propagation of trust. Similarly, the moleskiing system in [10] uses transitive trust. Experiments in [17] show trust propagation working successfully in PeerTrust, an experimental e-commerce application. Work in [11] and [5] also supports this concept in recommendation and e-commerce systems respectively. Xiong and Liu [17] outline three basic trust parameters for online auction systems. 1: The amount of user satisfaction. 2: The context, defined as the number of previous transactions a user has been involved in. 3: A balancing factor of trust. (to offset false feedback). Using the PeerTrust system, a simulated network of 128 peers, they show that trust can be propagated with reasonable accuracy by using their three parameters.
J. O’Donovan et al. / Personalizing Trust in Online Auctions
75
Work by Jøsang et al. in [7] and [6] describe approaches to trust network analysis using subjective logic. They define a method for simplifying complex trust networks so they may be expressed in concise form, and use a formal notation for describing trust transitivity and parallel combination of trust paths. The core idea of this work is that trust can be represented as beliefs, and therefore be computed securely with subjective logic. The approach in [7] compares favourably with normalisation approaches such as Google’s PageRank and the EigenTrust algorithm [8]. 2.1.3. Trust on eBay Resnick highlights some relevant points which affect the current eBay reputation system in [14] and [15]. Buyers reputation matter less since they hold the goods until they are paid. Feedback can be affected by the person who makes the first comment, ie: feedback can be reciprocated. Retaliatory feedback and potential for lawsuits are strong disincentives for leaving negative comments. Anonymity is possible in eBay since real names are not revealed and the only thing validated at registration is an email address. Users can choose not to display feedback comments. Also “Unpaid Item” buyers cannot leave feedback [1], and users can agree to mutually withdraw feedback [1]. All of these points help to explain the lack of negative comments on eBay. Of course this does not mean that customers are satisfied. The eBay forums 2 highlight the fact that false advertising does occur on eBay. This should lead to more negative comments, but they are not being displayed. Xiong et al. [17] provide further reasoning for the imbalance of positive comments on eBay. Our proposed model of trust for eBay should provide a more realistic scale than the existing system. 2.2. Comparative Survey of Online Auction Sites A survey was conducted of 10 online auction/retail sites. Full results of this survey are available on the web3 . For each site we asked several specific questions relating to the potential applicability of a trust modelling and propagation module to the system. To assess whether or not our trust propagation system would be applicable to the site, we asked the following questions: • What are the user roles in the system. (Buyer/Seller/Both) • Is there an existing trust value? (y/n) • If so, is it personalized (y/n) • What is the percentage of positive comments? • What are the review types (Product/People/Both) • What are the requirements for making a rating (Purchase/Registration/None) • What types of sale are provided (auction/retail/both) Interestingly, none of the sites surveyed provided any personalization of trust scores. The trust propagation system required that users could be both buyers and sellers, comments were provided on transactions, and that the comments be mostly genuine. For this reason we enquired about the potential for malicious attacking [12] by assessing the cost of making a rating. We found that our application would be deployable on 7/10 of the surveyed sites. Users have the option of performing buyer or seller roles on 8/10 sites. On 6 of the sites, reviews were made about other users, and on 1 site reviews were on products only. 3 sites allowed reviews of both. An important statistic we gathered was that on average user feedback was over 94% positive across all the sites reviewed. This reaffirms the need for a negativity-capturing algorithm if we are to build a new trust 2 http://forums.ebay.com 3 http://www.johnod.net/research.jsp
76
J. O’Donovan et al. / Personalizing Trust in Online Auctions
model. AuctionRules() captures negativity in comments where users are dissatisfied with transactions but still provide positive ratings. This situation can arise for many reasons, such as fear of reciprocal negative comments as described by Resnick in [14]. Section 3.1 details this algorithm. 3. Building a Model of Trust Figure 2 illustrates the process by which we build trust values based on rating data from the eBay site and from corresponding user-evaluations of those comments. Firstly a web crawler was designed to pull information from the live auction site. This system is easily generalisable to a broad range of auction sites, as long as comment information is available. We have a set of interfaces that can crawl data from several popular auction sites. For our experiments in this paper we use a subset of data from eBay. Crawled data is stored in an relational database in the basic form (useri , userj , trust(i,j) ), along with other information such as eBay trust score, number of transactions etc. Some of the comments left by users will not contain much more information than the binary positive or negative rating that the commenter has given already. An example of such a comment is “Great Seller, Thanks!”. This comment does not provide us with any real information surplus to what we know already about this particular transaction. However, many of the comments on eBay do provide us with extra information which can be incorporated into the trust modelling process. Take the following for example: “Product delivered on time, perfect condition, nicely packaged, would buy from again!!”. This comment provides us a wealth of extra information about the seller, such as punctuality of delivery, package and product quality. These are some of the salient features of a trustworthy seller. It would require very advanced natural language processing techniques to fully analyse and understand every user comment on eBay. We have developed a technique for approximating the goodness of a user comment for the purposes of building our trust graph. The following section outlines this technique. More restricted notions of trust are being examined. For example User A may be trusted in the context of book-recommendation, but not cars. Future work will show how a domain-constrained PageRank algorithm can produce trust values with more realistic transitive properties. 3.1. The AuctionRules Algorithm AuctionRules operates under the assumption that people will generally use the same set of terms to express some form of dissatisfaction in their online auction comments. This has become apparent after initial manual examination of online auction comments, and subsequent automated testing. The algorithm captures negativity in comments where users have complained but still marked the comment as positive. This has been shown in [14] to occur a lot in situations where users are afraid of retaliatory negative comments. AuctionRules is a machine learning algorithm. As with most ML techniques, training examples were required for the algorithm to learn. The algorithm works only with words and phrases which explicitly express negativity. Many of the words, expressions and characters in the raw comments were of no value to the learning process, so before training examples were compiled, preprocessing was done to reduce complexity. This algorithm does require some context information, in the form of short lexicons of special terms, and negative words. This information is not taken from, or specific to one site, and does work on any auction site with user feedback, so the context-dependency is very broad and the system widely applicable. An implementation the “porter stemming” algorithm [13] was used to shorten comments. The standard porter stemmer uses many rules for removing suffixes. For example,
J. O’Donovan et al. / Personalizing Trust in Online Auctions
77
Figure 2. Graphical overview of the trust-modelling process. [Current implementation only uses eBay as a source for ratings.]
Figure 3. Stemming, stop-word removal and classification of a sample comment from eBay.
all of the terms in c are conflated to the root term “connect”. c = connect, connected, connecting, connection, connections. This reduces the number of terms and therefore the complexity of the data. The stemmer algorithm was modified to also stem characters not used by AuctionRules, such as “?, !, *, (, )” for example. Data complexity is further reduced by removal of stop-words. Initially, Google’s stopword list4 was removed. This list was embellished with over 50 frequent words manually selected from user comments (on different auction sites) deemed to be of no help in ascertaining negativity. Examples of words in this list are: “Product, Item, Seller, Those, These, Book, Buyer”. In a sample test of 4065 tested terms, there were only 556 unique after stemming/stop word removal. This is one unique term in every 7.3, which is a large overlap. When AuctionRules works in probabilistic mode, it will output the number of 4 http://www.ranks.nl/tools/stopwords.html
78
J. O’Donovan et al. / Personalizing Trust in Online Auctions
bad terms in the comment, which is 3 in this example, over the maximum number of bad terms found in a comment, which was 5 in our tests, meaning that the probability of the example comment being negative is 60%. AuctionRules starts as a majority class predictor over the training data. Negative comments are isolated using lexicon of single negative terms such as “bad”, “awful” etc, and a set of checks for negative prefixes such as “not” before positive terms such as “good”. Each term in the comment is tested individually, with the preceding term where possible. A hit arises when a negative term or phrase is found in the comment. A threshold value is used to determine the number of negative hits required to sway the classifier. Figure 3 shows AuctionRules working on a sample comment from eBay. In this particular example, AuctionRules will always classify the comment as negative because it triggers a special rule. There are a number of such rules used in the algorithm, e.g. when the stemmed term “‘ship” is found with any term from a term-list indicating “costly”, the comment is classified as negative. These rules are manually defined from examining user-comment data over a range of sites. They are completely independent of the collected data. Currently the rules are simple and there are cases that will be misclassified, although these are rare. One such example is: "the ship is anything but bad". The algorithm is currently being improved to handle more complicated cases. To provide training examples for the algorithms we had to manually classify the data. User’s opinions were collected on each comment in an online evaluation. Details of this evaluation are given in the following section. Comments were rated on a scale of 1 to 5 (1 = strong negative). A threshold value of 2 was chosen as the pivotal point for a positive comment by the AuctionRules algorithm. This value was chosen because it produced the highest accuracies when we varied the threshold for a positive comment over different runs of several popular classifiers in pre-testing. Any comment rated 2 or lower was tagged as negative, and the rest positive. Figure 2 also shows a query expansion module, which is un-tested at present. This module queries an online thesaurus5 for a list of synonyms, and expands each individual term AuctionRules is testing to include the list of synonyms. Each word in this new list is then compared to the lexicon of negative (and/or positive) words. When this expansion is used there are many false hits, so the threshold for the number of hits to sway the classifier is greatly increased. Experiments to test this technique in more detail will be explained in a future paper. 3.2. Live User Evaluations of eBay Comments From the user evaluations6 1000 ratings on eBay comments were collected. In this survey, users were asked to rate the positiveness of each comment on a Likert scale of 1 to 5. 10 comments were presented to a user in each session. Each comment was made by different buyers about one seller. Users were required to answer the following: • How positive is the comment (Average rating: 3.8442) • How informative is the comment (Average rating: 3.1377) • Would you buy from this seller (Average rating: 4.0819) In the current implementation of AuctionRules, only results from the first question are used to develop training examples. For future experiments we may incorporate results from the other questions. Permission was sought from eBay inc. to use the information from the eBay website in our experiments. We had to gather the data ourselves by using a specially tailored web crawler. 5 http://thesaurus.reference.com 6 www.johnod.net/Survey1.jsp
J. O’Donovan et al. / Personalizing Trust in Online Auctions
79
3.3. Classification of Comments in WEKA In our experimental evaluation we show results of a range of classification algorithms running on our pre-classified set of feedback comments. We used the Weka corpus [16] of machine learning algorithms to run most of the experiments. To use the Weka classification algorithms from their system, data must be input in “ARFF” format. An ARFFCompiler program was written to automatically feed the comments into the Weka system. Figure 2 shows where this fits into the architecture. Each comment was tagged with its classification from the user evaluation. The header files consisted of a list of attributes used by Weka’s algorithms. The attribute list for our data is a list of comma separated unique terms across the entire set of comments. Details of the Weka experiments compared with AuctionRules are shown in the evaluation. AuctionRules can be set to output binary or probabilistic classifications of comments, based on the number of negative terms. As shown in Figure 2, output from the algorithm is a set of triples of the form: e(i,j) = (useri , userj , trust(i,j) ). This forms our basic units of trust to be used in the propagation mechanisms described in the following section. 4. Representing the Auction as a Trust Graph So far we have explained the AuctionRules algorithm generates a broader scale of trust than the current feedback we see on ebay by drawing on information in user comments. Figure 6.2 in the evaluation section shows this improvement. The new trust values are of the form e(i,j) = (useri , userj , trust(i,j) ). Now we explain how the model generated by AuctionRules enables us to generate personalized trust values. We construct a trust-graph of our subset of the eBay marketplace so that we can find paths between buyers and sellers. There were a number of factors to consider when constructing this graph. The three scenarios in Figure 4 show these considerations. In the graph an arrow represents some unidirectional trust value. On eBay, users can play three different roles: buyer, seller or both. This creates different communication patterns between users with different roles. For example, Resnick explains in [14] that seller trust matters less since they hold the product until payment is made. When a transaction happens on eBay, a buyer leaves a comment on a seller and vice-versa. We could consider only buyers comments on sellers, since the bulk of the hazard in the transaction lies with trusting the sellers [14]. Figure 4 (a) depicts this approach. However, as the diagram shows, there is much less connectivity in the resulting graph. In fact, the only way for a buyer node r can know about a seller p is through a node q that performs both roles, and has interacted with r and p. To overcome this constraint, we assume all nodes to be of type q, that is, they can all be both buyers and sellers. There is no distinction drawn between comments from buyers and those from sellers. This simplification allows much greater connectivity in the trust graph. Every node in the resulting graph will have an even number of edges entering and leaving it. Furthermore, there will be an equal number of edges entering and leaving the node. Figure 4 (b) shows this graphically. In many cases there are multiple transactions between the same buyer and seller, as shown in Figure 4 (c). When this occurs, we take the average value over each linking edge in the graph. 5. Generating Personalized Trust Figure 5 shows the implementation of the personalization stage of our system. An ID value for the user seeking the trust value is passed to the system, along with an ID for the potential seller or buyer that the user is querying. The system queries the database of trust values and returns the two shortest paths between the two nodes. A path between two
80
J. O’Donovan et al. / Personalizing Trust in Online Auctions
Figure 4. Graphical depiction of possible interactions between buyers and sellers in the eBay marketplace
Figure 5. Graphical overview of the trust prediction phase of the system. [Some of the analysis modules are not included in the illustration.]
users is represented in the form P = (usource , u, t), (u, u, t), ......(u, usink , t) Where usource is the user seeking a trust value t on user usink . The system then combines the trust values along these paths in four different ways. These combinations are given below. The system can be set to use any of the four techniques for combining trust scores along the paths. • weightedDistance - The average trust score over all the edges in the shortest path, discounted by the distance from the source. • meanPath - The average trust score over all the edges in the shortest path between the source node and target node. • twoPathMean - The average of the meanPath of the shortest path in the graph and the meanPath of the second shortest path. • SHMPath - The simple harmonic mean of the trust scores over all edges in the shortest path. 6. Experimental Evaluation Ideally our data should have a high level of overlap between buyers and sellers, as well as rich textual comments on each transaction made. eBay provided a structured way for users to leave good textual comments on their sales and purchases. This domain has millions of users so we chose a subset which had very high overlap amongst its users. This subset was the purchase and sales of Egyptian Antiques, because there were a large number of users who played the roles of both buyer and seller. In cases where the trust
J. O’Donovan et al. / Personalizing Trust in Online Auctions
81
Figure 6. Classification Accuracy [classification distribution from user evaluations: 36% positive, 63% negative, using a threshold of 4 or higher for a positive comment.]
graph sparse, or not connected at all, the system can present trust for a node simply by averaging the incoming trust values for that node. Initially we crawled over 10,000 comments from the eBay site. For the following experiments, we used only the set of comments which were rated by real people in our user evaluations. This is a set of 1000 classified user comments. From this set we found that on average there were 5.08 terms per comment, the max number of terms in a comment was 16. The set consists of entries from 313 buyers on 14 sellers. 6.1. Preliminary Experiment 1: Comparing Classifier Accuracy for Computing Trust We needed to assess how well the system extracted trust values from raw textual comments. To examine this empirically, the classification accuracy of the AuctionRules algorithm was tested against 7 popular algorithms. We chose three rule-based learners, Zero-r, One-r and Decision Table, a tree learner C4.5 rules, two Bayes learners, Naive Bayes and BayesNet, and a lazy learning algorithm K-Star. Figure 6 shows results of this experiment. For each algorithm we performed three runs. a 60:40 train-test split, an 80:20 split, and a 10-fold cross validation of the training set, which randomly selects a training set from the data over 10 runs of the classifier and averages the result. In the experiment each algorithm made a prediction for every value in the test set, and this prediction was compared against the training set. AuctionRules beat all of the other classifiers in every test we performed, achieving over 90% accuracy in all of the evaluations, 97.5% in the 80:20 test, beating the worst performer K-Star by 17.5%, (relative 21.2%) and it’s closest competitor Naive Bayes by 10.5%, giving a relative accuracy increase of 12.7%. In addition to numerical accuracy, we examined where the high accuracy results were coming from more closely by assessing the confusion matrix output by the algorithms. This was necessary since prediction of false negatives would have an adverse effect on the resulting trust graph. This phenomenon has been discussed by Massa in [10] with respect to the Moleskiing application, and Golbeck in [4] with respect to the TrustMail application. Table 1 shows AuctionRules outperforming all of the other algorithms by predicting no false negatives. All of the algorithms displayed similar trend to the ones in Table 1, which shows results of the 80:20 classification experiment which had a test set of 234 comments.
82
J. O’Donovan et al. / Personalizing Trust in Online Auctions
+’ve -’ve
AuctionRules +’ve -’ve
NaiveBayes +’ve -’ve
Decision Table +’ve -’ve
One-r +’ve -’ve
91.4 4.7
84.1 11.1
84.6 12.3
77.3 8.5
0 4.7
1.2 2.9
1.2 1.7
8.1 5.9
Table 1. Confusion matrices showing percentage true and false negatives and positives for four of the algorithms tested. [All of the other algorithms had similar results to the ones displayed.]
6.2. Preliminary Experiment 2: Trust Distributions As we mentioned in the introduction, there are too many positive comments on online auction sites such as eBay. Our AuctionRules algorithm performs accurate classifications when compared against classifications of real people. Now we must ask what effect this will have on the resultant trust values, especially in comparison with the existing values on eBay. Figure 6.2 shows the distributions of trust values produced by AuctionRules compared to the existing eBay trust values, and with what we believe, based on our distribution research in [11] to be the ideal standard for trust distribution in an online auction, where the model can isolate small numbers of highly trustworthy and similarly highly untrustworthy users. AuctionRules was set to output probabilistic classifications which were used as trust values for the distribution graphs. This was done by counting the number of negative terms in a comment and then dividing by the max numFigure 6.2: Comparison of distribuber of negative terms found in a comment to protions between current eBay trust valduce normalized values. Figure 6.2 shows that the ues and AuctionRules generated valvast majority of the computed trust scores are still ues. highly positive. This may not look like a large improvement at first glance, but considering that most of the comments in our data genuinely were positive, and that AuctionRules takes about 16.6% of the eBay trust values from the 83-100 bracket and distributes them across the entire scale, we can see that this is a positive result for the algorithm. 6.3. Proposed Experiment: Accuracy of the trust propagation model This experiment is designed to test accuracy of the propagated trust values, i.e. ability of the system to use the trust graph to predict a trust value for a new node. On eBay, this equates to the systems ability to provide a personalized trust value of a seller to a potential buyer who has no previous interaction with that seller. In this paper we are only mentioning this ongoing experiment, in which we temporarily remove each edge in the trust graph and use our prediction techniques from Figure 5 make trust predictions for the missing edge. The full set of results from this experiment will be presented in a later paper. 7. Conclusions and Future Work We have proposed that the current trust scoring system on eBay is massively biased towards positive comments, which tend to generate more revenue for the system. [1]. We have discussed related publications that back up this statement. [14][15]. As an alternative/addition to the current systems we have introduced our ideas for a trust modelling system for online auctions, using eBay as an example. In our proposed system, numerical trust values are mined directly from user comments using AuctionRules, a new classification algorithm which captures negativity in user comments. Our
J. O’Donovan et al. / Personalizing Trust in Online Auctions
83
system also facilitates the propagation of the trust values throughout the social network that is formed by the eBay marketplace. Propagation of trust values allows us to compute a personalized trust score on a prospective seller to be presented to a buyer. In our background work we have carried out a comparative survey of ten popular online auction sites, which determined that our application would be deployable on seven of the ten sites. In our evaluation section we outlined two initial experiments to test the validity of our system. Firstly to test the accuracy of the trust mining algorithm, and a second experiment to examine the relative distributions of the existing trust rating system against our computed trust values. Results show a more realistic distribution using the AuctionRules values, and consistent improvements of up to 21% over seven popular classification algorithms. AuctionRules also produces a false negative rating of 0% compared with 8.1% from other tested algorithms. Ongoing research includes testing the accuracy and coverage of the trust-propagation mechanisms, testing the query expansion module, incorporating new data from other auction sites, and refining the AuctionRules algorithm to produce higher accuracy. 8. Acknowledgements This material is based on works supported by Science Foundation Ireland under Grant No. 03/IN.3/I361, and in part by the Integrated Media Systems Center, a National Science Foundation Engineering Research Center, Cooperative Agreement No. EEC-9529152.
References [1] Brian Burke. Ebay town-hall meeting. Transcript from Senior Manager. eBay Marketplace Rules and Policy, eBay Inc, San Jose, CA. Feb 28th 2006. available at http://pics.ebay.com/aw/pics/commdev/TownHallTranscript_022806.pdf. [2] Bruce Christianson and William S. Harbison. Why isn’t trust transitive? In Proceedings of the International Workshop on Security Protocols, pages 171–176, London, UK, 1997. Springer-Verlag. [3] eBay Incorporated. Fourth quarter and full year financial results for 2005. Press release, San Jose, CA. Jan 18 2006. available at http://investor.ebay.com/news/Q405/EBAY0118-123321.pdf. [4] Jennifer Golbeck and James Hendler. Accuracy of metrics for inferring trust and reputation in semantic web-based social networks. In Proceedings of EKAW’04, pages LNAI 2416, p. 278 ff., 2004. [5] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust, 2004. [6] Audun Jøsang, Elizabeth Gray, and Michael Kinateder. Analysing Topologies of Transitive Trust. In Theo Dimitrakos and Fabio Martinelli, editors, Proceedings of the First International Workshop on Formal Aspects in Security and Trust (FAST2003), pages 9–22, Pisa, Italy, September 2003. [7] Audun Jøsang, Elizabeth Gray, and Michael Kinateder. Simplification and Analysis of Transitive Trust Networks. Web Intelligence and Agent Systems: An International Journal, pages 1–1, September 2005. ISBN ISSN: 1570-1263. [8] Sepandar Kamvar, Mario Schlosser, and Hector Garcia-Molina. The EigenTrust Algorithm for Reputation Management in P2P Networks. In Proceedings of WWW2003. ACM, 2003. [9] S. Marsh. Formalising trust as a computational concept. Ph.D. Thesis. Department of Mathematics and Computer Science, University of Stirling,1994. [10] Paolo Massa and Bobby Bhattacharjee. Using trust in recommender systems: an experimental analysis. Proceedings of 2nd International Conference on Trust Managment, Oxford, England, pages 221–235, 2004. [11] John O’Donovan and Barry Smyth. Trust in recommender systems. In Proceedings of the 10th International Conference on Intelligent User Interfaces, pages 167–174. ACM Press, 2005. [12] John O’Donovan and Barry Smyth. Is trust robust? an analysis of trust-based recommendation. In Proceedings of the 11th International Conference on Intelligent User Interfaces, pages 101–108. ACM Press, 2006. [13] M. F. Porter. An algorithm for suffix stripping. Readings in information retrieval, pages 313–316, 1997. [14] Paul Resnick and Richard Zeckhauser. Trust among strangers in internet transactions: Empirical analysis of ebay’s reputation system. The Economics of the Internet and E-Commerce. Volume 11 of Advances in Applied Microeconomics., December 2002. [15] Paul Resnick, Richard Zeckhauser, Eric Friedman, and Ko Kuwabara. Reputation systems. Communications of the ACM, December 2000. Vol. 43, No. 12. [16] Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques, 2nd Edition. Morgan Kaufmann, San Francisco, 2005. [17] Li Xiong and Ling Liu. Building trust in decentralized peer-to-peer electronic communities. In Fifth International Conference on Electronic Commerce Research (ICECR-5), 2002.
84
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
An Hybrid Soft Computing Approach for Automated Computer Design Alessandro G. Di Nuovo a,1 , Maurizio Palesi a and Davide Patti a a Dipartimento di Ingegneria Informatica e delle Telecomunicazioni Università degli Studi di Catania Abstract. In this paper we present an intelligent approach for Computer Aided Design, that is capable to learn from its experience in order to speedup the design process. The proposed approach integrates two well known soft-computing techniques, Multi-Objective Genetic Algorithms (MOGAs) and Fuzzy Systems (FSs): MOGA smartly explores the design space, in the meanwhile the FS learn from the experience accumulated during the MOGA evolution, storing knowledge in fuzzy rules. The joined rules build the Knowledge Base through which the integrated system quickly predict the results of complex simulations thus avoiding their long execution times. The methodology is applied to a real case study and evaluated in terms of both efficiency and accuracy, demonstrating the superiority of the intelligent approach against brute force random search. Keywords. Genetic Fuzzy Systems, Computer Aided Design, Multi-objective optimization, Design Space Exploration.
1. Introduction In this paper we introduce a methodology which uses Artificial Intelligence (AI) techniques to attack a very complex and time-consuming process named computer system design. The shrinking in the time to market has led to design methodologies which stress design reuse by exploiting a new design paradigm known as Platform Based Design (PBD) [6]. When a PBD methodology is used, the platform is the pillar around which the overall design activity focuses on. The embedded systems market is without doubt the largest and most significant application area for PBD [2]. There are basically two reasons for its success. The first is the shorter lifecycle for products based on embedded systems (especially for mainstream consumer products), which has led to increased competition between manufacturers. The second is the constant increase in the number, complexity and heterogeneous nature of the functions these products have to offer. Cell phones, for instance, now provide many functions which go beyond their core function, such as Web browsing capabilities, personal digital assistant functions, short message services, and even gaming. The reduction 1 Correspondence to: Alessandro G. Di Nuovo, Viale A. Doria 6, 95125 Catania, Italy. Tel.: +39 095 738 2353; E-mail: [email protected]
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
85
in the time-to-market has also made it unfeasible to design a processor from scratch for a specific application. On the other hand, the design of an embedded system is applicationspecific and so the use of general-purpose microprocessors is often not only inappropriate but also unfeasible in terms of performance, cost, power, etc.. A platform is a predesigned computing system, typically integrating a parameterized microprocessor, a parameterized memory hierarchy, parameterized interconnect buses, and parameterized peripherals. Such systems, also known system-on-a-chip (SoC) platforms, must be general enough to be used across several different applications, in order to be economically viable. Different applications often have very different power and performance requirements. Therefore, these parameterized SoC platforms must be optimally configured to meet varied power and performance requirements of a large class of applications. Variations in parameters have a considerable impact on the performance indexes being optimized (such as performance, power consumption, area, etc.). Defining strategies to “tune” parameters so as to establish the optimal configuration for a system is a challenge known as Design Space Exploration (DSE). Obviously, it is computationally unfeasible to use an exhaustive exploration strategy, since the size of the design space grows as the product of the cardinalities of the variation sets for each parameter. In addition, evaluation of a single configuration almost always requires the use of simulators or analytical models which are often highly complex. Another problem is that the objectives being optimized are often conflicting. The result of the exploration will therefore not be a single solution but a set of tradeoffs which make up the Pareto set. Most contribution to DSE to be found in the literature address the problem only from the point of view of parameter tuning [8,4,7]. Artificial Intelligence (AI) has found application in various VLSI design environments [9]. In VLSI design synthesis a Knowledge Based Expert System approach provides a framework for organizing solutions to problems that were solved by experts using large amounts of domain-specific knowledge. In problems relating to high-level synthesis such as scheduling [10], design optimization [12], in reliable chip testing through efficient test vector generation [14] and so on. In this paper we propose an approach which tackles the DSE problem on two fronts: parameter tuning and reducing the time required to evaluate system configurations. To achieve this, we propose the use of a Genetic Fuzzy System to increase efficiency or, with the same level of efficiency, improve the accuracy of any DSE strategy. We propose the use of a genetic algorithm for heuristic exploration and a fuzzy system as an evaluation tool. The methodology proposed is applied to exploration of the design space of a parameterized SoC platform based on a VLIW processor. The high degree of parametrization that these platforms feature, combined with the heterogeneous nature of the parameters being investigated, both hardware (architectural, micro-architectural and technology-dependent parameters) and software (compilation strategies and application parameters), demonstrates the scalability of the approach. The rest of the paper is organized as follows. A formal statement of the problem is given in Section 2. Section 3 is a general description of our proposal. In Section 4 the methodology is applied to a real case study and evaluated in terms of both efficiency and accuracy. Finally Section 5 summarizes our contribution and outlines some directions for future work.
86
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
2. Formulation of the Problem Although the methodology we propose is applied to and evaluated on a specific case study (optimization of a highly parameterized VLIW-based SoC platform), it is widely applicable. For this reason, in this section we will provide a general formulation of the problem of Design Space Exploration. Let S be a parameterized system with n parameters. The generic parameter pi , i ∈ {1, 2, . . . , n} can take any value in the set Vi . A configuration c of the system S is a n-tuple v1 , v2 , . . . , vn in which vi ∈ Vi is the value fixed for the parameter pi . The configuration space (or design space) of S [which we will indicate as C(S)] is the complete range of possible configurations [C(S) = V1 × V2 × . . . × Vn ]. Naturally not all the configurations of C(S) can really be mapped on S. We will call the set of configurations that can be physically mapped on S the feasible configuration space of S [and indicate it as C ∗ (S)]. Let m be the number of objectives to be optimized (e.g. power, cost, performance, etc.). An evaluation function E : C ∗ (S) × B −→ ℜm is a function that associates each feasible configuration of S with an m-tuple of values corresponding to the objectives to be optimized when any application belonging to the set of benchmarks B is executed. Given a system S, an application b ∈ B and two configurations c′ , c′′ ∈ C ∗ (S), c′ is said to dominate (or eclipse) c′′ , and is indicated as c′ ≻ c′′ , if given o′ = E(c′ , b) and o′′ = E(c′′ , b) it results that o′ ≤ o′′ and o′ = o′′ . Where vector comparisons are interpreted component-wise and are true only if all of the individual comparisons are true (o′i ≤ o′′i ∀ i = 1, 2, . . . , m). The Pareto-optimal set of S for the application b is the set: P(S, b) = {c ∈ C ∗ (S) : ∄ c′ ∈ C ∗ (S), c′ ≻ c} that is, the set of configurations c ∈ C ∗ (S) not dominated by any other configuration. Pareto-optimal configurations are configurations belonging to the Pareto-optimal set and the Pareto-optimal front is the image of the Pareto-optimal configurations, i.e. the set: PF (S, b) = {o : o = E(c, b), c ∈ P(S, b)} The aim of the paper is to define a Design Space Exploration (DSE) strategy that will give a good approximation of the Pareto-optimal front for a system S and an application b, visiting (evaluating) as few configurations as possible.
3. The Genetic Fuzzy Approach for Intelligent Design Space Exploration In [3,4] has been shown how the use of Multi-Objective Genetic Algorithms (MOGAs) to tackle the problem of DSE gives optimal solutions in terms of both accuracy and efficiency as compared with the state of the art in exploration algorithms. Unfortunately, MOGA exploration may still be expensive when a single simulation requires a long compilation and/or execution time. Here we present an intelligent extension of those method which can achieve better results using less computational time. Table 1 shows the total time (compilation + execution) needed for one simulation of some multimedia benchmarks on a Pentium IV Xeon 2.8 GHz Linux Workstation. By a little multiplication we
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
87
Table 1. Total simulation time Benchmark ieee810
Description
Avg time (s)
IEEE-1180 reference inverse DCT
37.5
adpcm-encode
Adaptive Differential Pulse Code Modulation speech encoding
22.6
adpcm-decode
Adaptive Differential Pulse Code Modulation speech decoding
20.2
MPEG-2 video bitstream decoding
113.7
mpeg2dec
can notice that a few thousands of simulations (just a drop in the immense ocean of feasible configurations) could last from a day to weeks! The primary goal of this work was to create a new approach which could run as few simulations as possible without affecting the very good performance of the MOGA approach. For this reason we started to develop an intelligent MOGA approach which has the ability to avoid the simulation of configurations that it assumes are not good enough to enter the Pareto-set and give them fitness values according to a fast estimation of the objectives. This feature was implemented using a Fuzzy System (FS) to approximate the unknown function from configuration space to objective space. The approach could be described as follows: the MOGA evolves normally; in the meanwhile the FS learns from simulations until it becomes expert and reliable. From this moment on the MOGA stops launching simulations and uses the FS to estimate the objectives. Only if the estimated objective values are good enough to enter the Pareto-set will the associated configuration be simulated. We chose a Fuzzy System as an approximator above all because it has cheap additional computational time requirements for the learning process, which are negligible as compared with simulation time. More complex methods, like Neural Networks, need an expensive learning process which could heavily affect time savings without reasonable performance improvements. Algorithms 1 and 2 explain how the proposed approach evaluates a configuration suggested by the MOGA. The MOGA adopted in our applications is SPEA2 [19] and is described in subsection 3.1, while fuzzy function approximation is briefly introduced in subsection 3.2. Figure 1 shows the general flow of the proposed DSE methodology. Algorithm 1 Evaluation of a configuration. Require: c ∈ C Ensure: power, ex_time if F easible(c) = true then ConfigurePlatform(c) results = RunSimulation() power = PowerEstimation(results) ex_time = ExTimeEstimation(results) else power = ex_time = ∞ end if
88
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
Algorithm 2 RunSimulation(). if FuzzyApproximatorReliable() == true then results = FuzzyEstimation(c) if IsGoodForPareto(results) == true then results = SimulatePlatform(c) FuzzyApproximatorLearn(c, results) end if else results = SimulatePlatform(c) FuzzyApproximatorLearn(c, results) end if
Figure 1. Exploration flow diagram.
The FuzzyApproximatorReliable() function, which establishes whether the Fuzzy System is reliable for estimating results, could be implemented in three ways: 1. The FS is reliable when a certain number of examples were used in the learning process. In this way it is possible to define the number of simulations to be made a priori. 2. The FS is reliable when the Fuzzy System performance is higher than a prefixed threshold. In this way better approximation performance can be achieved but it is not possible to know a priori how many simulations will be made. Fuzzy System performance could be tracked by using indexes such as the average error, the maximum error and/or the maximum average errors. 3. A combination of ways 1 and 2. For example, we could set a minimum and maximum number of simulations to be done in the learning phase and a performance threshold to stop the learning phase before the maximum number of simulations is reached but after the minimum number of simulations have been run.
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
89
In this work the third way was chosen in order test the effectiveness of the more general and intelligent approach. 3.1. Multi-Objective Genetic Algorithm (MOGA) Multi-objective optimization is an area in which evolutionary algorithms have achieved great success. Most real-world problems involve several objectives (or criteria) to be optimized simultaneously, but a single, perfect solution seldom exists for a multi-objective problem. Due to the conflicting nature of at least some of the objectives, only compromise solutions may exist, where improvement in some objectives must always be tradedoff against degradation in other objectives. Such solutions are called Pareto-optimal solutions, and there may be many of them for any given problem. For this work we chose SPEA2 [19], which is very effective in sampling from along the entire Pareto-optimal front and distributing the solutions generated over the trade-off surface. SPEA2 is an elitist multi-objective evolutionary algorithm which incorporates a fine-grained fitness assignment strategy, a density estimation technique, and an enhanced archive truncation method. The representation of a configuration can be mapped on a chromosome whose genes define the parameters of the system. Using the symbols introduced in Section 2, the gene coding the parameter pi can only take the values belonging to the set Vi . The chromosome of the MOGA will then be defined with as many genes as there are free parameters and each gene will be coded according to the set of values it can take. For each objective to be optimized it is necessary to define the respective measurement functions. These functions, which we will call objective functions, frequently represent cost functions to be minimized (e.g. area, power, delay, etc.). Crossover (recombination) and mutation operators produce the offspring. In our specific case, the mutation operator randomly modifies the value of a parameter chosen at random. The crossover between two configuration exchanges the value of two parameters chosen at random. Application of these operators may generate non-valid configurations (i.e. ones that cannot be mapped on the system). Although it is possible to define the operators in such a way that they will always give feasible configurations, or to define recovery functions, these have not been taken into consideration in the paper. Any unfeasible configurations are filtered by the feasible function fF : C −→ {true, f alse} that assigns a generic configuration c a value of true if it is feasible and f alse if c cannot be mapped onto S. A stop criterion based on convergence makes it possible to stop iterations when there is no longer any appreciable improvement in the Pareto sets found. 3.2. Fuzzy Function Approximation The capabilities of fuzzy systems as universal function approximators have been systematically investigated. We recall, for example [18,16] where the authors demonstrated that a fuzzy system was capable of approximating any real function to arbitrary accuracy and they proved that fuzzy systems perform better than neural networks without an expensive learning process. In our approach we used the well-known Wang and Mendel method [15], which consists of five steps: Step 1 divides the input and output space of the given numerical
90
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
data into fuzzy regions; Step 2 generates fuzzy rules from the given data; Step 3 assigns a degree to each of the generated rules for the purpose of resolving conflicts among them; Step 4 creates a combined fuzzy rule base based on both the generated rules and linguistic rules provided by human experts; Step 5 determines a mapping from the input space to the output space based on the combined fuzzy rule base using a defuzzifying procedure. From Step 1 to 5 it is evident that this method is simple and straightforward, in the sense that it is a one-pass buildup procedure that does not require time-consuming training. In our implementation the output space could not be divided in Step 1, because we had no information about boundaries. For this reason we used fuzzy rules which have as consequents a real number sj associated with all the M outputs: if x1 is S1 and . . . and xN is SN then y1 = s1 , . . . , ym = sM where Si are the fuzzy sets associated with the N inputs, which in our implementation are described by Gaussian functions which intersect at a fuzzy degree of 0.5. The choice of the Gaussian function was due to the better performance the fuzzy system gave in our preliminary tests as compared to that with classical triangular sets. In this work the fuzzy rules were generated from data as follows: for each of the N inputs (xi ) the fuzzy set Si with the greatest degree of truth out of those belonging to the term set of the i-th input is selected. After constructing the set of antecedents the consequent values yj equal to the values of the outputs are associated. The rule is then assigned a degree equal to the product of the N highest degrees of truth associated with the fuzzy sets chosen Si . Let us assume, for example, that we are given a set of two input - one output data pairs: (x1 , x2 ; y), and a total of four fuzzy sets (respectively LOW1 , HIGH1 and LOW2 , HIGH2 ) associated with the two inputs. Let us also assume that x1 has a degree of 0.8 in LOW1 and 0.2 in HIGH1 , and x2 has a degree of 0.4 in LOW2 and 0.6 in HIGH2 , y = 25.75. As can be seen from Fig. 2, the fuzzy sets with the highest degree of truth are LOW1 and HIGH2 , so the rule generated would be: if x1 is LOW1 and x2 is HIGH2 then y = 25.75. The rule degree is 0.8 × 0.6 = 0.48.
Figure 2. Fuzzy Rule Generation Example.
The rules generated in this way are "and" rules, i.e., rules in which the condition of the IF part must be met simultaneously in order for the result of the THEN part to occur. Steps 2 to 4 are iterated with the Genetic Algorithm: after every simulation a fuzzy rule is created and inserted into the rule base, according to its degree if there are conflicts. The defuzzifying procedure chosen for Step 5 was, as suggested in [15], the weighted sum of the values (¯ yi ) estimated by the K rules with degree of truth (mi ) of the pattern to be estimated as weight :
y˙ =
K
mi y¯i
i=1 K
i=1
mi
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
91
4. Experiments and Results Using the framework described in subsection 4.1 we conducted an exploration of the low-power/high-performance design space for a set of typical media and communication applications from multimedia bench [11]. To assess the performance of the proposed approach we used some quality measures which are described in subsection 4.2. 4.1. Simulation Framework To evaluate and compare the performance indexes of different architectures for a specific application, one needs to simulate the architecture running the code of the application. In addition, to make architectural exploration possible both the compiler and the simulator have to be retargetable. Trimaran [1] provides these tools and thus represents the pillar around which the EPIC-Explorer was constructed [5] to provide a framework that allows to evaluate any instance of the architecture in terms of area, performance and power. The EPIC-Explorer platform, which can be freely downloadable from the Internet [13], allows the designer to evaluate any application written in C and compiled for any instance of the platform, for this reason it is an excellent testbed for comparison between different design space exploration algorithms. The tunable parameters of the architecture can be classified in three main categories: • Registers files. Each register file is parameterized with respect to the number of registers it contains. These register files include: a set of 32-bit general purpose registers (GPR), a set of 64-bit floating point registers (FPR), a set predicate registers comprising 1-bit registers used to store the Boolean values (PR) and a set of 64-bit branch target registers containing information about possible future branches (BTR). • Functional units. Four different types of functional units are available: integer, floating point, memory and branch. Here parametrization regards the number of instances for each unit. • Memory sub-system. Each of the three caches, level 1 data cache, level 1 instruction cache, and level 2 unified cache, is independently specified with the following parameters: size, block size and associativity. Each of these parameters can be assigned a value from a finite set of values. A complete assignment of values to all the parameters is a configuration. A complete collection of all possible configurations is the configuration space, (also known as the design space). A configuration of the system generates an instance that is simulated and evaluated for a specific application. Together with the configuration of the system, the statistics produced by simulation contain all the information needed to apply the area, performance and power consumption estimation models. The results obtained by these models are the input for the exploration strategy, the aim of which is to modify the parameters of the configuration so as to minimize the three objectives. 4.2. Quality Assessment of Pareto set approximations It is difficult to define appropriate quality measures for Pareto set approximations. Nevertheless, quality measures are necessary in order to compare the outcomes of multi-
92
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design Table 2. Parameters Space and associated Fuzzy Sets. Parameter
Parameter space
N. of Fuzzy Sets
GPR / FPR PR / CR BTR
16,32,64,128 32,64,128 8,12,16
3 3 3
Integer/Float Units Memory/Branch Units
1,2,3,4,5,6 1,2,3,4
5 3
L1D/I cache size L1D/I cache block size
1KB,2KB,...,128KB 32B,64B,128B
9 3
L1D/I cache associativity
1,2,4
3
L2U cache size
32KB,64KB...,512KB
9
L2U cache block size L2U cache associativity
64B,128B,256B 2,4,8,16
3 3
Space size
7.7397 × 1010
2.9 × 1010
objective optimizers in a quantitative manner, and several quality measures have been proposed in the literature, a review of these is to be found in [20]. We chose the one we considered most suitable for our context: the Pareto Dominance Index. The value this index takes is equal to the ratio between the total number of points in Pareto set P1 and the number of points in Pareto set P2 that are also present in a reference Pareto set R. In our comparisons there was no absolute reference set, so we have used in its stead the Pareto set deriving from the union between the set obtained via the MOGA approach and that obtained with the MOGA-Fuzzy approach. In this way we were able to make a more direct comparison between the two approaches. In this case a higher value obviously corresponds to a better Pareto set. 4.3. Numerical Results In this subsection we present a comparison between the performance of the random search, and our new MOGA-Fuzzy approach. The internal and external population for the genetic algorithm were set as 30 individuals, using a crossover probability of 0.8 and a mutation probability of 0.1. These values were set following an extended tuning phase. The convergence times and accuracy of the results were evaluated with various crossover and mutation probabilities, and it was observed that the performance of the algorithm with the various benchmarks was very similar. That is, the optimal MOGA parameters seem to depend much more on the architecture of the system than on the application. This makes it reasonable to assume that the MOGA parameters tuning phase only needs to be performed once (possibly on a significant set of applications). The MOGA parameters thus obtained can then be used to explore the same platform for different applications. The eighteen input parameters, the parameter space and the number of fuzzy sets associated with them are listed in Table 2. As can be seen from Table 2, the size of the configuration space is such that to be able to explore all the configurations in a 365-day year a simulation need to last about 3 ms, a value which is several orders of magnitude away from the time usually needed by a simulation, which could last minutes. A whole human lifetime would not, in fact, be long enough to obtain complete results for an exhaustive exploration of any of
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
93
Table 3. A comparison between Random and MOGA-Fuzzy Pareto sets Benchmark
Simulations Random MOGA-Fuzzy
Pareto Dominance Random MOGA-Fuzzy
adpcm-encode
7500
862
0.6190
0.7353
adpcm-decode
7500
965
0.6857
0.8333
g721-encode
7500
807
0.5592
0.7135
mpeg2-dec
5000
750
0.5271
0.7278
the existing benchmarks, even using the most powerful PC currently available. It is, of course, possible to turn to High Performance Computing systems, but this is extremely expensive and would still require a very long time. Table 2 also gives the number of fuzzy sets associated with each parameter. This number was obtained by means of a GA in a series of preliminary tests. Table 3 and Fig. 3 show that after 50 generations the MOGA-Fuzzy approach yields a Pareto set which in many points dominates the set provide by a longer random search, thus being of more benefit to the designer. This is numerically expressed by the higher Pareto Dominance value which tell us that this set is qualitatively better. Another interesting feature of the approach proposed is that at the end of the genetic evolution we obtain a fuzzy system that can approximate any configuration. The designer can exploit this to conduct a more in-depth exploration. The average estimation errors for the fuzzy system obtained by MOGA-Fuzzy after 100 generations on a set of 10,000 configurations randomly chosen, other than those used in the learning phase, is from 4% to 9% for each objective and despite the great number of rules the time needed for rule learning and estimate in a Pentium IV 2.8 GHz workstation is several orders of magnitude shorter than that of a simulation, in fact average learn time is about 1 ms and estimate time is about 2 ms.
58 GA−Fuzzy Random 56
54
ms
52
50
48
46
44 1.6
1.65
1.7
1.75
1.8
1.85
W
Figure 3. A comparison between the Pareto sets obtained for the adpcm-dec benchmark by MOGA-Fuzzy after 50 generations and 965 simulations, and Random search with 7500 simulations.
94
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
5. Conclusion In this paper we have presented a new Intelligent Genetic Fuzzy approach to speed up Design Space Exploration. The speedup is achieved thanks to the ability of the Genetic Fuzzy System to learn from its experience in order to avoid unnecessary simulations. A comparison with the random search approach performed on various multimedia benchmarks showed that integration with the fuzzy system saves a great amount of time and gives better results. Despite the excellent results obtained, we are currently working on improving the algorithm even further. The use of a hierarchical fuzzy system [17], for example, could prove to be suitable to improve its approximation performance and thus the overall performance of the approach. Further developments may involve the use of acquired knowledge to create a set of generic linguistic rules to speed up the learning phase, providing an aid for designers and a basis for teaching.
References [1] An infrastructure for research in instruction-level parallelism. http://www.trimaran. org/. [2] World semiconductor trade statistics bluebook. http://www.wsts.org/, 2003. [3] Giuseppe Ascia, Vincenzo Catania, and Maurizio Palesi, ‘A GA based design space exploration framework for parameterized system-on-a-chip platforms’, IEEE Transactions on Evolutionary Computation, 8(4), 329–346, (August 2004). [4] Giuseppe Ascia, Vincenzo Catania, and Maurizio Palesi, ‘A multi-objective genetic approach for system-level exploration in parameterized systems-on-a-chip’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(4), 635–645, (April 2005). [5] Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi, and Davide Patti, ‘EPIC-Explorer: A parameterized VLIW-based platform framework for design space exploration’, in First Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia), pp. 65–72, Newport Beach, California, USA, (October 3–4 2003). [6] Henry Chang, Larry Cooke, Merrill Hunt, Grant Martin, Andrew McNelly, and Lee Todd, Surviving the SOC Revolution A Guide to Platform-Based Design, Kluwer Academic Publishers, 1999. [7] William Fornaciari, Donatella Sciuto, Cristina Silvano, and Vittorio Zaccaria, ‘A sensitivitybased design space exploration methodology for embedded systems’, Design Automation for Embedded Systems, 7, 7–33, (2002). [8] Tony Givargis, Frank Vahid, and Jörg Henkel, ‘System-level exploration for Pareto-optimal configurations in parameterized System-on-a-Chip’, IEEE Transactions on Very Large Scale Integration Systems, 10(2), 416–422, (August 2002). [9] Thaddeus J. Kowalski, An Artificial Intelligence Approach to VLSI Design, Kluwer Academic Publishers, Norwell, MA, USA, 1985. [10] David Ku and Giovanni De Micheli, ‘Relative scheduling under timing constraints’, in ACM/IEEE Conference on Design Automation, pp. 59–64, Orlando, Florida, United States, (1990). [11] Chunho Lee, Miodrag Potkonjak, and William H. Mangione-Smith, ‘MediaBench: A tool for evaluating and synthesizing multimedia and communications systems’, in International Symposium on Microarchitecture, (December 1997). [12] Chun Wai Liew, ‘Using feedback to improve VLSI designs’, IEEE Expert: Intelligent Systems and Their Applications, 12(1), 67–73, (1997). [13] Davide Patti and Maurizio Palesi. EPIC-Explorer. http://epic-explorer. sourceforge.net/, July 2003.
A.G. Di Nuovo et al. / An Hybrid Soft Computing Approach for Automated Computer Design
95
[14] Narinder Singh, An Artificial Intelligence Approach to Test Generation, Springer, 1990. [15] Li-Xin Wang and Jerry M. Mendel, ‘Generating fuzzy rules by learning from examples’, IEEE Transactions on System, Man and Cybernetics, 22, 1414–1427, (1992). [16] Ke Zeng, Nai-Yao Zhang, and Wen-Li Xu, ‘A comparative study on sufficient conditions for takagi-sugeno fuzzy systems as universal approximators’, IEEE Transactions on Fuzzy Systems, 8(6), 773–778, (December 2000). [17] Xiao-Jun Zeng and John A. Keane, ‘Approximation capabilities of hierarchical fuzzy systems’, IEEE Transactions on Fuzzy Systems, 13(5), 659–672, (October 2005). [18] Xiao-Jun Zeng and Madan G. Singh, ‘Approximation accuracy analisys of fuzzy systems as function approximators’, IEEE Transactions on Fuzzy Systems, 4, 44–63, (February 1996). [19] Eckart Zitzler, Marco Laumanns, and Lothar Thiele, ‘SPEA2: Improving the performance of the strength pareto evolutionary algorithm’, in EUROGEN 2001. Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, pp. 95–100, Athens, Greece, (September 2001). [20] Eckart Zitzler, Lothar Thiele, Marco Laumanns, Carlos M. Fonseca, and Viviane Grunert da Fonseca, ‘Performance assessment of multiobjective optimizers: An analysis and review’, IEEE Transactions on Evolutionary Computation, 7(2), 117–132, (April 2003).
96
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
FUNEUS: A Neurofuzzy Approach Based on Fuzzy Adaline Neurons a
Constantinos KOUTSOJANNISa & Ioannis HATZILYGEROUDISa Department of Computer Engineering & Informatics, School of Engineering, University of Patras, Hellas (Greece) Abstract.
Today hybrid computing is a popular framework for solving complex problems. If we have knowledge expressed in rules, we can build an Expert System, and if we have data, or can learn from stimulation (training) then we can use Artificial Neural Networks. In this paper we present the FUzzy NEUrule System (FUNEUS) which is a Neuro Fuzzy approach based on fuzzy Adaline neurons and uses Differential Evolution for optimization of membership functions. According to previous Neuro-fuzzy approaches and a well-defined hybrid system HYMES, FUNEUS is an attempt to the direction for integration of neural and fuzzy components with Differential evolution. Despite the fact that it remains difficult to compare neurofuzzy systems conceptually and evaluate their performance, early experimental results proved a promising performance and the need for further evaluation in other application domains.
Key words. Hybrid systems, Neurofuzzy architecture, Differential evolution, fuzzy adaline 1. Introduction Hybrid systems mix different methods of knowledge engineering and make them “work together” to achieve a better solution to a problem, compared to using a single method for the same problem [1, 2]. Hybrid connectionist production systems incorporate artificial neural networks (ANNs) into production rules with respect to approximate reasoning and learning [2, 3]. Fuzzy inference modules are incorporated into production rules in a similar way [4, 5]. Today Neuro-Fuzzy (NF) computing is a popular framework for solving complex problems. If we have knowledge expressed in linguistic rules, we can build a Fuzzy Expert System (FIS), and if we have data, or can learn from stimulation (training) then we can use ANNs. For building a FIS, we have to specify the fuzzy sets, fuzzy operators and the knowledge base. Similarly for constructing an ANN for an application the user needs to specify the architecture and learning algorithm. An analysis reveals that the drawbacks pertaining to these approaches seem complementary and therefore it is natural to consider building an integrated system combining the concepts. While the learning capability is an advantage from the viewpoint of FIS, the formation of linguistic rule base will be advantage from the viewpoint of ANN [4, 5]. Interestingly this synergy still becomes a target yet to be satisfied. The essence of the successful synergy relies on the retention of the well-defined identity of the two contributing technologies. In the most systems [FuzzyCOPE, ANFIS, NEFCLASS, FUNN etc] one of the two technologies becomes
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
97
predominant resulting a commonly visible accuracy-interpretability trade-off [6, 7, 8]. Generally, because of the approximation abilities are easier to be quantified and eventually to be scientifically or technologically realized, the usual result is a tendency toward for far attention being placed on the neural side of the most NF systems with the approximation capabilities being highly “glorified” even supported with Evolutionary Programming (EP) techniques and the interpretation abilities being quietly reduced (FUZZNET, FNES, FALCON, ANFIS, NEFCON, FINEST, FuNN, NEFCLASS, etc) [4, 3, 5]. As a guideline, for NF systems to be highly intelligent some of the major requirements are: fast learning (memory based - efficient storage and retrieval capacities), on-line adaptability (accommodating new features like inputs, outputs, nodes, connections etc), achievement a low global error rate and computationally inexpensive [3, 4, 7]. The data acquisition and pre-processing training data are also quite important for the success for all NF systems [8, 9]. The underlying conjecture of the previous is that the future NF systems should be constructed on a simple possessing unit [4]: Fuzzy logic Neuron (FN) that include fuzzy data and fuzzy logic operations in its’ unit, the inputs and/or the weights are also expressed in terms of membership functions and whose transparency and learning abilities are accentuated to highest possible level as already proposed in the literature [3, 10]. In this paper we introduce fuzzy neurules a kind of rules that incorporate fuzzy Adaline units for training and adaptivity purposes. However, pre-eminence is still given to the symbolic component. Thus, the constructed knowledge base retains the modularity of fuzzy rules, since it consists of autonomous units (fuzzy neurules), and their naturalness, since they look much like symbolic rules. In previous papers, we have also described a similar method included in HYMES for generating neurules directly from empirical (training) data [2]. In this paper we present and evaluate a new architecture, called FUNEUS, in order to work with fuzzy data. Preliminary experimental results provide promising performance. The structure of the paper is as follows. Section 2 presents the hybrid connectionist system and the corresponding architecture. Section 3 presents the basic ideas and architecture of fuzzy neurules. In Section 4, the system architecture and parameter adjustment of FUNEUS is described. In Section 5 the hybrid inference mechanism is presented. Section 6 contains experimental results for our model validation. Finally Section 7 discusses related work. 2. Low level Hybrid Systems and Fuzzy Neurules A hybrid system is a mixture of methods and tools used in one system which are loosely [1, 4] or tightly coupled [1, 4] from the functional point of view and hierarchical [3], flat [3], sequential [3, 4] or parallel [3, 4] from the architectural point of view. Additionally there are systems that are blended at a low structural and functional level that they are not separable from functional and structural point of view [3, 4, 6]. The most of Low level Hybrid Connectionist Systems models use variants of McCulloch and Pitt’s neurons to build a network. 2.1 Fuzzy Neurules 2.1.1 Symbolic Rules in Neural Networks (NN): Connectionist Expert Systems Building a connectionist rule base is possible not only by training a neural network with a set of data examples but by inserting existing rules into a neural network structure [5, 7]. This approach brings advantages of connectionism, that are: learning,
98
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
generalization, modularity, robustness, massive parallelism etc., to the elegant methods for symbolic processing, logical inferences and global-driven reasoning. Both paradigms can be blended at a low, neuronal level and structural knowledge can built up in a neuron and in a NN realized as a connectionist rule-based system [3. 4]. 2.1.1.1 Representing symbolic knowledge as NN Representation of symbolic knowledge in the form of production rules, in a NN structure requires appropriate structuring of the NN and special methods. A NN may have fixed connections, that is that NN cannot learn and improve its knowledge or adaptable connections, that is that NN can learn in addition to its inserted structured knowledge. A great advantage to using NNs for implementing rule-based systems is the capacity that they provide for approximate reasoning. It is true only if the neurons used in the network allow grading. If, they are binary, only exact reasoning is possible [8, 9]. 2.1.1.2 Neurons and NNs that represent Simple Symbolic Rules A Boolean propositional rule of the form of: IF x1 and x2 and … xn THEN y Where xi (i=1,2,…, n) and y are Boolean propositions, can be represented in a binary input - binary output neuron which has a simple summation input function and an activation threshold function f. Similarly, the Boolean propositional rule: IF x1 or x2 or … xn THEN y will be realized in a similar binary neuron but with different connection weights and thresholds [7]. The neurons cannot learn. These two simple neurons can be used for building NNs that represent a whole set of rules, but which are not adaptable. Symbolic rules that contain different types of uncertainties can also be realized in a connectionist structure. These include rules where uncertainty is expressed by probabilities and in this case is set in such a way that it calculates conditional probabilities as well as rules with confidence factors that is: IF x1 is A1 and x2 is A2 and … xn is An THEN B (Cf) Can be realized either by [5]: 1. Inserting the rule into the connections of n-input, one output neuron or 2. Applying a training procedure to e neuron with training examples, whose input and output values represent certainties for existing facts to match the condition elements and confidence for the inferred conclusion. Realizing that more than one rule in a single neuron of the Perceptron-type may not be appropriate, minded the restrictions of Perceptron neurons pointed out by a number of authors [5, 7], neurules presented in [2] have been developed as a kind of a connectionist production system that has incorporated Adaline-type neurons to represent sets of simple symbolic rules. In the next sections we describe fuzzy neurules that are a kind of fuzzy rules in a fuzzy expert system incorporated Fuzzy-Adaline neurons to represent sets of simple fuzzy rules. 2.2 Integrating Fuzzy Neurules with Fuzzy ADALINE neurons 2.2.1 The Fuzzy Logic neuron A fuzzy neuron has the following feature, which distinguish it from the ordinary types of neurons [4, 5]: x The inputs of the neuron x1, x2, … xn represent fuzzy labels of the fuzzy input variables.
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
99
x
The weight are replaced by the membership functions μi of the fuzzy labels xi (i=1,2,…, n) x Excitatory connections are represented by MIN operation and inhibitory connections by fuzzy logic complements followed by MIN operation. x A threshold level is not assigned. In fuzzy neuron there is no learning. The membership functions attached to the synaptic connections do not change. Neo-fuzzy neuron [10] that is a further development of the fuzzy neuron, with the new features of: a. the incorporated additional weights which are subject to change during training and b. it works with standard triangular membership functions and thus only two membership functions are activated simultaneously by an input, and consequently c. have proved faster in training and better in accuracy even than a three-layered feedforward NN with backpropagation algorithm. The underlying conjecture of the previous is that the new NF systems should be constructed in a simple possessing unit: Fuzzy logic Neuron (FN) that include fuzzy data and fuzzy logic operations in its’ unit, the inputs and/or the weights are also expressed in terms of membership functions and whose transparency and learning abilities are accentuated to highest possible level [4]. Types of fuzzy neurons have been successfully applied to prediction and classification problems [5, 9]. 2.2.2 The Fuzzy ADALINE neuron The Wirdow-Hoff’s ADALINE can be thought of as the smallest, linear building block of the artificial neural networks. This element has been extensively used in science, statistics (in the linear regression analysis), engineering (the adaptive signal processing, control systems), and so on. Recently Fuzzy ADALINE neurons were developed introducing the following modifications to ADALINEs [6]: 1. The signum-type non-linearity has been replaced by a two-level clipper. 2. The position of the non-linear function block has been shifted inside the loop, unlike the case of Widrow’s model, where it was on the forward path outside the loop. 3. A provision for stretching the linear portion of the non-linearity has been incorporated in the model. In the proposed model, with the input and output constrained in the range [-1, +1], the unity slope in the nonlinearity of the neuron will fail to yield the desired output whenever the target signal is higher than the average of the inputs. In order to circumvent this situation, an adaptive algorithm for adjustment of the slope of the linear portion in the non-linear clipper function have been developed. In order fuzzy rules to work in fuzzy environment we incorporated the a “fuzzy” neuron to produce sets of rules called fuzzy neurules. Each fuzzy neurule is now considered as a fuzzy adaline unit that has replaced a number of fuzzy rules in the rule base of a fuzzy system. However, pre-eminence is still given to the fuzzy component. Thus, the constructed knowledge base retains the modularity of production rules, since it consists of autonomous units (fuzzy neurules), and their naturalness, since fuzzy neurules look much like fuzzy rules. 3. Fuzzy neural Networks A Fuzzy Neural Network (FNN) is a connectionist model for fuzzy rules implementation and inference. There is a great variety of architectures and
100
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
functionalities of FNN. The FNNs developed so far differ mainly in the following parameters: x Type of fuzzy rules implemented that affects the connectionist structure used. x Type of inference method implemented that affects the selection of different neural network parameters and neuronal functions, such as summation, activation and output function. It also affects the way the connection weights are initialized before training, and interpreted after training x Mode of operation: that can have one of the three modes a) Fixed mode with fixed membership functions-fixed set of rules, that is a fixed set of rules inserted in a network, which performs the inference, but does not change its weights, resulting to learning and adaptation [5], b) Learning mode with the network is structurally defined to capture knowledge in a format of fuzzy rules, and after random initialization and training with a set of data the set of fuzzy rules are finally extracted from the structured network and c) Adaptation mode where the network is structurally set according to a set of fuzzy rules and heuristics and then after training with a set of data updated rules are extracted with: 1. fixed membership functions – adaptable rules and 2. adaptable membership functions-adaptable rules. FNNs have two major aspects: a. the structural that refers to the type of neurons that are used i.e. the multilayer perceptrons (MLPs) and the radial-basis functions. FuNN is a characteristic example of adaptable FNNs that uses MLP or backpropagation training algorithm with adaptable membership functions of the fuzzy predicates and adaptable fuzzy rules [5]. 4. FUNEUS 4.1 FUNEUS: A Hybrid Fuzzy Connectionist Production System 4.1.1 System Structural Components 4.1.1.1 The Fuzzy Neural Network NF systems could be broadly classified in two types: a) weakly coupled systems and b) tightly coupled systems [6].
Figure 1: A general diagram of FUNEUS A weakly coupled NF system employs both a neural net unit and a fuzzy system unit in cascade. In a tightly coupled system neurons the basic elements of a NN are constructed amalgamating the composite characteristics of both a neuronal element and fuzzy logic [4, 11].
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
101
4.1.1.2 The GA component Additionally a Fuzzy-Genetic algorithm synergism is used for membership functions optimization, that are intuitively chosen in a fuzzy system [6]. Given that the optimization of fuzzy membership functions may involve many changes to many different functions, and that a change to one function may effect others, the large possible solution space for this problem is a natural candidate for a GA based approach despite many NF approaches that use a gradient descent-learning algorithm to fine-tune the parameters of the fuzzy systems. GA module for adapting the membership function parameters acts as a stand-alone system that already have the if-then rules. GA optimises the antecedent and consequent membership functions. Differential Evolution (DE) algorithm here used is a very simple population based, stochastic function minimizer which is very powerful at the same time turned out to be the best genetic type of algorithm for solving the real-valued test functions [14].
Figure 2. Membership functions adjustment by GA The crucial idea behind DE is a scheme for generating trial parameter vectors. For this architecture evolution of membership functions proceeds at a faster time scale in an environment usually decided by the problem, the architecture and the inference system. Here we use evolution for triangular membership functions tuning using a mutation function as: Xnew- Xold = F(y+z) where x, y, z and F random numbers (with fuzzy or crisp values) and a mutation function for curves with representation (x (a), y (b) ). The GA used in the system is, in essence, the same as DE genetic algorithm, with the important exception that the chromosomes are represented as strings of floating point numbers, rather than strings of bits. According to this NF-GA synergism let x, y and z be three fuzzy variables and µ A (x), µ B (x), µ C (x) are the three fuzzy membership functions of the fuzzy variable x with respect to fuzzy sets A, B and C respectively. Similarly we have the membership functions µ A (y), µ B (y), µ C (y) and µ A (z), µ B (z), µ C (z) for variables y and z respectively. These functions have been chosen intuitively. Now for optimization of the membership curves that are isosceles triangles we denote each side with (x (a) ,y (b) ) for each variable. The chromosomes in the present context
102
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
thus has 18 fields, two for each membership curve from the input side of the system and the 1 more field from the desired output side of the system. The crossover and mutation operations in the present context are realized in convenient way in correspondence to the specified mutation function, that is not presented here in details because it is usually a comparison between the system responses F(x,y,z) with the desired system responses Fd(x,y,z) specified from the application (Fig. 8). 4.1.2 Fuzzy Neurule Base Construction We use the simple and straightforward method [2] and the proposed method by Wang and Mendel [16] for generating fuzzy rules from numerical input-output training data. The task here is to generate a set of fuzzy rules from the desired input-output pairs and then use these fuzzy rules to determine the complete structure of the rule base. The algorithm for constructing a hybrid rule base from training data is outlined below: 1.Determine the input and output variables (fuzzy) and use dependency information to construct an initial fuzzy rule for each intermediate and output variable. 2.Determine the for each initial fuzzy rule from the training data pairs, train each initial fuzzy rule using its training input-output set and produce the corresponding fuzzy neurule(s). 3.Put the produced fuzzy neurules into the fuzzy neurule base. In the sequel, we elaborate on each of the first two steps of the algorithm. 4.1.3 Constructing the initial fuzzy neurules To construct initial neurules, first we need to know or determine the input, intermediate and output variables. Then, we need dependency information. Dependency information indicates which intermediate variables (concepts) and output variables (concepts) depend on. If dependency information is missing, then output variables depend only on input variables, as indicated by the training data. In constructing an initial fuzzy neurule, all the conditions including the input, intermediate and the output variables that contribute in drawing a conclusion (which includes an intermediate or an output variable) constitute the inputs of the initial fuzzy neurule and the conclusion its output. So, a fuzzy neurule has as many conditions as the possible input, intermediate and output variable-value pairs. Also, one has to produce as many initial fuzzy neurules as the different intermediate and output variable-value pairs specified. Each fuzzy neurule is a fuzzy Adaline neuron with inputs the fuzzified values, weights the membership functions and additional weights with initial values of ‘1’. 4.1.4 Training the initial neurules From the initial training data, we extract as many (sub)sets as the initial fuzzy neurules. Each such set, called a training set, contains training examples in the form [x1 x2 … xn d], where xi, i= 1, …,n are their component values, which correspond to the n inputs of the fuzzy neurule, and d is the desired output. Each training set is used to train the corresponding initial neurule and calculate its additional weights that are already set as ‘1’. So, step 2 of the algorithm for each initial fuzzy neurule is analyzed as follows: 2.1 From the initial training data, produce as many initial training sets (x1 x2 … xn d) as the number of the initial fuzzy neurules .
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
103
2.2 We assign the (x1 x2 … xn d) to the region that has maximum degree resulting one value with one membership function for each input parameter 2.3 For each set of desired input-output we obtain do the following: 2.3.1 Obtain one fuzzy neurule i.e. If x1 is low and x2 high …. then d is normal 2.3.2 Assign an additional weight for each fuzzy neurule. The rule weight is defined as CFi = μA(Ȥ1)μB(Ȥ2)…μC(d). This step is further performed to delete redundant rules, and therefore obtain a concise fuzzy neurule base. 2.3.3 Produce the corresponding fuzzy neurule that is like If x1 is A and x2 B …. then d is C (CF) If two or more generated fuzzy neurules have the same conditions and consequents, then the rule that has maximum degree in the desired output is used. In this way, assigning the additional weigth to each rule, the fuzzy rule base can be adapted or updated by the relative weighting strategy: the more task related the rule becomes, the more weight degree the rule gains. As a result, not only is the conflict problem resolved, but also the number of rules is reduced significantly [2]. Suppose for example that we are given the following set of desired input -(x1,x2) output (y) data pairs (x1,x2,y): (0.6, 0.2; 0.2), (0.4, 0.3; 0.4). In our system, input variable fever used has a degree of 0.8 in low, a degree of 0.2 in low. Similarly, input variable itching has degree of 0.6 in low and of 0.3 in medium. Secondly, assign x1 i , x2 i , and y I to a region that has maximum degree. Finally, obtain one rule from one pair of desired input-output data, for example, (x1 1 ,x21 ,y1 ) => [x11 (0.8 in low), x2 1 (0.2 in low),y 1 (0.6 in normal)], • R1: if x1 is low and x2 is medium, then y is normal; (x1 2 ,x2 2 ,y 2 ), => [x1(0.8 in low),x2 (0.6 in medium),y 2 (0.8 normal)], • R2: if x1 is low and x2 is high, then y is medium; Assign a degree to each rule. To resolve a possible conflict problem, i.e. rules having the same antecedent but a different consequent, and to reduce the number of rules, we assign a degree to each rule generated from data pairs and accept only the rule from a conflict group that has a maximum degree. In other words, this step is performed to delete redundant rules, and therefore obtain a concise fuzzy neurule base. The following product strategy is used to assign a degree to each rule. The degree of the rule de-noted by Ri : if x1 is A and x2 is B, then y is C(wi), The rule additional weight is defined as wi = µA(xl)µB(x2)µc(y) For example of our example R1 has a degree of W1 = µ lowf(x1)µ medium (x2)µ normal (y) = 0.8 x 0.2 x 0.6 = 0.096, and R2 has a degree of W2 = µ half(x1)µ high(x2)µ normal (y) = 0.8 x 0.6 x 0.8 = 0.384 Note, that if two or more generated fuzzy rules have the same preconditions and consequents, then the rule that has maximum degree is used. In this way, assigning the degree to each rule, the fuzzy rule base can be adapted or updated by the relative weighting strategy: the more task related the rule becomes, the more weight degree the
104
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
rule gains. As a result, not only is the conflict problem resolved, but also the number of rules is reduced significantly. After the structure-learning phase (if-then rules), the whole network structure is established, and the network enters the second learning phase to optimally adjust the parameters of the membership functions using the GA algorithm to minimise the error function.
5. Inference through FUNEUS A functional block diagram of the FUNEUS model consists of two phases of learning processes: a) The first phase is the structure-learning (if-then rules) phase using the knowledge acquisition module producing the fuzzy neurule base. b) The second phase is the parameter-learning phase for tuning membership functions to achieve a desired level of performance with the use of a Genetic Algorithm to tune membership functions to fine-tune the parameters of the fuzzy membership functions. In the connectionist structure, the input and output nodes represent the input states and output decision signals, respectively, and in the hidden layers, there are nodes functioning as quantification of membership functions (MFs) and if-then rules. As soon as the initial input data set is given and put in the Working Memory (WM), the resulting output fuzzy neurules are considered for evaluation. One of them is selected for evaluation. Selection is based on textual order. A rule succeeds if the output of the corresponding fuzzy adaline unit is computed to be ‘1’, after evaluation of its conditions. A condition evaluates to ‘true’, if it matches a fact in the WM, where there is a fact with the same variable, predicate and value. A condition evaluates to ‘unknown’, if there is a fact with the same variable, predicate and ‘unknown’ as its value. A condition cannot be evaluated if there is no fact in the WM with the same variable. In this case, either a question is made to the user to provide data for the variable, in case of an input variable, or an intermediate fuzzy neurule in Fuzzy Neurule Base (FNRB) with a conclusion containing that variable is examined, in case of an intermediate variable. A condition with an input variable evaluates to ‘false’, if there is a fact in the WM with the same variable, predicate and different value. A condition with an intermediate variable evaluates to ‘false’ if additionally to the latter there is no unevaluated intermediate fuzzy neurule in the FNRB that has a conclusion with the same variable. Inference stops either when one or more output fuzzy neurules are fired (success) or there is no further action (failure). In the training phase input membership functions and desired output values of the training data-set are used in a text-form to fine tune the parameters of the fuzzy sets with the offline use of the GA component. 6. Model Validation Using Testing Data Sets 6.1 Coronary Heart Disease Development (crisp and fuzzy data) The experimental data set was taken from Takumi Ichimura and Katsumi Yoshida [11] that prepared a medical database named Coronary Heart Disease DataBase (CHD_DB), which makes it possible to assess the effectiveness of classification methods in medical data. The CHD_DB is based on actual measurements of the Framingham Heart Study - one of the most famous prospective studies of
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
105
cardiovascular disease. It includes more than 10,000 records related to the development of coronary heart disease (CHD). We have proved it enough valid by statistical analyses. We used FUNEUS to develop a diagnostic system that outputs whether each record is a non-CHD case or a CHD case. Data in coronary heart diseases database are divided into two classes: non-oronary heart disease cases (non-CHD) and coronary heart disease cases (CHD). Each patient’s disorder is diagnosed according to the results of eight test items. The eight items tested are Cholesterol (TC), Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP), Drinking (ALCOHOL) that are used as inputs with fuzzy values, Left Ventricular Hypertrophy (LVH), Origin (ORIGIN), Education (EDUCATE), Smoking (TABACCO), and that used as inputs with non-fuzzy value. The fuzzy inference system was created using the FUNEUS. We used triangular membership functions and each input variable were assigned three MFs. Ten fuzzy rules were created using the methodology mentioned in Section 3 (Rule format: IF (…) THEN CHD) : Rule 1 : (SBP high) … Rule 3 : (TC high) Rule 4 : (SBP medium) AND (DBP > high) AND (LVH = 0)AND (EDUCATE < 2) AND (TABACCO > 0) AND (TABACCO < 3) … Rule 6 : (TC medium) AND (DBP high) AND (ORIGIN = 1) AND (TABACCO > 0) AND (ALCOHOL low) Rule 7 : (TC medium) AND (SBP high) AND (DBP medium) AND (EDUCATE < 1) AND (TABACCO < 2) … Rule 10 : (TC high) AND (SBP high) AND (DBP > high) AND (LVH = 0) AND (TABACCO > 0)AND (TABACCO < 3) AND (ALCOHOL medium)
We also explored the fine-tuning of membership functions using DE algorithm. We started with a population size 10, tournament selection strategy, mutation rate 0.01 and implemented a one point crossover operator. After a trial and error approach by increasing the population size and the number of iterations (generations), we finalized the population size and number of iterations as 50. Figure 3 demonstrates the effect of parameter tuning of membership functions (before and after evolutionary learning) for the input variable ALCOHOL used. 6.2 Evaluation results We used Train_Z, set which is consisted of 400 CHD cases, and 3600 non-CHD cases. The judgment accuracy for 4000 training data was as follows: results were correct for 310 cases of 400 CHD cases, and were false for 2683 of 3600 non-CHD cases. Therefore, the recognition rate to the set of training data was 75.0%, comparable with machine learning method used in [15].
106
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
Figure 3. The MFs of input variable ALCOHOL used before and after GA learning 7. Discussion and Related Work A number of NF systems have used for various real life problems. Among others, three neuro-fuzzy systems that use hybrid learning for rules generation and parameter tuning like FUNEUS presented in this paper have already been described [8]. The first one is called NEFCON (NEuro-Fuzzy CONtroller) and used for control applications. The next one is NEFCLASS (NEuro-Fuzzy CLASSifier) and used for classification problems and pattern recognition [5]. The third one is NEFPROX (Neuro-Fuzzy function for approximation) and used for function approximation. The three systems are based on generic fuzzy perceptron which is a kind of FNN. Each of these systems can learn a rule-base and then tune the parameters for the membership functions. Perceptron is used in order to provide a framework for learning algorithms to be interpreted as a system of linguistic rules and to be able to use prior knowledge in the form of fuzzy IF-THEN rules. Additionally fuzzy weights are associated with linguistic terms. The fuzzy perseptron is composed from an input layer, in a hidden layer and an output layer. Connections between them are weighted with fuzzy sets instead real numbers. In NEFCON inputs are state variables and the only one output neuron outputs control action applied to a technical system. In NEFCLASS the rule-base approximates a function for a classification problem and maps an input pattern to proper class using no membership functions in the rule’s consequents [8]. There are also examples of NF systems that do not employ any algorithms for rule generating, so the rule base must be known in advance. They can only adjust parameters of the antecedent and consequent fuzzy sets. The most popular system of this kind is ANFIS (Adaptive-Network-based Fuzzy Inference System) that is of the first hybrid NF systems for function approximation. The architecture is five-layer feed-forward implementing TagakiSugeno type of rules [11]. According to Abraham [3] Sugeno type systems are high performers but often require complicated learning procedures and computational expensive. An optional design of a NF system can only be achieved by the adaptive evolution of membership functions, rule base and learning rules. For each architecture evolution of membership functions proceeds at a faster time scale in an environment decided by the problem, the architecture and the inference system. Thus, global search of fuzzy rules and membership functions provide the fastest possible time scale [3]. The main problem here is that the resulting rules are usually not accepted form the expert knowledge because they are mechanically derived and not related with real world. So it is desirable to tune a set of predefined rules rather to produce a set from
C. Koutsojannis and I. Hatzilygeroudis / FUNEUS: A Neurofuzzy Approach
107
data [15]. In this paper we present FUNEUS which is a NF system based on fuzzy Adaline neurons and uses Genetic Algorithms for optimization of membership functions. Taking account the previous approaches and that it remains difficult to compare NF systems conceptually and evaluate their performance FUNEUS is an attempt to the direction of integrating the best components of such approaches after our experience with a well-defined hybrid model so-called HYMES. Experimental results proved acceptable performance of the NF need to be evaluated in more domains as Medical Diagnosis [17] and Intelligent Educational Systems [18]. Further work will provide additionally experimental results compared with other hybrid approaches.
References [1] [2]
[3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
[16] [17]
[18]
Bonissone PP, Chen YT, Goebel T, and Khedkar PS.: Hybrid Soft Computing Systems: Industrial and Commercial Applications, Proceedings of the IEEE, 87(9), 1641-1667, (1999). Hatzilygeroudis I., Prentzas, J.: HYMES: A HYbrid Modular Expert System with Efficient Inference and Explanation. Proceedings of the 8th Panhellenic Conference on Informatics, Nicosia, Cyprus, Vol.1 (2001) 422-431 Abraham A. and Nath B.: Hybrid Intelligent Systems Design-A Review of a Decade of Research, Pedrycz W.: Heterogeneous Fuzzy Logic Networks: Fundamentals and Development Studies IEEE Transactions on Neural Networks, vol. 15, 6:1466- 81 (2004) Kasabov N.: Foundations of Neural Networks, Fuzzy Systems and Knowledge Engineering, MIT Press (1996) Konar A.: Computational Intelligence, Springer-Verlag (2005). Gallant S.I., Neural Network Learning and Expert Systems, MIT Press (1993). Shavlik J.: Combining Symbolic and Neural Learning, Machine Learning, 14, 321-331 (1994) Rutkowska D.: Neuro-Fuzzy Architecture and Hybrid Learning, Physica-Verlag Press (2002) Yamakawa T. Pattern recognition hardware system employing fuzzy neuron, In Proceedings of the International Conf of Fuzzy Logic and Neural Networks , Lizuka Japan, July 1990, 943-948. Pal SK, Mitra S.: Neuro-Fuzzy Pattern Recognition, John Wiley & Sons, NY (1999). Jang J.: ANFIS: Adaptive-Network-based-Fuzzy-Inference System, IEEE Trans on Systems, Man and Cybernetics, 23:665-685. Suka M , Ichimura T, and Yoshida K.: Development of Coronary Heart Disease Database, KES 2004, LNAI 3214:1081–1088, 2004. Storn, R., "System Design by Constraint Adaptation and Differential Evolution", IEEE Trans. on Evolutionary Computation, 1999, Vol. 3, No. 1, pp. 22 - 34. Hara A., Ichimura T.: Extraction of Rules from Coronary Heart Disease Database Using Automatically Defined Groups, M.Gh. Negoita et al. (Eds.): KES 2004, LNAI 3214:1089–1096, (2004). Wang L. X. & Mendel J. M., Generating Fuzzy Rules by Learning from Examples, IEEETranscation on System, Man and Cybernetics, Vol. 22, Issue 6, pp. 1414-1427, 1992. Georgios D. Dounias and Derek A. Linkens (Eds.), (2001), Adaptive Systems and Hybrid Computational Intelligence in Medicine, Special Session Proceedings of the EUNITE 2001 Symposium, Tenerife, Spain, December 13-14, 2001, A Publication of the University of the Aegean, ISBN 960-7475-19-4 I. Hatzilygeroudis, C. Giannoulis and C. Koutsojannis, Combining Expert Systems and Adaptive Hypermedia Technologies in a Web Based Educational System, ICALT 2005 (Kaohsiung,Taiwan, July 5-8,2005).
108
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Empirical Evaluation of Scoring Methods Luca Pulina 1 Laboratory of Systems and Technologies for Automated Reasoning (STAR-Lab) DIST, Università di Genova, Viale Causa, 13 – 16145 Genova, Italy [email protected] Abstract. The automated reasoning research community has grown accustomed to competitive events where a pool of systems is run on a pool of problem instances with the purpose of ranking the systems according to their performances. At the heart of such ranking lies the method used to score the systems, i.e., the procedure used to compute a numerical quantity that should summarize the performances of a system with respect to the other systems and to the pool of problem instances. In this paper we evaluate several scoring methods, including methods used in automated reasoning contests, as well as methods based on voting theory, and a new method that we introduce. Our research aims to establish which of the above methods maximizes the effectiveness measures that we devised to quantify desirable properties of the scoring procedures. Our method is empirical, in that we compare the scoring methods by computing the effectiveness measures using the data from the 2005 comparative evaluation of solvers for quantified Boolean formulas. The results of our experiments give useful indications about the relative strengths and weaknesses of the scoring methods, and allow us to infer also some conclusions that are independent of the specific method adopted. Keywords. Automated Reasoning, Systems Comparison, Scoring Methods.
1. Introduction The automated reasoning research community has grown accustomed to competitive events where a pool of systems is run on a pool of problem instances with the purpose of ranking the systems according to their performances. A non exhaustive list of such contests includes the CADE ATP System Competition (CASC) [1] for theorem provers in first order logic, the SAT Competition [2] for propositional satisfiability solvers, the QBF Evaluation [3] for quantified Boolean formulas (QBFs) solvers, and the International Planning Competition (see, e.g., [4]) for symbolic planners. At the heart of the ranking that determines the winner of such events, lies the method used to score the systems, i.e., the procedure used to compute a numerical quantity that should summarize the performances of a system. Usually such quantities cannot be interpreted as absolute measures of merit, but they should represent the relative strength of a system with respect to the other competitors based on the difficulty of the problem instances used in the contest. There is a general agreement that, although the results of automated reasoning systems competitions may provide less insight than controlled experiments in the spirit of [5], yet they play a fundamental role in the advancement of the state of the art, helping 1 The
work of the author is supported by grants from MIUR.
L. Pulina / Empirical Evaluation of Scoring Methods
109
to set research challenges for developers and assess the current technological frontier for users. In this paper we evaluate two different sets of scoring methods. The first set is comprised of some methods used in automated reasoning systems contests, namely CASC [1], the SAT competition [2], and the QBF evaluation [3], and a new method called YASM ( “Yet Another Scoring Method”) that we are evaluating as a candidate scoring method for the 2006 competitive evaluation of QBF solvers. The second set of scoring methods is comprised of procedures based on voting systems, namely Borda count [6], range voting [7] and sum of victories (based on ideas from [8]). The main difference between the methods of the first set is that using the methods of the second set amounts to consider the solvers as candidates and the problem instances as voters. Each voter ranks the candidates, i.e., the solver with the best performance on the instance is the preferred candidate, and all the other solvers are ranked accordingly. Finally, the votes are pooled to elect the winner of the contest. Our research aims to establish which of the above methods maximizes the measures that we devised to quantify desirable properties of the scoring procedures. In particular, our measures should account for: • the degree of (dis)agreement between the different scoring methods; • the degree of stability of each scoring method with respect to perturbations (i) in the size of the test set, (ii) in the amount of resources available (CPU time), and (iii) in the quality of the test-set; • the representativeness of each scoring method with respect to the state of the art expressed by the competitors. In order to evaluate the relative quality of the scoring methods under test, we compute the above measures using part of the results from the 2005 comparative evaluation of QBF solvers (QBFEVAL 2005) [9]. Our analysis is thus empirical, but we believe that the scenario of QBFEVAL 2005 is representative enough in order to allow some generalizations of our results, under the hypotheses presented in Subsection 2.2. The paper is structured as follows. In Section 2 we introduce our case study, the 2005 comparative evaluation of QBF solvers [9], and we outline the working hypotheses underlying our analysis. In Section 3 we present the scoring methods, and in Section 4 the effectiveness measures used to compare and evaluate the methods. The results of such comparison are the subject of Section 5, where we analyze the data in order to pin down the methods that enjoy the best performances overall. We conclude the paper in Section 6 with a discussion about the impact of our results on the evaluations of automated reasoning systems, and with some suggestions for further improvements on our current analysis. 2. Preliminaries 2.1. QBFEVAL 2005 QBFEVAL 2005 [9] is the third in a series of non-competitive events established with the aim of assessing the advancements in the field of QBF reasoning and related research. QBFEVAL 2005 accounted for 13 competitors, 553 QBFs and three QBF generators submitted. The test set was assembled using a selection of 3191 QBFs obtained considering the submissions and the instances archived in QBFLIB [10]. The results of QBFEVAL
110
L. Pulina / Empirical Evaluation of Scoring Methods
2005 can be listed in a table, that we call RUNS in the following. RUNS is comprised of four attributes (column names): SOLVER, INSTANCE, RESULT, and CPUTIME . The attributes SOLVER and INSTANCE report which solver is run on which instance. RESULT is a four-valued attribute: SAT, i.e., the instance was found satisfiable by the solver, UNSAT, i.e., the instance was found unsatisfiable by the solver, TIME, i.e., the solver exceeded a given time limit without solving the instance (900 seconds in QBFEVAL 2005), and FAIL , i.e., the solver aborted for some reason (e.g., a crash, or insufficient memory to complete the task). Finally, CPUTIME reports the CPU time spent by the solver on the given instance, in seconds. Our empirical analysis of scoring methods is herewith carried out using a selection from the QBFEVAL 2005 RUNS table described above. In particular, we selected only the eight solvers that passed to the second stage of the evaluation and the QBFs coming from classes of instances having fixed structure (see [9] for deepened details). Under these assumptions, RUNS table reduces to 4408 entries, one order of magnitude less than the original one. This choice allows us to disregard correctness issues, to reduce considerably the overhead of the computations required for our analyses, and, at the same time, maintain a significant number of runs. 2.2. Working hypotheses The scoring methods that we evaluate, the measures that we compute and the results that we obtain, are based on the assumption that a table identical to RUNS as described above is the only input required by a scoring method. As a consequence, the scoring methods (and thus our analyses) do not take into account (i) memory consumption, (ii) correctness of the solution, and (iii) “quality” of the solution. As for (i), it turns out that measuring and comparing memory consumption is a tough exercise during a contest, mainly for two reasons. First, there are several definitions of memory consumption, e.g., peak memory usage as opposed to the total number of bytes in memory read/write operations, and it is not clear which one should be preferred. Second, it is complicated to measure memory consumption for systems as black boxes, which is usually the only view that the organizers of the contest have. Regarding (ii), checking the correctness of the solution is desirable for most automated reasoning systems. Unfortunately, the size of the certificate can be prohibitive in practice, and this is precisely the case of certificates for QBF (un)satisfiability which is a PSPACE-complete problem. Producing reasonably sized certificates, i.e., small proofs, from QBF solvers is still mainly a research issue (see, e.g., [11,12]), and thus we do not have any certificates associated with QBFEVAL 2005 RUNS table. Finally, (iii) matters only if there is indeed some solution on which quality indicators can be computed for the sake of comparison. No such indicators can be computed for simple SAT/UNSAT results, and, as we said before, obtaining solutions, i.e., proofs, from QBF solvers is not an easy task. Given the setting described above, one more working hypothesis concerns CPU times, that we will assume to be unaffected by noise. It turns out that such assumption makes for a rather idealistic model, at least on the current QBFEVAL 2005 platform1 . Even under light-load conditions, the noise affecting the CPU time measured by the operating system can be substantial, with standard errors in the order of 1% to 10% of the average CPU time over several runs of the same program with the same inputs. 1 A farm of identical 3GHz PIV PCs with 1GB of main memory, running Debian GNU/Linux (sarge distribution with a 2.4.x kernel)
L. Pulina / Empirical Evaluation of Scoring Methods
111
3. Analyzed methods 3.1. State of the art scoring methods In the following we describe in some details the state of the art scoring methods used in our analysis. For each method we describe only those features that are relevant for our purposes. Further details can be found in the references provided. CASC [1] The CASC scoring method applied to our setting yields the following guidelines. Solvers are ranked according to the number of problems solved, i.e., the number of times RESULT is either SAT or UNSAT. In case of a tie, the solver faring the lowest average on CPUTIME fields over the problems solved is preferred. QBF evaluation [3] QBFEVAL scoring method is the same as CASC, except that ties are broken using the sum of CPUTIME fields over the problems solved. SAT competition [2] The last SAT competition uses a purse-based method, i.e., the score of a solver on a given instance, is obtained by adding up three purses: • the solution purse, which is divided equally among all solvers that solve the problem; • the speed purse, which is divided unequally among all the competitors that solve the problem, first by computing the speed factor Fs,i of a solver s on a problem instance i: k Fs,i = (1) 1 + Ts,i 4 where k is an arbitrary scaling factor (we set k = 10 according to [13]), and Ts,i is the time spent by s to solve i; then by computing the speed award As,i , i.e., the portion of speed purse awarded to the solver s on the instance i: Pi · Fs,i (2) As,i = r Fr,i where r ranges over the solvers, and Pi is the total amount of the speed purse for the instance i. • the series purse, which is divided equally among all solvers that solve at least one problem in a given series (a series is a family of instances that are somehow related, e.g., different QBF encodings for some problem in a given domain). The overall score of a solver is just the sum of its scores on all the instances of the test set, and the winner of the contest is the solver with the highest sum. Borda count [6] Suppose that n solvers are participating to the contest. Each voter (instance) ranks the candidates (solvers) in ascending order considering the value of the CPUTIME field. Let ps,i be the position of a solver s in the ranking associated with instance i (1 ≤ ps,i ≤ n). The score of s computed according to Borda count is just Ss,i = n − ps,i . In case of time limit attainment and failure, we default Ss,i to 0. The total score Ss of a solver s is the sum of all the scores, i.e., Ss = i Ss,i , and the winner is the solver with the highest score. Range voting [7] The ranking position ps,i of each solver s on each instance i is computed as in Borda count. Then an arbitrary scale is used to associate a weight wp with each of the n positions and the score Ss,i is computed as Ss,i = wp · ps,i (default to 0 in case of time limit attainment or failure) and Ss = i Ss,i , as in Borda count. To
112
L. Pulina / Empirical Evaluation of Scoring Methods
compute wp in our experiments we use a geometric progression with a common ratio r = 2 and a scale factor a = 1, i.e., wp = arn−p with 1 ≤ p ≤ n. Sum of victories The sum of victories is a non-Condorcet method based on ideas from [8] which works as follows. A n × n square matrix M is computed, where the entries Ms,t with s = t account for the number of times that solver s is faster than t on some instance, while Ms,t = 0 if s = t. No points are awarded in case of ties, time limit attainment and failure. The score Ss according to which a solver s is ranked is computed as Ss = t Ms,t . 3.2. YASM: Yet Another Scoring Method
While the scoring methods used in CASCs and QBF evaluations are straightforward, they do not take into account some aspects that are indeed considered by the purse-based method used in the last SAT competition. On the other hand, the purse-based method requires some oracle to assign purses to the problem instances, so the results can be influenced heavily by the oracle. YASM is a first attempt to combine the two approaches: a rich method like the purse-based one, but using the data obtained from the runs only. As such, YASM requires a preliminary classification whereby a hardness degree Hi is assigned to each problem instance i using the same equation as in CASC [1]: Si Hi = 1 − (3) St where Si is the number of solvers that solved i, and St is the total number of participants to the contest. Considering equation (3), we notice that 0 ≤ Hi ≤ 1, where Hi = 0 means that i is relatively easy, while Hi = 1 means that i is relatively hard. We can then compute the score SBs,i of a solver s on a given instance i: L − Ts,i SBs,i = k · Hi · (4) L − Mi where k is a constant (we fix k = 100 to normalize Hi in the range [0; 100]), L is the time limit, Ts,i is the CPU time used up by s to solve i (Ts,i ≤ L), and Mi = mins {Ts,i }. Notice that SBs,i = 0 whenever RESULT is TIME and we force SBs,i = 0 also when RESULT is FAIL . Mi is the time spent on the instance i by the SOTA solver defined in [9] to be the ideal solver that always fares the best time among all the participants. The score of a solver SBs is just the sum of the scores obtained on the instances, i.e., SBs = i SBs,i . The score SBs introduced so far is just a combined speed/solution bonus, and it does not take into account explicitly the number of times that RESULT is either TIME or FAIL. To complete the YASM method, we extend the basic scoring mechanism of equations (3) and (4) by awarding bonuses to solvers in such a way that the amount of bonuses received is inversely proportional to the number of times that the solvers reach the time limit or fail. Let Γ be the set of instances used for the contest and Γl,s (resp. Γf,s ) be the set of instances on which the solver s reaches the time limit (resp. fails). We compute the time limit bonus LBs of a solver s as: |Γ| − |Γl,s | + kl · Cl,s |Γ| and the fail bonus F Bs as: LBs = kl ·
(5)
113
L. Pulina / Empirical Evaluation of Scoring Methods
CASC QBF SAT YASM Borda r.v. s.v.
CASC –
QBF 1 –
SAT 0.71 0.71 –
YASM 0.86 0.86 0.86 –
Borda 0.86 0.86 0.71 0.71 –
r.v. 0.71 0.71 0.71 0.71 0.86 –
s.v. 0.86 0.86 0.71 0.71 1 0.86 –
Table 1. Homogeneity between scoring methods.
|Γ| − |Γf,s | + kf · Cf,s (6) |Γ| where kl and kf are constant parameters (we set kl = kf = 1/2 ·maxs {Ss }); Ct and Cf are the time limit coefficient and the failure coefficient respectively. Cx,s with x ∈ {f, l} is given by: ⎧ Hi ⎪ ⎨ i∈Γx,s if |Γx,s | > 0 |Γx,s | Cx = (7) ⎪ ⎩ 1 otherwise Equations (5) and (6) convey the intuition that a good solver is one that reaches the time limit or fails in a small number of cases (first addendum), and when it does, this happens mainly on hard instances (second addendum). Notice that LBs (resp.F Bs ) has the highest value when |Γl,s | = 0 (resp. |Γf,s | = 0), i.e., there are no instances on which s reaches the time limit (resp. fails). In this case, considering our choice of kl and kf detailed above, LBs = F Bs = maxs {Ss }. The total score Ss of a solver according to YASM is thus computed with a weighted sum of three elements: F Bs = kf ·
Ss = α · SBs + β · LBs + γ · F Bs
(8)
The parameters α, β and γ can be tuned to improve the quality of the scoring method, or to take into account specific characteristics of the solver, e.g., in a competition of incomplete solvers it may be reasonable to set γ = 0. In the analyses hereafter presented we set α = 0.5, β = 0.25 and γ = 0.25. 4. Comparative measures Homogeneity The rationale behind this measure is to verify that, on a given test set, the scoring methods considered (i) do not produce exactly the same solver rankings, but, at the same time, (ii) do not yield antithetic solver rankings. If (i) was the case, evaluating different methods would not make a lot of sense, since any of them would produce the same results. If (ii) was the case, since we have no clue about the absolute value of the competitors, it would be impossible to decide which method is the right choice. Measuring homogeneity is thus fundamental to ensure that we are considering alternative scoring methods, yet we are performing an apple-to-apple comparison and that the apples are not exactly the same. To measure homogeneity, we consider the Kendall rank correlation coefficient τ [14]. τ is computed between any two rankings and is such that −1 ≤ τ ≤ 1, where
114
L. Pulina / Empirical Evaluation of Scoring Methods
Figure 1. RDT-stability plots.
τ = −1 means perfect disagreement, i.e., one ranking is the opposite of the other, τ = 0 means independence, i.e., the rankings are not comparable, and τ = 1 means perfect agreement, i.e., the two rankings are the same. For 0 ≤ τ ≤ 1, increasing values of τ imply increasing agreement between the rankings. Table 1 shows the values of τ computed for the scoring methods considered, arranged in a symmetric matrix where we omit the elements below the diagonal (r.v. is a shorthand for range voting, while s.v. is a shorthand for sum of victories). Values of τ close to, but not exactly equal to 1 are desirable. Table 1 shows that this is indeed the case for the scoring methods considered using QBFEVAL 2005 data. Only two couples of methods (QBF-CASC and sum of victories-Borda) show perfect agreement, while all the other couples agree to some extent, but still produce different rankings. RDT-stability Stability on a randomized decreasing test set (RDT-stability for short) aims to measure how much a scoring method is sensitive to perturbations that diminish the size of the original test set. We evaluate RDT-stability using several test sets computed by removing instances uniformly at random from the original test set. On the reduced test sets we compute the scores, and then we take the median over the reduced test sets in order to establish a new ranking. If such ranking is the same as the one obtained on the original test set, then we conclude that the scoring method is RDT-stable up to the number of instances discarded from the original test set. More precisely, assume that n solvers are participating to the contest, let Γ be the set of instances used for the contest, and Rq be the ranking produced by a given scoring method q. We consider a set of k test sets {Γm,1 , . . . , Γm,k }, where each Γm,i is obtained by discarding, uniformly at random without repetitions, m instances from Γ. For each Γm,i , we then apply q considering only the entries in RUNS whose INSTANCE values appear in Γm,i . Thus, for each Γm,i , we obtain the set of n scores {S1 |Γm,i , . . . , Sn |Γm,i }. Let S j |Γm = mediani (Sj |Γm,i ). The set {S 1 |Γm , . . . , S n |Γm } is used to produce a ranking Rq,m . If Rq,m is the same as Rq , then we say that q is RDT-stable up to m. In the experiments presented in Section 5 the value of k is always 100. DTL-stability Stability on a decreasing time limit (DTL-stability for short) aims to measure how much a scoring method is sensitive to perturbations that diminish the maximum amount of CPU time granted to the solvers. Let L be initial value of such quantity.
L. Pulina / Empirical Evaluation of Scoring Methods
115
Figure 2. DTL-stability plots.
To compute DTL-stability of a scoring method q, we apply it considering a new time limit L′ such that L − L′ = t and t > 0. If the ranking Rq′ obtained using L′ instead of L is the same as the original ranking Rq , then we conclude that the scoring method is DTL-stable up to t, i.e., the amount of time subtracted from the original time limit. SBT-stability Stability on a solver biased test set (SBT-stability for short) aims to measure how much a scoring method is sensitive to a test set that is biased in favor of a given solver. Let Γ be the original test set, and Γs be the subset of Γ such that the solver s is able to solve exactly the instances in Γs . Let Rq,s be the ranking obtained by applying the scoring method q on Γs . If Rq,s is the same as the original ranking Rq , then the scoring method q is SBT-stable with respect to the solver s. Notice that SBT-stability is an important indicator of the capacity of a scoring method to detect the absolute quality of the participants, even against flaws in the design of the test set. SOTA-relevance This measure aims to understand the relationship between the ranking obtained with a scoring method and the strength of a solver, as witnessed by its contribution to the SOTA solver. As mentioned in Subsection 3.2, the SOTA solver is the ideal solver that always fares the best time among all the participants. Indeed, a participant contributes to the SOTA solver whenever it is the fastest solver on some instance. The count of such events for a given solver is the quantitative measure of the SOTA contribution. We measure the SOTA-relevance using Kendall coefficient to compare the ranking induced by the SOTA contribution with the rankings obtained using each of the scoring methods considered. 5. Experimental results In this section we present the experimental evaluation of the scoring methods presented in the Subsections 3.1 and 3.2, using the measures introduced in Section 4. We start from RDT-stability and the plots in Figure 1. The histograms are arranged in two rows and three columns: the first row shows, from left to right, the plots regarding QBF/CASC, SAT and YASM scoring methods, while the second row shows, again from left to right, the plots regarding Borda count, range voting and sum of victories. Each histogram in Figure 1 reports, on the x-axis the number of problems m discarded from
116
L. Pulina / Empirical Evaluation of Scoring Methods
Figure 3. SBT-stability plots.
the original test set (0, 100, 200 and 400 out of 551) and on the y-axis the score. For each value of the x-axis, eight bars are displayed, corresponding to the scores of the solvers admitted to the second stage of QBFEVAL 2005. In all the plots of Figure 1 the legend is sorted according to the ranking computed by the specific scoring method, and the bars are also displayed accordingly. This makes easier to identify perturbations of the original ranking, i.e., the leftmost group of bars corresponding to m = 0. Considering Figure 1, we can immediately conclude that all the scoring methods considered are RDT-stable up to 400. This means that a random sample of 151 instances is sufficient for all the scoring methods to reach the same conclusions that each one reaches on the much heftier test set of 551 instances used in QBFEVAL 2005. A general consideration that may be extracted from these results is that a substantial number of problem instances is not needed in order to perform a proper evaluation, at least with the scoring methods that we consider. On the other hand, RDT-stability does not allow us to discriminate among the different scoring methods, since all of them perform equally well under this point of view. We continue our analysis by considering DTL-stability. Figure 2 reports six histograms arranged in the same way as Figure 1, except that the x-axis now reports the amount of CPU time seconds used as a time limit when evaluating the scores of the solvers. The leftmost value is L = 900, i.e., the original time limit that produces the ranking according to which the legend and the bars are sorted, and then we consider the values L′ = {700, 500, 300, 100, 50, 10, 1} corresponding to t = {200, 400, 600, 800, 850, 890, 899}. Considering Figure 2 we see that CASC/QBF scoring methods (Figure 2, top-left) are DTL-stable up to t = 400. For L′ = 300, there is a noticeable perturbation, i.e., an exchange of position in the ranking between WALK QSAT and OPEN QBF. The ranking then stabilizes until L′ = 100 and then, as one would expect, it is perturbed more heavily in the rightmost part of the plot. Also the SAT competition scoring method (Figure 2, top-center) is DTL-stable up to t = 400, while perturbations occur for higher values of t. YASM (Figure 2, top-right) DTL-stable up to t = 850. For t = 890 and t = 899 YASM shows a relatively high instability, but, as for all the other scoring methods analyzed so far, the relative position of the best solver and the worst solver does not change. Borda count (Figure 2, bottom-left) is remarkably DTL-stable up to t = 890, and it is the
117
L. Pulina / Empirical Evaluation of Scoring Methods
WALK QSAT Y Q UAFFLE
CASC/QBF 0.43 0.43 0.64 1 0.93 0.71 0.57 0.71
SAT 0.57 0.43 0.86 0.86 0.71 0.57 0.57 0.64
YASM 0.36 0.36 0.76 0.86 0.71 0.57 0.43 0.57
Borda 0.79 0.79 0.71 0.93 0.93 0.86 0.64 0.86
r.v. 0.79 0.86 0.86 0.86 0.86 0.79 0.79 0.86
s.v. 0.71 0.79 0.71 1 0.93 0.86 0.71 0.86
Mean
0.68
0.65
0.58
0.81
0.83
0.82
OPEN QBF QBFBDD
QMR ES QUANTOR SEMPROP SSOLVE
Table 2. Comparing rankings on solver-biased test sets vs. the original test set
best scoring method as far DTL-stability is concerned. Finally, range voting (Figure 2, bottom-center) and sum of victories (Figure 2, bottom-right) are both DTL-stable up to t = 850. In conclusion, while Borda count shows excellent DTL-stability, YASM turns out to be the best among some of the methods used in automated reasoning contests. Overall, we notice that decreasing the time limit substantially, even up to one order of magnitude, is not influencing the stability of the scoring methods considered, except for some minor perturbations for QBF/CASC and SAT methods. Moreover, independently from the scoring method used and the amount of CPU time granted, the best solver is always the same. A general consideration that may be extracted from these results is that increasing the time limit of a competition is useless, unless the increase is substantial, i.e., orders of magnitude. Figure 3 shows the plots with the results of the SBT-stability measure for each scoring method (the layout is the same as Figures 1 and 2). The x-axis reports the name of the solver s used to compute the solver-biased test set Γs and the y-axis reports the score value. For each of the Γs ’s, we report eight bars showing the scoring obtained by the solvers using only the instances in Γs . The order of the bars (and of the legend) corresponds to the ranking obtained with the given scoring method considering the original test set Γ. As we can see from Figure 3 (top-left), CASC/QBF scoring methods are SBT-stable with respect to their alleged winner QUANTOR, while they are not SBT-stable with respect to all the other solvers: for each of the Γs where s = QUANTOR, the original ranking is perturbed and the winner becomes s. The SAT competition scoring method (Figure 3, top-center) and YASM (Figure 3, top-right) are not SBT-stable with respect to any solver, not even with respect to their alleged winner. Borda count (Figure 3, bottom-left) is not SBT-stable with respect to any solver, but the alleged winner ( QUANTOR) is always the winner on the biased test sets. Moreover, the rankings obtained on the test sets biased on QUANTOR and SEMPROP are not far from the ranking obtained on the original test set. Also range voting (Figure 3, bottom-center), is not SBT-stable with respect to any solver, but the solvers ranking first and last do not change over the biased test sets. Finally, sum of victories (Figure 3, bottom-right) is SBT-stable only with respect to its alleged winner QUANTOR. Looking at the results presented above, we can only conclude that all the scoring methods are sensitive to a bias in the test set. However, some methods seem more robust than others, e.g., the winner according to Borda count and range voting seems less biasprone than the winner according to other methods such as SAT or YASM. SBT-stability is an on-off criteria which does not allow us to quantify these subtle differences. In order to do so, in Table 2, for each scoring method we compute the Kendall coefficient between
118
L. Pulina / Empirical Evaluation of Scoring Methods
CASC QBF SAT YASM Borda range voting sum of victories
SOTA ranking 0.57 0.57 0.64 0.57 0.71 0.86 0.71
Table 3. Comparing scoring methods and SOTA contributes.
the ranking obtained on the original test set Γ and each of the rankings obtained on the Γs test sets. In Table 2 each column, but the first, is relative to a scoring method, and each row, but the first and the last, is relative to a specific Γs . The last row reports the mean value of the coefficients for each scoring method. Overall, YASM turns out to be the method more sensitive to a bias in the test set, immediately followed by CASC/QBF and SAT. On the other hand, the methods based on voting theory are all more robust than automated reasoning contests scoring methods. A general consideration that we can extract is that, whatever the choice of the scoring method, it is very important to assemble a test set in such a way to minimize bias, e.g., by extracting random samples from a large instance base. We conclude our analysis with Table 3, showing the relationship between the ranking computed by each scoring method, and the ranking induced by the contribution of each solver to the SOTA solver. Column 2 of Table 3 shows the values of the Kendall coefficient for each scoring method. Notice that the results of this table are very close to those summarized in the last row of Table 2. This seems to imply that robustness and SOTA relevance are somehow related qualities of a scoring method. Under this point of view, the methods based on voting theory seem to have an edge over automated reasoning contests scoring methods. 6. Conclusions Summing up, our analysis allowed us to put forth some considerations that are independent of the specific scoring method used. First, a larger test set is not necessarily a better test set: the results about RDT-stability show that there is no visible difference in rankings even when slashing the original test by 70%. Second, it is not so useful to increase the time limit, unless you are prepared to increase it substantially, e.g., by orders of magnitude: DTL-stability shows that there is no visible difference in rankings unless the initial time limit is reduced substantially. Third, the composition of the evaluation test set may heavily influence the final ranking: the results on SBT-stability tell us that no scoring method considered is immune from bias in the original test set. As for the specific properties of the scoring methods, those based on voting theory seem to have an edge over automated reasoning contests scoring methods, particularly for SBT-stability and SOTA-relevance. Our future directions to improve on the research presented in this paper include: (i) a detailed analysis of the noise on CPU time measurement, meant to establish a sound statistical model that can be used for error estimates inside the various scoring methods and inside the SOTA solver contributes computation; (ii) the analysis of YASM to improve its performances by tuning its parameters, particularly with respect to SBT-stability and SOTA-relevance; (iii) the analysis of the statistical significance of the rankings obtained
L. Pulina / Empirical Evaluation of Scoring Methods
119
in the spirit of [4] and, particularly, the relationship between the various scoring methods and the indications obtained using statistical hypothesis testing. Also interesting, although not directly linked to the results herewith presented, is the investigation on computing efficient certificates for QBFs (un)satisfiability. This would allow us to relax further our working hypothesis and broaden the spectrum of our conclusions, since we could take into account correctness of the solvers and, possibly, the quality of the solutions. Acknowledgements The author wishes to thank Armando Tacchella and Massimo Narizzano for helpful discussions, comments and suggestions about this work. References [1] G. Sutcliffe and C. Suttner. The CADE ATP System Competition. http://www.cs. miami.edu/~tptp/CASC . Visited in April 2006. [2] D. Le Berre and L. Simon. The SAT Competition. http://www.satcompetition. org. Visited in April 2006. [3] M. Narizzano, L. Pulina, and A. Taccchella. QBF solvers competitive evaluation (QBFEVAL). http://www.qbflib.org/qbfeval . Visited in April 2006. [4] D. Long and M. Fox. The 3rd International Planning Competition: Results and Analysis. Artificial Intelligence Research, 20:1–59, 2003. [5] J. N. Hooker. Testing Heuristics: We Have It All Wrong. Journal of Heuristics, 1:33–42, 1996. [6] D. G. Saari. Chaotic Elections! A Mathematician Looks at Voting. American Mathematical Society, 2001. [7] RangeVoting.org. http://math.temple.edu/~wds/crv/ . Visited in April 2006. [8] The Condorcet Method. Reference available on line from http://en.wikipedia. org/wiki/Condorcet_method . Visited in April 2006. [9] M. Narizzano, L. Pulina, and A. Tacchella. The third QBF solvers comparative evaluation. Journal on Satisfiability, Boolean Modeling and Computation, 2:145–164, 2006. Available on-line at http://jsat.ewi.tudelft.nl/. [10] E. Giunchiglia, M. Narizzano, and A. Tacchella. Quantified Boolean Formulas satisfiability library (QBFLIB), 2001. www.qbflib.org . Visited in April 2006. [11] Y. Yu and S. Malik. Verifying the Correctness of Quantified Boolean Formula(QBF) Solvers: Theory and Practice. In ASP-DAC, 2005. [12] M. Benedetti. Evaluating QBFs via Symbolic Skolemization. In Eleventh International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR 2004), volume 3452 of Lecture Notes in Computer Science. Springer Verlag, 2004. [13] A. Van Gelder, D. Le Berre, A. Biere, O. Kullmann, and L. Simon. Purse-Based Scoring for Comparison of Exponential-Time Programs, 2006. Unpublished draft. [14] M. Kendall. Rank Correlation Methods. Charles Griffin & Co. Ltd., 1948. Reference available on line from http://en.wikipedia.org/wiki/Rank_correlation.
120
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Binarization Algorithms for Approximate Updating in Credal Nets Alessandro Antonucci a,1 , Marco Zaffalon a , Jaime S. Ide b and Fabio G. Cozman c a Istituto Dalle Molle di Studi sull’Intelligenza Artificiale Galleria 2 - 6928 Manno (Lugano), Switzerland b Escola de Economia de S˜ao Paulo, Fundac¸ao Getulio Vargas Rua Itapeva, 474 - Sa˜ o Paulo, SP - Brazil c Escola Politecnica, Universidade de S˜ao Paulo Av. Prof. Mello Moraes, 2231 - S˜ao Paulo, SP - Brazil Abstract. Credal networks generalize Bayesian networks relaxing numerical parameters. This considerably expands expressivity, but makes belief updating a hard task even on polytrees. Nevertheless, if all the variables are binary, polytree-shaped credal networks can be efficiently updated by the 2U algorithm. In this paper we present a binarization algorithm, that makes it possible to approximate an updating problem in a credal net by a corresponding problem in a credal net over binary variables. The procedure leads to outer bounds for the original problem. The binarized nets are in general multiply connected, but can be updated by the loopy variant of 2U. The quality of the overall approximation is investigated by promising numerical experiments. Keywords. Belief updating, credal networks, 2U algorithm, loopy belief propagation.
1. Introduction Bayesian networks (Section 2.1) are probabilistic graphical models based on precise assessments for the conditional probability mass functions of the network variables given the values of their parents. As a relaxation of such precise assessments, credal networks (Section 2.2) only require the conditional probability mass functions to belong to convex sets of mass functions, i.e., credal sets. This considerably expands expressivity, but makes also considerably more difficult to update beliefs about a queried variable given evidential information about some other variables: while in the case of Bayesian network, efficient algorithms can update polytree-shaped models [10], in the case of credal networks updating is NP-hard even on polytrees [4]. The only known exception to this situation is 2U [5], an algorithm providing exact posterior beliefs on binary (i.e., such that all the variables are binary) polytree-shaped credal networks in linear time. The topology of the network, which is assumed to be singly connected, and the number of possible states for the variables, which is limited to two for any variable, are therefore 1 Correspondence to: Alessandro Antonucci, IDSIA, Galleria 2 - CH-6928 Manno (Lugano), Switzerland. Tel.: +41 58 666 66 69; Fax +41 58 666 66 61; E-mail: [email protected].
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
121
the main limitations faced by 2U. The limitation about topology is partially overcome: the loopy variant of 2U (L2U) can be employed to update multiply connected credal networks [6] (Section 3). The algorithm typically converges after few iterations, providing an approximate but accurate method to update binary credal nets of arbitrary topology. The goal of this paper is to overcome also the limitation of 2U about the number of possible states. To this extent, a map is defined to transform a generic updating problem on a credal net into a second updating problem on a corresponding binary credal net (Section 4). The transformation can be implemented in efficient time and the posterior probabilities in the binarized network are shown to be an outer approximation of those of the initial problem. The binarized network, which is multiply connected in general, is then updated by L2U. The quality of the approximation is tested by numerical simulations, for which good approximations are obtained (Section 5). Conclusions and outlooks are in Section 6, while some technical parts conclude the paper in Appendix A.
2. Bayesian and Credal Networks In this section we review the basics of Bayesian networks (BNs) and their extension to convex sets of probabilities, i.e., credal networks (CNs). Both the models are based on a collection of random variables, structured as an array X = (X1 , . . . , Xn ), and a directed acyclic graph (DAG) G, whose nodes are associated with the variables of X. In our assumptions the variables in X take values in finite sets. For both the models, we assume the Markov condition to make G to represent probabilistic independence relations between the variables in X: every variable is independent of its non-descendant nonparents conditional on its parents. What makes BNs and CNs different is a different notion of independence and a different characterization of the conditional mass functions for each variable given the values of the parents, which will be detailed next. Regarding notation, for each Xi ∈ X, ΩXi = {xi0 , xi1 , . . . , xi(di −1) } denotes the set of the possible states of Xi , P (Xi ) is a mass function for Xi and P (xi ) the probability that Xi = xi , where xi is a generic element of ΩXi . A similar notation with uppercase subscripts (e.g., XE ) denotes arrays (and sets) of variables in X. Finally, the parents of Xi , according to G, are denoted by Πi , while for each πi ∈ ΩΠi , P (Xi |πi ) is the mass function for Xi conditional on Πi = πi . 2.1. Bayesian Networks In the case of BNs, a conditional mass function P (Xi |πi ) for each Xi ∈ X and πi ∈ ΩΠi should be defined; and the standard notion of probabilistic independence is assumed in the Markov condition. A BN can therefore be regarded as a joint probability mass function over X, that, according the Markov condition, factorizes as follows: P (x) =
n
P (xi |πi ),
(1)
i=1
for all the possible values of x ∈ ΩX , with the values of xi and πi consistent with x. In the following, we represent a BN as a pair G, P (X). Concerning updating, posterior beliefs about a queried variable Xq , given evidence XE = xE , are computed as follows:
122
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
n xM i=1 P (xi |πi ) , P (xq |xE ) = n xM ,xq i=1 P (xi |πi )
(2)
where XM ≡ X \ ({Xq } ∪ XE ), the domains of the arguments of the sums are left implicit and the values of xi and πi are consistent with x = (xq , xM , xE ). The evaluation of Equation (2) is a NP-hard task [1], but in the special case of polytree-shaped BNs, Pearl’s propagation scheme based on propagated local messages allows for efficient update [10]. 2.2. Credal Sets and Credal Networks CNs relax BNs by allowing for imprecise probability statements: in our assumptions, the conditional mass functions of a CN are just required to belong to a finitely generated credal set, i.e., the convex hull of a finite number of mass functions over a variable. Geometrically, a credal set is a polytope. A credal set contains an infinite number of mass functions, but only a finite number of extreme mass functions: those corresponding to the vertices of the polytope, which are, in general, a subset of the generating mass functions. It is possible to show that updating based on a credal set is equivalent to that based only on its vertices [11]. A credal set over X will be denoted as K(X). In order to specify a CN over the variables in X based on G, a collection of conditional credal sets K(Xi |πi ), one for each πi ∈ ΩΠi , should be provided separately for each Xi ∈ X; while, regarding Markov condition, we assume strong independence [2]. A CN associated to these local specifications is said to be with separately specified credal sets. In this paper, we consider only CNs with separately specified credal sets. The specification becomes global considering the strong extension of the CN, i.e., K(X) ≡ CH
n
i=1
P (Xi |Πi ) : P (Xi |πi ) ∈ K(Xi |πi )
∀πi ∈ ΩΠi , , ∀i = 1, . . . , n
(3)
where CH denotes the convex hull of a set of functions. In the following, we represent v a CN as a pair G, P(X), where P(X) = {Pk (X)}nk=1 denotes the set of the vertices of K(X), whose number is assumed to be nv . It is an obvious remark that, for each k = 1, . . . , nv , G, Pk (X) is a BN. For this reason a CN can be regarded as a finite set of BNs. In the case of CNs, updating is intended as the computation of tight bounds of the probabilities of a queried variable, given some evidences, i.e., Equation (2) generalizes as: n i=1 Pk (xi |πi ) , (4) P (xq |xE ) = min xM n k=1,...,nv xM ,xq i=1 Pk (xi |πi )
and similarly with a maximum replacing the minimum for upper probabilities P (xq |xE ). Exact updating in CNs displays high complexity: updating in polytree-shaped CNs is NP-complete, and NPPP -complete in general CNs [4]. The only known exact linear-time algorithm for updating a specific class of CNs is the 2U algorithm, which we review in the following section.
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
123
3. The 2U Algorithm and its Loopy Extension The extension to CNs of Pearl’s algorithm for efficient updating on polytree-shaped BNs faced serious computational problems. To solve Equation (2), Pearl’s propagation scheme computes the joint probabilities P (xq , xE ) for each xq ∈ ΩXq ; the conditional probabilities associated to P (Xq |xE ) are then obtained using the normalization of this mass function. Such approach cannot be easily extended to Equation (4), because P (Xq |xE ) and P (Xq |xE ) are not normalized in general. A remarkable exception to this situation is the case of binary CNs, i.e., models for which all the variables are binary. The reason is that a credal set over a binary variable has at most two vertices and can therefore be identified with an interval. This makes possible an efficient extension of Pearl’s propagation scheme. The result is an exact algorithm for polytree-shaped binary CNs, called 2-Updating (2U), whose computational complexity is linear in the input size. Loosely speaking, 2U computes lower and upper messages for each node according to the same propagation scheme of Pearl’s algorithm but with different combination rules. Any node produces a local computation and the global computation is concluded updating all the nodes in sequence. See [5] for a detailed description of 2U. Loopy propagation is a popular technique that applies Pearl’s propagation to multiply connected BNs [9]: propagation is iterated until probabilities converge or for a fixed number of iterations. In a recent paper [6], Ide and Cozman extend these ideas to belief updating on CNs, by developing a loopy variant of 2U (L2U) that makes 2U usable for multiply connected binary CNs. Initialization of variables and messages follows the same steps used in the 2U algorithm. Then nodes are repeatedly updated following a given sequence. Updates are repeated until convergence of probabilities is observed or until a maximum number of iterations is reached. Concerning computational complexity, L2U is basically an iteration of 2U and its complexity is therefore linear in the number input size and in the number of iterations. Overall, the L2U algorithm is fast and returns good results, with low errors after a small number of iterations [6, Sect. 6]. However, at the present moment, there are no theoretical guarantees about convergence. Briefly, L2U overcomes 2U limitations about topology, at the cost of an approximation; and in the next section we show how to make it bypass also the limitations about the number of possible states.
4. Binarization Algorithms In this section, we define a procedure to map updating problems in CNs into corresponding problems in binary CNs. To this extent, we first show how to represent a random variable as a collection of binary variables (Section 4.1). Secondly, we employ this idea to represent a BN as an equivalent binary BN (Section 4.3) with an appropriate graphical structure (Section 4.2). Finally, we extend this binarization procedure to the case of CNs (Section 4.4).
124
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
4.1. Binarization of Variables Assume di , which is the number of states for Xi , to be an integer power of two, i.e., ΩXi = {xi0 , . . . , xi(di −1) }, with di = 2mi and mi integer. An obvious one-to-one correspondence between the states of Xi and the joint states of an array of mi binary variables (Bi(mi −1) , . . . , Bi1 , Bi0 ) can be established: we assume that the joint state (bi(mi −1) , . . . , bi0 ) ∈ {0, 1}mi is associated to xil ∈ ΩXi , where l is the integer whose mi -bit binary representation is the sequence bi(mi −1) · · · b1 b0 . We refer to this procedure as the binarization of Xi and the binary variable Bij is said to be the j-th order bit of Xi . As an example, the state xi6 of Xi , assuming for Xi eight possible values, i.e., mi = 3, would be represented by the joint state (1, 1, 0) for the three binary variables (Bi2 , Bi1 , Bi0 ). If the number of states of Xi is not an integer power of two, the variable is said to be not binarizable. In this case we can make Xi binarizable simply adding to ΩXi a number of impossible states2 up the the nearest power of two. For example we can make binarizable a variable with six possible values by adding two impossible states. Clearly, once the variables of X have been made binarizable, there is an obvious one-to-one correspondence between the joint states of X and those ˜ = of the array of the binary variables returned by the binarization of X, say X (B1(m1 −1) , . . . , B10 , B2(m2 −1) , . . . , Bn(mn −1) , . . . , Bn0 ). Regarding notation, for each ˜ is assumed to denote the corresponding element of ΩX x ∈ ΩX , x ˜ and vice versa. Similarly, x ˜E denotes the joint state for the bits of the nodes in XE corresponding to xE . 4.2. Graph Binarization Let G be a DAG associated to a set of binarizable variables X. We call the binarization ˜ returned by the of G with respect to X, a second DAG G˜ associated to the variables X binarization of X, obtained with the following prescriptions: (i) two nodes of G˜ corresponding to bits of different variables in X are connected by an arc if and only if there is an arc with the same orientation between the relative variables in X; (ii) an arc connects two nodes of G˜ corresponding to bits of the same variable of X if and only if the order of the bit associated to the node from which the arc departs is lower than the order of the bit associated to the remaining node. ˜ As an example Figure 1 reports a multiply connected DAG G and its binarization G. ˜ note the arcs connecting all the three bits of X0 with all the two of Prescription (i) for G, bits of X2 , while, considering the bits of X0 , the arcs between the bit of order zero and those of order one and two, as well as that between the bit of order one and that of order two, are drawn because of Prescription (ii). 4.3. Bayesian Networks Binarization The notion of binarizability extends to BNs as follows: G, P (X) is binarizable if and only if X is a set of binarizable variables. A non-binarizable BN can be made binarizable by the following procedure: (i) make the variables in X binarizable; (ii) specify zero values for the conditional probabilities of the impossible states, i.e., P (xij |πi ) = 0 for 2 This denomination is justified by the fact that, in the following sections, we will set the probabilities for these states equal to zero.
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
B00
125
B01
X0 B02
X1
B20
X2 B10
B21
X3
B30
B31
Figure 1. A multiply connected DAG (left) and its binarization (right) assuming d0 = 8, d1 = 2 and d2 = d3 = 4.
each j ≥ di , for each πi ∈ Ωπi and for each i = 1, . . . , n; (iii) arbitrarily specify the mass function P (Xi |πi ) for each πi such that at least one of the states of the parents Πi corresponding to πi is an impossible state, for i = 1, . . . , n. Considering Equation (1) and Prescription (ii), it is easy to note that, if the joint state x = (x1 , . . . , xn ) of X is such that at least one of the states xi , with i = 1, . . . , n, is an impossible state, then P (x) = 0, irrespectively of the values of the mass functions specified as in Prescription (iii). Thus, given a non-binarizable BN, the procedure described in this paragraph returns a binarizable BN that preserves the original probabilities. This makes possible to focus on the case of binarizable BNs without loss of generality, as in the following: Definition 1. Let G, P (X) be a binarizable BN. The binarization of G, P (X), is a ˜ P˜ (X) ˜ obtained as follows: (i) G˜ is the binarization of G with respect to binary BN G, ˜ ˜ X (ii) P (X) corresponds to the following specifications of the conditional probabilities ˜ given their parents: 3 for the variables in X ˜i ) ∝ P˜ (bij |bi(j−1) , . . . , bi0 , π
∗ l
P (xil |πi )
i = 1, . . . , n j = 0, . . . , mi − 1 πi ∈ ΩΠi ,
(5)
∗ where the sum is restricted to the states xil ∈ ΩXi such that the first j + 1 bits of the binary representation of l are bi0 , . . . , bij , πi is the joint state of the parents of Xi corresponding to the joint state π ˜i for the bits of the parents of Xi , and the symbol ∝ denotes proportionality. ˜ i ) are In the following, to emphasize the fact that the variables (Bi(j−1) , . . . , Bi0 , Π ˜ the parents of Bij according to G, we denote the joint state (bi(j−1) , . . . , bi0 , π ˜i ) as πBij . As an example of the procedure described in Definition 1, let X0 be a variable with four states associated to a parentless node of a BN. Assuming for the corresponding mass function [P (x00 ), P (x01 ), P (x02 ), P (x03 )] = (.2, .3, .4, .1), we can use Equation (5) to obtain the mass functions associated to the two bits of X0 in the binarized BN. This leads to: P˜ (B00 ) = (.6, .4), P˜ (B01 |B00 = 0) = ( 31 , 23 ), P˜ (B01 |B00 = 1) = ( 43 , 14 ), where the mass function of a binary variable B is denoted as an array [P (B = 0), P (B = 1)]. 3 If the sum on the right-hand side of Equation (5) is zero for both the values of B , the corresponding ij conditional mass function is arbitrary specified.
126
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
A BN and its binarization are basically the same probabilistic model and we can represent any updated belief in the original BN as a corresponding belief in the binarized BN, according to the following: ˜ P˜ (X) ˜ its binarization. Then, Theorem 1. Let G, P (X) be a binarizable BN and G, given a queried variable Xq ∈ X and an evidence XE = xE : P (xq |xE ) = P˜ (bq(mq −1) . . . bq0 |˜ xE ),
(6)
where (bq(mq −1) , . . . , bq0 ) is the joint state of the bits of Xq corresponding to xq . 4.4. Extension to Credal Networks In order to generalize the binarization from BNs to CNs, we first extend the notion of binarizability: a CN G, P(X) is said to be binarizable if and only if X is binarizable. A non-binarizable CN can be made binarizable by the following procedure: (i) make the variables in X binarizable; (ii) specify zero upper (and lower) probabilities for conditional probabilities of the impossible states: P (xij |πi ) = P (xij |πi ) = 0 for each j ≥ di , for each πi ∈ ΩΠi , and for each i = 1, . . . , n; (iii) arbitrarily specify the conditional credal sets K(Xi |πi ) for each πi such that at least one of the states of the parents Πi corresponding to πi is an impossible state, for i = 1, . . . , n. According to Equation (3) and Prescription (i), it is easy to check that, if the joint state x = (x1 , . . . , xn ) of X is such that at least one of the states xi , with i = 1, . . . , n, is an impossible state, then P (x) = 0, irrespectively of the conditional credal sets specified as in the Prescription (iii), and for each P (X) ∈ K(X). Thus, given a non-binarizable CN, the procedure described in this paragraph returns a binarizable CN, that preserves the original probabilities. This makes possible to focus on the case of binarizable CNs without loss of generality, as in the following: Definition 2. Let G, P(X) be a binarizable CN. The binarization of G, P(X) is a ˜ P( ˜ X), ˜ binary CN G, with G˜ binarization of G with respect to X and the following separate specifications of the extreme probabilities:4 P˜ (bij |πBij ) ≡
min
k=1,...,nv
P˜k (bij |πBij ),
(7)
˜ P˜k (X) ˜ is the binarization of G, Pk (X) for each k = 1, . . . , nv . where G, Definition 2 implicitly requires the binarization of all the BNs G, Pk (X) associated to G, P(X), but the right-hand side of Equation (7) is not a minimum over all the BNs ˜ nv . This means that it ˜ X), ˜ ˜ P( ˜ = {P˜k (X)} associated to a G, being in general P˜ (X) k=1 is not possible to represent an updating problem in a CN as a corresponding updating ˜ P( ˜ X) ˜ as an problem in the binarization of the CN, and we should therefore regard G, approximate description of G, P(X). Remarkably, according to Equation (5), the conditional mass functions for the bits of Xi relative to the value π ˜i , can be obtained from the single mass function P (Xi |πi ). 4 Note that in the case of a binary variables a specification of the extreme probabilities as in Equation (7) is equivalent to the explicit specification of the (two) vertices of the conditional credal set K(Bij |πBij ): if B is a binary variable and we specify P (B = 0) = s and P (B = 1) = t, then the credal set K(B) is the convex hull of the mass functions P1 (B) = (s, 1 − s) and P2 (B) = (1 − t, t).
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
127
Therefore, if we use Equation (5) with Pk (X) in place of P (X) for each k = 1, . . . , nv to compute the probabilities P˜k (bij |πBij ) in Equation (7), the only mass function required to do such calculations is Pk (Xi |πi ). Thus, instead of considering all the joint mass functions Pk (X), with k = 1, . . . , nv , we can restrict our attention to the conditional mass functions P (Xi |πi ) associated to the elements of the conditional credal set K(Xi |πi ) and take the minimum, i.e., P˜ (bij |πBij ) =
min
P (Xi |πi )∈K(Xi |πi )
P˜ (bij |πBij ),
(8)
where P˜ (bij |πbij ) is obtained from P (Xi |πi ) using Equation (5) and the minimization on the right-hand side of Equation (8) can be clearly restricted to the vertices of K(Xi |πi ). The procedure is therefore linear in the input size. As an example, let X0 be a variable with four possible states associated to a parentless node of a CN. Assuming the corresponding credal set K(X0 ) to be the convex hull of the mass functions (.2, .3, .4, .1), (.25, .25, .25, .25), and (.4, .2, .3, .1), we can use Equation (5) to compute the mass functions associated to the two bits of X0 for each vertex of K(X0 ) and then consider the minima as in Equation (8), obtaining: P˜ (B00 ) = (.5, .3), P˜ (B01 |B00 = 0) = ( 31 , 37 ), P˜ (B01 |B00 = 1) = ( 12 , 14 ). The equivalence between an updating problem in a BN and in its binarization as stated by Theorem 1 is generalizable in an approximate way to the case of CNs, as stated by the following: ˜ P( ˜ X) ˜ its binarization. Then, Theorem 2. Let G, P(X) be a binarizable CN and G, given a queried variable Xq ∈ X and an evidence XE = xE : xE ), P (xq |xE ) ≥ P˜ (bq(mq −1) , . . . , bq0 |˜
(9)
where (bq(mq −1) , . . . , bq0 ) is the joint state of the bits of Xq corresponding to xq . The inequality in Equation (9) together with its analogous for the upper probabilities provides an outer bound for the posterior interval associated to a generic updating problem in a CN. Such approximation is the posterior interval for the corresponding problem on the binarized CN. Note that L2U cannot update joint states of two or more variables: this means that we can compute the right-hand side of Equation (9) by a direct application of L2U only in the case mq = 1, i.e, if the queried variable Xq is binary. If Xq has more than two possible states, a simple transformation of the binarized CN is necessary to apply L2U. The idea is simply to define an additional binary random variable, which is true if and only if (Bq(mq −1) , . . . , Bq0 ) = (bq(mq −1) , . . . , bq0 ). ˜ and can thereThis variable is a deterministic function of some of the variables in X, ˜ ˜ ˜ ˜ fore be easily embedded in the CN G, P(X). We simply add to G a binary node, say Cbq(mq −1) ,...,bq0 , with no children and whose parents are Bq(mq −1) , . . . , Bq0 , and specify the probabilities for the state 1 (true) of Cbq(mq −1) ,...,bq0 , conditional on the values of its parents Bq(mq −1) , . . . , Bq0 , equal to one only for the joint value of the parents (bq(mq −1) , . . . , bq0 ) and zero otherwise. Then, it is straightforward to check that: ′ xE ), xE ) = P˜ (Cbq(mq −1) ,...,bq0 = 1|˜ P˜ (bq(mq −1) , . . . , bq0 |˜
(10)
128
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
where P ′ denotes the lower probability in the CN with the additional node. Thus, according to Equation (10), if Xq has more than two possible values, we simply add the node Cbq(mq −1) ,...,bq0 and run L2U on the modified CN. Overall, the joint use of the binarization techniques described in this section, with the L2U algorithm represents a general procedure for efficient approximate updating in CNs. Clearly, the lack of a theoretical quantification of the outer approximation provided by the binarization as in Theorem 2, together with the fact that the posterior probabilities computed by L2U can be lower as well as upper approximations, suggests the opportunity of a numerical investigation of the quality of the overall approximation, which is the argument of the next section.
5. Tests and Results We have implemented a binarization algorithm to binarize CNs as in Definition 2 and run experiments for two sets of 50 random CNs based on the topology of the ALARM network and generated using BNGenerator [7]. The binarized networks were updated by an implementation of L2U, choosing the node “VentLung”, which is a binary node, as target variable, and assuming no evidences. The L2U algorithm converges after 3 iterations and the overall computational time is quick: posterior beliefs for the networks were produced in less than one second in a Pentium computer, while the exact calculations used for the comparisons, based on branch-and-bound techniques [3], took a computational time between 10 and 25 seconds for each simulation. Results can be viewed in Figure 2. As a comment, we note a good accuracy of the approximations with a mean square error around 3% and very small deviations. Remarkably the quality of the approximation is nearly the same for both the sets of simulations. Furthermore, we observe that the posterior intervals returned by the approximate method always include the corresponding exact intervals. This seems to suggest that the approximation due to the binarization dominates that due to L2U. It should also be pointed out that the actual difference between the computational time required by the two approaches would dramatically increase for larger networks: the computational complexity of the branch-and-bound method used for exact updating is exponential in the input size, while both our binarization algorithm and L2U (assuming that it converges) take a linear time; of course both the approaches have an exponential increase with an increase in the number of categories for the variables.
6. Conclusions This paper describes an efficient algorithm for approximate updating on credal nets. This task is achieved transforming the credal net in a corresponding credal net over binary variables, and updating such binary credal net by the loopy version of 2U. Remarkably, the procedure can be applied to any credal net, without restrictions related to the network topology or to the number of possible states of the variables. The posterior probability intervals in the binarized network are shown to contain the exact intervals requested by the updating problem (Theorem 2). Our numerical tests show that the quality of the approximation is satisfactory (few percents), remaining an outer
129
1.0 0.8 0.6 0.4
exact binarization+L2U
0.0
0.0
0.2
Probabilities
0.6 0.4
exact binarization+L2U
0.2
Probabilities
0.8
1.0
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
0
10
20
30
40
50
0
Index of the CN
10
20
30
40
50
Index of the CN
(a) Conditional credal sets with 4 vertices
(b) Conditional credal sets with 10 vertices
Figure 2. A comparison between the exact results and approximations returned by the “binarization+L2U” procedure for the upper and lower values of P (VentLung = 1) on two sets of 50 randomly generated CNs based on the ALARM, with a fixed number of vertices for each conditional credal set.
approximation also after the approximate updating by L2U. Thus, considering also the efficiency of the algorithm, we can regard the “binarization+L2U” approach as a viable and accurate approximate method for fast updating on large CNs. As a future research, we intend to explore the possibility of a theoretical characterization of the quality of the approximation associated to the binarization, as well as the identification of particular specifications of the conditional credal sets for which binarization provides high-quality approximations or exact results. Also the possibility of a formal proof of convergence for L2U, based on similar existing results for loopy belief propagation on binary networks [8] will be investigated in a future study.
Acknowledgements The first author was partially supported by the Swiss NSF grant 200020-109295/1. The third author was partially supported by the FAPESP grant 05/57451-8. the fourth author was partially supported by CNPq grant 3000183/98-4. We thank Cassio Polpo de Campos for providing the exact simulations.
A. Proofs Proof of Theorem 1. With some algebra it is easy to check that the inverse of Equation (5) is: P (xil |πi ) =
m i −1
˜i ), P˜ (bij |bi(j−1) , . . . , bi0 , π
(11)
j=0
where (bi(mi −1) , . . . , bi0 ) is the mi -bit binary representation of l. Thus, ∀x ∈ ΩX : P (x) =
n
i=1
P (xi |πi ) =
n m i −1 i=1 j=0
˜i ) = P˜ (˜ x), P˜ (bij |bi(j−1) , . . . , bi0 , π
(12)
130
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
where the first passage is because of Equation (1), the second because of Equation (11) and the third because of the Markov condition for the binarized BN. Thus: P (xq |xE ) =
P˜ (bq(mq −1) , . . . , bq0 , x ˜E ) P (xq , xE ) = , P (xE ) P˜ (˜ xE )
(13)
that proves the thesis as in Equation (6). v Lemma 1. Let {G, Pk (X)}nk=1 be the BNs associated to a CN G, P(X). Let also ˜ ˜ X) ˜ be the binarization of G, P(X). Then, the BN G, ˜ P( ˜ P˜k (X), which is the G, binarization of G, Pk (X), specifies a joint mass function that belongs to the strong ˜ P( ˜ X), ˜ i.e., extension of G,
˜ ∈ K( ˜ X), ˜ P˜k (X)
(14)
˜ X) ˜ P( ˜ X). ˜ ˜ denoting the strong extension of G, for each k = 1, . . . , nv , with K( ˜ P( ˜ X) ˜ is: Proof. According to Equation (3), the strong extension of G, ˜ X) ˜ ≡ CH K(
˜ ij |πBij ) P˜ (Bij |ΠBij ) : P˜ (Bij |πBij ) ∈ K(B
˜ Bij∈ X
∀πBij ∈ ΩΠBij . (15) ˜ ∀Bij ∈ X
˜ P˜k (X), ˜ we have: On the other side, considering the Markov condition for G, P˜k (X) = P˜k (Bij |ΠBij ).
(16)
˜ Bij ∈X
˜ the conditional mass function P˜k (Bij |πBij ) But, for each πBij ∈ ΩΠij and Bij ∈ X, ˜ belongs to the conditional credal set K(Bij |πBij ) because of Equation (7). Thus, the joint mass function in Equation (16) belongs to the set in Equation (15), and that holds for each k = 1, . . . , nv . Lemma 1 basically states an inclusion relation between the strong extension of ˜ P( ˜ X) ˜ and the set of joint mass functions {Pk (X)}nv , which, according to the G, k=1 equivalence in Equation (12), is just an equivalent representation of G, P(X). This will be used to establish a relation between inferences in a CN and in its binarization, as detailed in the following: Proof of Theorem 2. We have: P (xq |xE ) =
min
k=1,...,nv
Pk (xq |xE ) =
min
k=1,...,nv
xE ), P˜k (bq(mq −1) , . . . , bq0 |˜
(17)
where the first passage is because of Equations (4) and (2), and the second because of Theorem 1 referred to the BN G, Pk (X), for each k = 1, . . . , nv . On the other side, the lower posterior probability probability on the right-hand side of Equation (9) can be equivalently expressed as: P (bq(mq −1) , . . . , bq0 |˜ xE ) =
min
˜ K( ˜ X) ˜ P˜ (X)∈
xE ), P˜ (bq(mq −1) , . . . , bq0 |˜
(18)
A. Antonucci et al. / Binarization Algorithms for Approximate Updating in Credal Nets
131
˜ X) ˜ P( ˜ X). ˜ Considering the minima on the right˜ is the strong extension of G, where K( hand sides of Equations (17) and (18), we observe that they refer to the same function and the first minimum is over a domain that is included in that of the second because of Lemma 1. Thus, the lower probability in Equation (17) cannot be less than that on Equation (18), that is the thesis.
References [1] G. F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42:393–405, 1990. [2] F. G. Cozman. Graphical models for imprecise probabilities. Int. J. Approx. Reasoning, 39(2-3):167–184, 2005. [3] C. P. de Campos and F. G. Cozman. Inference in credal networks using multilinear programming. In Proceedings of the Second Starting AI Researcher Symposium, pages 50–61, Amsterdam, 2004. IOS Press. [4] C. P. de Campos and F. G. Cozman. The inferential complexity of Bayesian and credal networks. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 1313–1318, Edinburgh, 2005. [5] E. Fagiuoli and M. Zaffalon. 2U: an exact interval propagation algorithm for polytrees with binary variables. Artificial Intelligence, 106(1):77–107, 1998. [6] J. S. Ide and F. G. Cozman. IPE and L2U: Approximate algorithms for credal networks. In Proceedings of the Second Starting AI Researcher Symposium, pages 118–127, Amsterdam, 2004. IOS Press. [7] J. S. Ide, F. G. Cozman, and F. T. Ramos. Generating random Bayesian networks with constraints on induced width. In IOS Press, editor, Proceedings of the 16th European Conference on Artificial Intelligence, pages 323–327, Amsterdam, 2004. [8] J. M. Mooij and H. J. Kappen. Validity estimates for loopy belief propagation on binary real-world networks. In Lawrence K. Saul, Yair Weiss, and L´eon Bottou, editors, Advances in Neural Information Processing Systems 17, pages 945–952. MIT Press, Cambridge, MA, 2005. [9] K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI-99), pages 467–475, San Francisco, CA, 1999. Morgan Kaufmann. [10] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, 1988. [11] P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, New York, 1991.
132
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
On Generalizing the AGM Postulates Giorgos FLOURIS, Dimitris PLEXOUSAKIS and Grigoris ANTONIOU Institute of Computer Science, FO.R.T.H. P.O. Box 1385, GR 71110, Heraklion, Greece {fgeo, dp, antoniou}@ics.forth.gr
Abstract. The AGM theory is the dominating paradigm in the field of belief change but makes some non-elementary assumptions which disallow its application to certain logics. In this paper, we recast the theory by dropping most such assumptions; we determine necessary and sufficient conditions for a logic to support operators which are compatible with our generalized version of the theory and show that our approach is applicable to a broader class of logics than the one considered by AGM. Moreover, we present a new representation theorem for operators satisfying the AGM postulates and investigate why the AGM postulates are incompatible with the foundational model. Finally, we propose a weakening of the recovery postulate which has several intuitively appealing properties. Keywords. AGM Postulates, Contraction, Belief Revision, Belief Change
Introduction The problem of belief change deals with the modification of an agent’s beliefs in the face of new, possibly contradictory, information [1]. Being able to dynamically change the stored data is very important in any knowledge representation system: mistakes may have occurred during the input; some new information may have become available; or the world represented by the Knowledge Base (KB) may have changed. In all such cases, the agent’s beliefs should change to reflect this fact. The most influential work on the field of belief change was developed by Alchourron, Gärdenfors and Makinson (AGM for short) in [2]. Instead of trying to find a specific algorithm (operator) for dealing with the problem of belief change, AGM chose to investigate the properties that such an operator should have in order to be intuitively appealing. The result was a set of postulates (the AGM postulates) that any belief change operator should satisfy. This work had a major influence on most subsequent works on belief change, being the dominating paradigm in the area ever since and leading to a series of works, by several authors, studying the postulates’ effects, providing equivalent formulations, criticizing, commenting or praising the theory etc; a list of relevant works, which is far from being exhaustive, includes [1], [3], [4], [5], [6], [7], [8], [9]. The intuition upon which the AGM postulates were based is independent of the peculiarities of the underlying logic. This fact, along with the (almost) universal acceptance of the AGM postulates in the belief change community as the defining paradigm for belief change operators, motivate us to apply them in any type of logic. Unfortunately, this is not possible because the AGM theory itself was based on certain assumptions regarding the underlying logic, such as the admittance of standard boolean
G. Flouris et al. / On Generalizing the AGM Postulates
133
operators (, , , etc), classical tautological implication, compactness and others. These assumptions are satisfied by many interesting logics, such as Propositional Logic (PL), First-Order Logic, Higher-Order Logics, modal logics etc [10]; in the following, we will refer to such logics using the term classical logics. On the other hand, there are several interesting logics, which do not satisfy these assumptions, such as various logics used in Logic Programming ([11]), equational logic ([12]), Description Logics ([13]) and others (see [14], [15]). Thus, the AGM postulates (as well as most of the belief change literature) are not applicable to such (nonclassical) logics. As a result, we now have a very good understanding of the change process in classical logics, but little can be said for this process in non-classical ones, leading to problems in related fields (such as ontology evolution [16], [15]). This paper is an initial attempt to overcome this problem by generalizing the most influential belief change theory, the AGM theory. We drop most AGM assumptions and develop a version of the theory that is applicable to many interesting logics. We determine the necessary and sufficient conditions required for an operator that satisfies the AGM postulates to exist in a given logic and develop a new representation theorem for such operators. The results of our research can be further applied to shed light on our inability to develop contraction operators for belief bases that satisfy the AGM postulates [4], [7]. Finally, we present a possible weakening of the theory with several appealing properties. Only informal sketches of proofs will be presented; detailed proofs and further results can be found in the full version of this paper ([14], [15]).
1. Tarski’s Framework When dealing with logics we often use Tarski’s framework (see [4]), which postulates the existence of a set of expressions (the language, denoted by L) and an implication operator (function) that allows us to conclude facts from other facts (the consequence operator, denoted by Cn). Under this framework, a logic is actually a pair and any subset XL is a belief of this logic. The implications Cn(X) of a belief X are determined by the consequence operator, which is constrained to satisfy three rationality axioms, namely iteration (Cn(Cn(X))=Cn(X)), inclusion (XCn(X)) and monotony (XY implies Cn(X)Cn(Y)). For X, YL, we will say that X implies Y (denoted by X՛Y) iff YCn(X). This general framework engulfs all monotonic logics and will be our only assumption for the underlying logic throughout this paper, unless otherwise mentioned. Thus, our framework is applicable to all monotonic logics. On the other hand, the AGM theory additionally assumes that the language of the underlying logic is closed under the standard propositional operators (, , , etc) and that the Cn operator of the logic includes classical tautological implication, is compact and satisfies the rule of Introduction of Disjunctions in the Premises.
2. The AGM Theory and the Generalized Basic AGM Postulates for Contraction In their attempt to formalize the field of belief change, AGM defined three different types of belief change (operators), namely expansion, revision and contraction [2]. Expansion is the trivial addition of a sentence to a KB, without taking any special provisions for maintaining consistency; revision is similar, with the important difference that the result should be a consistent set of beliefs; contraction is required when one wishes to consistently remove a sentence from their beliefs instead of adding
134
G. Flouris et al. / On Generalizing the AGM Postulates Table 1. Original and Generalized AGM Postulates AGM Postulate
Original Version
Generalized Version
(K1) Closure
Kx is a theory
KX=Cn(KX)
(K2) Inclusion
KxK
KXCn(K)
(K3) Vacuity
If xCn(K), then Kx=K
If XԽCn(K), then KX=Cn(K)
(K4) Success
If xCn(), then xCn(Kx)
If XԽCn(), then XԽCn(KX)
(K5) Preservation
If Cn({x})=Cn({y}), then Kx=Ky
If Cn(X)=Cn(Y), then KX=KY
(K6) Recovery
KCn((Kx){x})
KCn((KX)X)
one. AGM introduced a set of postulates for revision and contraction that formally describe the properties that such an operator should satisfy. This paper focuses on contraction, which is the most important operation for theoretical purposes [1], but some ideas on expansion and revision will be presented as well. Under the AGM approach, a KB is a set of propositions K closed under logical consequence (K=Cn(K)), also called a theory. Any expression xL can be contracted from the KB. The operation of contraction can thus be formalized as a function mapping the pair (K, x) to a new KB Kc (denoted by Kc=Kx). Similarly, expansion is denoted by K+x and revision is denoted by KӐx. Of course, not all functions can be used for contraction. Firstly, the new KB (Kx) should be a theory itself (see (K1) in table 1). As already stated, contraction is an operation that is used to remove knowledge from a KB; thus the result should not contain any new, previously unknown, information (K2). Moreover, contraction is supposed to return a new KB such that the expression x is no longer believed; hence x should not be among the consequences of Kx (K4). Finally, the result should be syntax-independent (K5) and should remove as little information from the KB as possible (K3), (K6), in accordance with the Principle of Minimal Change. These intuitions were formalized in a set of six postulates, the basic AGM postulates for contraction, which can be found in table 1 (see also [2]). Reformulating these postulates to be applicable in our more general context is relatively easy; the generalized version of each postulate can be found in the rightmost column of table 1. Notice that no assumptions (apart from the logic being expressible as an pair) are made in the generalized set. In addition, the restriction on the contracted belief being a single expression was dropped; in our context, the contracted belief can be any set of propositions, which is read conjunctively. This is a crucial generalization, as, in our context (unlike the classical one), the conjunction of a set of propositions cannot be always expressed using a single one (see also [15]). The fact that X is read conjunctively underlies the formulation of the vacuity and success postulates and implies that we are dealing with choice contraction in the terminology of [17]. Moreover, the KB was allowed to be any set (not necessarily a theory). This is a purely technical relaxation done for aesthetic reasons related to the symmetry of the contraction operands. The full logical closure of K is considered when determining the contraction result, so this relaxation should not be confused with foundational approaches using belief bases ([18], [19]). It is trivial to verify that, in the presence of the AGM assumptions, each of the original AGM postulates is equivalent to its generalized counterpart. In the following, any reference to the AGM postulates will refer to their generalized version, unless stated otherwise.
G. Flouris et al. / On Generalizing the AGM Postulates
135
3. Decomposability and AGM-compliance One of the fundamental results related to the original AGM postulates was that all classical logics admit several contraction operators that satisfy them [2]. Unfortunately, it turns out that this is not true for non-classical logics (and the generalized postulates). Take for example L={x,y}, Cn()=, Cn({x})={x}, Cn({y})=Cn({x,y})={x,y}. It can be easily shown that is a logic. Notice that, in this logic, all theories (except Cn()) contain x. So, if we attempt to contract {x} from {x,y} the result can be no other than , or else the postulate of success would be violated. But then, the recovery postulate is not satisfied, as can be easily verified. Thus, no contraction operator can handle the contraction {x,y}{x} in a way that satisfies all six AGM postulates. As the following theorem shows, it is the recovery postulate that causes this problem: Theorem 1. In every logic there exists a contraction operator satisfying (K1)-(K5). Once the recovery postulate is added to our list of desirable postulates, theorem 1 fails, as shown by our counter-example. One of the major questions addressed in this paper is the identification of the distinctive property that does not allow the definition of a contraction operator that satisfies the AGM postulates in some logics. The answer to this question can be found by examining the situation a bit closer. Consider two beliefs K, XL and the contraction operation Z=KX, which, supposedly, satisfies the AGM postulates. Let’s first consider some trivial, limit cases: x If Cn(X)=Cn(), then the recovery and closure postulates leave us little choice but to set Z=Cn(K). x If Cn(X)ԽCn(K), then the vacuity postulate forces us to set Z=Cn(K). These remarks guarantee that a proper Z can be found in both these cases, regardless of the logic at hand. The only case left unconsidered is the principal one, i.e., when Cn()ԷCn(X)Cn(K). In this case, the postulates imply two main restrictions: 1. The result (Z) should be implied by K (inclusion postulate); however, Z cannot imply K, due to (K4) and our hypothesis; thus: Cn(Z)ԷCn(K). 2. The union of X and Z should imply K (recovery postulate); however, our hypothesis and (K2) imply that both X and Z are implied by K, so their union is implied by K as well. Thus: Cn(XZ)=Cn(K). It turns out that the existence of a set Z satisfying these two restrictions for each (K,X) pair with the aforementioned properties is a necessary and sufficient condition for the existence of an operator satisfying the generalized (basic) AGM postulates for contraction in a given logic. To show this fact formally, we will need some definitions: Definition 1. Consider a logic and two beliefs K, XL. We define the set of complement beliefs of X with respect to K, denoted by X(K) as follows: x If Cn()ԷCn(X)Cn(K), then X(K)={ZL | Cn(Z)ԷCn(K) and Cn(XZ)=Cn(K)} x In any other case, X(K)={ZL | Cn(Z)=Cn(K)} Definition 2. Consider a logic and a set KL: x The logic is called AGM-compliant with respect to the basic postulates for contraction (or simply AGM-compliant) iff there exists an operator that satisfies the basic AGM postulates for contraction (K1)-(K6) x The set K is called decomposable iff X(K)z for all XL x The logic is called decomposable iff all KL are decomposable As already mentioned, the two notions of definition 2 actually coincide: Theorem 2. A logic is AGM-compliant iff it is decomposable.
136
G. Flouris et al. / On Generalizing the AGM Postulates
Theorem 2 provides a necessary and sufficient condition to determine whether any given logic admits a contraction operator that satisfies the AGM postulates. The following result shows that the set of complement beliefs (X(K)) of a pair of beliefs K, X contains exactly the acceptable (per AGM) results of KX, modulo Cn: Theorem 3. Consider a decomposable logic and an operator ‘’. Then ‘’ satisfies the basic AGM postulates for contraction iff for all K, XL, there is some ZX(K) such that Cn(Z)=KX. Theorem 3 provides yet another representation theorem for the AGM postulates for contraction, equivalent to partial meet functions [2], safe contraction [3], systems of spheres [6], epistemic entrenchment [8] etc. However, our representation theorem is unique in the sense that it is applicable to all monotonic logics, unlike the other results whose formulation assumes that the underlying logic satisfies the AGM assumptions. As already mentioned, all classical logics admit several operators satisfying the original AGM postulates. This should also be true for the generalized postulates, so, we would expect all classical logics to be AGM-compliant. This is indeed true: Theorem 4. All logics satisfying the AGM assumptions are AGM-compliant.
4. More Characterizations of AGM-compliance: Cuts and Max-cuts Apart from theorem 2, an alternative (and equivalent) method for determining the AGM-compliance of a given logic can be devised, based on the notion of a cut. A cut is a family of subsets of Cn(K) with the property that every other subset of Cn(K) either implies or is implied by one of them. Such a family actually “cuts” the beliefs implied by K in two (thus the name). Cuts are related to AGM-compliance, so they have a special place in our theory. Formally, we define a cut as follows: Definition 3. Consider a logic , a set KL and a family х of beliefs such that: x For all Yх, Cn(Y)ԷCn(K) x For all ZL such that Cn(Z)ԷCn(K), there is a Yх such that Cn(Y)Cn(Z) or Cn(Z)Cn(Y) Then х is called a cut of K. Now, consider a set KL, a cut х of K and a non-tautological set X that is implied by all the sets in the cut х. Take Z=KX. Since х is a cut and Cn(Z)ԷCn(K) by (K2) and (K4), Z will either imply or be implied by a set in х. If it implies a set in х, then it also implies X, so the operation Z=KX does not satisfy success. If it is implied by a set in х (say Yх), then both X and Z are implied by Y, so it is necessarily the case that Cn(XZ)Cn(Y)ԷCn(K), so recovery is not satisfied. Once we deal with some technicalities and limit cases, it turns out that this idea forms the basis for another equivalent characterization of AGM-compliant logics: Theorem 5. Consider a logic and a belief KL. Then: x K is decomposable iff Cn(YхCn(Y))=Cn() for every cut х of K x The logic is AGM-compliant iff for every KL and every cut х of K it holds that Cn(YхCn(Y))=Cn(). Cuts enjoy an interesting monotonic behavior; informally, the “larger” the beliefs that are contained in a cut the more likely it is to have a non-tautological intersection. This motivates us to consider the family of all maximal (proper) subsets of Cn(K), which is a special type of a cut, containing the strongest (largest) beliefs possible:
G. Flouris et al. / On Generalizing the AGM Postulates
137
Definition 4. Consider a logic , a set KL and a family х of beliefs such that: x For all Yх, Cn(Y)ԷCn(K) x For all ZL such that Cn(Z)ԷCn(K), there is a Yх such that Cn(Z)Cn(Y) x For all Yх, Y=Cn(Y) x For all Y, Ycх, Cn(Y)Cn(Yc) implies Y=Yc Then х is called a max-cut of K. It is easy to verify that a max-cut is a cut that contains exactly the maximal proper subsets of K. There is no guarantee that every belief will have a max-cut, but, if it does, then it is unique. The importance of max-cuts stems from the fact that, if a max-cut х of K exists and Cn(YхCn(Y))=Cn(), then the same property will hold for all cuts of K as well (due to monotonicity). This fact allows us to determine the decomposability of a given belief with a single check, formally described in the following theorem: Theorem 6. Consider a logic and a belief KL which has a max-cut х. Then K is decomposable iff Cn(YхCn(Y))=Cn().
5. A Foundational AGM Theory Knowledge in a KB can be represented using either belief bases or belief sets [18]. Belief sets are closed under logical consequence, while belief bases can be arbitrary sets of expressions. Belief sets contain explicitly all the knowledge deducible from the KB (so they are often large and infinite sets), while belief bases are (usually) small and finite, because they contain only the beliefs that were explicitly acquired through an observation, rule, measurement, experiment etc [9]; the rest of our beliefs are simply deducible from such facts via the inference mechanism (Cn) of the underlying logic. The selected representation affects the belief change algorithms considered. In the belief set approach, all the information is stored explicitly, so there is no distinction between implicit and explicit knowledge. On the other hand, when changes are performed upon a belief base, we temporarily have to ignore the logical consequences of the base; in effect, there is a clear distinction between knowledge stored explicitly in the base (which can be changed directly) and knowledge stored implicitly (which cannot be changed, but is indirectly affected by the changes in the explicit knowledge). Thus, in the belief base approach, our options for the change are limited to the base itself, so the changes are (in general) more coarse-grained. On the other hand, belief sets are (usually) infinite structures, so, in practice, only a small subset of a belief set is explicitly stored; however, in contrast to the belief base approach, the implicit part of our knowledge is assumed to be of equal value to the explicitly stored one. Some thoughts on the connection between the two approaches appear in [4], [18]. A related philosophical consideration studies how our knowledge should be represented. There are two viewpoints here: foundational theories and coherence theories [19]. Under the foundational viewpoint, each piece of our knowledge serves as a justification for other beliefs; this viewpoint is closer to the belief base approach. On the other hand, under the coherence model, no justification is required for our beliefs; a belief is justified by how well it fits with the rest of our beliefs, in forming a coherent and mutually supporting set; thus, every piece of knowledge helps directly or indirectly to support any other. Coherence theories match with the use of belief sets as a proper knowledge representation format. The AGM theory is based on the coherence model (using belief sets); however, some authors have argued that the foundational model is more appropriate for belief change [4], [9]. It can be shown that the foundational model is incompatible with the
138
G. Flouris et al. / On Generalizing the AGM Postulates Table 2. Generalized AGM Postulates (Coherence and Foundational Model)
AGM Postulate (Foundational Model)
Generalized Version (Coherence Model)
Generalized Version (Foundational Model)
(B1) Base Closure
KX=Cn(KX)
KXL
(B2) Base Inclusion
KXCn(K)
KXK
(B3) Base Vacuity
If XԽCn(K), then KX=Cn(K)
If XԽCn(K), then KX=K
(B4) Base Success
If XԽCn(), then XԽCn(KX)
If XԽCn(), then XԽCn(KX)
(B5) Base Preservation
If Cn(X)=Cn(Y), then KX=KY
If Cn(X)=Cn(Y), then KX=KY
(B6) Base Recovery
KCn((KX)X)
KCn((KX)X)
AGM postulates in classical logics, because there are no operators satisfying the foundational version of the AGM postulates in such logics [4]. However, this result leaves open the question of whether there are any (non-classical) logics which are compatible with a generalized version of the foundational AGM postulates. To address this question, we took a path similar to the one taken for the standard (coherence) case, by introducing a generalized foundational version of the AGM postulates (see table 2). This set of postulates can be derived either by generalizing the foundational postulates that appeared in [4], or by reformulating the postulates that appeared in table 1 so as to be applicable for the foundational case (the latter method is followed in table 2). Note that, in order to avoid confusion with the coherence model, a different notation was used in table 2 for the numbering of the foundational postulates. Notice that the main difference between the two sets of postulates is the fact that in belief set contraction the result should be a subset of the logical closure of the KB, while in belief base contraction the result should be a subset of the KB itself; moreover, the result of a contraction operation need not be a theory in the foundational case. These seemingly small differences have some severe effects on the operators considered. Take for example the operation {xy}{x} in PL. Due to the base inclusion and base success postulates it should be the case that {xy}{x}=; but this violates the base recovery postulate, as can be easily verified. Thus, there can be no AGM-compliant base contraction operator that can handle this case. This simple example can be easily adapted to show that no classical logic can admit an operation satisfying the foundational version of the AGM postulates [4]. But what is the property that allows a logic to admit an AGM-compliant contraction operator for belief bases? The answer is simpler than one would expect. As already mentioned, the only additional requirement that is posed by our transition to the foundational model is the fact that the result should now be a subset of the belief base (instead of its logical closure, i.e., the belief set). Thus, the ideas that were used in the coherence case could be applied here as well, with the additional requirement that the result is a subset of the base. More formally: Definition 5. Consider a logic and a set KL: x The logic is called AGM-compliant with respect to the basic postulates of belief base contraction (or simply base-AGM-compliant) iff there exists an operator that satisfies the basic foundational AGM postulates x The set K is called base-decomposable iff X(K)P(K)z for all XL x The logic is called base-decomposable iff all KL are basedecomposable
G. Flouris et al. / On Generalizing the AGM Postulates
139
In definition 5, P(K) stands for the powerset of K, i.e., the family of all subsets of K (also denoted by 2K). Using this definition, we can show the counterparts of theorems 1, 2 and 3 for the foundational case: Theorem 7. In every logic there exists a contraction operator satisfying (B1)-(B5). Theorem 8. A logic is base-AGM-compliant iff it is base-decomposable. Theorem 9. Consider a base-decomposable logic and an operator ‘’. Then ‘’ satisfies the basic foundational AGM postulates for contraction iff for all K, XL, for which Cn(X)ԽCn(K), it holds that KX=K and for all K, XL for which Cn(X)Cn(K) it holds that KXX(K)P(K). As is obvious by theorems 7, 8 and 9, there is an exceptional symmetry between the foundational and coherence versions of both the postulates and the concepts and results related to the postulates. This symmetry is also extended to cuts: Definition 6. Consider a logic , a set KL and a family х of beliefs such that: x For all Yх, Cn(Y)ԷCn(K) x For all ZL such that Cn(Z)ԷCn(K) and ZԷK, there is a Yх such that Cn(Y)Cn(Z) or Cn(Z)Cn(Y) Then х is called a base cut of K. Theorem 10. Consider a logic and a belief KL. Then: x K is base-decomposable iff Cn(YхCn(Y))=Cn() for every base cut х of K x The logic is base-AGM-compliant iff for every KL and every base cut х of K it holds that Cn(YхCn(Y))=Cn() At this point it would be anticipated (and desirable) that the counterpart of a maxcut for the foundational case could be defined (i.e., base-max-cuts). Unfortunately, the different nature of the foundational case makes the definition of base-max-cuts problematic; for this reason, base-max-cuts are not studied in this paper. The theorems and definitions of this section show that base-AGM-compliance is a strictly stronger notion than AGM-compliance: Theorem 11. If a logic is base-AGM-compliant, then it is AGM-compliant.
6. Discussion The original AGM theory focused on a certain class of logics and determined the properties that a belief change operator should satisfy in order to behave rationally. Our approach could be considered as taking the opposite route: starting from the notion of rationality, as defined by the AGM postulates, we tried to determine the logics in which a rational (per AGM) contraction operator can be defined. Our results determine the necessary and sufficient conditions for a logic to admit a rational contraction operator under both the coherence and the foundational model. The relation between the various classes of logics studied in this paper is shown in figure 1. In section 3, we showed the intuitively expected result that all classical logics are AGM-compliant. A related question is whether there are any non-classical logics which are AGM-compliant (or base-AGM-compliant). The answer is positive; take L={xi | iI} for any index set I with at least two indices and set Cn()=, Cn({xi})={xi} and
140
G. Flouris et al. / On Generalizing the AGM Postulates
Logics (Tarski’s Model) AGM-compliant logics Classical Logics Base-AGM-compliant logics
Figure 1. Topology of AGM-compliance
Cn(X)=L for all X such that |X|t2. It can be easily shown that this construct is a baseAGM-compliant, non-classical logic (thus, it is AGM-compliant as well). The example just presented is artificial; in [15], [20], it is shown that there are well-known, interesting, non-classical logics, used in real world applications (such as certain Description Logics), which are AGM-compliant. In such logics, the original AGM theory is not applicable, but our generalized theory is. Unfortunately, for the foundational case the situation is not so nice: the following theorem (theorem 12) shows that we should not expect to find any interesting base-AGM-compliant logic, while its corollary (theorem 13) recasts the result of [4] in our terminology: Theorem 12. Consider a logic . If there is a proposition xL and a set YL such that Cn()ԷCn(Y)ԷCn({x}), then the logic is not base-AGM-compliant. Theorem 13. No logic satisfying the AGM assumptions is base-AGM-compliant. Our results could be further exploited by studying the connection of AGMcompliance with the concept of roots [14], [15] which is quite instructive on the intuition behind decomposability. Furthermore, an interesting connection of AGMcompliance with lattice theory [21] can be shown; AGM-compliance is a property that can be totally determined by the structure of the (unique, modulo equivalence) lattice that represents the logic under question [14], [15]. This is an important result for two reasons; first, lattice theory provides a nice visualization of the concepts and results presented in this paper; second, the various results of lattice theory that have been developed over the years can be used directly in our framework. The latter property may allow the development of deeper results regarding AGM-compliant logics. As already mentioned, AGM dealt with two more operations, namely revision and expansion. Expansion can be trivially defined in the standard framework [2] using the equation: K+x=Cn(K{x}). This equation can be easily recast in our framework, by setting K+X=Cn(KX). For revision, however, things are not so straightforward. For reasons thoroughly explained in [15], the reformulation of the AGM postulates for revision in our more general context requires the definition of a negation-like operator applicable to all monotonic logics. The exact properties of such an operator are a subject of current investigation [22]. A related question is whether contraction and revision in this context are interdefinable in a manner similar to the one expressed via the Levi and Harper identities in the classical setting [1].
G. Flouris et al. / On Generalizing the AGM Postulates
141
Apart from the basic AGM postulates, AGM proposed two additional contraction postulates, the supplementary ones, for which a similar generalization can be done. Under certain conditions (satisfied by all finite logics), decomposability is a necessary and sufficient condition for the existence of a contraction operator satisfying both the basic and the supplementary postulates. However, it is an open problem whether this result extends to all logics; thus, the integration of the supplementary postulates in our framework is still incomplete. For more details on this issue see [15].
7. Towards a Universal AGM Theory As already mentioned, the AGM postulates express common intuition regarding the operation of contraction and were accepted by most researchers. The only postulate that has been seriously debated is the postulate of recovery (K6) and its foundational counterpart (B6). Some works [4] state that (K6) is counter-intuitive, while others [7], [8] state that it forces a contraction operator to remove too little information from the KB. However, it is generally acceptable that (K6) (and (B6)) cannot be dropped unless replaced by some other constraint that would somehow express the Principle of Minimal Change [1]. Theorems 1 and 7 show that the recovery (and base recovery) postulate is also the reason that certain logics are non-AGM-compliant (and non-baseAGM-compliant). These facts motivate us to look for a replacement of this postulate (say (K6*) and (B6*)) that would properly capture the Principle of Minimal Change in addition to being as close as possible to the original (K6) and (B6). Let us first consider the coherence model. In order for such a postulate to be adequate for our purposes, it should satisfy the following important properties: 1. Existence: In every logic there should exist a contraction operator satisfying (K1)-(K5) and (K6*) 2. AGM Rationality: In AGM-compliant logics, the class of operators satisfying (K1)-(K5) and (K6*) should coincide with the class of operators satisfying (K1)-(K6). It turns out that the following postulate comes very close to satisfying both goals: (K6*) If Cn((KíX)X)ԷCn(YX) for some YCn(K), then Cn()ԷCn(X)Cn(Y) The original recovery postulate guarantees that the contraction operator will remove as little information from K as possible, by requiring that the removed expressions are all “related” to X; thus, the union of (KX) with X is equivalent to K. Notice that, in the principal case of contraction (where Cn()ԷCn(X)Cn(K)), this result is maximal, due to the inclusion postulate. The idea behind (K6*) is to keep the requirement that Cn((KX)X) should be maximal, while dropping the requirement that this “maximal” is necessarily equivalent to K. Indeed, according to (K6*), Cn((KX)X) is maximal, because, if there is any Y that gives a “larger” union with X than (KX), then this Y is necessarily a superset of X, so it cannot be considered as a result of (KX), due to the success postulate. Let’s first verify that the second property (AGM rationality) is satisfied by (K6*). Firstly, if postulate (K6) is satisfied, then there is no Y satisfying the “if part” of (K6*), so (K6*) is satisfied as well. For the opposite, if the logic is AGM-compliant, then, by theorem 3, for every pair K, XL such that Cn()ԷCn(X)Cn(K) there is some set ZL such that ZX(K). For this particular Z it holds that Cn(X)ԽCn(Z) and Cn(ZX)=Cn(K), so Cn(ZX) is maximal among all ZCn(K). Therefore, (K6*)
142
G. Flouris et al. / On Generalizing the AGM Postulates
will not be satisfied unless Cn((KX)X)=Cn(K), i.e., (K6*) will be satisfied only for contraction operators for which (K6) is also satisfied. Once we deal with some technicalities and limit cases we can show the following: Theorem 14. Consider an AGM-compliant logic and a contraction operator ‘’. Then ‘’ satisfies (K1)-(K6) iff it satisfies (K1)-(K5) and (K6*). Unfortunately, the first of our desired properties, existence, cannot be guaranteed in general. However, finiteness, as well as decomposability, imply existence; this implication could have a “local” or a “global” character: Theorem 15. Consider a logic . x There is a contraction operator (KX) which satisfies (K1)-(K5) and (K6*) whenever the belief set Cn(K) is decomposable or it contains a finite number of subsets, modulo logical equivalence x If the logic is decomposable, or it contains a finite number of beliefs, modulo logical equivalence, then there is a contraction operator satisfying (K1)(K5) and (K6*) As usual, similar results can be shown for the foundational case; the only change required is to restrict our search to all Y which are subsets of the belief base K, instead of its logical closure (Cn(K)). Moreover, notice that, in this case, the assumption of finiteness of the belief base is reasonable, so existence is (usually) achieved [22]: (B6*) If Cn((KíX)X)ԷCn(YX) for some YK, then Cn()ԷCn(X)Cn(Y) Theorem 16. Consider a base-AGM-compliant logic and a contraction operator ‘’. Then ‘’ satisfies (B1)-(B6) iff it satisfies (B1)-(B5) and (B6*). Theorem 17. Consider a logic . x There is a contraction operator (KX) which satisfies (B1)-(B5) and (B6*) whenever the belief base K is base-decomposable or it contains a finite number of subsets, modulo logical equivalence x If the logic is base-decomposable, or it contains a finite number of beliefs, modulo logical equivalence, then there is a contraction operator satisfying (B1)-(B5) and (B6*)
8. Conclusion and Future Work We studied the application of the AGM theory in a general framework including all monotonic logics. We determined the limits of our generalization and showed that there are non-classical logics in which the AGM theory can be applied. We provided a new representation result for the AGM postulates, which has the advantage of being applicable to all monotonic logics, rather than just the classical ones, and studied the connection of the AGM theory with the foundational model. Finally, we proposed a weakening of the recovery postulate with several intuitively appealing properties. For more details refer to [14], [15]; for some applications of our work refer to [15], [20]. Future work includes the further refinement of (K6*), (B6*) so as to satisfy the existence property unconditionally, as well as the study of the operation of revision and the supplementary postulates. Finally, we plan to study the connection of other AGMrelated results to AGM-compliance. Our ultimate objective is the development of a complete “AGM-like” theory suitable for all logics, including non-classical ones.
G. Flouris et al. / On Generalizing the AGM Postulates
143
Acknowledgements This paper is a heavily revised and extended version of [23]. The authors would like to thank Zhisheng Huang, Jeff Pan and Holger Wache for permission to include yet unpublished joint work (to appear in [22]).
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]
P. Gärdenfors. Belief Revision: An Introduction. In P. Gärdenfors (ed.) Belief Revision, pp. 1-20. Cambridge University Press, 1992. C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic 50:510-530, 1985. C. Alchourron, D. Makinson. On the Logic of Theory Change: Safe Contraction. Studia Logica 44:405422, 1985. A. Fuhrmann. Theory Contraction through Base Contraction. Journal of Philosophical Logic 20:175203, 1991. P. Gärdenfors, D. Makinson. Revisions of Knowledge Systems using Epistemic Entrenchment. In Proceedings of the 2nd Conference on Theoretical Aspects of Reasoning about Knowledge, 1988. A. Grove. Two Modellings for Theory Change. Journal of Philosophical Logic, 17:157-170, 1988. S.O. Hansson. Knowledge-level Analysis of Belief Base Operations. Artificial Intelligence 82:215-235, 1996. S.O. Hansson. A Textbook of Belief Dynamics. Kluwer Academic Publishers, 1999. B. Nebel. A Knowledge Level Analysis of Belief Revision. In Proceedings of the 1st International Conference on Principles of Knowledge Representation and Reasoning (KR'89), pp. 301-311, 1989. H.B. Enderton. A Mathematical Introduction to Logic. Academic Press, New York, 1972. J. Lloyd. Foundations of Logic Programming. Springer, 1987. S.N. Burris. Logic For Mathematics And Computer Science. Prentice Hall, New Jersey, 1998. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2002. G. Flouris, D. Plexousakis, G. Antoniou. AGM Postulates in Arbitrary Logics: Initial Results and Applications, Technical Report, TR-336, ICS-FORTH, 2004. G. Flouris. On Belief Change and Ontology Evolution. Doctoral Dissertation, Department of Computer Science, University of Crete, February 2006. P. Haase, Y. Sure. D3.1.1.b State of the Art on Ontology Evolution. 2004. Available on the Web (last visited April, 2006): http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/SEKT-D3.1.1.b.pdf A. Fuhrmann, S.O. Hansson. A Survey of Multiple Contractions. Journal of Logic, Language and Information, 3:39-76, 1994. S.O. Hansson. Revision of Belief Sets and Belief Bases. In Handbook of Defeasible Reasoning and Uncertainty Management Systems, Vol. 3, pp. 17-75, 1998. P. Gärdenfors. The Dynamics of Belief Systems: Foundations Versus Coherence Theories. Revue Internationale de Philosophie, 44:24-46, 1992. G. Flouris, D. Plexousakis, G. Antoniou. On Applying the AGM Theory to DLs and OWL. In Proceedings of the 4th International Semantic Web Conference (ISWC-05), pp. 216-231, 2005. G. Grätzer. Lattice Theory: First Concepts and Distributive Lattices. W. H. Freeman & co, 1971. G. Flouris, Z. Huang, J. Pan, D. Plexousakis, H. Wache. Inconsistencies, Negations and Changes in Ontologies. To appear in Proceedings of the 21st National Conference on Artificial Intelligence, 2006. G. Flouris, D. Plexousakis, G. Antoniou. Generalizing the AGM Postulates: Preliminary Results and Applications. In Proceedings of the 10th International Workshop on Non-Monotonic Reasoning, 2004.
144
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
The Two-Variable Situation Calculus Yilan GU a , Mikhail SOUTCHANSKI b,1 a Department of Computer Science, University of Toronto, email: [email protected] b Department of Computer Science, Ryerson University, email: [email protected] Abstract. We consider a modified version of the situation calculus built using a two-variable fragment of the first-order logic extended with counting quantifiers. We mention several additional groups of axioms that need to be introduced to capture taxonomic reasoning. We show that the regression operator in this framework can be defined similarly to regression in the Reiter’s version of the situation calculus. Using this new regression operator, we show that the projection problem (that is the main reasoning task in the situation calculus) is decidable in the modified version. We mention possible applications of this result to formalization of Semantic Web services. Keywords. situation calculus, description logic, decidable reasoning about actions
1. Introduction The Semantic Web community makes significant efforts toward integration of Semantic Web technology with the ongoing work on web services. These efforts include use of semantics in the discovery, composition, and other aspects of web services. Web service composition is related to the task of designing a suitable combination of available component services into a composite service to satisfy a client request when there is no single service that can satisfy this request [15]. This problem attracted significant attention of researchers both in academia and in industry. A major step in this direction is creation of ontologies for web services, in particular, OWL-S that models web services as atomic or complex actions with preconditions and effects. An emerging industry standard BPEL4WS (Business Process Execution Language for Web Services) provides the basis for manually specifying composite web services using a procedural language. However, in comparison to error-prone manual service compositions, (semi)automated service composition promises significant flexibility in dealing with available services and also accommodates naturally the dynamics and openness of service-oriented architectures. The problem of the automated composition of web services is often formulated in terms similar to a planning problem in AI: given a description of a client goal and a set of component services (that can be atomic or complex), find a composition of services that achieves the goal [19,20,25,23]. Despite that several approaches to solving this problem have already been proposed, many issues remain to be resolved, e.g., how to give welldefined and general characterizations of service compositions, how to compute all effects 1 Correspondence to: Mikhail Soutchanski, Department of Computer Science, Ryerson University, The Centre for Computing and Engineering, 245 Church Street, ENG281, Toronto, Ontario, M5B 2K3, Canada Tel.: +1 416 979 5000, ext 7954; Fax: +1 (416) 979-5064; E-mail: [email protected]
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
145
and side-effects on the world of every action included in composite service, and other issues. Other reasoning problems, well-known in AI, that can be relevant to service composition and discovery are executability and projection problems. Executability problem requires determining whether preconditions of all actions included in a composite service can be satisfied given incomplete information about the world. Projection problem requires determining whether a certain goal condition is satisfied after the execution of all component services given an incomplete information about the current state. In this paper we would like to concentrate on the last problem because it is an important prerequisite for planning and execution monitoring tasks, and for simplicity we start with sequential compositions of the atomic actions (services) only (we mention complex actions in the last section). More specifically, following several previous approaches [19,20,5,25,15], we choose the situation calculus as an expressive formal language for specification of actions. However, we acknowledge openness of the world and represent incomplete information about an initial state of the world by assuming that it is characterized by a predicate logic theory in the general syntactic form. The situation calculus is a popular and well understood predicate logic language for reasoning about actions and their effects [24]. It serves as a foundation for the Process Specification Language (PSL) that axiomatizes a set of primitives adequate for describing the fundamental concepts of manufacturing processes (PSL has been accepted as an international standard) [13,12]. It is used to provide a well-defined semantics for Web services and a foundation for a high-level programming language Golog [5,19,20]. However, because the situation calculus is formulated in a general predicate logic, reasoning about effects of sequences of actions is undecidable (unless some restrictions are imposed on the theory that axiomatizes the initial state of the world). The first motivation for our paper is intention to overcome this difficulty. We propose to use a two-variable fragment FO2 of the first-order logic (FOL) as a foundation for a modified situation calculus. Because the satisfiability problem in this fragment is known to be decidable (it is in NE XP T IME), we demonstrate that by reducing reasoning about effects of actions to reasoning in this fragment, one can guarantee decidability no matter what is the syntactic form of the theory representing the initial state of the world. The second motivation for our paper comes from description logics. Description Logics (DLs) [2] are a well-known family of knowledge representation formalisms, which play an important role in providing the formal foundations of several widely used Web ontology languages including OWL [14] in the area of the Semantic Web [3]. DLs may be viewed as syntactic fragments of FOL and offer considerable expressive power going far beyond propositional logic, while ensuring that reasoning is decidable [6]. DLs have been mostly used to describe static knowledge-base systems. Moreover, several research groups consider formalization of actions using DLs or extensions of DLs. Following the key idea of [8], that reasoning about complex actions can be carried in a fragment of the propositional situation calculus, De Giacomo et al. [9] give an epistemic extension of DLs to provide a framework for the representation of dynamic systems. However, the representation and reasoning about actions in this framework are strictly propositional, which reduces the representation power of this framework. In [4], Baader et al. provide another proposal for integrating description logics and action formalisms. They take as foundation the well known description logic ALCQIO (and its sub-languages) and show that the complexity of executability and projection problems coincides with the complexity of standard DL reasoning. However, actions (services) are represented in their paper meta-theoretically,
146
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
not as first-order (FO) terms. This can potentially lead to some complications when specifications of other reasoning tasks (e.g., planning) will be considered because it is not possible to quantify over actions in their framework. In our paper, we take a different approach and represent actions as FO terms, but achieve integration of taxonomic reasoning and reasoning about actions by restricting the syntax of the situation calculus. Our paper can be considered as a direct extension of the well-known result of Borgida [6] who proves that many expressive description logics can be translated to two-variable fragment FO2 of FOL. However, to the best of our knowledge, nobody proposed this extension before. The main contribution of our paper to the area of service composition and discovery is the following. We show that by using services that are composed from atomic services with no more than two parameters and by using only those properties of the world which have no more than two parameters (to express a goal condition), one can guarantee that the executability and projection problems for these services can always be solved even if information about the current state of the world is incomplete. Our paper is structured as follows. In Section 2, we briefly review the Reiter’s situation calculus. In Section 3 we review a few popular description logics. In the following section 4 we discuss details of our proposal: a modified situation calculus. In Section 5 we consider an extension of regression (the main reasoning mechanism in the situation calculus). Finally, in Section 6 we provide a simple example and in Section 7 we discuss briefly other related approaches to reasoning about actions. 2. The Situation Calculus The situation calculus (SC) Lsc is a FO language for axiomatizing dynamic systems. In recent years, it has been extended to include procedures, concurrency, time, stochastic actions, etc [24]. Nevertheless, all dialects of the SC Lsc include three disjoint sorts: actions, situations and objects. Actions are FO terms consisting of an action function symbol and its arguments. Actions change the world. Situations are FO terms which denote possible world histories. A distinguished constant S0 is used to denote the initial situation, and function do(a, s) denotes the situation that results from performing action a in situation s. Every situation corresponds uniquely to a sequence of actions. Moreover, notation s′ s means that either situation s′ is a subsequence of situation s or s = s′ .2 Objects are FO terms other than actions and situations that depend on the domain of application. Fluents are relations or functions whose values may vary from one situation to the next. Normally, a fluent is denoted by a predicate or function symbol whose last argument has the sort situation. For example, F (x, do([α1 , · · · , αn ], S0 ) represents a relational fluent in the situation do(αn , do(· · · , do(α1 , S0 ) · · · ) resulting from execution of ground action terms α1 , · · · , αn in S0 .3 The SC includes the distinguished predicate P oss(a, s) to characterize actions a that are possible to execute in s. For any SC formula φ and a term s of sort situation, we say φ is a formula uniform in s iff it does not mention the predicates P oss or ≺, it does 2 Reiter [24] uses the notation s′ ⊑ s, but we use s′ s to avoid confusion with the inclusion relation that is commonly used in description logic literature. In this paper, we use to denote the inclusion relation between concepts or roles. 3 We do not consider functional fluents in this paper.
147
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
not quantify over variables of sort situation, it does not mention equality on situations, and whenever it mentions a term of sort situation in the situation argument position of a fluent, then that term is s (see [24]). If φ(s) is a uniform formula and the situation argument is clear from the context, sometimes we suppress the situation argument and write this formula simply as φ. Moreover, for any predicate with the situation argument, such as a fluent F or P oss, we introduce an operation of restoring a situation argument s back to the corresponding atomic formula without situation argument, i.e., F (x)[s] =def F (x, s) and P oss(A)[s] =def P oss(A, s) for any action term A and object vector x. By the recursive definition, such notation can be easily extended to φ[s] for any FO formula φ, in which the situation arguments of all fluents and P oss predicates are left out, to represent the SC formula obtained by restoring situation s back to all the fluents and/or P oss predicates (if any) in φ. It is obvious that φ[s] is uniform in s. A basic action theory (BAT) D in the SC is a set of axioms written in Lsc with the following five classes of axioms to model actions and their effects [24]. Action precondition axioms Dap : For each action function A(x), there is an axiom of the form P oss(A(x), s) ≡ ΠA (x, s). ΠA (x, s) is a formula uniform in s with free variables among x and s, which characterizes the preconditions of action A. Successor state axioms Dss : For each relational fluent F (x, s), there is an axiom of the form F (x, do(a, s)) ≡ ΦF (x, a, s), where ΦF (x, a, s) is a formula uniform in s with free variables among x, a and s. The successor state axiom (SSA) for F (x) completely characterizes the value of F (x) in the next situation do(a, s) in terms of the current situation s. The syntactic form of ΦF (x, a, s) is as follows: m x, yi , s))∨ F (x, do(a, s)) ≡ i=1 (∃yi )(a = P osActi (ti ) ∧ φ+ i ( k (∃zj )(a = N egActj (t′ j ) ∧ φ− (x, zj , s)), F (x, s) ∧ ¬ j=1
j
where for i = 1..m (j = 1..k, respectively), each ti (t′ j , respectively) is vector of terms including variables among x and quantified new variables yi (zj , respectively) if there are any, each φ+ x, zj , s), respectively) is a SC formula uniform in s who x, yi , s) (φ− j ( i ( has free variables among x and yi (zj , respectively) if there are any, and each P osAct(ti ) (N egAct(t′ j ), respectively) is an action term that makes F (x, do(a, s)) true (false, respectively) if the condition φ+ x, yi , s) (φ− x, zj , s), respectively) is satisfied. i ( j ( Initial theory DS0 : It is a set of FO formulas whose only situation term is S0 . It specifies the values of all fluents in the initial state. It also describes all the facts that are not changeable by any actions in the domain. Unique name axioms for actions Duna : Includes axioms specifying that two actions are different if their names are different, and identical actions have identical arguments. Fundamental axioms for situations Σ: The axioms for situations which characterize the basic properties of situations. These axioms are domain independent. They are included in the axiomatization of any dynamic systems in the SC (see [24] for details).
Suppose that D = Duna ∪ DS0 ∪ Dap ∪ Dss ∪ Σ is a BAT, α1 , · · · , αn is a sequence of ground action terms, and G(s) is a uniform formula with one free variable s. One of the most important reasoning tasks in the SC is the projection problem, that is, to determine whether D |= G(do([α1 , · · · , αn ], S0 )). Another basic reasoning task is the executability problem. Letexecutable(do([α1 , · · · , αn ], S0 )) be an abbreviation n of the formula P oss(α1 , S0 ) ∧ i=2 P oss(αi , do([α1 , · · · , αi−1 ], S0 )). Then, the ex-
148
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
ecutability problem is to determine whether D |= executable(do([α1 , · · · , αn ], S0 )). Planning and high-level program execution are two important settings where the executability and projection problems arise naturally. Regression is a central computational mechanism that forms the basis for automated solution to the executability and projection tasks in the SC [22,24]. A recursive definition of the regression operator R on any regressable formula φ is given in [24]; we use notation R[φ] to denote the formula that results from eliminating P oss atoms in favor of their definitions as given by action precondition axioms and replacing fluent atoms about do(α, s) by logically equivalent expressions about s as given by SSAs of sort situation in W is starting from S0 and has the syntactic form do([α1 , · · · , αn ], S0 ) where each αi is of sort action; (2) for every atom of the form P oss(α, σ) in W , α has the syntactic form A(t1 , · · · , tn ) for some n-ary function symbol A of Lsc ; and (3) W does not quantify over situations, and does not mention the relation symbols “≺” or “=” between terms of situation sort. The formula G(do([α1 , · · · , αn ], S0 )) is a particularly simple example of a regressable formula because it is uniform in do([α1 , · · · , αn ], S0 )), but in the general case, regressable formulas can mention several different ground situation terms. Roughly speaking, the regression of a regressable formula φ through an action a is a formula φ′ that holds prior to a being performed iff φ holds after a. Both precondition and SSAs support regression in a natural way and are no longer needed when regression terminates. The regression theorem proved in [22] shows that one can reduce the evaluation of a regressable formula W to a FO theorem proving task in the initial theory together with unique names axioms for actions: D |= W iff DS0 ∪ Duna |= R[W ]. This fact is the key result for our paper. It demonstrates that an executability or a projection task can be reduced to a theorem proving task that does not use precondition, successor state, and foundational axioms. This is one of the reasons why the SC provides a natural and easy way to representation and reasoning about dynamic systems. However, because DS0 is an arbitrary FO theory, this type of reasoning in the SC is undecidable. One of the common ways to overcome this difficulty is to introduce the closed world assumption that amounts to assuming that DS0 is a relational theory (i.e., it has no occurrences of the formulas having the syntactic form F1 (x1 , S0 )∨F2 (x2 , S0 ) or ∃xF (x, S0 ), etc) and all statements that are not known to be true explicitly, are assumed to be false. However, in many application domains this assumption is unrealistic. Therefore, we consider a version of the SC formulated in FO2 , a syntactic fragment of the FO logic that is known to be decidable, or in C 2 an extension of FO2 (see below), where the satisfiability problem is still decidable. If all SC formulas are written in this syntactically restricted language, it is guaranteed by the regression theorem that both the executability and the projection problems for ground situations are decidable.
3. Description Logics and Two-variable First-order Logics In this section we review a few popular expressive description logics and related fragments of the FO logic. We start with logic ALCHQI. Let NC = {C1 , C2 , . . .} be a set of atomic concept names and NR = {R1 , R2 , . . .} be a set of atomic role names. A ALCHQI role is either some R ∈ NR or an inverse role R− for R ∈ NR . A ALCHQI role hierarchy (RBox ) RH is a finite set of role inclusion axioms R1 ⊑ R2 , where R1 , R2
149
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
are ALCHQI roles. For R ∈ NR , we define Inv(R) = R− and Inv(R− ) = R, and assume that R1 ⊑ R2 ∈ RH implies Inv(R1 ) ⊑ Inv(R2 ) ∈ RH. The set of ALCHQI concepts is the minimal set built inductively from NC and ALCHQI roles using the following rules: all A ∈ NC are concepts, and, if C, C1 , and C2 are ALCHQI concepts, R is a simple role and n ∈ N, then also ¬C, C1 ⊓ C2 , and (∃n R.C) are ALCHQI concepts. We use also the following abbreviations for concepts: def
C1 ⊔ C2 = ¬(¬C1 ⊓ ¬C2 )
(∃
def
∃R.C = (∃
1
def
R.C) = ¬(∃ def
n
C1 ⊃ C2 = ¬C1 ⊔ C2 def
n
(∃ R. C) = (∃
(n+1)
R.C)
R.C) ⊓ (∃
n
def
for some A ∈ NC
def
for some A ∈ NC
R.C)
⊤ = A ⊔ ¬A
∀R.C = ∃ 5000), eligP art(x, s) ≡ (∃y)(tuitP aid(x, y, s) ∧ y ≤ 5000), qualF ull(x, s) ≡ eligF ull(x, s) ∧ (∃≥4 y)enrolled(x, y, s), qualP art(x, s) ≡ eligP art(x, s) ∧ (∃≥2 y)enrolled(x, y, s) ∧ (∃≤3 enrolled(x, y, s)). An example of the initial theory DS0 could be the conjunctions of the following sentences: person(P SN1 ), person(P SN2 ),· · · , person(P SNm ), preReq(CS1 , CS4 ) ∨ preReq(CS3 , CS4 ), (∀x)incoming(x, S0 ) ⊃ x = P SN2 ∨ x = P SN3 , (∀x, y)¬enrolled(x, y, S0 ), (∀x)x = CS4 ⊃ ¬(∃y).preP eq(y, x), (∀x)¬student(x, S0 ).
Suppose we denote the above basic action theory as D. Given goal G, for example ∃x.qualF ull(x), and a composite web service starting from the initial situation, for example do([admit(P SN1 ), payT uit(P SN1 , 6000)], S0 ) (we denote the corresponding resulting situation as Sr ), we can check if the goal is satisfied after the execution of this composite web service by solving the projection problem whether D |= G[Sr ]. In our example, this corresponds to solving whether D |= ∃x.qualF ull(x, Sr ). We may also check if a given (ground) composite web service A1 ; A2 ; · · · ; An is possible to execute starting from the initial state by solving the executability problem whether D |= executable(do([A1 , A2 , · · · , An ], S0 )). For example, we can check if the
158
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
composite web service admit(P SN1 ); payT uit(P SN1 , 6000) is possible to be executed from the starting state by solving whether D |= executable(Sr ). Finally, we give an example of regression of a regressable formula. For instance, R[(∃x).qualF ull(x, do([admit(P SN1 ), payT uit(P SN1 , 6000)], S0 ))] = R[(∃x).eligF ull(x, do([admit(P SN1 ), payT uit(P SN1 , 6000)], S0 ))∧ (∃≥4 y)enrolled(x, y, do([admit(P SN1 ), payT uit(P SN1 , 6000)], S0 ))] = (∃x).R[eligF ull(x, do([admit(P SN1 ), payT uit(P SN1 , 6000)], S0 ))]∧ (∃≥4 y)R[enrolled(x, y, do([admit(P SN1 ), payT uit(P SN1 , 6000)], S0 ))] = ··· = (∃x).((∃y)R[tuitP aid(x, y, do([admit(P SN1 ), payT uit(P SN1 , 6000)], S0 )) ∧ y > 5000]) ∧(∃≥4 y)enrolled(x, y, S0 ) = ··· = (∃x).((∃y)tuitP aid(x, y, S0 ) ∧ y > 5000 ∨ (x = P SN1 ∧ y = 6000 ∧ y > 5000)) ∧(∃≥4 y)enrolled(x, y, S0 ),
which is false given the above initial theory. We also may introduce some RBox axioms as follows: gradeA ⊑ hadGrade, gradeB ⊑ hadGrade, gradeC ⊑ hadGrade, gradeD ⊑ hadGrade. The RBox axioms are not used in the regression steps of reasoning about executability problems and projection problems. However, they are useful for terminological reasonings when necessary.
7. Discussion and Future Work The major consequence of the results proved above for the problem of service composition is the following. If both atomic services and properties of the world that can be affected by these services have no more than two parameters, then we are guaranteed that even in the state of incomplete information about the world, one can always determine whether a sequentially composed service is executable and whether this composite service will achieve a desired effect. The previously proposed approaches made different assumptions: [19] assumes that the complete information is available about the world when effects of a composite service are computed, and [5] considers the propositional fragment of the SC. As we mentioned in Introduction, [19,20] propose to use Golog for composition of Semantic Web services. Because our primitive actions correspond to elementary services, it is desirable to define Golog in our modified SC too. It is surprisingly straightforward to define almost all Golog operators starting from our C 2 based SC. The only restriction in comparison with the original Golog [17,24] is that we cannot define the operator (πx)δ(x), non-deterministic choice of an action argument, because LDL sc regressable formulas cannot have occurrences of non-ground action terms in situation terms. In the original Golog this is allowed, because the regression operator is defined for a larger class of regressable formulas. However, everything else from the original Golog specifications remain in force, no modifications are required. In addition to providing a welldefined semantics for Web services, our approach also guarantees that evaluation of tests in Golog programs is decidable (with respect to arbitrary theory DS0 ) that is missing in other approaches (unless one can make the closed world assumption or impose another restriction to regain decidability).
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
159
The most important direction for future research is an efficient implementation of a decision procedure for solving the executability and projection problems. This procedure should handle the modified LDL sc regression and do efficient reasoning in DS0 . It should be straightforward to modify existing implementations of the regression operator for our purposes, but it is less obvious which reasoner will work efficiently on practical problems. There are several different directions that we are going to explore. First, according to [6] and Theorem 2, there exists an efficient algorithm for translating C 2 formulas to ALCQI(⊔, ⊓, ¬, |, id) formulas. Consequently, we can use any resolution-based description logic reasoners that can handle ALCQI(⊔, ⊓, ¬, |, id) (e.g., MSPASS [16]). Alternatively, we can try to use appropriately adapted tableaux-based description logic reasoners, such as FaCT++, for (un)satisfiability checking in ALCQI(⊔, ⊓, ¬, |, id). Second, we can try to avoid any translation from C 2 to ALCQI(⊔, ⊓, ¬, |, id) and adapt resolution based automated theorem provers for our purposes [7]. The recent paper by Baader et al [4] proposes integration of description logics ALCQIO (and its sub-languages) with an action formalism for reasoning about Web services. This paper starts with a description logic and then defines services (actions) metatheoretically: an atomic service is defined as the triple of sets of description logic formulas. To solve the executability and projection problems this paper introduces an approach similar to regression, and reduces this problem to description logic reasoning. The main aim is to show how executability of sequences of actions and solution of the executability and projection problems can be computed, and how complexity of these problems depend on the chosen description logic. In the full version of [4], there is a detailed embedding of the proposed framework into the syntactic fragment of the Reiter’s SC. It is shown that solutions of their executability and projection problems correspond to solutions of these problems with respect to the Reiter’s basic action theories in this fragment for appropriately translated formulas (see Theorem 12 in Section 2.4). To achieve this correspondence, one needs to eliminate TBox by unfolding (this operation can result potentially in exponential blow-up of the theory). Despite that our paper and [4] have common goals, our developments start differently and proceed in the different directions. We start from the syntactically restricted FO language (that is significantly more expressive than ALCQIO), use it to construct the modified SC (where actions are terms), define basic action theories in this language and show that by augmenting (appropriately modified) regression with lazy unfolding one can reduce the executability and projection problems to the satisfiability problem in C 2 that is decidable. Furthermore, C 2 formulas can be translated to ALCQI(⊔, ⊓, ¬, |, id), if desired. Because our regression operator unfolds fluents “on demand” and uses only relevant part of the (potentially huge) TBox , we avoid potential computational problems that may occur if the TBox were eliminated in advance. The advantage of [4] is that all reasoning is reduced to reasoning in description logics (and, consequently, can be efficiently implemented especially for less expressive fragments of ALCQIO). Our advantages are two-fold: the convenience of representing ac2 tions as terms, and the expressive power of LDL sc . Because C and ALCQI(⊔, ⊓, ¬, |, id) are equally expressive, there are some (situation suppressed) formulas in our SC that cannot be expressed in ALCQIO (that does not allow complex roles). An interesting paper [18] aims to achieve computational tractability of solving projection and progression problems by following an alternative direction to the approach chosen here. The theory of the initial state is assumed to be in the so-called proper form and the query used in the projection problem is expected to be in a certain normal form.
160
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus
In addition, [18] considers a general SC and impose no restriction on arity of fluents. Because of these significant differences in our approaches, it is not possible to compare them. There are several other proposals to capture the dynamics of the world in the framework of description logics and/or its slight extensions. Instead of dealing with actions and the changes caused by actions, some of the approaches turned to extensions of description logic with temporal logics to capture the changes of the world over time [1,2], and some others combined planning techniques with description logics to reason about tasks, plans and goals and exploit descriptions of actions, plans, and goals during plan generation, plan recognition, or plan evaluation [10]. Both [1] and [10] review several other related papers. In [5], Berardi et al. specify all the actions of e-services as constants, all the fluents of the system have only situation argument, and translate the basic action theory under such assumption into description logic framework. It has a limited expressive power without using arguments of objects for actions and/or fluents: this may cause a blow-up of the knowledge base.
Acknowledgments Thanks to the Natural Sciences and Engineering Research Council of Canada (NSERC) and to the Department of Computer Science of the University of Toronto for providing partial financial support for this research.
References [1] [2]
[3]
[4]
[5]
[6] [7]
[8]
Alessandro Artale and Enrico Franconi. A survey of temporal extensions of description logics. Annals of Mathematics and Artificial Intelligence, 30(1-4), 2001. Franz Baader, Diego Calvanese, Deborah McGuinness, Daniele Nardi, and Peter F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003. Franz Baader, Ian Horrocks, and Ulrike Sattler. Description logics as ontology languages for the semantic web. In Dieter Hutter and Werner Stephan, editors, Mechanizing Mathematical Reasoning, Essays in Honor of J¨org H. Siekmann on the Occasion of His 60th Birthday, Lecture Notes in Computer Science, vol. 2605, pages 228–248. Springer, 2005. Franz Baader, Carsten Lutz, Maja Miliˆci´c, Ulrike Sattler, and Frank Wolter. Integrating description logics and action formalisms: First results. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), pages 572–577, Pittsburgh, PA, USA, July 2005. extended version is available as LTCS-Report-05-02 from http://lat.inf.tu-dresden.de/research/reports.html. Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella. e-service composition by description logics based reasoning. In Diego Calvanese, Giuseppe de Giacomo, and Enrico Franconi, editors, Proceedings of the 2003 International Workshop in Description Logics (DL-2003), Rome, Italy, 2003. Alexander Borgida. On the relative expressiveness of description logics and predicate logics. Artificial Intelligence, 82(1-2):353–367, 1996. Hans de Nivelle and Ian Pratt-Hartmann. A resolution-based decision procedure for the two-variable fragment with equality. In A. Leitsch R. Gor´e and T. Nipkow, editors, IJCAR’01: Proceedings of the First International Joint Conference on Automated Reasoning, pages 211–225, London, UK, 2001. SpringerVerlag, LNAI, V. 2083. Giuseppe De Giacomo. Decidability of Class-Based Knowledge Representation Formalisms. Dipartimento di Informatica e Sistemistica Universita di Roma ”La Sapienza”, Roma, Italy, 1995.
Y. Gu and M. Soutchanski / The Two-Variable Situation Calculus [9]
161
Giuseppe De Giacomo, Luca Iocchi, Daniele Nardi, and Riccardo Rosati. A theory and implementation of cognitive mobile robots. Journal of Logic and Computation, 9(5):759–785, 1999. [10] Yolanda Gil. Description logics and planning. AI Magazine, 26(2):73–84, 2005. [11] Erich Gr¨adel, Martin Otto, and Eric Rosen. Two-variable logic with counting is decidable. In Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science (LICS’97), pages 306–317, Warsaw, Poland, 1997. [12] Michael Gr¨uninger. Ontology of the process specification language. In Steffen Staab and Rudi Studer, editors, Handbook on Ontologies, pages 575–592. Springer, 2004. [13] Michael Gr¨uninger and Christopher Menzel. The process specification language (PSL): Theory and applications. AI Magazine, 24(3):63–74, 2003. [14] Ian Horrocks, Peter Patel-Schneider, and Frank van Harmelen. From SHIQ and RDF to OWL: The making of a web ontology language. Journal of Web Semantics, 1(1):7–26, 2003. [15] Richard Hull and Jianwen Su. Tools for composite web services: a short overview. SIGMOD Record, 34(2):86–95, 2005. [16] Ullrich Hustadt and Renate A. Schmidt. Issues of decidability for description logics in the framework of resolution. In R. Caferra and G. Salzer, editors, Automated Deduction, pages 191–205. Springer-Verlag, LNAI, V. 1761, 2000. [17] Hector Levesque, Ray Reiter, Yves Lesp´erance, Fangzhen Lin, and Richard Scherl. GOLOG: A logic programming language for dynamic domains. Journal of Logic Programming, 31:59–84, 1997. [18] Yongmei Liu and Hector J. Levesque. Tractable reasoning with incomplete first-order knowledge in dynamic systems with context-dependent actions. In Proc. IJCAI-05, Edinburgh, Scotland, August 2005. [19] Sheila McIlraith and Tran Son. Adapting Golog for composition of semantic web services. In D. Fensel, F. Giunchiglia, D. McGuinness, and M.-A. Williams, editors, Proceedings of the Eighth International Conference on Knowledge Representation and Reasoning (KR2002), pages 482–493, Toulouse, France, April 22-25 2002. Morgan Kaufmann. [20] Srini Narayanan and Sheila McIlraith. Analysis and simulation of web services. Computer Networks, 42:675–693, 2003. [21] Leszek Pacholski, Wiesław Szwast, and Lidia Tendera. Complexity of two-variable logic with counting. In Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science (LICS-97), pages 318–327, Warsaw, Poland, 1997. A journal version: SIAM Journal on Computing, v 29(4), 1999, p. 1083–1117. [22] Fiora Pirri and Ray Reiter. Some contributions to the metatheory of the situation calculus. Journal of the ACM, 46(3):325–364, 1999. [23] Marco Pistore, AnnaPaola Marconi, Piergiorgio Bertoli, and Paolo Traverso. Automated composition of web services by planning at the knowledge level. In Leslie Pack Kaelbling and Alessandro Saffiotti, editors, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI05), pages 1252–1259, Edinburgh, Scotland, UK, July 30-August 5 2005. http://ijcai.org/papers/1428.pdf. [24] Raymond Reiter. Knowledge in Action: Logical Foundations for Describing and Implementing Dynamical Systems. The MIT Press, Cambridge, 2001. [25] Evren Sirin, Bijan Parsia, Dan Wu, James Hendler, and Dana Nau. HTN planning for web service composition using SHOP2. Journal of Web Semantics, 1(4):377–396, October 2004.
162
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Base Belief Change and Optimized Recovery1 Frances Johnson 2 and Stuart C. Shapiro University at Buffalo, CSE Dept., Buffalo, NY, USA flj|[email protected] Abstract. Optimized Recovery (OR) adds belief base optimization to the traditional Recovery postulate—improving Recovery adherence without sacrificing adherence to the more accepted postulates or to the foundations approach. Reconsideration and belief liberation systems both optimize a knowledge base through consolidation of a chain of base beliefs; and recovered base beliefs are returned to the base. The effects match an iterated revision axiom and show benefits for pre-orders, as well. Any system that can resolve an inconsistent belief base can produce these results. Keywords. Base belief revision, knowledge base optimization, reconsideration, recovery, truth maintenance system (TMS)
1. Introduction 1.1. Motivation This paper shows how existing belief change operations can be used to optimize a belief base and improve the return of previously retracted base beliefs to the base—offering the best aspects of the Recovery postulate while retaining a true foundations approach as is required for implemented systems. Any agent reasoning from a set of beliefs must be able to perform basic belief change operations, including expansion, contraction and consolidation. Briefly, expansion is adding a belief to a set without concern for any inconsistencies it might raise; contraction of a set by a belief results in a set that does not entail (cannot derive) that belief — it is the removal or retraction3 of that belief; and consolidation of a finite set of beliefs produces a consistent subset of the original set. These are discussed in more detail in Section 1.3. The Recovery postulate for belief theory contraction [1] states that a logically closed belief theory K is contained in the belief theory that results from contraction of K by a belief p followed by union with {p} and deductive closure. One feature of Recovery is 1 This
paper is a revised version of a 2005 IJCAI workshop paper [13]. to: Frances Johnson, University at Buffalo, Department of Computer Science and Engineering, 201 Bell Hall, Buffalo, NY 14260-2000, USA. Tel.: +1 716-998-8394; Fax: +1 716-645-3464; E-mail: fl[email protected] 3 The term retraction is also used in the literature to define a specific subclass of contraction. In this paper, we use the term retraction as a synonym for removal. 2 Correspondence
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
163
that any beliefs lost, due to contraction by p, should become reasserted when p is returned and closure is performed; it is this feature of recovery that is a key element of our paper. Although a belief base can be infinite and deductively closed (theoretically), we are focusing on the base of an implemented system which consists solely of input to the system (e.g., observations, sensor readings, rules). This follows the foundations approach (see discussion in [7] and [10]), where base beliefs have independent standing and are treated differently from inferred beliefs, and results in a finite base that is not deductively closed.4 We refer to the closure of a belief base as its belief space. The well accepted Success postulate [10]) requires that contraction of a belief base B by some belief p is successful when p is absent from the resulting base and its logical closure (unless p ∈ Cn(∅). Recovery does not hold in general for belief base contraction, because additional base beliefs removed during contraction by some belief p might not return (or be derivable) when p is returned to the base. This is because the base is not deductively closed prior to the contraction. There are base contraction operations that do satisfy Recovery by inserting “Recovery-enhancing" beliefs into the base during contraction. These inserted beliefs, however, do not come from an input source, so this technique deviates from the foundations approach as we are applying it for an implemented system. Adding new base beliefs during contraction also violates the Inclusion postulate [1,10], which states that the result of contracting a belief theory/base should be a subset (⊆) of that theory/base. This is discussed further in Section 5.2. The research defining belief liberation [2] and reconsideration [11,12] supports the concept that removing a belief from a base might allow some previously removed beliefs to return.5 Both liberation and reconsideration research discuss a re-optimization of the current belief base when the sequence of all base beliefs has been altered. It is this reoptimization that is instrumental in defining Optimized Recovery and in providing the recovery feature that has been missing in base belief change. Our discussion of various recovery formulations is accompanied by a table that illustrates all cases where these formulations do (or do not) hold for base belief change. 1.2. Notation and Terminology For this paper, we use a propositional language, L, which is closed under the truth functional operators ¬, ∨, ∧, →, and ↔. Atoms of the language L are denoted by lowercase letters (p, q, r, . . .). Sets and sequences are denoted by uppercase letters (A, B, C, . . .). If set A derives p, it is denoted as A ⊢ p. Cn, is defined by Cn(A) = {p | A ⊢ p}, and Cn(A) is called the closure of A. A set of beliefs S is consistent iff S ⊢ ⊥, where ⊥ denotes logical contradiction. A belief theory, K, is a logically closed set of beliefs (i.e. K = Cn(K)) [1]. We will use B for a belief base and K for a belief theory. Note that we use the term set to refer to any set of beliefs—whether finite or infinite, deductively closed or not. 4 Note
that we do not consider the rare case of a finite base that is considered “closed" if it contains at least one (but not all) of each logically equivalent belief in its deductive closure. This is rare and unlikely in a real-world implementation of any appreciable size. 5 This is very different from the recovery of retracted beliefs during either saturated kernel contractions [8] or the second part of Hybrid Adjustment [16]
164
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
Given a belief base, B, the set of p-kernels of B is the set {A | A ⊆ B, A ⊢ p and (∀A′ A)A′ ⊢p} [8]. Truth maintenance systems (TMSs) [6] retain the information about how a belief is derived, distinguishing between base and derived beliefs. An assumption-based truth maintenance system (ATMS) [5] stores the minimal set of base beliefs underlying a derivation. A nogood in the ATMS literature is an inconsistent set of base beliefs. We will define a ⊥-kernel (falsum-kernel) as a minimally inconsistent nogood: a set S s.t. S ⊢⊥, but for all S ′ S, S ′ ⊢ ⊥. 1.3. Background This section briefly reviews the traditional belief change operations of expansion and contraction of a logically closed belief theory K [1] and expansion, kernel contraction and kernel consolidation of a finite belief base B. [8,10]. 1.3.1. Expansion K + p (the expansion of the belief theory K by the belief p) is defined as Cn(K ∪ {p}). B + p (the expansion of the belief base B by the belief p) is defined as B ∪ {p}. 1.3.2. Kernel Contraction The contraction of a base B [or theory K] by a belief p is written as B ∼ p [K ∼ p]. For this paper, B ∼ p is the kernel contraction [8] of the belief base B by p (retraction of p from B) and, although constrained by several postulates, is basically the base resulting from the removal of at least one element from each p-kernel in B — unless p ∈ Cn(∅), in which case B ∼ p = B. A decision function determines which beliefs should be removed during kernel contraction.6 Although minimal damage to a knowledge base is a desirable feature of a decision function, it often comes with increased computational cost; when choosing a decision function for an implemented system, the tradeoff between minimizing damage and minimizing complexity must be considered. Given a belief base B, if the belief theory K is the belief space for B (K = Cn(B)), then the contraction of this belief space by p through the use of kernel contraction is defined as K ∼ p =def Cn(B ∼ p). 1.3.3. Kernel Consolidation Consolidation (the removal of any inconsistency) is defined for belief bases only. Any inconsistent belief theory is the set of all beliefs (due to deductive closure), so operations on belief theories focus on preventing inconsistencies, as opposed to resolving them. B! (the kernel consolidation of B) is the removal of at least one element from each ⊥-kernel in B s.t. B! ⊆ B and B! ⊢ ⊥. This means that B! =def B ∼⊥. 6 An example of six different decision functions can be seen in the six different adjustment strategies used by SATEN [16].
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
165
1.4. Recovery Recovery does not hold for kernel contraction when elements of a p-kernel in B are retracted during the retraction of p, but are not returned as a result of the expansion by p followed by deductive closure. Not only do these base beliefs remain retracted, but derived beliefs that depend on them are also not recovered. Example 1 Given the base B = {s, d, s → q}, B ∼ s ∨ d = {s → q}, and (B ∼ s ∨ d) + s ∨ d = {s ∨ d, s → q}. Not only do we not recover s or d as individual beliefs, but the derived belief q is also not recovered. We feel the assertion of s ∨ d means that its earlier retraction was, in hindsight, not valid for this current state, so all effects of that retraction should be undone. There are various criticisms of Recovery in the literature (see [10] and [15] for discussions and further references); their argument is that Recovery is not as essential an axiom for contraction as the other axioms, which we do not dispute. We do, however, prefer to adhere to Recovery whenever possible, predicated on the fact that recovered beliefs were at one time in the base as base beliefs. The recovery of those previously retracted base beliefs should occur whenever the reason that caused them to be removed is, itself, removed (or invalidated). In such a case, the previously retracted beliefs should be returned to the base, because they were base beliefs and the reason for disbelieving them no longer exists.
2. Reconsideration 2.1. Assuming a Linear Preference Ordering In defining reconsideration, we make the assumption that there is a linear preference ordering () over all base beliefs [11,12]. See [14] (also in these proceedings) for a discussion of reconsideration on non-linear pre-orders. Although the beliefs may be ordered by recency, we assume a different ordering may be used. Thus, any base can be represented as a unique sequence of beliefs in order of descending preference: B = p1 , p2 , . . . , pn , where pi ≻ pi+1 , 1 ≤ i < n. Note: pi ≻ pj means that pi is strictly preferred over pj (is stronger than pj ) and is true iff pi pj and pj pi . 2.2. The Knowledge State for Reconsideration The knowledge state used to formalize reconsideration [11,12] is a tuple with three elements. Starting with B0 = ∅, Bn is the belief base that results from a series of expansion and consolidation operations on B0 (and the subsequent resulting bases: B1 , B2 , B3 , . . .).7 , and B ∪ = 0≤i≤n Bi . Xn is the set of base beliefs removed (and currently dis-believed: Bn ∩ Xn = ∅) from these bases during the course of the series of operations: Xn =def B ∪ \ Bn . 7 Adding beliefs to a finite base by way of expansion followed by consolidation is a form of non-prioritized belief change called semi-revision [9].
166
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
The knowledge state is a triple of the form B, B ∪ , , where is the linear ordering of B ∪ , X = B ∪ \ B and Cn(B, B ∪ , ) = Cn(B). All triples are assumed to be in this form. A numerical value for credibility of a base is calculated from the preference ordering of B ∪ = p1 , . . . , pn : Cred(B, B ∪ , ) = pi ∈B 2n−i (the bit vector indicating the elements in B) when B ⊢ ⊥. Otherwise, when B ⊢⊥, Cred(B, B ∪ , ) = -1. A linear ordering over bases (B ∪ ) is also defined: B B ∪ B ′ if and only if Cred(B, B ∪ , ) ≥ Cred(B ′ , B ∪ , ). 2.3. Optimal Base Given a possibly inconsistent set of base beliefs, B ∪ = p1 , p2 , ..., pn , ordered by , the base B is considered optimal w.r.t. B ∪ and if and only if B ⊆ B ∪ and (∀B ′ ⊆ B ∪ ) : B B ∪ B ′ . This favors retaining a single strong belief over multiple weaker beliefs. As in [11,12], an operation of contraction or consolidation produces the new base B ′ by using a global decision function that maximizes Cred(B ′ , B ∪ , ) w.r.t. the operation being performed. Note: maximizing Cred(B ′ , B ∪ , ) without concern for any specific operation would result in B ′ = B ∪ ! . Observation 1 The consolidation of a base B is the optimal subset of that particular base (w.r.t. B ∪ and ): B! ⊆ B and (∀B ′ ⊆ B) : B! B ∪ B ′ . 2.4. Operations on a Knowledge State The following are operations on the knowledge state B = B, B ∪ , . Expansion of B by p and its preference information, p , is: B + p, p =def B + p, B ∪ + p, 1 , where 1 is adjusted to incorporate the preference information p — which positions p relative to other beliefs in B ∪ , while leaving the relative order of other beliefs in B ∪ unchanged. The resulting ordering is the transitive closure of these relative orderings.8 Contraction of B by p is: B ∼ p =def B ∼ p, B ∪ , . Reconsideration of B [11,12] is: B!∪ =def B ∪ !, B ∪ , . Theorem 1 [11,12] The base resulting from reconsideration is optimal w.r.t. B ∪ and . Proved using Obs. 1. Observation 2 Reworded from [12] Given any knowledge state for B ∪ and , reconsideration on that state produces the optimal knowledge state: (∀B ⊆ B ∪ ) : B, B ∪ , !∪ = Bopt , B ∪ , , where Bopt is the optimal base w.r.t. B ∪ and (because Bopt = B ∪ !). Optimized-addition to B (of the pair p, p ) [11] is: B +∪! p, p =def (B, B ∪ , + p, p )!∪ . 8 We
assume that if p ∈ B ∪ , the location of p in the sequence might change — i.e. its old ordering information is removed before adding p and performing closure — but all other beliefs remain in their same relative order.
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
167
If B ∪ and are known, a shorthand expression is used: (1) B +∪! p, p to stand for B, B ∪ , +∪! p, p ; and (2) B ∪ +∪! p, p to stand for B ′ , B ∪ , +∪! p, p for any B ′ ⊆ B ∪ . If the effect of adjusting the ordering by p is also known, then p, p can be reduced to p. Observation 3 Optimized-addition does not guarantee that the belief added will be in the optimized base — it might get removed during reconsideration. Example 2 Let B ∪ = p, p→q, ¬q, p→r, ¬r, m→r, m. And assume that B = B ∪ ! = p, p→q, p→r, m→r, m. B, B ∪ , +∪! ¬p, ¬p = ¬p, p→q, ¬q, p→r, ¬r, m→r, assuming that ¬p indicates ¬p ≻ p. Notice the return of ¬q and ¬r to the base due to the removal of p, and the simultaneous removal of m to avoid a contradiction with ¬r and m→r. If, on the other hand, ¬p indicated p ≻ ¬p, then the base would have remained unchanged.
3. Belief Liberation 3.1. Basic Notation In this section, we summarize σ-liberation [2] and compare it to reconsideration. Like reconsideration, liberation assumes a linear sequence of beliefs σ = p1 , . . . , pn . The sequence is ordered by recency, where p1 is the most recent information9 the agent has received (and has highest preference), and the set [[σ]] is the set of all the sentences appearing in σ. Since the ordering in this sequence is based on recency, for the remainder of this section, all comparisons between features of liberation and those of reconsideration are predicated on the assumption that both of their sequences are ordered by recency.10 3.2. A Belief Sequence Relative to K In [2] the ordering of σ is used to form the maximal consistent subset of [[σ]] iteratively by defining the following: (1) B0 (σ) = ∅. (2) for each i = 0, 1, . . . , n − 1: if Bi (σ) + p(i+1) ⊢ ⊥, then B(i+1) (σ) = Bi (σ) + p(i+1) , otherwise B(i+1) (σ) = Bi (σ). That is, each belief — from most recent to least — is added to the base only if it does not raise an inconsistency. Definition 1 [2] Let K be a belief theory and σ = p1 , . . . , pn a belief sequence. We say σ is a belief sequence relative to K iff K = Cn(Bn (σ)). 3.3. Removing a Belief q from K In [2] the operation of removing the belief q is defined using the following: (1) B0 (σ, q) = ∅. (2) for each i = 0, 1, . . . , n − 1: if Bi (σ, q) + pi+1 ⊢q, then B(i+1) (σ, q) = Bi (σ, q) + p(i+1) , otherwise B(i+1) (σ, q) = Bi (σ, q). Note that “Bn (σ) = Bn (σ, ⊥) 9 We
have reversed the ordering from that presented in [2] to avoid superficial differences with the ordering for reconsideration. We have adjusted the definitions accordingly. 10 We discuss the effects of a recency-independent ordering in Section 5.2.
168
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
and Bn (σ, q) is the set-inclusion maximal amongst the subsets of [[σ]] that do not imply q."[2] Given a belief sequence σ relative to K, σ is used to define an operation ∼σ for K such that K ∼σ q represents the result of removing q from K [2]: K ∼σ q = Cn(Bn (σ, q)) if q ∈ Cn(∅), otherwise K ∼σ q = K. Definition 2 [2] Let K be a belief theory and ∼ be an operator for K. Then ∼ is a σ-liberation operator (for K) iff ∼ = ∼σ for some belief sequence σ relative to K.
4. Comparing Reconsideration and Liberation 4.1. The Sequence σ is Used for Defining Liberation The research in belief liberation focuses on defining liberation operators for some belief theory K relative to some arbitrary σ. The focus is on K and how it changes when a contraction is performed — whether there is any σ that indicates that a given contraction operation is an operation of σ-liberation. Liberation research does not advocate maintaining any one, specific σ. It is clearly stated that σ-liberation does not adhere to Recovery, but if you maintain a belief base as a recency-ordered sequence, σ, then liberation terminology can be directly related to that of reconsideration and re-optimization. 4.2. Similarities Assume B ∪ = [[σ]] and is ordered by recency. We refer to the belief theory associated with σ as Kσ . Bn (σ) is the maximal consistent subset of [[σ]] — i.e. Bn (σ) = [[σ]]! = B ∪ ! . Similarly, Bn (σ, p) is the kernel contraction of [[σ]] by p. In other words, Bn (σ, p) = B ∪ ∼ p.11 Thus, K ∼σ p = Cn(B ∪ ∼ p). If B = B ∪ ! = Bn (σ), then we can define σB to be a recency ordering of just the beliefs in Bn (σ), and Kσ = KσB . Now we can define contraction of an optimal knowledge state in terms of contraction for σ-liberation: B ∼ p = (Kσ ∼σB p) ∩ B and Cn(B, B ∪ , ∼ p) = Kσ ∼σB p. Let us define σ-addition (adding a belief to σ) as follows: σ + p is adding the belief p to the sequence σ = p1 , . . . , pn to produce the new sequence σ1 = p, p1 , . . . , pn .12 If σ is the sequence for B ∪ , then the optimized addition of p to any knowledge state for B ∪ results in a base equivalent to the base for p added to σ: Given B ∪ +∪! p = B ′ , B ∪ + p, ′ , then B ′ = Bn+1 (σ + p).13 Likewise, σ-addition followed by recalculation of the belief theory is equivalent to optimized-addition followed by closure: Kσ+p = Cn(B ∪ +∪! p). 11Note:
specifically not Bn (σ, p) = B ∪ ! ∼ p. is also the technique described in [3]. 13 This notation for the base associated with a σ-addition is not inconsistent with the notation in [2] for the base associated with a σ-liberation operation. Addition changes the sequence, so we are determining the base for the new sequence (σ + p): B(σ + p). The operation of σ-liberation changes the base used to determine the belief theory (from B(σ) to B(σ, p)), but the sequence σ remains unchanged. 12 This
169
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
Case
(TR)
(LR)
(OR)
K ⊆ Cn((B ∼ p) + p)
K ⊆ Cn((K ∼σ p) + p)
K ⊆ Cn((B ∼ p) +∪! p) but also B ⊆ (B ∼ p) +∪! p
ordered by recency
ordered by recency
(i) ordered by recency
(ii) p ∈ B ∪ and 1 =
YES optimal
YES possibly inconsistent
YES optimal
YES optimal
NO consistent
NO possibly inconsistent
YES optimal
YES optimal
1.
p ∈ Cn(B); P = {{p}}
2.
p ∈ Cn(B); P \ {{p}} = ∅
3.
p ∈ Cn(B); B + p ⊢ ⊥
YES optimal
YES optimal
YES optimal
NA
4.
p ∈ Cn(B); B + p ⊢⊥
YES inconsistent
YES inconsistent
NO optimal
YES optimal
Table 1. This table indicates whether each of three recovery formulations (TR, LR and OR) always holds in each of four different cases (comprising all possible states of belief). K = Cn(B) and p = p, p . See the text for a detailed description. If contraction is used for consistency maintenance only, a column for adherence to either B ⊆ (B +∪! ¬p) +∪! p (ordered by recency) or Kσ ⊆ K(σ+¬p)+p would match (OR-i).11
4.3. Cascading Belief Status Effects It is important to realize that there is a potential cascade of belief status changes (both liberations and retractions) as the belief theory resulting from a σ-liberation operation of retracting a belief p is determined; and these changes cannot be anticipated by looking at only the ⊥-kernels and kernels for p. This is illustrated in the example below. Example 3 Let σ = p → q, p, ¬p ∧ ¬q, r → p ∨ q, r, ¬r. Then, B6 (σ) = {p → q, p, r → p ∨ q, r}. Note that r ∈ Kσ and ¬r ∈ / Kσ . K ∼σ p = Cn({p → q, ¬p ∧ ¬q, r → p ∨ q, ¬r}). Even though r is not in a p-kernel in [[σ]], r ∈ K ∼σ p. Likewise, ¬r is liberated even though ∄N s.t. N is a ⊥-kernel in [[σ]] and {¬r, p} ⊆ N . Optimized addition has a similar effect. If B ∪ = σ, and B = B ∪ ! = B6 (σ), then B, B ∪ , +∪! ¬p, ¬p , where ¬p indicates ¬p ≻ p, would result in the base B1 = {¬p, p → q, ¬p ∧ ¬q, r → p ∨ q, ¬r}.
5. Improving Recovery for Belief Bases 5.1. Comparing Recovery-like Formulations Let B = B, B ∪ , , s.t. B ∪ = [[σ]], B = B ∪ ! = Bn (σ) and K = Kσ = Cn(B)), P = the set of p-kernels in B, p = p, p , B1 = B1 , B1∪ , 1 = (B ∼ p) +∪! p, and X1 = B1∪ \ B1 . The first element in any knowledge state triple is recognized as the currently believed base of that triple (e.g. B in B), and is the default set for any shorthand set notation formula using that triple (e.g. A ⊆ B means A ⊆ B). Table 1 shows the cases where different recovery formulations hold — and where they do not hold. There is a column for each formulation and a row for each case. The traditional Recovery postulate for bases (Cn(B) ⊆ Cn((B ∼ p) + p)) is shown in column (TR). In column (LR), the recovery postulate for σ-liberation retraction followed by expansion (Liberation-recovery, LR) is: K ⊆ ((K ∼σ p) + p). In column (OR), the recovery-like formulation for kernel contraction followed by optimized-addition is: K ⊆ Cn((B ∼ p) +∪! p) (called Optimized-recovery, OR).
170
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
OR can also be written as B ⊆ ((B ∼ p) +∪! p), which is more strict than Recovery: base beliefs are actually recovered in the base, itself, not just its closure. Essentially, OR an axiom about contraction followed by optimized-addition—as opposed to the regular Recovery axiom, which describes the results of contraction followed by expansion. In either case, recovering retracted beliefs is a desirable feature of contraction followed by either expansion or optimized-addition. For column (OR-i), we assume that the ordering for B ∪ and B1∪ is recency. For column (OR-ii), we assume that the ordering is not recency-based, p ∈ B ∪ (not applicable for Case 3), and optimized-addition returns p to the sequence in its original place (i.e. =1 ). Note that (OR) is not a true Recovery axiom for some contraction operation; because it can be rewritten as K ⊆ Cn( ((B ∼ p) + p)!∪ ), where the re-optimizing operation of reconsideration is performed after the expansion but before the closure to form the new belief space. YES means the formulation always holds for that given case; NO means it does not always hold; NA means the given case is not possible for that column’s conditions. The second entry indicates whether the base/theory is optimal w.r.t. B1∪ (= B ∪ + p = σ + p) and its linear order. If not optimal, then a designation for consistency is indicated. Recall that optimality requires consistency. Theorem 2 Expansion of an optimal knowledge state by a belief that is consistent with the base (and is not being relocated to a lower position in the ordering) results in a new and optimal knowledge state: Given B = B, B ∪ , , where B = B ∪ ! and X = B ∪ \B, then (∀p s.t. B + p ⊢ ⊥): B + p, p = B + p, B ∪ + p, ′ !∪ = B + p, B ∪ + p, ′ . (Provided: if p = pi ∈ B ∪ = pj ∈ (B ∪ + p), then j ≤ i; otherwise ∪! might remove p.) Proof: B = B ∪ ! . (∀x ∈ X) : B + x ⊢⊥ and (∄B ′ ⊆ B) s.t. both (B \ B ′ ) + x ⊢ ⊥ and (∀b ∈ B ′ )x b. Since B + p ⊢ ⊥, then ∀B ′′ ⊆ (B ∪ + p) : (B + p) B ∪ +p B ′′ . Case 1 In this simple case, {p} is the sole p-kernel in B. For all formulations, p is removed then returned to the base, therefore all formulations hold. Case 2 Since there are p-kernels in B that consist of beliefs other than p, some base beliefs other than p must be retracted during contraction by p. Returning these removed base beliefs is the recovery feature that is the central focus of this paper. For (TR), if B = {p ∧ q}, then B ∼ p = ∅ and (B ∼ p) + p) = {p}. Therefore, K ⊆ Cn((B ∼ p) + p), and (TR) does not hold. For (LR), if σ = p ∧ q, then K ∼σ p = ∅ and (K ∼σ p) + p = Cn({p}). So, (LR) also does not hold. For (OR), since p ∈ Cn(B), then B + p ⊢ ⊥. Thus B1 = B + p (from Theorem 2), so B ⊆ B1 , and (OR) holds. Case 3 Since p ∈ Cn(B) and B + p ⊢ ⊥, we know p ∈ B ∪ — otherwise, (B + p) ≻B ∪ B and B = B ∪ ! as it was defined. Column (OR-ii) has NA (for “Not Applicable") as its entry, because (OR-ii) assumes that p ∈ B ∪ . For the other columns, B ∼ p = B, K ∼σ p = K = Cn(B), and B ∼ p = B. Clearly, (TR) holds and (LR) holds. (OR-i) also holds (Theorem 2).
Case 4 Because p ∈ Cn(B), B ∼ p = B and K ∼σ p = K = Cn(B). Since B +p ⊢⊥ and both (TR) and (LR) produce inconsistent spaces, they both hold. For (OR), B ∼ p = B. For (OR-i), the optimized-addition puts p at the most preferred end of the new sequence (most recent), so p ∈ B1 forcing weaker elements of B to be retracted for consistency maintenance during reconsideration (recall B + p ⊢⊥). Therefore (OR-
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
171
i) does not hold.14 For (OR-ii), optimized-addition returns p to the same place in the sequence that it held in B ∪ (recall B1∪ = B ∪ and =1 ). Therefore, B = B1 and (OR-ii) holds. 5.2. Discussion When comparing the traditional base recovery adherence (in column TR) to optimized recovery adherence (shown in the OR columns), the latter results in improved adherence, because: 1. if ordering by recency (OR-i), B is recovered in all cases where p ∈ Cn(B); 2. any beliefs removed due to contraction by p are returned (OR,1;OR,2) 3. if expansion by p would make the final base inconsistent (TR,4), B is not recovered if recency ordered, but the final base is consistent and optimal (OR-i,4). 4. when the retraction of p is truly “undone" (column (OR-ii)), B is recovered in all applicable cases; Reconsideration eliminates the results of any preceding contraction, because B ∪ is unaffected by contraction: (B ∼ p)!∪ = B!∪ . Likewise, optimized-addition also eliminates the results of any preceding contraction: ∀q : (B ∼ q) +∪! p = B +∪! p. If we consider contraction for consistency-maintenance only (assuming ordering by recency), the recovery-like formulation B ⊆ (B +∪! ¬p) +∪! p would have column entries identical to those in the column under (OR-i). Likewise, the entries in a column for Kσ ⊆ K(σ+¬p)+p would also be identical to the entries for column (OR-i). These results show adherence to (R3) in [4]: if ¬p ∈ / K, then K ⊆ (K ∗ ¬p) ∗ p, where ∗ is prioritized revision (consistent addition of a belief requiring the belief to be in the resulting belief theory) [1]. In the case where the ordering is not by recency, R3 still holds provided (1) p ≻ ¬p in the final ordering and (2) if p was in the original ordering, it is not weaker in the new ordering. We also note that the improved recovery compliance that reconsideration provides does not involve the addition of new beliefs to the belief base during contraction. Belief base contraction can adhere to Recovery if the contraction operation to remove p also inserts p → q into the base, for every belief q that is removed during that retraction of p. However, this deviates from our assumption of a foundations approach, where the base beliefs represent the base input information from which the system or agent should reason. Not only would this technique insert unfounded base beliefs15 , but the recovery of previously removed beliefs would only show up in the belief space; whereas reconsideration actually returns the removed beliefs to the belief base. An additional benefit is that the belief removed (whether through contraction or revision by a contradicting belief) need not be reasserted in its original syntactic form. Any logically equivalent assertion will have the same effect: provided the newly asserted belief survives re-optimization, it and all those beliefs just retracted will be returned to the base. In fact, any belief that is inconsistent with the removed belief’s negation will have this same effect (assuming it is consistent with the starting base). 14 Producing an optimal base is preferred to adhering to a recovery-like formulation by having an inconsistent base. 15 The new beliefs are not from some input source, but derived from the contraction operation. This violates the foundations approach as well as the Inclusion postulate (as discussed in Section 1.1).
172
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
Example 4 Given a knowledge state triple with B ∪ = B = {s, d, s → q}, B ∼ s ∨ d = {s → q}, as described in Example 1, (B ∼ s ∨ d) +∪! (s ∨ d) = (B +∪! ¬(s ∨ d)) +∪! (s ∨ d) = {s ∨ d, s, d, s → q} and q is derivable. Similarly, if the belief that is asserted last (and strongest) is merely inconsistent with ¬(s ∨ d), the recovery of retracted beliefs is performed just the same: (B +∪! ¬(s ∨ d)) +∪! (p ∧ (¬s → (d ∧ m))) results in a final base B2 = {p ∧ (¬s → (d ∧ m))), s, d, s → q} and q is derivable (as is m). If the linear ordering is not based on recency and 1 =, then there are cases where Optimized-recovery does not hold even though the resulting base will still be optimal— those cases where p does not survive the optimization process. For Case 1, if p is reinserted into the ordering at a weaker spot, it might be retracted during reconsideration if it is re-asserted in a position that is weaker than the conflicting elements of one of its pre-existing ⊥-kernels and the decision function favors retracting p. This could also happen in case 2, unless the elements of some p-kernel are all high enough in the order to force the retraction of the beliefs conflicting with p. In Case 3 all recovery formulations always hold. In Case 4, if p is inserted into the final ordering at a strong enough position, it could survive the reconsideration step of optimized-addition — in which case, (OR) would not hold. These exceptions are typical of any re-ordering of beliefs. The benefits of reconsideration are not limited to linear orderings. A discussion of reconsideration on pre-orders is offered in [11] and [14] along with a table showing reconsideration using the six adjustment strategies implemented in SATEN [16], where five bases are improved—three to optimal, showing full recovery and adhering to (R3).16 Assuming an implemented TMS system retains its ⊥-kernels, reconsideration (and its recovery-like benefits) can be implemented using an efficient, anytime algorithm called dependency-directed reconsideration (DDR) [11,12]. Examining a small subset of B ∪ in a series of steps, the process can be suspended whenever reasoning, acting or belief change need to be performed. The system performs these operations on the most credible base it has at that time. DDR can be re-called later to continue its optimization, which will be adjusted to take the interleaved operations into account.
6. Conclusions and Future Work Optimized Recovery (OR) adds belief base optimization to the traditional Recovery postulate allowing a system to experience the restoration of p-kernels in the base when reasserting p without sacrificing adherence to the other more accepted postulates (such as Success and Inclusion) or to the foundations approach. Reconsideration (determining the base for a belief sequence) optimizes a base through consolidation of a chain of base beliefs. The effects match the iterated revision axiom (R3) and show benefits for total pre-orders, as well. Any system that implements consolidation can produce these results. The anytime algorithm for DDR can be implemented in a TMS. Future work includes exploring how this research relates to other iterated belief change axioms and improving the current implementation of reconsideration in an existing ATMS so that it can handle non-linear orderings. 16 Consolidation (!) is called theory extraction in [16]. We assume a ≻ ¬a. SATEN website: http://magic.it.uts.edu.au/systems/saten.html
F. Johnson and S.C. Shapiro / Base Belief Change and Optimized Recovery
173
Acknowledgments The authors are grateful for the support, insights and feedback of William J. Rapaport, Carl Alphonce, Ken Regan, David R. Pierce, Jan Chomicki, Samir Chopra, Thomas Meyer, and the SNePS Research Group. Special thanks go out to Sven Ove Hansson, the outside reader for Fran’s dissertation [11].
References [1] C. E. Alchourrón, P. Gärdenfors, and D. Makinson. On the logic of theory change: Partial meet contraction and revision functions. The Journal of Symbolic Logic, 50(2):510–530, June 1985. [2] R. Booth, S. Chopra, A. Ghose, and T. Meyer. Belief liberation (and retraction). Studia Logica, 79 (1):47–72, 2005. [3] S. Chopra, K. Georgatos, and R. Parikh. Relevance sensitive non-monotonic inference on belief sequences. Journal of Applied Non-Classical Logics, 11(1-2):131–150, 2001. [4] S. Chopra, A. Ghose, and T. Meyer. Iterated revision and recovery: a unified treatment via epistemic states. In F. van Harmelen, editor, ECAI 2002: 15th European Conference on Artificial Intelligence, number 77 in Frontiers in Artificial Intelligence and Applications, pages 541–545, Amsterdam, The Netherlands, 2002. IOS Press. [5] J. de Kleer. An assumption-based truth maintenance system. Artificial Intelligence, 28(2):127–162, 1986. [6] K. D. Forbus and J. de Kleer. Building Problem Solvers. MIT Press, Cambridge, MA, 1993. [7] P. Gärdenfors. Belief Revision. Cambridge Computer Tracts. Cambridge University Press, Cambridge, 1992. [8] S. O. Hansson. Kernel contraction. J. Symb. Logic, 59(3):845–859, 1994. [9] S. O. Hansson. Semi-revision. Journal of Applied Non-Classical Logic, 7:151–175, 1997. [10] S. O. Hansson. A Textbook of Belief Dynamics, volume 11 of Applied Logic. Kluwer, Dordrecht, The Netherlands, 1999. [11] F. L. Johnson. Dependency-Directed Reconsideration: An Anytime Algorithm for Hindsight Knowledge-Base Optimization. PhD thesis, Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, January 2006. [12] F. L. Johnson and S. C. Shapiro. Dependency-directed reconsideration: Belief base optimization for truth maintenance systems. In Proceedings of the Twentieth National Conference on Artificial Intellicence (AAAI-05), pages 313–320, Menlo Park, CA, 2005. AAAI Press. [13] F. L. Johnson and S. C. Shapiro. Improving recovery for belief bases. In L. Morgenstern and M. Pagnucco, editors, IJCAI-05 Workshop on Nonmonotonic Reasoning, Action, and Change (NRAC’05): Working Notes, pages 65–70, Edinburgh, 2005. IJCAII. [14] F. L. Johnson and S. C. Shapiro. Reconsideration on non-linear base orderings. In Proceedings of STAIRS’06 at ECAI’06, Amsterdam, 2006. IOS Press. [15] M.-A. Williams. On the logic of theory base change. In C. MacNish, D. Pearce, and L. M. Pereira, editors, Logics in Artificial Intelligence(JELIA), pages 86–105. Springer, Berlin, Heidelberg, 1994. [16] M.-A. Williams and A. Sims. SATEN: An object-oriented web-based revision and extraction engine. In C. Baral and M. Truszy´nski, editors, Proceedings of the 8th International Workshop on Non-Monotonic Reasoning NMR’2000, 2000. CoRR article: cs.AI/0003059.
174
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Unsupervised Word Sense Disambiguation Using The WWW Ioannis P. KLAPAFTIS & Suresh MANANDHAR Department of Computer Science University of York York, UK, YO10 5DD {giannis,suresh}@cs.york.ac.uk Abstract. This paper presents a novel unsupervised methodology for automatic disambiguation of nouns found in unrestricted corpora. The proposed method is based on extending the context of a target word by querying the web, and then measuring the overlap of the extended context with the topic signatures of the different senses by using Bayes rule. The algorithm is evaluated on Semcor 2.0. The evaluation showed that the web-based extension of the target word’s local context increases the amount of contextual information to perform semantic interpretation, in effect producing a disambiguation methodology, which achieves a result comparable to the performance of the best system in SENSEVAL 3. Keywords. Unsupervised Learning, Word Sense Disambiguation, Natural Language Processing
1. Introduction Word Sense Disambiguation is the task of associating a given word in a text or discourse with a definition or meaning (sense) which is distinguishable from other meanings potentially attributable to that word [7]. WSD is a long-standing problem in NLP community. The outcome of the last SENSEVAL-3 workshop [12] clearly shows that supervised systems [6,16,10] are able to achieve up to 72.9% precision and recall1 [6] outperforming unsupervised ones. However, supervised systems need to be trained on large quantities of high quality annotated data in order to achieve reliable results, in effect suffering from the knowledge acquisition bottleneck [18]. Recent unsupervised systems [14,17,4] use semantic information (glosses) encoded in WordNet [13] to perform WSD. However, descriptive glosses of WordNet are very sparse and contain very few contextual clues for sense disambiguation [14]. This problem, as well as WordNet’s well-known deficiencies, i.e. the lack of explicit links among semantic variant concepts with different part of speech, and the lack of explicit relations between topically related concepts, were tackled by topic signatures (TS) [2]. 1 This result refers to the precision and recall achieved by the best supervised system with fine-grained scoring in the English sample task.
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW
175
A TS of ith sense of word w is a list of words that co-occur with ith sense of w. Each word in a TS has a weight that measures its importance. Essentially, TS contain topically related words for nominal senses of WordNet [13]. TS are an important element of the proposed methodology2 and thus, our assumption is that TS contain enough contextual clues for WSD. Another problem of WSD approaches (e.g. like approaches of Lesk [11]) is the criteria based on which we can determine the window size around the target word to create its local context. A large window size increases noise, while a small one decreases contextual information. The common method is to define the window size based on empirical and subjective evaluations, assuming that this size can capture enough amount of related words to perform WSD. To our knowledge, the idea of extending the local context of a target word has not been studied yet. The proposed method suggests the novelty of extending the local context of the target word by querying the web to obtain more topically related words. Then, the overlap of the extended context with the topic signatures of the different senses is measured using Bayes rule. The rest of the paper is structured as follows: section 2 provides the background work related to the proposed one, section 3 presents and discusses the disambiguation algorithm, section 4 contains the evaluation of the proposed methodology , section 5 identifies limitations and suggests improvements for future work and finally section 6 summarizes the paper.
2. Background A topic signature (TS) of ith sense of a word w is a list of the words that co-occur with ith sense of w, together with their respective weights. It is a tool that has been applied to word-sense disambiguation with promising results [1]. Let si be the WordNet synset for ith sense of a word w. Agirre’s [2] method for the construction of WordNet’s TS is the following. 1. Query generation In this stage, a query is generated, which contains all the monosemous relatives to si words as positive keywords, and the words in other synsets as negative keywords. Monosemous relatives can be hypernyms, hyponyms and synonyms. 2. Web documents download In the second stage, the generated query is submitted to an Internet search engine, and the n first documents are downloaded. 3. Frequency calculation In the third stage, frequencies of words in documents are calculated and stored in a vector vfi excluding common closed-class words(determiners, pronouns e.t.c). The vector contains pairs of (wordj , f reqi,j ), where j is the jth word in the vector and i is the ith sense of w. 4. TS weighting Finally, vfi is replaced with a vector vxi that contains pairs of (wordj , wi,j ), where wi,j is the weight of word j for TS of ith sense of word w. Weighting 2 In
the next section we will provide the exact automatic process for the construction of TS
176
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW
measures applied so far to TS are the tf/idf measure [15], x2 , t-score and mutual information. TS were applied to a WSD task, which showed that they are able to overcome WordNet’s deficiencies [1]. According to this approach, given an occurrence of the target word in a text, a local context was formed using a window size of 100 words3 . Then, for each word sense the weights for the context words appearing in the corresponding topic signature were retrieved and summed. The highest sum determined the predicted sense. A similar WSD method was proposed by Yarowsky [19]. His method performed WSD in unrestricted text using Roget’s Thesaurus and Grolier’s Encyclopedia and involved 3 stages as summarised below. 1. Creation of context discrimination lists. Representative contexts are collected to create context discriminators for each one of the 1041 Roget’s categories (sense categories). Let RCat be a Roget category. For each occurrence of a word w in a category RCat, concordances of 100 surrounding words in the encyclopedia are collected. At the end of this stage, each Roget category is represented by a list of topically related words. From this the conditional probability of each w given RCat, P (w|RCat), is calculated. 2. Salient Word weighting In the second stage, salient words of each list are identified and weighted. Salient words are detected according to their probabilities in a Roget category and in the encyclopedia. The following formula calculates the probability of a word appearing in the context of a Roget category, divided by the overall probability of the word in the encyclopedia corpus. P (w|RCat) P (w)
This formula, along with topical frequency sums are multiplied to produce a score for each salient word, and the n highly ranked salient words are selected. Each selected salient word is then assigned the following weight: log(P (w|RCat)P (w)) At the end of this stage, each Roget category is represented by a list of topically related words, along with their respective weights. As it is possible to observe, there is a conceptual similarity between TS and Yarowsky’s [19] context discriminators. 3. Disambiguation of a target word. The process to disambiguate a target word was identical to the TS based WSD. A local context was formed, and then for each Roget sense category, the weights for the context words appearing in the corresponding sense discriminator were retrieved and summed. The highest sum determined the predicted Roget sense. Both of these approaches attempt to disambiguate a target word w by constructing sense discrimination lists, and then measuring the overlap between the local context and each sense discrimination list of w using Bayes rule. Both of these approaches calculate empirically the window size of the local context. In the proposed method, we take the same view of using sense discriminators, but we attempt to extend the local context, in order to provide more information, aiming for a more reliable and accurate WSD. 3 This
window size was chosen by Agirre et. al [1] as the most appropriate one, after several experiments
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW
177
3. Disambiguation Method The proposed method consists of three steps which can be followed to perform disambiguation of a word w. 1. Collect external web corpus W C. In this stage, sentence s containing target word w is sent to Google and the first r documents are downloaded. Part-of-speech (POS) tagging is applied to retrieved documents to identify nouns within a window of +/− n number of words around w. A final list of nouns is produced as a result, which is taken to represent the external web context W C of w. 2. Retrieve topic signatures T Si for each nominal sense i of w. In this stage, T Si for each sense i of w is retrieved or constructed. In our approach, we have used TS weighted by the tf/idf measure [15]. 3. Use W C and T Si to predict the appropriate sense. When any of the words contained in T Si appears in W C, there is evidence that ith sense of w might be the appropriate one. For this reason, we sum the weights of words appearing both in T Si and in W C for each sense i, and then we use Bayes’s rule to determine the sense for which the sum is maximum. P (w|T Si )∗P (T Si ) arg max ( ) P (w) T Si
w∈W C
In case of lack of evidence4 , the proposed methodology makes a random choice of the predicted sense.
Essentially, the weight of a word w in a T Si is P (w|T Si ), the probability of a word w appearing in topic signature T Si . Consequently, the sum of weights of a word among all senses of a word is equal to 1. The probability P (T Si ) is the a priori probability of T Si to hold, which is essentially the a priori probability of sense i of w to hold. Currently, we assume P (T Si ) to be uniformly distributed. Note that P (w) may be omitted, since it does not change the results of the maximization. 3.1. Method Analysis The first step of the proposed method attempts to extend the local context of the target word by sending queries to Google, to obtain more topically related words. In most WSD approaches, the local context of a target word is formed by an empirically calculated window size, which can introduce noise and hence reduce the WSD accuracy. This is a result of the fact that the appropriate size depends on many factors, such as the writing style of the author, the domain of discourse, the vocabulary used etc. Consequently, it is possibly impracticable to calculate the appropriate window size for every document, author etc. Even if this is empirically regarded or randomly chosen as the best possible, there is no guarantee that the target word’s local context contains enough information to perform accurate WSD. In our approach, we attempt to overcome this problem by downloading several web documents and choosing a small window size around the target word. Our initial pro4 We
were unable to download any documents from the web
178
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW
jections suggest that through this tactic, we will increase the amount of contextual clues around the target word, in effect increasing the accuracy of WSD. The proposed method is heavily based on TS and its performance depends on its quality. It has already been mentioned, that TS are conceptually similar to the Roget’s sense discrimination (RSD) lists [19]. But Yarowsky’s method [19] of constructing RSD lists suffers from noise. We believe that our proposed method is less likely to suffer from noise for the reasons following. Firstly, TS are built by generating queries containing only monosemous relatives. In contrast, RSD lists were constructed by taking into account each word w appearing in a Roget category. However, many of these words used to collect examples from Grolier’s encyclopedia were polysemous, in effect introducing noise. Secondly, noise reduction is enhanced by the fact that TS are constructed by issuing queries that have other senses’s keywords as negative keywords, in effect being able to exclude documents that contain words that are semantically related with senses of the target word, other than the one we are interested in. Finally, noise reduction is enhanced by the fact that in tf/idf [15] based TS, words occurring frequently with one sense, but not with the other senses of the target word, are assigned high weights for the associated word sense, and low values for the rest of word senses. Furthermore, words occurring evenly among all word senses are also assigned low weights for all the word senses [3]. On the contrary, Yarowsky’s measure [19] only takes into account the probability of a word appearing in a Roget’s category representative context, divided by the overall probability of the word in corpus. Figure 1 shows the conceptual architecture of the afore mentioned WSD methods and of the proposed one.
4. Evaluation 4.1. Preparation At the first step of the disambiguation process (section 3), we mentioned two parameters that affect the outcome of the proposed disambiguation method. The first one was the number of documents to download and the second was the window size around the target word within the retrieved documents. Let n be the window size and r the number of downloaded documents. Our purpose at this stage, was to perform WSD on a large part of SemCor 2.0 [9] with different values on r and n, and then choose these values, for which our WSD algorithm was performing better. These values would be used to perform evaluation on the whole SemCor. Two measures are used for this experiment, P rank1 and P rank2. P rank1 denotes the percentage of cases where the highest scoring sense is the correct sense, and is equal to recall and precision. Note that our recall measure is the same as the precision measure, because every word was assigned a sense tag 5 . P rank2 denotes the percentage of cases when one of the first two highest scoring senses is the correct sense. A part of our experiments is shown in Table 1. We obtained the best results for r = 4 and n = 100. It seems that when r is above 4, the system retrieves inconsistent 5 In
the seldom case of having lack of evidence to output a sense, the predicted sense was randomly chosen
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW
179
Figure 1. WSD Methods Conceptual Architectures
web documents, which increase noise. In contrast, when r is below 4, the amount of contextual clues decreases. Additionally, for all values of r, when n is above 100, the system receives noisy information, while when n is below 100 (not shown), performance is lowered due to the small window size. 4.2. Results on SemCor Table 2 shows the results of our evaluation, when n is set to 100 and r is to 4 for the first 10 SemCor files. Table 2 shows that our system achieved 67.4% on a large part of SemCor 2.0. In 81.2% of cases one of the first two highest scoring senses was the correct one. Results
180
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW n
r
Prank1
Prank2
100
3
0.685
0.800
150
3
0.675
0.801
200
3
0.670
0.811
100
4
0.731
0.846
150
4
0.681
0.830
200
4
0.677
0.823
100
5
0.695
0.800
150
5
0.675
0.790
200 5 0.660 0.793 Table 1. Experiments on the br-a01 file of SemCor 2.0
File
Nouns
Prank1
Prank2
br-a01
573
0.731
0.846
br-a02
611
0.729
0.865
br-a11
582
0.692
0.812
br-a12
570
0.685
0.808
br-a13
575
0.600
0.760
br-a14
542
0.706
0.854
br-a15
535
0.691
0.818
br-b13
505
0.570
0.774
br-b20
458
0.641
0.779
br-c01
512
0.671
0.791
Total 5463 0.674 0.812 Table 2. Results from the first 10 files of Brown 1 Corpus
on the whole SemCor did not change significantly. In particular, our approach achieved 69.4% Prank1 and 82.5% Prank2. Our first baseline method was the performance of TS (without querying the web) using a window of 100 words as Agirre et. al. [1] did. The comparison between TS WSD and the proposed method would allow us to see, if the web-based extension of local context is useful for WSD. Our second baseline method was a method similar to that of Lesk [11], which is based on extending the content of the target word by querying the web (as in our approach), and then measuring the overlap of the extended context with WordNet-based lists of the different senses as in [8]. Each sense list is constructed by taking into account all the hypernyms, hyponyms, meronyms, holonyms and synonyms of the particular sense. This method is used to show that TS overcome the WordNet deficiencies (section 1) and are useful for WSD. Table 3 shows the comparison between the proposed and the baseline methods. Table 4 shows a comparison of our method’s performance with other recent WSD approaches on same evaluation data set6 . We compare our method with a method similar 6 The
compared systems performance is mentioned in their literature
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW Methods
Prank1 (%)
Proposed
69.4
82.5
TS WSD
60.79
76.34
181
Prank2 (%)
Web based Lesk-like WSD 59.90 69.91 Table 3. Comparison between the proposed and the baseline methods
to that of Lesk [11], which creates sense lists for a target word w using WordNet hypernym glosses [5]. Each word in a sense list is assigned a weight inversely proportional to its depth in the WordNet hierarchy. At the end, they measure the overlap of each sense list with the local context7 . This system is referred to in Table 4 as WordNet-based Lesk-like. The second system is the best performing system in the last SENSEVAL 3 workshop [14] and is similar to the previous one, differing on the weighting of words in the sense lists. In particular, Ramakrishnan et al. [14] use a variation of the tf/idf [15] measure, which they call tf/igf [14]. The inverse gloss frequency (igf) of a token is the inverse of the number of glosses, which contain that token and it captures the commonness of that particular token. This system is referred to in Table 4 as gloss-centered. Methods
Prank1(%)
Prank2(%)
Gloss-centered
71.4
83.9
Proposed
69.4
82.5
WordNet-based Lesk-like 49.5 62.4 Table 4. Comparison of modern approaches to WSD.
We intend to compare our approach with SENSEVAL workshop approaches in the near future. At this stage, this was infeasible due to different WordNet versions (SENSEVAL uses 1.7.1 while ours is 2.0). As it is possible to observe, our method achieves better results than the WordNet-based Lesk-like method [5], and a comparable performance to the gloss-centered approach.
5. Identified Limitations & Further Work The current work is an ongoing research and hence, we have identified two significant limitations by manual identification of incorrect predictions. The first identified limitation is the retrieval of inconsistent (noisy) web documents. This shortcoming is a result of the query which is sent to Google. The proposed methodology generates a string query, which is essentially the sentence containing the target word. If this sentence is not large enough, then Google will return irrelevant documents that will negatively affect the performance of the system. We propose two solutions to this problem. The first one is to send a n number of adjacent sentences to Google. That way, the particular search engine will be able to return more consistent web documents, possibly increasing the accuracy of WSD. The second solution is to use NP chunking and enclosing in quotes. This technique will allow Google to search for the exact sequence of words enclosed in quotes. As a 7 Local
context is equal to the sentence containing the target word
182
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW
result, returned web documents will be less noisy and the accuracy of WSD will possibly increase. The second identified limitation is the noise included in topic signatures. There were cases in the evaluation, in which the retrieved web documents were relevant, but we were unable to predict the correct sense, even when we were using all the possible combinations of the number of downloaded documents and the window size around the target word within the documents. This limitation arose from the fact that word senses were similar, but still different. TS were unable to discriminate between these senses, which means that TS of the corresponding senses have high similarity. Experiments on calculating semantic distance between word senses using TS and comparison with other distance metrics have shown that topic signatures based on mutual information (MI) and t-score perform better than tf/idf-based TS [3]. This means that our WSD process would possibly achieve a higher performance using the MI or t-score based TS. Unfortunately, this was infeasible to test at this stage, since these TS were not available to the public. Future work involves experimentation with other kinds of TS and exploration of the parameters of their construction methodology targeted at more accurate TS. Finally, verbs and pre-nominal modifiers are not considered in the particular approach. We intend to extend topic signatures by developing appropriate ones for verbs and pre-nominal modifiers. Thus, their disambiguation will also be feasible.
6. Conclusions We have presented an unsupervised methodology for automatic disambiguation of noun terms found in unrestricted corpora. Our method attempts to extend the local context of a target word by issuing queries to the web, and then measuring the overlap with topic signatures of the different senses using Bayes rule. Our method outperformed the TS-based WSD, indicating that the extension of local context increases the amount of useful knowledge to perform WSD. Our method achieved promising results, which are comparable to the result of the best performing system participating in SENSEVAL 3 competition. Finally, we have identified two main limitations, which we intend to overcome in the future in order to provide a more reliable WSD.
Acknowledgments The first author is grateful to the General Michael Arnaoutis charitable foundation for its financial support. We are more than grateful to our colleague, George Despotou, and to Georgia Papadopoulou for proof reading this paper.
References [1]
E. Agirre, O. Ansa, E. Hovy, and D. Martinez, ‘Enriching very large ontologies using the www’, in ECAI Workshop on Ontology Learning. Berlin, Germany, (2000).
I.P. Klapaftis and S. Manandhar / Unsupervised Word Sense Disambiguation Using the WWW [2] [3] [4]
[5] [6]
[7] [8]
[9] [10]
[11] [12]
[13] [14]
[15] [16]
[17]
[18]
[19]
183
E. Agirre, O. Ansa, E. Hovy, and D. Martinez, ‘Enriching wordnet concepts with topic signatures’, ArXiv Computer Science e-prints, (2001). Eneko Agirre, Enrique Alfonseca, and Oier Lopez de Lacalle, ‘Approximating hierarchy-based similarity for wordnet nominal synsets using topic signatures’, in Sojka et al. [SPS+03], pp. 15–22, (2004). Timothy Chklovski, Rada Mihalcea, Ted Pedersen, and Amruta Purandare, ‘The senseval-3 multilingual english-hindi lexical sample task’, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, eds., Rada Mihalcea and Phil Edmonds, pp. 5–8, Barcelona, Spain, (July 2004). Association for Computational Linguistics. K. Fragos, Y. Maistros, and C. Skourlas., ‘Word sense disambiguation using wordnet relations’, First Balkan Conference in Informatics, Thessaloniki, (2003). Cristian Grozea, ‘Finding optimal parameter settings for high performance word sense disambiguation’, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, eds., Rada Mihalcea and Phil Edmonds, pp. 125–128, Barcelona, Spain, (July 2004). Association for Computational Linguistics. N. Ide and J. Veronis, ‘Introduction to the special issue on word sense disambiguation: The state of the ˝ art.’, Computational Linguistics 24(1), 1U40, (1998). Ioannis P. Klapaftis and Suresh Manandhar, ‘Google & WordNet based Word Sense Disambiguation’, in Proceedings of the International Conference on Machine Learning (ICML-05) Workshop on Learning and Extending Ontologies by using Machine Learning Methods, Bonn, Germany, (August 2005). S. Lande, C. Leacock, and R. Tengi, ‘Wordnet, an electronic lexical database’, in MIT Press, Cambridge MA 199-216, (1998). Yoong Keok Lee, Hwee Tou Ng, and Tee Kiah Chia, ‘Supervised word sense disambiguation with support vector machines and multiple knowledge sources’, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, eds., Rada Mihalcea and Phil Edmonds, pp. 137–140, Barcelona, Spain, (July 2004). Association for Computational Linguistics. Michael Lesk, ‘Automated sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cream cone’, in Proceedings of the AAAI Fall Symposium Series, pp. 98–107, (1986). Rada Mihalcea, Timothy Chklovski, and Adam Kilgarriff, ‘The senseval-3 english lexical sample task’, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, eds., Rada Mihalcea and Phil Edmonds, pp. 25–28, Barcelona, Spain, (July 2004). Association for Computational Linguistics. ˝ G. Miller, ‘Wordnet: A lexical database for english’, Communications of the ACM, 38(11), 39U–41, (1995). Ganesh Ramakrishnan, B. Prithviraj, and Pushpak Bhattacharya, ‘A gloss-centered algorithm for disambiguation’, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, eds., Rada Mihalcea and Phil Edmonds, pp. 217–221, Barcelona, Spain, (July 2004). Association for Computational Linguistics. G. Salton and C. Buckley, ‘Term weighting approaches in automatic text retrieval’, Information Processing and Management, 24(5), 513–523, (1988). Carlo Strapparava, Alfio Gliozzo, and Claudiu Giuliano, ‘Pattern abstraction and term similarity for word sense disambiguation: Irst at senseval-3’, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, eds., Rada Mihalcea and Phil Edmonds, pp. 229–234, Barcelona, Spain, (July 2004). Association for Computational Linguistics. Sonia Vázquez, Rafael Romero, Armando Suárez, Andrés Montoyo, Iulia Nica, and Antonia Martí, ‘The university of alicante systems at senseval-3’, in Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, eds., Rada Mihalcea and Phil Edmonds, pp. 243–247, Barcelona, Spain, (July 2004). Association for Computational Linguistics. Xinglong Wang and John Carroll, ‘Word sense disambiguation using sense examples automatically acquired from a second language’, in Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 547–554, Vancouver, British Columbia, Canada, (October 2005). Association for Computational Linguistics. David Yarowsky, ‘Word-sense disambiguation using statistical models of roget’s categories trained on large corpora’, in Proceedings COLING-92 Nantes, France, (1992).
184
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Relational Descriptive Analysis of Gene Expression Data Igor Trajkovski a,1 , Filip Zelezny b, Nada Lavraca, Jakub Tolarc a Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia b Department of Cybernetics, Czech Technical University in Prague, Prague, Czech Republic c Department of Pediatrics, University of Minnesota Medical School, Minneapolis, USA Abstract. This paper presents a method that uses gene ontologies, together with the paradigm of relational subgroup discovery, to help find description of groups of genes differentially expressed in specific cancers. The descriptions are represented by means of relational features, extracted from publicly available gene ontology information, and are straightforwardly interpretable by the medical experts. We applied the proposed method to two known data sets: (i) acute lymphoblastic leukemia (ALL) vs. acute myeloid leukemia (AML) and (ii) classification of fourteen types of cancer. Significant number of discovered groups of genes had a description, confirmed by the medical expert, which highlighted the underlying biological process that is responsible for distinguishing one class from the other classes. We view our methodology not just as a prototypical example of applying more sophisticated machine learning algorithms to gene expression analysis, but also as a motivation for developing increasingly more sophisticated functional annotations and ontologies, that can be processed by such learning algorithms. Keywords. Relational learning, Learning from structured data, Learning in bioinformatics, Scientific discovery, Inductive logic programming, Meta-learning
1. Introduction Microarrays are at the center of a revolution in biotechnology, allowing researchers to simultaneously monitor the expression of tens of thousands of genes. Independent of the platform and the analysis methods used, the result of a microarray experiment is, in most cases, a list of genes found to be differentially expressed. A common challenge faced by the researchers is to translate such gene lists into a better understanding of the underlying biological phenomena. Manual or semi1 Correspondence to: Igor Trajkovski, Department of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia. Tel.: +386 1 477 3125; Fax: +386 1 477 3315; E-mail: [email protected]
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
185
automated analysis of large-scale biological data sets typically requires biological experts with vast knowledge of many genes, to decipher the known biology accounting for genes with correlated experimental patterns. The goal is to identify the relevant “functions”, or the global cellular activities, at work in the experiment. For example, experts routinely scan gene expression clusters to see if any of the clusters are explained by a known biological function. Efficient interpretation of these data is challenging because the number and diversity of genes exceed the ability of any single researcher to track the complex relationships hidden in the data sets. However, much of the information relevant to the data is contained in the publicly available gene ontologies. Including the ontologies as a direct knowledge source for any algorithmic strategy to approach such data may greatly facilitate the analysis. Here we present a method to identify groups of genes with a similar signature in gene expression data that also have functional similarity in the background knowledge formally represented with gene annotation terms from the gene ontology. Precisely, we present an algorithm that for given multi-dimensional numerical data set, representing the expression of the genes under different conditions (that define the classes of examples) and ontology used for producing background knowledge about these genes, is able to identify groups of genes, described by conjunctions of first order features, whose expression is highly correlated with one of the classes. For example, one of the applications of this algorithm is to describe groups of genes that were selected as discriminative for some classification problem. Medical experts are usually not satisfied with having a separate description of every discriminative gene, but want to know the processes that are controlled by these genes. With our algorithm we are able to find these processes and the cellular components where they are “executed”, indicating the genes from the preselected list of discriminative genes which are included in these processes. For doing this we use the methodology of Relational Subgroup Discovery (RSD) [10]. With RSD we were able to induce set of discrimination rules between the different types (or subtypes) of cancers in terms of functional knowledge extracted from the gene ontology and information about gene interactions. In other words, we try to explain the differences between types of cancer in terms of the functions of the genes that are differentially expressed in these types. 1.1. Measuring gene expression The process of transcribing a gene’s DNA sequence into the RNA that serves as a template for protein production is known as gene expression. A gene’s expression level indicates the approximate number of copies of that gene’s RNA produced in a cell. This is considered to be correlated with the amount of corresponding protein made. While the traditional technique for measuring gene expression is labor-intensive and produces an approximate quantitative measure of expression, new technologies have greatly improved the resolution and the scalability of gene expression monitoring. “Expression chips”, manufactured using technologies derived from computer-chip production, can now measure the expression of thousands of genes simultaneously, under different conditions. These conditions may be different time points during a biological process, such as the yeast cell cycle
186
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
or drosophila development; direct genetic manipulations on a population of cells such as gene deletions; or they can be different tissue samples with some common phenotype (such as different cancer specimen). A typical gene expression data set is a matrix, with each column representing a gene and each row representing a condition, e.g. a cancer type. The value at each position in the matrix represents the expression of a gene under some condition. 1.2. Analysis of gene expression data Large scale gene expression data sets include thousands of genes measured at dozens of conditions. The number and diversity of genes make manual analysis difficult and automatic analysis methods necessary. Initial efforts to analyze these data sets began with the application of unsupervised machine learning, or clustering, to group genes according to similarity in gene expression [4]. Clustering provides a tool to reduce the size of the dataset to a simpler one that can more easily be manually examined. In typical studies, researchers examine the clusters to find those containing genes with common biological properties, such as the presence of common upstream promoter regions or involvement in the same biological processes. After commonalities have been identified (often manually) it becomes possible to understand the global aspects of the biological phenomena studied. As the community developed an interest in this area, additional novel clustering methods were introduced and evaluated for gene expression data [1,6]. The analysis of microarray gene expression data for various tissue samples has enabled researchers to determine gene expression profiles characteristic of the disease subtypes. The groups of genes involved in these genetic profiles are rather large and a deeper understanding of the functional distinction between the disease subtypes might help not only to select highly accurate “genetic signatures” of the various subtypes, but hopefully also to select potential targets for drug design. Most current approaches to microarray data analysis use (supervised or unsupervised) clustering algorithms to deal with the numerical expression data. While a clustering method reduces the dimensionality of the data to a size that a scientist can tackle, it does not identify the critical background biological information that helps the researcher understand the significance of each cluster. However, that biological knowledge in terms of functional annotation of the genes is already available in public databases. Direct inclusion of this knowledge source can greatly improve the analysis, support (in term of user confidence) and explain obtained numerical results. 1.3. Gene Ontologies One of the most important tools for the representation and processing of information about gene products and functions is the Gene Ontology (GO). GO is being developed in parallel with the work on a variety of other biological databases within the umbrella project OBO (Open Biological Ontologies). It provides a controlled vocabulary for the description of cellular components, molecular functions, and biological processes. As of January 2006 (www.geneontology.org), GO contains 1681 component, 7386 function and 10392 process terms. Terms are orga-
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
187
nized in parent-child hierarchies, indicating either that one term is more specific than another (is a) or that the entity denoted by one term is part of the entity denoted by another (part of). Typically, such associations (or “annotations”) are first of all established electronically and later validated by a process of manual verification which requires the annotator to have expertise both in the biology of the genes and gene products and in the structure and content of GO. The Gene Ontology, in spite of its name, is not an ontology in the sense accepted by computer scientists, in that it does not deal with axioms and definitions associated to terms. It is rather a taxonomy, or, as the GO Consortium puts it, a “controlled vocabulary” providing a practically useful framework for keeping track of the biological annotations applied to gene products. Recently, an automatic ontological analysis approach using GO has been proposed to help solving the task of interpreting the results of gene expression data analysis [8]. From 2003 to 2005, 13 other tools have been proposed for this type of analysis and more tools continue to appear daily. Although these tools use the same general approach, identifying statistically significant GO terms that cover a selected list of genes, they differ greatly in many respects that influence in an essential way the results of the analysis. A general idea and comparison of those tools is presented in [9]. Another approach to descriptive analysis of gene expression data is [14]. They present a method that uses text analysis to help find meaningful gene expression patterns that correlate with the underlying biology as described in the scientific literature.
2. Descriptive analysis of gene expression data The fundamental idea of this paper is as follows. First, we construct a set of discriminative genes, GC (c), for every class c ∈ C. These sets can be constructed in several ways. For example: GC (c) can be the set of k (k > 0) most correlated genes with class c, computed by, for example, Pearson’s correlation. GC (c) can also be the set of best k single gene predictors, using the recall values from a microarray experiment (absent/present/marginal) as the expression value of the gene. These predictors can look like this: If genei = present Then class = c. In our experiments we used a measure of correlation, P (g, c), that emphasizes the “signal-to-noise” ratio in using gene g as predictor for class c. The definition and analysis of P (g, c) is presented in later section. The second step aims at improving the interpretability of GC . Informally, we do this by identifying groups of genes in GC (c) (for each c ∈ C) which can be summarized in a compact way. Put differently, for each ci ∈ C we search for compact descriptions of group of genes which correlate strongly with ci and weakly with all cj ∈ C; j = i. Searching for these groups of genes, together with their description, is defined as a separate supervised machine learning task. We refer to it as the secondary, or meta-mining task, as it aims to mine from the outputs of the primary learning process in which the genes with high predictive value are searched. This secondary task is, in a way, orthogonal to the primary discovery process in that the original attributes (genes) now become training examples, each of which has
188
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
a class label c ∈ C. To apply a discovery algorithm, information about relevant features of the new examples is required. No such features (i.e., “attributes” of the original attributes - genes) are usually present in the gene expression microarray data sets themselves. However, this information can be extracted from a public database of gene annotations (in this paper, we use the Entrez Gene database maintained at the US National Center for Biotechnology Information, ftp://ftp.ncbi.nlm.nih.gov/gene/). For each gene we extracted its molecular functions, biological processes and cellular components where its protein products are located. Next, using GO, in the gene’s background knowledge we also included their generalized annotations. For example, if one gene is functionally annotated as: zinc ion binding, in the background knowledge we also included its more general functional annotations: transition metal ion binding, metal ion binding, cation binding, ion binding and binding. In the gene’s background knowledge we also included information about the interactions of the genes, in the form of pairs of genes for which there is a evidence that they can interact. In traditional machine learning, examples are expected to be described by a tuple of values corresponding to some predefined, fixed set of attributes. Note that a gene annotation does not straightforwardly correspond to a fixed attribute set, as it has an inherently relational character. For example, a gene may be related to a variable number of cell processes, can play a role in variable number of regulatory pathways etc. This imposes 1-to-many relations which are hard to be elegantly captured within an attribute set of a fixed size. Furthermore, a useful piece of information about a gene g may for instance be expressed by the following feature gene g interacts with another gene whose functions include protein binding. Going even further, the feature may not include only a single interaction relation but rather consider entire chains of interactions. The difficulties of representing such features through attribute-value tuples are evident. In summary, we are approaching the task of subgroup discovery from a relational data domain. For this purpose we employ the methodology of relational subgroup discovery proposed in [10,16] and implemented in the RSD2 algorithm. Using RSD, we were able to discover knowledge such as The expression of genes coding for proteins located in the integral-to-membrane cell component, whose functions include receptor activity, has a high correlation with the BCR class of acute lymphoblastic leukemia (ALL) and a low correlation with the other classes of ALL. The RSD algorithm proceeds in two steps. First, it constructs a set of relational features in the form of conjunctions of first order logic atoms. The entire set of features is then viewed as an attribute set, where an attribute has the value true for a gene (example) if the gene has the feature corresponding to the attribute. As a result, by means of relational feature construction we achieve the conversion of relational data into attribute-value descriptions. In the second step, groups 2 http://labe.felk.cvut.cz/˜ zelezny/rsd/rsd.pdf
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
189
of genes are searched, such that each group is represented as a conjunction of selected features. The subgroup discovery algorithm employed in this second step is an adaptation of the popular propositional rule learning algorithm CN2 [3]. 2.1. Relational feature construction The feature construction component of RSD aims at generating a set of relational features in the form of relational logic atom conjunctions. For example, the feature exemplified informally in the previous section has the following relational logic form: interaction(g,G), function(G,protein binding) Here, upper cases denote existentially quantified variables and g is the key term that binds a feature to a specific example (here a gene). The user specifies a grammar declaration which constraints the resulting set of constructed features. RSD accepts feature language declarations similar to those used in the inductive logic programming system Progol [12]. The construction of features is implemented as depth first, general-to-specific search where refinement corresponds to adding a literal to the currently examined expression. During the search, each search node found to be a correct feature is listed in the output. A remark is needed concerning the way constants (such as protein binding) are employed in features. Rather than making the user responsible for declaring all possible constants that may occur in the features, RSD extracts them automatically from the training data. The user marks the types of variables which should be replaced by constants. For each constant-free feature, a number of different features are then generated, each corresponding to a possible replacement of the combination of the indicated variables with constants. RSD then only proceeds with those combinations of constants which make the feature true for at least a pre-specified number of examples. Finally, to evaluate the truth value of each feature for each example for generating the attribute-value representation of the relational data, the first-order logic resolution procedure is used, provided by a Prolog language engine. 2.2. Subgroup Discovery Subgroup discovery aims at finding population subgroups that are statistically “most interesting”, e.g., are as large as possible and have the most unusual statistical characteristics with respect to the property of interest [15]. Notice an important aspect of the above definition: there is a predefined property of interest, meaning that a subgroup discovery task aims at characterizing population subgroups of a given target class. This property indicates that standard classification rule learning algorithms could be used for solving the task. However, while the goal of classification rule learning is to generate models (sets of rules), inducing class descriptions in terms of properties occurring in the descriptions of training examples, in contrast, subgroup discovery aims at discovering individual patterns of interest (individual rules describing the target class).
190
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
Figure 1. Description of discovered subgroups can cover: a) Individuals from only one class (1), b) Some not classified individuals (2), c) Individuals from other classes (3) and d) Individuals already covered by other subgroups (3,4)
Rule learning, as implemented in RSD, involves two main procedures: the search procedure that performs search to find a single subgroup discovery rule, and the control procedure (the weighted covering algorithm) that repeatedly executes the search in order to induce a set of rules. Description of the two procedures, described in [10], are omitted for space constraints.
3. Experiments This section presents a statistical validation of the proposed methodology. We do not assess here the accuracy of disease classification from gene-expression values itself, as this is a property of the particular method used for the primary mining task, which is not our main concern. We rather aim at evaluating the properties of the secondary, descriptive learning task. Namely, we wish to determine if the high descriptive capacity pertaining to the incorporation of the expressive relational logic language incurs a risk of descriptive overfitting, i.e., a risk of discovering fluke subgroups. We thus aim at measuring the discrepancy of the quality of discovered subgroups on the training data set on one hand and an independent test set on the other hand. We will do this through the standard 10-fold stratified cross-validation regime. The specific qualities measured for each set of subgroups produced for a given class are average precision (PRE) and recall (REC) values among all subgroups in the subgroup set. 3.1. Materials and methods We apply the proposed methodology on two problems of predictive classification from gene expression data. The first was introduced in [5] and aims at distinguishing between samples of ALL and AML from gene expression profiles obtained by the Affymetrix HU6800 microarray chip, containing probes for 6817 genes. The data contains 73 classlabeled samples of expression vectors. The second was defined in [13]. Here one tries to distinguish among 14 classes of cancers from gene expression profiles obtained by the Affymetrix Hu6800 and Hu35KsubA microarray chip, containing probes for 16,063 genes. The data set contains 198 class-labeled samples.
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
191
To access the annotation data for every gene considered, it was necessary to obtain unique gene identifiers from the microarray probe identifiers available in the original data. We achieved this by querying Affymetrix site3 for translating probe ID’s into unique gene ID’s. Knowing the gene identifiers, information about gene annotations and gene interactions can be extracted from Entrez gene information database4 . We developed a program script 5 in the Python language, which extracts gene annotations and gene interactions from this database, and produces their structured, relational logic representations which can be used as input to RSD. In both data sets, for each class c we first extracted a set of discriminative genes GC (c). In our experiments we used a measure of correlation, P (g, c), that emphasizes the “signal-to-noise” ratio in using gene g as predictor for class c. P (g, c) is computed by the following procedure: Let [μ1 (g), σ1 (g)] and [μ2 (g), σ2 (g)] denote the means and standard deviations of log of the expression levels of gene g for the samples in class c and samples in all other classes, respectively. 1 (g) Let P (g, c) = μσ11 (g)−μ (g)+σ2 (g) , which reflects the difference between the classes relative to the standard deviation within the classes. Large values of |P (g, c)| indicate a strong correlation between the gene expression and the class distinction, while the sign of P (g, c) being positive or negative correspond to g being more highly expressed in class c or in other classes. Unlike a standard Pearson´correlation coefficient, P (g, c) is not confined to the range [−1, +1]. The set of informative genes for class c, GC (c) of size n, consists of the n genes having the highest |P (g, c)| value. If we have only two classes, then GC (c1 ) consist of genes having the highes P (g, c1 ) values, and GC (c2 ) consists of genes having the highest P (g, c2 ) values. For the first problem we selected 50 discriminative genes for ALL and 50 for AML class. In the second problem we selected 35 discriminative genes for each class. The average value of correlation coefficient, |P (g, c)|, of selected discriminatory genes for each class/problem are listed in Table 1. The usage of the gene correlation coefficient is twofold. In the first part of the analysis, for a given class it is used for selection of discriminative genes, and in the second part as initial weight of the example-genes for the meta-mining procedure where we try to describe these discriminative genes. In the second mining task RSD will prefer to group genes with large weights, so these genes will have enough weight to be grouped in several groups with different descriptions. After the selection of sets of discriminatory genes, GC (c) for each c ∈ C, these sets were merged and every gene coming from GC (c) was class labeled as c. Now RSD was run on these data, with the aim to find as large as possible and as pure (in terms of class labels) as possible subgroups of this population of example-genes, described by relational features constructed from GO and gene interaction data. 3 www.affymetrix.com/analysis/netaffx/ 4 ftp://ftp.ncbi.nlm.nih.gov/gene/ 5 This
script is available on request to the first author
192
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
Table 1. Average(AVG), maximal(MAX) and minimal(MIN) value of {|P (g, c)|g ∈ G C (c)} for each task and class c. TASK
CLASS
AVG
MAX
MIN
ALL-AML
ALL
0.75
1.25
0.61
AML
0.76
1.44
0.59
MULTI
BREAST
1.06
1.30
0.98
CLASS
PROSTATE LUNG
0.91 0.70
1.23 0.99
0.80 0.60
COLORECTAL LYMPHOMA
0.98 1.14
1.87 2.52
0.73 0.87
BLADDER
0.88
1.15
0.81
MELANOMA UTERUS
1.00 0.82
2.60 1.32
0.73 0.71
LEUKEMIA
1.35
1.75
1.18
RENAL PANCREAS
0.81 0.75
1.20 1.08
0.70 0.67
OVARY MESOTHELIOMA
0.70 0.90
1.04 1.91
0.60 0.74
CNS
1.38
2.17
1.21
3.2. Results The discovered regularities have very interesting biological interpretations. In ALL, RSD has identified a group of 23 genes, described as: component(G, ’nucleus’) AND interaction(G,B),process(B,’regulation of transcription, DNAdependent’). The products of these genes, proteins, are located in the nucleus of the cell, and they interact with genes that are included in the process of regulation of transcription. In AML, RSD has identified several groups of overexpressed genes, located in the membrane, that interact with genes that have ’metal ion transport’ as one of their functions. In breast cancer, RSD has identified a group of genes (described as process(G,’regulation of transcription’), function(G,’zinc ion binding’)) containing five genes (Entrez Gene id’s: 4297, 51592, 91612, 92379, 115426) whose under expression is a good predictor for that class. These genes are simultaneously involved in regulation of transcription and in zinc ion binding. Zinc is a cofactor in proteinDNA binding, via a “zinc finger” domain (id 92379). This property is shared by many transcription factors, which are major regulators of normal and abnormal (e.g., malignant) cell proliferation. Second, zinc is an essential growth factor and a zinc transporter associated with metastatic potential of estrogen positive breast cancer, termed LIV-1, has been described [7]. Less than optimal expression of the factors involved in zinc metabolism can therefore represent either a cause or effect (biomarker) of dysregulated cellular proliferation in breast cancer. A separate group of genes involved in ubiquitin cycle (process(G,’ubiquitin cycle’)) was identified in breast cancer, (Entrez id’s: 3093, 10910, 23014, 23032, 25831, 51592, 115426). The role of ubiquitin in a cell is to recycle proteins. This is of a
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
193
Table 2. Precision-recall figures and average sizes of found subgroups, for the ALL/AML and multi-class classification tasks obtained through 10-fold cross-validation. TASK
DATA
PRE(st.dev.)
REC(st.dev.)
AVG. SIZE
ALL-AML
Train
0.96(0.01)
0.18(0.02)
12.07
Test
0.76(0.06)
0.12(0.04)
Train
0.51(0.03)
0.15(0.01)
Test
0.42(0.10)
0.10(0.02)
MULTI-CLASS
8.35
paramount importance to the overall cellular homeostasis, since inappropriately active proteins can cause cancer. Subnormal expression of ubiquitin components (id’s 3093 and 23032) resulting in subnormal inactivation of proteins active in cell cycle could thus represent a functional equivalent of ectopic oncogene expression [11]. This is the example where one gene, id: 115426, was included in two groups with different descriptions. In CNS (central nervous system) cancer, we discovered two important groups concerning neurodevelopment (description: process(G,’nervous system development’), Entrez Gene id’s: 333, 1400, 2173, 2596, 2824, 3785, 4440, 6664, 7545, 10439, 50861), and immune surveillance (Entrez Gene id’s: 199, 1675, 3001, 3108, 3507, 3543, 3561, 3588, 3683, 4046, 5698, 5699, 5721, 6352, 9111, 28299, 50848, 59307). The genes in the first/second group are over/under-expressed (respectively) in CNS. As for the former, reactivation of genes relevant to early development (i. e., ineffective recapitulation of embryonal or fetal neural growth at a wrong time) is a hallmark of the most rapidly growing tumors (id’s 3785 and 10439 specific to neuroblastoma). The latter illustrates the common clinical observation that immune deficiency (subnormal expression of genes active in immune response shown in this work) creates a permissive environment for cancer persistence. Thus, both major themes of malignant growth are represented in this example: active unregulated growth and passive inability to clear the abnormal cells. In addition, we subjected the RSD algorithm to 10-fold stratified crossvalidation on both classification tasks. Table 2 shows the PRE and REC values (with standard deviation figures) results for the two respective classification tasks. Overall, the results show only a small drop from the training to the testing set in terms of both PRE and REC, suggesting that the number of discriminant genes selected (Table 1) was sufficient to prevent overfitting. In terms of total coverage, RSD covered more then 23 of the preselected discriminative genes (in both problems), while 13 of the preselected gene were not included in any group. One interpretation is that they are not functionally connected with the other genes, but were selected by chance. This information can be used in the first phase of the classification problem, feature selection, by choosing genes that were covered by some subgroup. That will be the next step in our future work, using the proposed methodology as feature (gene) selection mechanism.
194
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
4. Discussion In this paper we presented a method that uses gene ontologies, together with the paradigm of relational subgroup discovery, to help find patterns of expression for genes with a common biological function that correlate with the underlying biology responsible for class differentiation. Our methodology proposes to first select a set important discriminative genes for all classes and then finding compact, relational descriptions of subgroups among these genes. It is noteworthy that the ’post-processing’ step is also a machine learning task, in which the curse of dimensionality (the number of attributes - gene expressions measured) usually ascribed to the type of classification problem considered, actually turns into an advantage. The high number of attributes (important genes), incurring the risk of overfitting, turns into a high number of examples, which on the contrary works against overfitting in the subsequent subgroup discovery task. Furthermore, the dimensionality of the secondary attributes (relational features of genes extracted from gene annotations) can be conveniently controlled via suitable constraints of the language grammar used for the automatic construction of the gene features. Furthermore, since genes frequently have multiple functions that they may be involved in, they may under some of the conditions exhibit the behavior of genes with one function and in other conditions exhibit the behavior of genes with a different function. Here subgroup discovery is effective at selecting a specific function. The same gene can be included in multiple subgroup descriptions (gene id: 115426 in breast cancer), each emphasizing the different biological process critical to the explanation of the underlying biology responsible for observed experimental results. Unlike other tools for analyzing gene expression data that use gene ontologies, which report statistically significant single GO terms and do not use gene interaction data, we are able to find a set of GO terms (the first reported group of genes, for breast cancer, is described with two GO terms), that cover the same set of genes, and we use available gene interaction data to describe features of genes that can not be represented with other approaches (the third reported group, for ALL). However, this approach of translating a list of differentially expressed genes into subgroups of functional categories using annotation databases suffers from a few important limitations. The existing annotations databases are incomplete, only a subset of known genes is functionally annotated and most annotation databases are built by curators who manually review the existing literature. Although unlikely, it is possible that certain known facts might get temporarily overlooked. For instance, [9] found references in literature published in the early 90s, for 65 functional annotations that are yet not included in the current functional annotation databases. Despite the current imperfectness of the available ontological background knowledge, the presented methodology was able to discover and compactly describe several gene groups, associated to specific cancer types, with highly plausible biological interpretation. We thus strongly believe the presented approach will significantly contribute to the application of relational machine learning to gene expression analysis, given the expected increase in both the quality and quantity of gene/protein annotations in the near future.
I. Trajkovski et al. / Relational Descriptive Analysis of Gene Expression Data
195
Acknowledgment The research of I.T. and N.L. is supported by the Slovenian Ministry of Higher Education, Science and Technology. F.Z. is supported by the Czech Academy of Sciences through the project KJB201210501 Logic Based Machine Learning for Analysis of Genomic Data.
References [1] Ben-Dor, A., Shamir, R., & Yakhini, Z. (1999). Clustering gene expression patterns. J. Comput. Biol., 6:3/4, 281-297. [2] Camon, E.B. et al. (2005). An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics, 6, S17. [3] Clark, P. & Niblett T. (1989). The CN2 induction algorithm. Machine Learning, pages 261-283. [4] Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95:25, 1486314868. [5] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:5439, 531-537. [6] Heyer, L. J., Kruglyak, S., & Yooseph, S. (1999). Exploring expression data: Identification and analysis of coexpressed genes. Genome Res., 9:11, 1106-1115. [7] Kasper, G. et al. (2005). Expression levels of the putative zinc transporter LIV-1 are associated with a better outcome of breast cancer patients. Int J Cancer. 20;117(6):961-73. [8] Khatri, P. et al. (2002). Profiling gene expression using Onto-Express. Genomics, 79, 266270. [9] Khatri, P. & Draghici S. (2005). Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21(18):3587-3595 ˇ [10] Lavraˇ c, N., Zelezn´ y F. & Flach P. (2002). RSD: Relational subgroup discovery through first-order feature construction. In Proceedings of the 12th International Conference on Inductive Logic Programming, pages 149-165. [11] Mani, A. & Gelmann E.P. (2005). The ubiquitin-proteasome pathway and its role in cancer. Journal of Clinical Oncology. 23:4776-4789 [12] Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Programming, 13(3-4):245-286. [13] Ramaswamy, S., Tamayo P., Rifkin R. et al. (2001). Multiclass cancer diagnosis using tumor gene expression signatures, PNAS 98 (26) 15149-15154. [14] Raychaudhuri, S., Schtze,H.S. & Altman,R.B. (2003). Inclusion of textual documentation in the analysis of multidimensional data sets: application to gene expression data. Machine Learn., 52, 119-145 [15] Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Jan Komorowski and Jan Zytkow, editors, Proceedings of the First European Symposion on Principles of Data Mining and Knowledge Discovery (PKDD-97), pages 78-87. ˇ [16] Zelezn´ y, F. & N. Lavraˇ c. (2006). Propositionalization-Based Relational Subgroup Discovery with RSD. Machine Learning 62(1-2):33-63. Springer 2006.
196
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Solving Fuzzy PERT Using Gradual Real Numbers Jérôme FORTIN a , Didier DUBOIS a a IRIT/UPS 118 route de Narbonne, 31062, Toulouse, cedex 4, France, e-mail: {fortin, dubois}@irit.fr Abstract. From a set of a partially ordered tasks, one goal of the project scheduling problem is to compute earliest starting dates, latest starting dates and floats of the different tasks, and to identify critical tasks. When durations of tasks are not precisely known, the problem is much trickier. Recently we provided polynomial algorithms to compute upper bound of activity floats in the interval-valued model. The aim of this paper is to extend those new algorithms to the fuzzy-valued problem. To this end, we use the new notion of gradual numbers [7], that represents soft boundaries of a fuzzy interval. We show that algorithms in the interval–valued case can be adapted to fuzzy intervals considering them as crisp intervals of gradual numbers. Keywords. Fuzzy PERT, Gradual Numbers, Floats
1. Introduction Temporal Constraint Networks (TCN) represent relations between dates of events and also allow to express constraints on the possible durations of activities from intervals of values. TCN have been extended to take into account uncertainty of durations of some tasks in realistic applications by making a distinction between so-called contingent constraints and controllable ones [11]. The resulting network becomes a decision-making problem under uncertainty. This paper reconsiders the most basic scheduling problem, that of minimizing the makespan of a partially ordered set of activities, in the context of incomplete knowledge. When the durations of tasks of a project scheduling problem are ill-known and modeled by intervals, the problem can be viewed as a special kind of TCN where all tasks are modeled by contingent constraints, and controllable constraints only describe precedence between tasks. Of course, the resulting network is always controllable if the graph of precedence constraints is acyclic. This paper answers the question of optimizing the total duration of such a network when constraints are set by fuzzy intervals. One goal of project scheduling problem is to compute earliest starting dates, latest starting dates and floats of the different tasks, and to identify critical tasks. When tasks durations are precisely known, this problem is solved by the PERT/CPM Algorithm (Critical Path Method). When task durations are ill–known and lie in an interval, this problem is much more difficult. In recent paper [8], we give polynomial algorithms to
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
197
compute upper bound of floats in the interval-valued model. Here we prove that those algorithms can be adapted the fuzzy-valued problem. For this purpose, we will recall and use a recent notion introduced in fuzzy set theory, called gradual numbers [7]. This notion makes it possible to define and use gradual bounds of fuzzy intervals and then apply already known interval analysis ideas to the fuzzy counterpart of interval problems. The paper is organized as follows; Section 2 gives the background needed to construct our new algorithms: Subsection 2.1 recalls the basic PERT/CPM problem of project scheduling. Subsection 2.2 re–states the recent results of [8] to solve the PERT/CPM problem when tasks are modeled by intervals. Then Subsection 2.3 gives general definition and intuitions of so–called gradual numbers. A subproblem is addressed in Section 3, that is the computation of earliest starting dates, latest starting dates and floats when durations of tasks are modeled by gradual numbers. And finally Section 4 gives algorithms to compute degrees of necessary criticality in polynomial time, and the least upper bound (LUB) of floats for each possibility degree λ 0 1 . 2. Preliminaries 2.1. Classical PERT Model An activity network is classically defined as a set of activities (or tasks) with given duration times, related to each other by means of precedence constraints. When there are no resource constraints, it can be represented by a directed, connected and acyclic graph. Of major concern, is to minimize the ending time of the last task, also called the makespan of the network. For each task, three quantities have practical importance for the management of the activity network: The earliest starting time esti j of an activity i j is the date before which the activity cannot be started without violation of a precedence constraint. The latest starting time lsti j of an activity i j is the date after which the activity cannot be started without delaying the end of the project. The float f i j of an activity i j is the difference between the latest starting time lsti j and the earliest starting time esti j . An activity is critical if and only if its float is equal to zero. Under the assumption of minimal makespan, critical tasks must be started and completed at prescribed time-points. V A , represents an activity netA directed, connected and acyclic graph G n, and work. We use the activity-on-arc convention. V is the set of nodes (events), V 12 n is labeled in such a way A is the set of arcs (activities), A m. The set V that i j for each activity i j A. Activity durations di j (weights of the arcs) i j A are well known. See Figure 1 for a simple example of project scheduling problem. Task durations are the values given on the arc of the graph. Two nodes 1 and n are distin2 1
2 1
3
2
Figure 1. Simple project scheduling problem
guished as the initial and final node, respectively. We also need some additional notations to speak about predecessor or successor of a task or a node, and subgraphs of G:
198
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
Succ i (resp. Pred i ) refers to the set of nodes that immediately follow (resp. precede) node i V . SUCC i j (resp. PRED i j ) denotes the set of all arcs that come after (resp. A, and SUCC j (resp. PRED j ) stands for the set of all nodes before) i j that come after (resp. before) j V . G i j is the subgraph of G composed of nodes succeeding i and preceding j. G di j d is the graph where duration of task i j is replaced by d. With these notations, we can give precise formulas defining earliest starting dates, latest starting dates and floats of tasks and events: The earliest starting date of an event k is the longest path from the beginning of the project, represented by node 1, and node k. We arbitrary fix the starting time of project to est1 0. Of course the earliest starting date of a task k l is equal to the earliest starting date of event k. The following recursive formula give the key to compute earliest starting dates of events and tasks. 0 if k 1 estk max j Pred k est j d jk otherwise estkl estk The earliest ending time for the project is the earliest starting time of the last event n. In order to ensure a minimal duration of the project, the latest starting date of event n is equal to its earliest starting date (lstn estn ). The latest starting date of a task k l is the time after which we can not start its execution without delaying the end of the project, so it is in fact the difference between the earliest ending date of the project and the longest path from node n and node l minus the duration of the task k l . We can now define the latest starting date of events and tasks by the following formulas: estn if k n lstk minl Succk lstl dkl otherwise lstkl lstl dkl The float of a task k l which represents the length of the time window for the beginning of the execution of the task is the difference between the latest starting date and the earliest starting date: f kl lstkl estkl The well known PERT/CPM algorithm computes the earliest starting dates, latest starting dates and float in linear time complexity (O m n ). It executes a forward recursion from node 1 to node n to give the longest path from node 1 to each node of the graph. Then it runs a backward recursion to compute latest starting dates and so the floats. The PERT algorithm give the following earliest starting dates, latest starting dates and floats of project of Figure 1: est12 0, est23 2, est13 0, lst12 0, lst23 2, lst13 1, f12 0, f23 0, f13 1. 2.2. Interval-Valued PERT Model In this section we recall recent results published in the CP2005 conference proceedings [8]. Suppose the durations di j of tasks i j of the precedent model are not precisely di j di j . It means that the real known, but are known to lie in an interval range Di j exact duration of the task will be in Di j , but it can neither be known nor chosen in this interval. This problem was first formulated by Buckley [1], and was completely solved very recently in [8].
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
199
An instantiation of all task durations of the network is called a configuration, and a instantiation of all task durations for which task duration is set to the minimal value di j or maximal value di j is called a extreme configuration. We note the set of possible configurations: i j A Di j . In this formulation, a task i j can be possibly critical, if there exists a configuration for which i j is critical in the usual sense. A task is not possibly critical if there exists no configuration for which i j is critical in the usual sense. A task is necessarily critical if it is critical for each configuration. And a task is not necessarily critical is there exists at least one configuration for which the task is not critical in the usual sense. Finding possible and necessarily critical tasks is very important for a project manager. Other useful pieces of information are the possible earliest starting dates, possible latest starting date and possible floats, which become intervals: ESTkl estkl estkl estkl Ω Ω LSTkl lstkl lstkl lstkl Ω Ω Fkl fkl fkl fkl Ω Ω where estkl Ω (resp. lstkl Ω and fkl Ω ) represents the earliest starting date of k l (resp. latest starting date and float) in configuration Ω. 2.2.1. Asserting Necessarily Criticality Asserting if a task is possibly critical is an NP-Hard problem [2], while asserting is a task is necessarily critical is polynomially solvable [8]. As in the classical problem, criticality and floats are obviously connected: Proposition 1 An activity k l only if fkl 0 (resp. fkl 0).
A is possibly (resp. necessarily) critical in G if and
In this paper we focus on asserting the necessarily criticality of a task, and on computing the upper bound of the float, when durations are modeled by fuzzy interval. The same approach can be applied for computing latest starting dates in polynomial time, or for asserting the possible criticality of a task and computing the lower bound of the float, but it can only be done with exponential time complexity. Because of the lack of space in conference proceedings, we choose to deal with necessarily criticality and upper bound of the float. A first algorithm that asserts the necessarily criticality of a task k l in the interval case when predecessors of k l are precisely known is based on the following propositions [8]: Proposition 2 An activity k l A is necessarily critical in G if and only if k l is critical in an extreme configuration in which the duration of k l is at its lower bound and all activities from set A SUCC k l PRED k l k l have durations at their upper bounds. Proposition 3 Let k l A be a distinguished activity, and i j be an activity such that i j SUCC k l . Assume that every activity u v PRED i j has precise duration. If k l is critical in G 1 i , then: k l is necessarily critical in G k l is necessarily critical in G di j di j . If k l is not critical in G 1 i , then: k l is necessarily critical in G k l is necessarily critical in G di j di j .
200
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
Proposition 2 and 3 lead to an algorithm for asserting the necessary criticality of a given activity k l in a network in which all activities that precede k l have precise durations. The algorithm recursively assigns an exact duration to tasks succeeding k l without changing the necessarily criticality of k l . The algorithm works as follows: At the initialization, we set all durations of tasks not succeeding k l to their upper bound, and duration of k l to its lower bound (according to Proposition 2). Then at the first iteration of the algorithm, we only consider the sub– network G 1 l . In this network, all tasks durations are precisely fixed, so a classical PERT/CPM run decides the necessary criticality of k l in G 1 l . Now, according to Proposition 2 we can set to a precise duration tasks immediately succeeding node l: if k l is necessary critical in G 1 l , duration of tasks l m (m Succ l ) are set to their minimal duration, otherwise, they are set to their maximal duration. Then the second iteration considers the sub–network G 1 l 1 . At this step all tasks durations are precisely set in this sub–network, we can check the criticality of k l with the standard PERT/CPM algorithm, and so on. After the last iteration, all tasks durations are precisely instantiated and k l is critical in this instantiation, if and only if task k l is necessary critical in the original network G 1 n (where duration of tasks not preceding k l are intervals). The next proposition will lead to a polynomial algorithm which asserts the necessary criticality of a task k l in the general case: Proposition 4 Let k l A be a distinguished activity, and i j be an activity such that i j PRED k l . If k l is necessarily critical in G j n , then: k l is necessarily critical in G k l is necessarily critical in G di j di j If k l is not necessarily critical in G j n , then: k l is necessarily critical in G k l is necessarily critical in G di j di j With the help of the previous algorithm, Proposition 4 leads to an algorithm which recursively assigns an exact duration to tasks preceding k l without changing the necessarily criticality of k l . At the first iteration, we consider the sub–network G k n . In this network all tasks preceding k l have precise durations (because there is no task preceding k l ). So we can invoke the previous algorithm to know if k l is necessary critical in G k n . Now, with the help of Proposition 4, we can precisely set the duration of tasks just preceding node k. At the second iteration, we consider the sub–network G k 1 n , where all tasks preceding k l are now precisely fixed, so we can call the previous algorithm (with original interval durations for tasks succeeding k l ), and so on. At the end of this Algorithm, all durations of tasks preceding k l are precisely set, and k l is necessary critical in the original network if and only if it is necessary critical in the instantiated network. So finally, putting together Proposition 3 and 4 we can assert if a task k l is necessarily critical in O mn time complexity. The two previous algorithms used conjointly instantiate all task duration in a network without changing the necessary criticality of a given task k l . That means that k l is necessarily critical if and only if its float is null in the so–built configuration. If k l is necessary critical the LUB of the float have been computed and is f kl 0. But, if this float is not null in the constructed configuration, then the computed value of the float is not the correct LUB of float of k l . We call this computed value the potential float of k l .
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
201
2.2.2. Computing Least Upper Bound of Floats The key idea to compute the least upper bound (LUB) on float f kl of activity k l in G, is to increase the minimal duration dkl until k l becomes necessarily critical. Then it is proved that the LUB of float is equal to the overall increment of d kl [8]: Proposition 5 Let fkl be the minimal nonnegative real number such that k l is necesfkl . sarily critical in G dkl dkl fkl . Then fkl Let us explain how to find this increment. One can note that if a task k l is necessarily critical in G, then there exists a path p from node 1 to node k such that for every node j p, k l is necessarily critical in G j n [8]. So the minimal increment that can make k l necessarily critical is given by next lemma: Lemma 1 Let ∆ min j fkl j n fkl j n 0 where fkl j n is the potential float of task k l in G j n , j PRED k (i.e. f kl j n is the float in the configuration induced by the algorithm which asserts if k l is necessary critical). Then for all ε ∆, activity k l is not necessarily critical in G j n , j PRED k , with duration dkl dkl ε (the configuration constructed for asserting the necessary criticality is the same). Moreover there exists j PRED k such that k l becomes necessarily critical in G j n with duration dkl dkl ∆ (the configuration constructed for asserting the necessary criticality is not the same). The algorithm which computes the LUB of floats works as follow: For each node i preceding k l it computes the potential float of k l in G i n with the algorithm which assert the necessary criticality of k l . If the potential float of k l in G 1 n is not 0, then the minimal potential float computed is added to the minimal duration d kl of k l , and the algorithm is run again from the beginning until k l becomes necessary critical in G. The total added value to the minimal duration d kl so far is then the LUB of the floats of k l . We are going to see that those algorithms can be easily adapted to the fuzzy-valued version of this problem, using the new notion of gradual numbers, thus providing in polynomial time the degree of necessarily criticality of a task. 2.3. Gradual Numbers To solve the fuzzy interval valued model PERT, we will use in this paper the recent notion of gradual numbers [7]. What is often called a fuzzy number [12,5] is the extension of an interval, not a real number. Hence, fuzzy interval is a better name for a fuzzy set of reals whose cuts are intervals. In a fuzzy interval, it is the interval that models incomplete knowledge (we know that some parameter lies between two bounds), not the fuzziness per se. Intervals model uncertainty in a Boolean way: a value in the interval is possible; a value outside is impossible. What fuzziness brings is to make the boundaries of the interval softer, thus making uncertainty gradual. In order to model the essence of graduality without uncertainty, gradual numbers are defined as follow:
202
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
Definition 1 (Gradual real number [7]) A gradual real number (or gradual number for short) r˜ is defined by an assignment function Ar˜ from 0 1 (the unit interval minus 0) to the reals. We note the set of all gradual real numbers. A gradual number is a number parametrized by a value α which range in 0 1 . Intu0 1 is the degree of plausibility of a scenario, and A r˜ α is the itively, an element α value of some parameter in this scenario. Using the notion of gradual number, we can describe a fuzzy interval M by an ordered pair of specific gradual numbers m˜ m˜ . m˜ is called the gradual lower bound of M and m˜ the gradual upper bound. To ensure the well known shape of a fuzzy interval, several properties of m˜ and m˜ must hold: the domains of Am˜ and Am˜ must be 0 1 Am˜ need to be increasing Am˜ need to be decreasing m˜ and m˜ must be well ordered (Am˜ Am˜ ). Such a pair of gradual numbers intuitively describe a fuzzy interval with membership function: sup λ Am˜ λ x if x Am˜ 0 1 1 if Am˜ 1 x Am˜ 1 μM x sup λ Am˜ λ x if x Am˜ 0 1 0 otherwise An upper semi continuous membership function μ M , can be described by the pair of gradual reals m˜ m˜ defined by Am˜ : 0 1 λ Am˜ λ in f x μM x λ Am˜ : 0 1 λ Am˜ λ sup x μM x λ To know more about general framework of gradual numbers, reader should refer to the following papers: [7,3,6]. The sum of gradual numbers r˜ and s˜ is simply defined by summing their assignment 0 1 , Ar˜ s˜ α Ar˜ α As˜ α . Most algebraic functions. It is r˜ s˜ such that α properties of real numbers are preserved for gradual real numbers, contrary to the case of fuzzy intervals. For example, the set of the gradual real numbers with the addition 0 λ 0 1 ). Indeed operation forms a commutative group with identity 0˜ (A0˜ λ Ar˜ α , and the gradual real number r˜ has an inverse r˜ under the addition: A r˜ α ˜ Classical arithmetic operations are defined intuitively: If r˜ and s˜ are two r˜ 0. r˜ gradual numbers, addition: Ar˜ s˜ α Ar˜ α As˜ α . subtraction: Ar˜ s˜ α Ar˜ α As˜ α . multiplication: Ar˜ s˜ α Ar˜ α As˜ α . Ar˜ α division: A r˜ α , if λ 0 1 , As˜ α 0. As˜ α s˜ maximum: max Ar˜ As˜ α max Ar˜ α As˜ α . minimum: min Ar˜ As˜ α min Ar˜ α As˜ α . Note that contrary to real numbers, the maximum operation on gradual numbers is not Ar˜ or As˜. There are sub–ranges of 0 1 where max Ar˜ As˜ selective: max Ar˜ As˜ Ar˜ and it is As˜ in the complementary range. If Ar˜ and As˜ are linear, these ranges are of the form 0 θ and θ 1 for a threshold θ.
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers λ
λ
0.5
λ
1
1
1
d˜13
d˜23
d˜12
0.5
x
203
0.5
x
x
Figure 2. Gradual duration of tasks of the project of Figure 1
3. The Gradual-Valued Project Scheduling Problem As in the previous section, an activity network is represented by a directed, connected V A . But now, activity durations d˜i j ( i j A) are defined and acyclic graph G by gradual numbers. It means that all durations depend on a parameter to be chosen (for instance indexing scenarios). This problem formulation is given to validate the PERT algorithm with gradual numbers which is used in next sections. However we can see this problem as an optimization problem on a family of scenarios: for a given λ, d˜i j λ is a precise duration, and so configuration (set of the durations) at degree λ is a particular scenario. This parametric approach can model dependencies between tasks durations, but only with one degree of freedom. Figure 2 provides gradual durations of tasks of the project of Figure 1. 2 λ means that the duration of task 1 3 is 2 in each considered scenario. d˜13 λ We can note on this example that the duration of tasks 1 2 and 2 3 are correlated, and the more time the task 1 2 will require, the faster the task 2 3 will be executed. Computing earliest starting date, latest starting date and float of this problem is very simple: we just need to run the standard PERT Algorithm, where , , max and min are the four operations on gradual numbers seen in the previous section. This is due to the fact that all algebraic properties needed to apply the PERT Algorithm are preserved by the max is an idempotent semi-ring, gradual numbers, for example the fact that validates the forward recursion of the PERT Algorithm. The backward recursion is possiis a commutative group), so conble since the addition operation as an inverse ( trary to the interval–valued problem, we do not run the risk of counting any uncertainty twice. For more detail, the reader should refer to [7,9]. Algorithm 1 works as follow: Line 1 is the initialization to 0˜ of all the earliest starting dates of events (nodes of the graph). Part from line 2 to line 5 is the standard forward recursion which computes earliest starting dates. Line 6 is initialization of the latest starting dates. Lines 7 to 11 is the standard backward recursion which computes latest starting dates and float. The main difference between this algorithm and the usual PERT/CPM algorithm is in the form of the results, because max and min are not selective, a characteristic of a task may not correspond to a single configuration. For instance, a task may be critical for some values of λ 0 1 and not critical for other values. We present on Figure 3 the results of the computation of the earliest starting date, latest starting date and float of task 2 3 . It means that for all scenario generated with λ 13 , the task 2 3 is critical, but it is not for λ 13 . In practice, we must store this information describing a more complex situation than in the deterministic case.
204
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
Algorithm 1: GradualPERT Input: A network G V A , activity duration is known as gradual numbers. ˜ i j and ˜ i j , gradual latest starting date lst Output: Gradual earliest starting date est gradual float f˜i j of each task i j of the network. ˜ ˜ i 1 n do est 1 foreach i 0; l to n 1 do 2 for i 3 foreach j Succ i do ˜ ij ˜ i; est est 4 ˜ j ˜ j est ˜ i d˜i j ; est max est 5 6 7 8 9 10 11
˜ i est ˜ n; foreach i 1 n do lst for j n downto 2 do foreach i Pred j do ˜ ij ˜ j d˜i j ; lst lst ˜ i j est ˜fi j lst ˜ i j; ˜ ˜ ˜ j d˜i j ; lst i min lst i lst
λ
λ
λ
1
1
1
˜ 23 lst 0.5
0.5
˜ 23 est x
0.5
x
f˜23 x
Figure 3. Gradual earliest starting date, latest starting date and float
4. Fuzzy Interval Valued PERT In this section, we suppose that durations of tasks of the project are given by fuzzy intervals. A simple model of a fuzzy task duration for a regular PERT user can be a triangular fuzzy interval whose core is the most plausible duration and whose support is the interval of all possible durations. We show that we can assert the necessarily criticality degree of a task in polynomial time complexity. Then we will give an algorithm to obtain the gradual upper bound of the float of a given task. 4.1. Asserting Necessarily Criticality We have seen in Section 2.2 that we can precisely set the duration of all tasks of the network without changing the criticality of a given task k l when task duration are modeled by crisp intervals. To assert the degree of necessary criticality of a task k l in the fuzzy case, we can use the same type of algorithm, using gradual numbers instead of real numbers. But from Proposition 3 (resp. 4), to fix the precise gradual duration of a task i j , we have to test if the task k l is critical in the subnetwork G j n (resp. G 1 j ). We have seen in the previous section that on a gradual PERT, tasks can be critical for some possibility degree and not critical for others (see example on Figure 3).
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
205
This means that the gradual duration assigned to task i j may involve both upper and d˜i j di j . lower gradual bounds of the duration D˜ i j Algorithm 2:
1 2 3 4 5 6 7 8 9
Input: A network G, activity k l , fuzzy interval durations D˜ uv d˜uv d˜uv , ˜ uv A and for every task in PRED k l the duration dkl is precisely given (but gradual). 0 for all posOutput: f˜kl the “potential” maximal float of task k l : f˜kl λ sibility degree for which the task is necessarily critical, f˜kl λ 0 for all possibility degree for which the task is not necessarily critical, but in this case f˜kl λ is not the exact float of k l . foreach u v SUCC k l do d˜uv d˜uv ; d˜kl ; d˜kl l do for i l to n 1 such that i SUCC l Algorithm 1 with G 1 i ; f˜kl foreach λ 0 1 such that f˜kl λ 0 do d˜i j λ ; foreach j Succ i do d˜i j λ foreach λ 0 1 such that f˜kl λ foreach j Succ i do d˜i j λ
0 do d˜i j λ ;
return f˜kl ;
Line 1 of Algorithm 2 initializes gradual durations of all tasks of the network not succeeding k l to their maximal gradual bounds, according to Proposition 2. Lines 3 8 assign gradual values for the tasks succeeding k l , depending on the criticality of k l , with respect to Proposition 4. The creation of the gradual duration d˜i j (Lines 5 and 8) may seem complex, but is very simple to implement in practice, for example, when durations are given by piecewise linear fuzzy intervals [10]. Then gradual bounds of those interval can be simply modeled by the lists of their kinks. All computed gradual durations are also piecewise linear, since operations , , max and min preserve the piecewise linearity. 0 1 such that f˜kl λ 0 remains to find the pairs of In this context, finding all λ adjacent kinks whose abscissa is 0. Note that the result f˜kl of Algorithm 2 is not the correct maximal gradual float f˜kl of k l : it is correct only if f˜kl λ 0. So this Algorithm is useful to know the Necessary degree criticality of a task when its predecessors are precisely (but gradually) fixed, and will be invoked in the next algorithm. Algorithm 3 computes the necessary criticality degree of a given task k l when all task durations are fuzzy interval valued. It recursively assigns precise (but gradual) durations to tasks preceding node k, due to to Proposition 4. The result is a gradual “potential” maximal float f˜kl of task k l , which means that k l is necessarily critical 0. But if f˜kl λ 0, then f˜kl λ is not the at degree at least λ if and only if f˜kl λ correct maximal float of k l at possibility degree λ. Computing the exact maximal float f˜kl λ at possibility degree λ is the subject of the next paragraph.
206
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
Algorithm 3: Input: A network G, activity k l , fuzzy interval durations D˜ uv d˜uv d˜uv , u v A. 0 for all posOutput: f˜kl the “potential” maximal float of task k l : f˜kl λ sibility degree for which the task is necessarily critical, f˜kl λ 0 for all possibility degree for which the task is not necessarily critical, but in this case f˜kl λ is not the exact float of k l . 1 2 3 4 5 6 7
for j k downto 2 such that i Algorithm 2 with G j n f˜kl foreach λ 0 1 such that f˜kl foreach i Pred j do d˜i j
PRED k k ; λ 0 do λ d˜i j λ ;
foreach λ 0 1 such that f˜kl λ foreach i Pred j do d˜i j λ
do
0 do d˜i j λ ;
return f˜kl ;
Algorithm 4: Input: A network G, activity k l , fuzzy interval durations D˜ uv , u v Output: The gradual least upper bound f˜ of floats of k l .
A.
kl
1 2 3 4 5 6 7 8 9 10
foreach i Pred k do f˜kl i n f˜kl with Algorithm 3 in network G i n ; 0; d˜kl d˜kl ; f˜kl ˜ while fkl 1 n 0˜ do foreach λ 0; 1 do min f˜kl j n λ ∆˜ λ
j
PRED k
f˜kl j n λ
0 ;
˜ d˜kl ∆; d˜kl foreach i Pred k do f˜kl with Algorithm 3 in network G i n ; f˜kl i n return f˜kl
4.2. Computing Gradual Least Upper Bound of Floats We can now compute the gradual upper bound of the floats of a task k l . The idea is to make k l necessarily critical for all λ by adding a gradual duration ∆˜ to d˜kl . This is done by Algorithm 4. The first step of the algorithm computes f˜kl i n , the “potential float” of k l in the network G i n (lines 1 and 2). In practice, only one iteration of Algorithm 3 is needed to obtain f˜kl i n for all i Pred k . The second step of the algorithm constructs a gradual duration ∆˜ to add to d˜kl in order to make k l necessarily critical in at least one more sub–network G i n , for all possibility degree (lines 5 and 6). In practice, constructing ∆˜ is easy, we just need to restrict all the f˜kl i n on a domain 0 and f˜kl 1 n λ 0 , and take their minimum, as defined for which f˜kl i n λ in Section 2.3. Algorithm 4 repeats this step until k l becomes necessarily critical in G 1 n for all possibility degree. The total gradual increase of d˜kl is then the gradual float
J. Fortin and D. Dubois / Solving Fuzzy PERT Using Gradual Real Numbers
207
f˜kl of k l . By definition of the gradual numbers, the obtained f˜kl is such that f˜kl λ is the exact LUB of floats of task k l at possibility degree λ, and the degree of criticality 0 . is 1 inf λ f˜kl λ Remark: In the interval–valued problem, asserting possible criticality and computing lower bound of floats are NP–Hard problems [8]. However efficient algorithms exist and can be adapted to the fuzzy interval case with the help of gradual numbers. One of these algorithms consists in running a PERT/CPM on configurations where all task durations are set to their lower bounds but on a path from node 1 to node n [4]. The lower bound of the float of a task is then the minimal computed float over the configurations issued from all possible paths. This method is directly applicable to the fuzzy–valued problem, using gradual bounds of fuzzy intervals, instead of usual interval end–points.
5. Conclusion In this paper, we generalize some recent results of [8] for project scheduling problems under uncertainty, when task durations are modeled by fuzzy intervals. We give algorithms that assert the degree of necessarily criticality of a given task in polynomial time. And we construct an algorithm to compute the least upper bound of floats in polynomial time for each possibility degree λ 0 1 . Those algorithms are based on the very recent notion of gradual number, which provides a new way of looking at fuzzy intervals as classical intervals of gradual numbers. Thus allowing to apply interval analysis methods to fuzzy intervals directly.
References [1]
J.J. Buckley. Fuzzy PERT. In Applications of fuzzy set methodologies in industrial engineering, pages 103–114. Elsevier, 1989. [2] S. Chanas and P. Zielinski. On the hardness of evaluating criticality of activities in planar network with duration intervals. Operation Research Letters, 31:53–59, 2003. [3] D. Dubois, H. Fargier, and J. Fortin. A generalized vertex method for computing with fuzzy intervals. In Procedings of the IEEE International Conference on Fuzzy Systems (Budapest, Hungary), pages 541– 546, 2004. [4] D. Dubois, H. Fargier, and J. Fortin. Computational methods for determining the latest starting times and floats of tasks in interval-valued activity networks. Journal of Intelligent Manufacturing, 16(4-5):407– 421, 2005. [5] D. Dubois, E. Kerre, R. Mesiar, and H. Prade. Fuzzy interval analysis. In Fundamentals of Fuzzy Sets, pages 483–581. Kluwer, 2000. [6] D. Dubois and H. Prade. Fuzzy elements in a fuzzy set. In Proc. 10th Inter. Fuzzy Systems Assoc. (IFSA) Congress, Beijing, pages 55–60, 2005. [7] J. Fortin, D. Dubois, and H. Fargier. Gradual numbers and their application to fuzzy interval analysis. submitted to IEEE Transactions on Fuzzy Systems, 2006. [8] J. Fortin, P. Zieli´nski, D. Dubois, and H. Fargier. Interval analysis in scheduling. In Proc. of the Int. Conf. on Principles and Practice of Constraint Programming (CP’2005) (Sitges), pages 226–240, 2005. [9] M. Gondran and M. Minoux. Graphs and Algorithms. John Wiley and sons, 1995. [10] E.E. Kerre, H. Steyaert, F. Van Parys, and R. Baekeland. Implementation of piecewise linear fuzzy quantities. International Journal of Intelligent Systems, 10:1049–1059, 1995. [11] Paul Morris, Nicola Muscettola, and Thierry Vidal. Dynamic control of plans with temporal uncertainty. In IJCAI, pages 494–502, 2001. [12] L.A. Zadeh. Fuzzy sets. Journal of Information and Control, 8:338–353, 1965.
208
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Approaches to Efficient ResourceConstrained Project Rescheduling a
Jürgen KUSTER a,1 and Dietmar JANNACH a Department of Business Informatics and Application Systems, University of Klagenfurt, Austria
Abstract. The intrinsic and pervasive uncertainty of the real world causes unforeseen disturbances during the execution of plans. The continuous adaptation of existing schedules is necessary in response to the corresponding disruptions. Affected Operations Rescheduling (AOR) and Matchup Scheduling (MUP) have proven to be efficient techniques for this purpose, as they take only a subset of future activities into account. However, these approaches have mainly been investigated for the job shop scheduling problem, which is too restrictive for many practical scheduling problems outside the shop floor. Thus, this paper analyzes and describes how the respective concepts can be extended for the more generic problem class of the Resource-Constrained Project Scheduling Problem (RCPSP). The conducted evaluation reveals that particularly our generalized version of AOR (G AOR) can yield significant performance improvements in comparison to the strategy of rescheduling all future activities. Keywords. Resource-Constrained Project Scheduling Problem, Rescheduling, Disruption Management, Decision Support Systems
1. Introduction Most work on project scheduling is based on the assumption of complete information and a static and fully deterministic environment [17]. However, a schedule is typically subject to the intrinsic and pervasive uncertainty of the real world, as soon as it is released for execution: Disruptions occur and elaborated plans become obsolete. The continuous adaptation and reparation of schedules is thus essential for efficient operation. Rescheduling (or reactive scheduling) is the process of updating an existing schedule in response to disruptions or other changes [16]. Basically, the options of (1) simply shifting all subsequent activities, (2) rescheduling only a subset of affected operations or (3) generating an entirely new schedule for all of the remaining activities can be distinguished. The first method is computationally inexpensive but can lead to poor results [9] whereas the last one maximizes schedule quality, typically requiring high computational effort and imposing a huge number of schedule modifications [17]. The option of partial rescheduling represents sort of a tradeoff: It aims at the identification of a schedule which provides the optimal combination of schedule efficiency and schedule stability at reasonable computational costs. 1 Corresponding Author: Jürgen Kuster, Universitätsstraße 65-67, 6020 Klagenfurt, Austria. E-mail: jkuster@ifit.uni-klu.ac.at.
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
209
The idea of considering only the operations affected by a disruption has first been articulated by Li et al. [14]: All activities succeeding a delayed operation within the regarded job or on the processing machine are recursively added to a binary tree if not sufficient slack time is available to compensate the disruption. The time effects associated with the nodes describe imposed schedule modifications. Abumaizar and Svestka [1] illustrate the superior performance of this approach – also known as Affected Operations Rescheduling (AOR) – if compared to right-shift rescheduling and full rescheduling; Huang et al. [10] recently introduced a distributed version of AOR. The matchup scheduling procedure (MUP) [3] represents another form of partial rescheduling: Instead of identifying the set of actually affected operations, Bean et al. propose to reschedule all activities executed before a heuristically determined matchup point. If rescheduling is possible with respect to the given constraints, all operations after the matchup point can be executed according to the original schedule - otherwise, the matchup point is postponed or jobs are reassigned to different machines. Akturk and Gorgulu [2] apply this procedure to flow shop scheduling problems where machine breakdowns occur. Partial rescheduling has mainly been studied for problems with unary and nonexchangeable resources: Particularly the production-specific job-shop scheduling problem has been investigated. More generic problem classes and other practical domains have almost completely been disregarded so far, even though the goal of returning to an already existing schedule as early as possible may be of equal importance. Consider for example the management of supply chains, traffic flows or airport operations: The respective domains are characterized by frequent disruptions, the requirement of fast intervention and high numbers of dependencies, which implies that single actors can not easily alter large portions of the collective schedule. Since triggering modifications is usually associated with penalties or costs, involved participants try to identify quick, minimal and local forms of problem resolution in the process of disruption management. This paper discusses the complexity of partial rescheduling in scenarios, in which discrete resource capacities and arbitrarily linked activities are important. It introduces and evaluates generalized approaches to AOR and MUP for the generic ResourceConstrained Project Scheduling Problem (RCPSP) and compares them to the strategy of rescheduling all future activities. The remainder of this document is structured as follows: In Section 2 the conceptual framework of the RCPSP is presented before the proposed extensions of the considered rescheduling techniques are described in detail. In Section 3 the results of a thorough analysis and comparison are discussed. Section 4 provides an outlook on future work and summarizes the contributions of this paper.
2. Rescheduling Approaches for the RCPSP The RCPSP [4,5] represents a generalization of various forms of production scheduling problems such as the job-shop, the flow-shop and the open-shop problem. It is all about scheduling the activities of a project according to some optimization criterion (the minimization of the project duration, for example), such that all precedence constraints are respected and resource requirements never exceed respective availabilities. A project in the RCPSP is usually defined by a set of activities A = {0, 1, ..., a, a+1}, where 0 and a + 1 denote abstract start and end activities with a duration of 0 and no resource requirements associated. Execution is based on a set of resource types
210
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
R = {1, ...r}: Of each type k ∈ R, a constant amount of ck units is available. The following constructs can be used for the description of time and resource dependencies. • Duration Value. For each activity i, a duration di describes how long its execution lasts. Note that the RCPSP typically consists of non-preemptive operations. • Precedence Constraints. The order of activities can be defined by precedence constraints: The existence of pi,j in the respective set P states that activity i has to be finished at or before the start of activity j. • Resource Requirements. The relationships between resource types and activities are defined in the set of resource requirements Q: Activity i requires qi,k ∈ Q units of type k ∈ R throughout its execution. The goal of the scheduling process is the identification of a vector of scheduled start ing times (β1 , ..., βa ), for which βi + di ≤ βj , ∀pi,j ∈ P and i∈At qi,k ≤ ck for any resource type k ∈ R at any time t, where At corresponds to all activities concurrently executed at t. For the subsequent discussion of rescheduling approaches the following additional concepts are required: ρi is the planned starting time of activity i, forming the lower bound for βi . δi is the due date associated with an i ∈ A, defining when an activity shall be considered late. U = {1, ..., u} groups all periods of non-shiftable resource unavailability, which are used to describe fixed resource reservations, for example. Each l ∈ U is defined by an amount ql,k required on a resource type k, a starting time βl and a duration dl . Uk denotes the subset of unavailabilities of a single resource type k ∈ R. The vast majority of research in the domain of the RCPSP has focused on the development of efficient algorithms for the generation of baseline schedules (i.e. schedules which do not consider another previously generated plan). Although rescheduling has been investigated for the specific problems of machine scheduling, the characteristics of the more general RCPSP formulation have almost completely been disregarded so far: It is particularly the potential existence of resources with capacities greater than one that makes existing techniques like AOR or MUP inapplicable for project scheduling. Since these approaches of partial rescheduling have proven to be highly efficient in the domain of machine scheduling [1,2], their extension for the RCPSP is proposed in this section: After the summary of the regarded problem and a brief description of full rescheduling, generalized versions of AOR and MUP are presented in detail. 2.1. Regarded Rescheduling Problem Prior to the discussion of potential rescheduling strategies, let us describe the regarded problem more precisely: In the following, we consider a schedule during the execution of which a disruption occurs. Such a disruption might be of one of the following types, according to the classification scheme proposed by Zhu et al. [19] (cf. [15]): Additional activity, additional precedence constraint, duration modification, requirements modification, resource capacity reduction or milestone modification. Each type has a specific set of parameters associated, the only value all of them have in common is the time of disruption detection td . The objective of rescheduling is to identify a schedule which (1) considers the modified circumstances, (2) is as close as possible to the original one and (3) is optimal according to some predefined criterion. The relevance of these aspects is formulated in an appropriate cost or quality function, which is used to evaluate candidate schedules during optimization.
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
211
2.2. Full Rescheduling Full rescheduling is concerned with the generation of a new schedule for all activities with future starting points. For this purpose, a sub-problem of the original scheduling problem is generated according to the following rules: • Past Activities. All activities which are already finished when the disruption occurs (βi + di ≤ td , i ∈ A) are ignored. Note that also the precedence relations which make these activities predecessors of future elements can be omitted. • Running Activities. Since the activities running at the time the disruption is detected (βi ≤ td < βi + di , i ∈ A) must not be interrupted, it has to be made sure that they are positioned at their past and actual starting time within the new schedule. For this purpose, it is necessary to set the planned to the actual starting time and to replace all associated resource requirements (qi,k ∈ Q, ∀k ∈ R) by corresponding resource unavailabilities: The former imposes a lower bound on the newly assigned schedule time, the latter expresses the fact that resources are actually blocked and can not be reallocated anymore. Since the modified version of the activity requires no resources and is not precedence dependent on any past operation, it will definitively be scheduled at its real starting time. • Future Activities. All activities starting after the detection of the disruption (td < βi , i ∈ A) can be considered in their original version. • Resources. All resources required by running or future activities have to be regarded in the rescheduling sub-problem. As regards associated unavailabilities, merely future periods have to be considered. This scheduling problem can be solved by applying the same method as used for the generation of the baseline schedule. It is important to note that full rescheduling does not necessarily mean that scheduling is performed from scratch: Instead, the original (and disrupted) schedule can be regarded a starting point for incremental optimization based on appropriate heuristics (such as genetic algorithms, etc.). This way it is implicitly made sure that closely related solutions are evaluated prior to completely different ones. 2.3. Affected Operations Rescheduling The concept of Affected Operations Rescheduling is motivated by the idea of preserving as much as possible of the original schedule and maintaining a maximum level of stability [16]. AOR is therefore concerned with the identification of the set of minimal changes, which are necessary to adapt the existing schedule to a new situation. It has already been mentioned that in problems where resource capacities are restricted to one and where each activity has at most one successor associated, a binary tree can be applied for this purpose [1,14]. However, this approach is not feasible for the more generic RCPSP: On the one hand an activity might have more than one successor, and on the other hand the set of actually affected resource successors can not be identified unambiguously anymore. This is mainly due to the fact that several activities can be executed on one single resource simultaneously. Consider the simple example where an activity a requires 2 units of of a resource r with cr = 3 and is immediately succeeded by the activities b, c and d in the original schedule. If no precedence constraints link the single activities and if each successor of a requires exactly one unit of r (i.e. qi,r = 1, i ∈ {b, c, d}), a delay of a causes the shift of exactly 2 of its successors: either {b, c},
212
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
Algorithm 1 G AOR Input Schedule S with A and U; modifications D; time of last regarded conflict tx Return A set of minimally modified schedules S 1: S ′ ← A PPLY M ODIFICATIONS (S, D) 2: S ′ ← M AKE P RECEDENCE F EASIBLE (S ′ ) ′ ′ 3: tc ← min(t| i∈A′t qi,k > ck , k ∈ R, t ∈ {βi , i ∈ A ∪ U }) 4: if tc is undefined then 5: S ← {S ′ } 6: else 7: if tc = tx then Ac ← A′tc \ SAmod else Ac ← A′tc 8: for all k ∈ R | i∈At qi,k − ck > 0 do c 9: Rc ← Rc ∪ k 10: B ← B ⊗ G ET M INIMAL S UBSETS (Ac , k, i∈At qi,k − ck ) c 11: B ← {Ai ∈ B | ∄Aj ⊂ Ai : l∈Aj ql,k ≥ m∈At qm,k − ck , ∀k ∈ Rc } c 12: for all A ∈ B do 13: for all i ∈ A do 14: tt ← min(t | t > tc , t ∈ {βj + dj | qj,k > 0, k ∈ Rc , j ∈ U ′ ∪ A′ \ i}) 15: D′ ← D′ ∪ G ETACTIVITY S HIFT (i, tt − βi ) 16: S ← S ∪ G AOR (S ′ , D′ , tc ) 17: end if {b, d} or {c, d} might be affected by the disruption and each of them represents the basis for a set of minimal schedule modifications. For the identification of the corresponding set of minimally modified schedules in the context of the RCPSP we propose G AOR as a generalized version of AOR [14]. Even though it is motivated by the same underlying idea, the respective technique is significantly more complex than the existing approaches for the job-shop scheduling problem. Algorithm 1 describes G AOR formally: Given the original schedule S and the modifications D implied by a disruption, first a potential new schedule S ′ is generated by incorporating all D directly into S (line 1). The precedence conflicts potentially associated with the application of the schedule modifications are then resolved by shifting all of the respective activities to the end of their latest predecessor (line 2). It is then checked, if any resource conflicts remain in the new schedule S ′ : The time of the first conflict tc corresponds to the minimal t among the starting times of all activities and resource unavailabilities within S ′ for which requirements exceed availabilities on a resource type k ∈ R (line 3). Note that the consideration of starting times is sufficient since activities and unavailabilities always block a constant amount of resource entities. If no conflict was found (line 4), S ′ represents a feasible modification of S and is returned (line 5). Otherwise, the recursive reparation process is started: • First (line 7) a set of activities to consider Ac is generated: It usually consists of all activities running at the time of the conflict less the elements Amod that have been modified earlier. Only if tc equals the previously regarded conflict time tx (i.e. the conflict could not be resolved in the previous step), also these already altered activities shall be considered: This distinction is necessary to make sure that neither one single activity is shifted to the end of the schedule, nor fixed resource unavailabilities can block the reparation process.
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
213
Figure 1. Exemplary application of G AOR: Original Schedule and Search Tree
• The next step (lines 8 to 11) is the identification of the minimal sets of activities, the shift of which is sufficient to resolve all resource conflicts at tc . First, involved resources are stored in Rc (lines 8, 9). For each of them, all subsets of Ac which can cover the excessive requirements and which are minimal (i.e. there exists no subset which is able to set enough units free by itself) are merged into B – a set of activity sets (line 10): The specific ⊗-operator, which joins each set Ai ∈ Bu with each set Aj ∈ Bv , is used to make sure that at any time the sets in B do consider all of the resource types regarded so far. Since non-minimal activity sets might result from this merge, they are removed from B in the final step (line 11). • In the last iteration (lines 12 to 16) all candidate activity sets are evaluated: For each A ∈ B a set of associated modifications D′ is generated. For this purpose, first the time to shift an activity i ∈ A to is identified (line 14): tt is the first point in time after tc , at which some units of any of the resources causing the conflict are set free (by ending unavailabilities or other activities). The modification, which describes the requirement of shifting i by tt − βi , is added to the set D′ (line 15). Finally, evaluation is performed by calling G AOR for the intermediary schedule S ′ , the set of modifications D′ and tc as the last regarded conflict (line 16). An example for the application of G AOR is depicted in Figure 1: On the left-hand side, the resource requirements associated with an original schedule are illustrated: r1 and r2 are required by the activities a to l, which are not linked by any precedence constraints. The dashed area visualizes a pending extension of activity a’s duration by 3 time units. Given this disruption, G AOR first tries to resolve the resource conflict at 4 by shifting d and either c or b. In the former case, no further conflicts exist and a minimal set of modifications has been identified. In the latter case, however, resource requirements exceed the availabilities of r1 at 12 and either f or e has to be shifted. This procedure is continued until feasible schedules have been identified on each branch of the thereby spanned search tree, which is illustrated on the right-hand side: Note that plus stands for the extension of durations and the arrow symbolizes the temporal shift of an activity. G AOR returns the leaves of the search tree, which correspond to valid and feasible solutions: Recursion ends only if no further conflicts exist. The associated schedules are then evaluated by use of a cost or quality function in search of the optimum. For problems where costs increase (or analogously: quality decreases) monotonously with additional tardiness, earliness and higher numbers of modifications, the following theorem holds: Theorem 1. The solution which is optimal according to some specified cost function is always contained in the set of schedules generated by G AOR, if no activity must start earlier than defined in the originally optimal preschedule (ρi = βi , ∀i ∈ A).
214
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
Figure 2.
G AOR
Suboptimality: Original Situation, G AOR Schedule and Actual Optimum
Proof (Sketch). Since the baseline schedule is assumed to be optimal, any temporal shift of activities is associated with costs. Given a monotonously increasing cost function, the respective value increases with the deviation from the original plan. For this reason, all adaptations have to be minimal in the optimal solution. The constraint that no activity must start earlier than defined in the original schedule makes sure that only the postponement of activities represents a feasible reaction to a disruption. In case of a shortening of process execution times, the existing schedule has therefore not be modified at all. In case of an extension of process execution times, however, a minimal set of activities has to be shifted to the right-hand side by a minimal amount of time: The content of this set of postponed elements depends on the activityrelated earliness/tardiness costs. The proposed generalized version of Affected Operations Rescheduling evaluates all existing possibilities for shifting activities to the righthand side: All subsets of all activities potentially causing a conflict are analyzed. By considering available slack and idle times, it makes sure that only minimal modifications are made to the existing schedule. The algorithm therefore definitively identifies the previously characterized optimal solution. The precondition ρi = βi , ∀i ∈ A implies that only shifts to the right-hand side are considered, which is often a practical and natural approach to efficient rescheduling: In many real world domains the number of processes and dependencies is so high, that the operative managers are not able to consider more complex shifts and reallocations within the short time available for intervention. Take, for example, the management of airport ground processes, which is characterized by short execution times, the involvement of many actors with tight individual schedules as well as the availability of only little time to react to disruptions [13,18]. However, it has to be stated that in such complex scenarios (where already modified schedules have to be rescheduled) the actual optimum might lie outside the set of solutions considered by G AOR. Take the exemplary situation depicted in Figure 2(1), where the upper plane visualizes precedence relations between the activities a, b and c as dashed lines and the two lower planes illustrate the resource usage of types r1 and r2 : The dashed area shows a pending extension of a’s duration. We assume that both b and c may start at the end of activity a at the earliest (ρb = ρc = βa + da ) and that the aim of optimization is a minimal makespan. If G AOR is applied in this situation, schedule (2) is generated, which is obviously worse than the optimal schedule (3). To overcome this limitation, we suggest to adapt the strategy of human process managers, who first try to find efficient solutions by shifting activities to the right-hand side before they spend the additionally available time in considering more extensive modifications. G AOR shall therefore particularly be considered in combination with a second
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
215
step of full rescheduling, based on incremental heuristic optimization. Since the evaluation of all potentially relevant combinations of schedule modifications soon becomes intractable for realistic problem sizes and short response times, we propose and analyze a hybrid rescheduling strategy, in which only a certain amount of time is spent in the constructive search for an initial set of (good) solutions, based on the generalized version of AOR. The respective schedules represent the starting point for the subsequent step of further optimization, which is based on some form of heuristics: Considering also the remaining solutions, the employed algorithm has to make sure that the theoretical optimum is identified at least within an infinite time horizon. 2.4. Matchup Scheduling Matchup Scheduling as proposed by Bean et al. is motivated by the idea of identifying a schedule that seeks to match up with an existing preschedule as early as possible[3]. If compared to AOR, the main difference lies in the fact that the point in time until which the disruption shall be compensated – the so-called matchup point – is determined heuristically. Scheduling is performed in a stepwise approach, where in each iteration the regarded time frame is extended until finally the entire future is considered. For the application of MUP to the Resource-Constrained Project Scheduling Problem, we introduce a generalized version named G MUP. A matchup point tm is identified by some form of heuristics, the proper choice of which depends mainly on the structure of activities and dependencies in the relevant problems. The further proceeding is similar to the method applied in full rescheduling (see Section 2.2): A sub-problem of the original scheduling problem is generated, starting at td and ending at tm . Its creation is based on the rules discussed above, only modified in the following one: • Future Activities. All activities starting after the detection of the disruption and ending before the matchup point (td < βi , βi +di ≤ tm , i ∈ A) can be considered in their original version. Future activities ending after the matchup point (td ≤ βi , tm < βi + di , i ∈ A) are handled similarly to the ones running at td : They are planned at their scheduled time, associated resource requirements are eliminated and replaced by unavailabilities: It shall thereby be guaranteed that their starting time is not modified. Moreover, their duration is shortened to tm − βi : This way it is made sure that none of them ends after the matchup point, which makes it possible to distinguish between valid and invalid solutions (see below). All remaining future activities starting only after the matchup point (tm < βi , i ∈ A) are omitted along with associated precedence constraints and resource requirements. The sub-schedules resulting from the resolution of this problem are valid if no contained activity ends after the matchup point. In that case, updating the starting times defined in the baseline schedule according to the G MUP timetable provides a feasible way to handle the given irregularity. As regards the quality of the respective solution, however, it is only made sure that disruptions are compensated as early as possible. Even though rescheduling problems can be resolved quickly based on Matchup Scheduling, it has thus to be considered that the first identified schedules might be of bad quality in terms of more complex objective functions. For this reason, we propose to continue the extension of the regarded search space (until finally all future activities are considered) instead of terminating after the identification of the first feasible solution: Only this way the theoretical optimum can be found.
216
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
3. Performance Evaluation This section describes the results of the analysis and comparison of the discussed approaches: After some remarks on the general characteristics of the regarded problem classes and the experimental setup, the generation of testsets, the considered performance indicators and the results of the performed measurements are discussed. 3.1. Characteristics and Setup of the Evaluation The rescheduling problems used for the evaluation of the proposed approaches are multiple-project tardiness problems [6]: They consist of a baseline schedule and a single disruption of type activity duration extension. Note, however, that any of the irregularities summarized in Section 2.1 could be handled by use of the above methods. The availability of only a limited amount of time for rescheduling is assumed: We are particularly interested in the potential application of the discussed approaches to (near) real-time decision support systems for operative process disruption management (cf. Section 4). For the evaluation of the proposed strategies, a generic rescheduling engine has been implemented in Java, in which schedule optimization is based on the genetic algorithm proposed by Hartmann [7]. As regards the implementation of G AOR and G MUP, the following annotations can be made: • According to the above argumentation, G AOR has been implemented and evaluated as a preprocessing technique of full rescheduling: In a heuristic approach, portions of the available time were assigned to the two steps the way that both of them can make significant contribution to the identification of an optimal solution. For the regarded problem sizes, a ratio of one to three turned out to (1) permit the identification of an appropriate number of minimally modified schedules by G AOR and to (2) leave sufficient time for further optimization. • The methods and parameters applied for the identification of matchup points in G MUP were also determined heuristically, targeting at the maximum effectiveness for the given problem structures (see Section 3.2). In a three-step approach, matchup is first tried at the time where twice the additional resource requirements are covered. The second matchup point is the point at which enough idle time has passed to cover four times the requirements imposed by the disruption. Finally, a third step performs full rescheduling. To make sure that each of these steps can contribute appropriately, one half of the overall available time was reserved for the last step and the other half was split among the two former steps in proportion to the number of activities contained in the respective sub-problems. 3.2. Testset Generation and Problem Classes Due to the currently existing lack of publicly available instances of reactive scheduling problems [15], a corresponding testset generator has been implemented: Relevant parameters include normalized versions of network complexity, resource factor and resource strength as proposed by Kolisch et al. [11] as well as specific extensions for the description of inter-process relations and exogenous events. Since the structure of the baseline schedule has significant impact on the results of rescheduling [1,9] it can also be parameterized. Accordingly, the following configurations have been used to generate eight different classes of problems:
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
217
• Low/High Process Complexity. Process complexity defines the density of the activity network in terms of precedence relations. Low complexity means that few precedence relations exist, high complexity that the activities are strongly linked. • Low/High Resource Complexity. The resource complexity parameter combines the aspects of resource requirements and resource availability: Low complexity means that many entities are available to cover few requirements, high complexity means that only small amounts are available to cover high requirements. • Tight/Wide Baseline Schedule. The tightness of the baseline schedule depends on two elements: The amount of incorporated slack time and the distribution of planned activity starting times. In a tight schedule, activities start immediately at the earliest possible time and many processes are executed synchronously. In a wide schedule, some amount of time is available between the end of a predecessor and the start of its successor and only few processes are executed in parallel. For each thereby defined class, 100 cases were generated by random, forming a total of 800 problem instances. Each case consists of 10 processes (i.e. projects) containing 10 activities, which are executed on up to 3 different kinds of resources2 . 3.3. Used Performance Indicators and Objective Function As regards performance indicators for rescheduling approaches, particularly effectiveness and schedule stability can be distinguished [10]. The former describes the quality of the applied modifications and can be measured by evaluating the new schedule in terms of costs: Effectiveness corresponds to the relation between the respective value and the theoretically possible optimum. The latter describes the nervousness of the schedule and can be measured by comparing the modified schedule with the original one: Schedule stability corresponds to the portion of activities with unmodified starting times. For the performed evaluation, we have combined both performance indicators into one single cost function: The sum of all activities’ tardiness i∈A max(0, βi + di − δi ) is used as the measure of effectiveness. It is related to schedule stability the way that each schedule modification causes three times the costs of one time unit of tardiness. 3.4. Results For each problem instance, rescheduling was performed 10 times with each of the discussed strategies. The execution time was limited to only 3 seconds, which can be considered a pretty hard setting. Since it is not easily possible to determine the definite optimum for problems of the regarded size (cf. [12]), the respective results were compared to the best value identified within all of the performed runs instead. The figures listed in Table 1 summarize how much of the thereby defined optimization potential could be tapped by the regarded strategies3 : If, for example, the disrupted schedule causes costs of 10 before full rescheduling, G AOR and G MUP bring this value down to 5, 2 or 3, 10−5 = 62.5% for the first, 10−2 respectively, the table would show 10−2 10−2 = 100.0% for the 10−3 second and 10−2 = 87.5% for the third approach. The fact that the listed values range from about 75 to 90% illustrates some sort of stability: Since only few outliers exist, it is quite likely that in many of the regarded cases the global optimum could be identified. 2 The XML description of the instances can be downloaded from http://rcpsp.serverside.at/rescheduling.html 3 On
http://rcpsp.serverside.at/rescheduling.html also the detailed results of this evaluation are provided.
218
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling Table 1. Cumulated Performance Values for Full Rescheduling, G AOR and G MUP Process Complexity low high
Resource Complexity low high
Baseline Schedule tight wide
Overall
Full
82.16%
77.32%
82.00%
77.48%
83.55%
75.93%
79.74%
G AOR
86.04% 84.82%
85.04% 78.06%
90.76% 84.04%
80.32% 78.85%
86.61% 84.32%
84.47% 78.57%
85.54% 81.44%
G MUP
Overall, the solutions identified by the proposed techniques of partial rescheduling are consistently and significantly better than the ones determined by full rescheduling. The best results are achieved by G AOR, which performs particularly well on wide baseline schedules. For the positive effects associated with the different levels of process and resource complexity, mainly two aspects are crucial: On the one hand, higher complexities cause a wider distribution of activities and work the same way as low schedule tightness does. On the other hand, low complexities ensure that more solutions can be analyzed within the available time and that it therefore is likely to identify the optimum already during preprocessing (i.e. the constructive search). Since process complexity has a rather direct influence on the wideness of the baseline schedule, and since particularly resource complexity extends the number of possibilities to consider by G AOR, best performance improvements can be achieved in scenarios with high process and low resource complexity. As regards G MUP, it can be observed that all of the respective performance values are about 1 to 2 % above full rescheduling: Again, significantly better results can be obtained for wide baseline schedules (with large amounts of slack times or high numbers of precedence constraints). A low level of resource complexity has smaller impact since G MUP is less dependent on the respectively defined combinatorial possibilities.
4. Conclusions and Future Work This paper described how two well-established techniques for partial rescheduling in the domain of machine scheduling can be adapted to and extended for the more generic problem classes of project scheduling. As generalized versions of AOR and MUP, G AOR and G MUP were proposed as approaches to efficient rescheduling in the context of the RCPSP. Their analysis and comparison to the strategy of rescheduling all future activities revealed that they can yield significant performance improvements. Particularly the use of G AOR in domains with sparse and wide baseline schedules can be suggested. It has therefore been illustrated how it is possible to perform efficient rescheduling in case of schedule disruptions: Adaptation and reparation focused on the mere temporal shift of activities. For efficient disruption management in real-world applications, however, also potential structural modifications have to be considered when it is possible to extend and shorten durations deliberately, to exchange or modify the planned order of operations or to parallelize or serialize activity execution. The only form in which at least some of the respective flexibility can be modeled in the domain of project scheduling is provided by the Multi-Mode RCPSP (MRCPSP, see [8]) which allows the alternation of activity execution modes. We have therefore proposed the x-RCPSP as a conceptual extension of the RCPSP, targeting at the support of more comprehensive forms of disruption management [13]. Apart from the further enhancement of the proposed methods and the conduction of additional comparative experiments, future work will be directed at the adaptation of the techniques proposed in this paper for such an extended framework.
J. Kuster and D. Jannach / Approaches to Efficient Resource-Constrained Project Rescheduling
219
Acknowledgements This work is part of the cdm@airports project, which is carried out in cooperation with FREQUENTIS GmbH (Austria) and is partly funded by grants from FFF (Austria).
References [1] R.J. Abumaizar and J.A. Svestka, Rescheduling job shops under random disruptions, International Journal of Production Research 35 (1997), 2065–2082. [2] M.S. Akturk and E. Gorgulu, Match-up scheduling under a machine breakdown, European Journal of Operational Research 112 (1999), 81–97. [3] J.C. Bean, J.R. Birge, J. Mittenthal and C.E. Noon, Matchup Scheduling with Multiple Resources, Release Dates and Disruptions, Operations Research 39 (1991), 470–483. [4] J. Błazewicz, J.K. Lenstra and A.H.G. Rinnooy Kan, Scheduling Projects to Resource Constraints: Classification and Complexity, Discrete Applied Mathematics 5 (1983), 11–24. [5] P. Brucker, A. Drexl, R. Möhring, K. Neumann and E. Pesch, Resource-constrained project scheduling: Notation, classification, models, and methods, European Journal of Operational Research 112 (1999), 3–41. [6] E.L. Demeulemeester, W.S. Herroelen, Project Scheduling: A Research Handbook, Kluwer Academic Publishers, Boston, 2002. [7] S. Hartmann, A competitive genetic algorithm for resource-constrained project scheduling, Naval Research Logistics 45 (1998), 733–750. [8] S. Hartmann, Project Scheduling with Multiple Modes: A Genetic Algorithm, Annals of Operations Research 102 (2001), 111–135. [9] W. Herroelen and R. Leus, Robust and reactive project scheduling: a review and classification of procedures, International Journal of Production Research 42 (2004), 1599–1620. [10] G.Q. Huang, J.S.K. Lau, K.L. Mak, L. Liang, Distributed supply-chain project rescheduling: part II distributed affected operations rescheduling algorithm, International Journal of Production Research 44 (2006), 1–25. [11] R. Kolisch, A. Sprecher, A. Drexl, Characterization and Generation of a General Class of ResourceConstrained Project Scheduling Problems, Management Science 41 (1995), 1693–1703. [12] R. Kolisch, A. Sprecher, PSPLIB - A project scheduling library, European Journal of Operational Research 96 (1996), 205–216. [13] J. Kuster, D. Jannach, Extending the Resource-Constrained Project Scheduling Problem for Disruption Management, IEEE Conference On Intelligent Systems (2006), to appear. [14] R.K. Li, Y.T. Shyu and A. Sadashiv, A heuristic rescheduling algorithm for computer-based production scheduling systems, International Journal of Production Research 31 (1993), 1815–1826. [15] N. Policella and R. Rasconi, Testsets Generation for Reactive Scheduling, Workshop on Experimental Analysis and Benchmarks for AI Algorithms (2005). [16] G.E. Vieira, J.W. Herrmann and E. Lin, Rescheduling manufacturing systems: a framework of strategies, policies, and methods, Journal of Scheduling 6 (2003), 39–62. [17] S. van de Vonder, E. Demeulemeester and W. Herroelen, An investigation of efficient and effective predictive-reactive project scheduling procedures, Journal of Scheduling (2006). [18] C.L. Wu, R.E. Caves, Modelling and Optimization of Aircraft Turnaround Time at an airport, Transportation Planning & Technology 27 (2004), 47–66. [19] G. Zhu, J.F. Bard and G. Yu, Disruption management for resource-constrained project scheduling, Journal of the Operational Research Society 56 (2005), 365–381.
220
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
A Comparison of Web Service Interface Similarity Measures Natallia KOKASH Department of Information and Communication Technology, University of Trento, Via Sommarive, 14, 38050 Trento, Italy, email: [email protected] Abstract. Web service technology allows access to advertised services despite of their location and implementation platform. However, considerable differences on structural, semantical and technical levels along with the growing number of available web services makes their discovery a significant challenge. Keyword-based matchmaking methods can help users to locate quickly the set of potentially useful services, but they are insufficient for automatic retrieval. On the other hand, the high cost of formal ontology-based methods alienates service designers from their use in practice. Several information retrieval approaches to assess the similarity of web services have been proposed. In this paper we proceed with such a study. In particular, we examine advantages of using Vector-Space Model, WordNet and semantic similarity metrics for this purpose. A matching algorithm relying on the above techniques is presented and an experimental study to choose the most effective approach is provided. Keywords. web service, service discovery, service matchmaking
Introduction During the last years the idea of software composition and refinement as opposed to software building from scratch was elaborated to the platform independent, distributed and based on open standards services paradigm. The state-of-the-art in system integration is defined by the implementation of service-oriented architectures using web service technology. Web services are loosely coupled, distributed and independent software entities, that can be described, published, discovered and invoked via the web infrastructure using a stack of standards such as SOAP, WSDL and UDDI [11]. Potentially, a large amount of compatible services simplifies building of new applications from existing components. However, the problem is very intricate due to the absence of service behavior specifications and control over the service lifecycle. Adaptivity is a highly desired property for service-based systems: troublesome components should be automatically changed to the analogues but troublefree ones. In this context the problem of service discovery acquires a significant importance. Garofalakis et al. [5] provide a survey of different perspectives in this area. Discovery can be carried out by developers at design-time or by self-assembling applications at either design or run time.
N. Kokash / A Comparison of Web Service Interface Similarity Measures
221
These processes are referred to as manual and automated discovery. Under manual discovery, a requester-human searches for a service description that meets the desired criteria. Under automated discovery, a requester-agent performs and evaluates this task. State-of-the-art on automated and semi-automated web service discovery consists of many sound proposals. Simple keyword-based service search is traded against formal methods that require manual annotation of service specifications with semantic information. The latter ones do not fully bring the issue to a close and spawn additional problems such as multiple ontology mapping. As an effort to increase precision of web service discovery without involving any additional level of semantic markup, several approaches based on Information Retrieval (IR) techniques have been proposed [7] [16] [17] [18]. All of them report enhances in precision of automated service matchmaking. In this paper we provide a comparative analysis of the ideas underlying the nominated solutions to locate the most promising strategy. Further, we provide an implementation of a matching algorithm that combines similarity scores by searching the maximum-score assignment between different specification elements. WSDL specifications contain several elements, some of which can be very similar whereas another can be completely different. This presumes combination of lexical and structural information matchings. The paper is organized as follows. In Section 1, we review the related work. In Section 2, web service description formats are discussed. Section 3 introduces similarity assessment techniques used in our approach. Section 4 describes the proposed web service matching algorithm. Experimental results are presented in Section 5. Finally, Section 6 concludes the paper and outlines future work. 1. Related Work Currently UDDI registries1 are the dominating technological basis for web service discovery. Alternatively, ebXML registries2 can be used to advertise services available on the web. They allow storing actual WSDL specifications in a repository. As a consequence, such ability as retrieval of WSDL using custom ad hoc queries is enabled. The question about the use of registries seems to be irrelevant due to the advantages they bring to the technology. But existing registries are still small and mostly private. The discovery supported by registry APIs is inaccurate as retrieved services may be inadequate due to low precision and low recall. Users may need to examine different registries before they find an appropriate service. Approaches that reduce manual activities in service discovery and allow intelligent agents to identify useful services automatically are required. Below we analyze the existing methods targeted at improving automated service matchmaking. Generally, information matching can be accomplished on two levels: • In syntactic matching we look for the similarity of data using syntax driven techniques. Usually, the similarity of two concepts is a relation with values between 0 (completely dissimilar) and 1 (completely similar). 1 http://www.uddi.org 2 http://www.oasis-open.org/committees/regrep/documents/2.0/specs/ebrim.pdf
222
N. Kokash / A Comparison of Web Service Interface Similarity Measures
• In semantic matching the key intuition is the mapping of meanings. There are several semantic relations of two concepts: equivalence (≡), more general (⊇), less general (⊆), mismatch (⊥) and overlapping (∩). Nevertheless, they can be mapped into a relation with values between 0 and 1. Among the areas closely related to service matching are: Text document matching. These solutions rely on term frequency analysis and ignore document structure. Among the most popular methods are the VectorSpace Model (VSM), Laten Semantic Indexing (LSI) and Probabilistic Models [2]. However, solely they are insufficient in a web service context. Semi-structured document matching. The major part of information on the web today is represented in HTML/XML formats. This fact spawned the research aiming to improve IR from semi-structured documents. Methods using plain text queries do not allow users to specify constrains on the document structure. On the other hand, recall of exact matching algorithms used XPath 3 or XQuery 4 is often too low. Kamps et al. [10] noted that structure in XML retrieval is used only as a search hint, but not as a strict requirement. Software component matching. Software components can be compared with various degrees of accuracy. Structural similarity reflects the degree to which the software specifications look alike, i.e., have similar design structures. Functional similarity reflects the degree to which the components act alike, i.e., capture similar functional properties [9]. Functional similarity assessment methods rely on matching of pre/post-conditions, which normally are not available for web services. There is an ongoing research to support web service discovery by checking behavioral compatibility (e.g. [8]). Schema matching. Schema matching methods [12] can be based on linguistic and structural analysis, domain knowledge and previous matching experience. However, the application of schema matching approaches is impeded by the fact that the existing works have mostly been done in the context of particular application domain. In addition, service specifications have much plainer structure than schemas. In IR approaches to service discovery a query consists of keywords, which are matched against the stored descriptions in service catalogs. LSI, the prevailing method for small document collections, was applied to capture the semantic associations between service advertisements [13] in a UDDI registry. Bruno et al. [3] experimented with automated classification of WSDL descriptions using support vector machines. Stroulia at al. [16] developed a suite of algorithms for similarity assessment of service specifications. WSDL format does not provide any special semantic information but it contains the documentation tag with service documentation and elements with natural language descriptions of operations and data types. Identifiers of messages and operations are meaningful, XML syntax allows to capture the domain specific relations. The WordNet database was applied for semantic analysis. According to those experimental results, the methods are nei3 http://www.w3.org/TR/xpath 4 http://www.w3.org/TR/xquery
N. Kokash / A Comparison of Web Service Interface Similarity Measures
223
ther precise nor robust. The main drawback, in our opinion, is that poor heuristics in assigning weights for term similarity were used. Dong et al. [7] present a search engine Woogle focused on retrieval of WSDL operations. Their method is based on term associations analysis. The underlying idea can be expressed by the heuristic that parameters tend to reflect the same concept if they often occur together. The above approaches do not consider data types in a proper way. Carman and Serafini [15] designed an algorithm for semantic matching of complex types. Structure information is used to infer equal, more general or less general relations between type schemas. In [17] web service similarity is defined using a WordNet-based distance metric. Zhuang et al. [18] apply a similar approach. The future directions outlined in the papers include automated preprocessing of WSDL files with complex names handling and structural information analysis, provided in our approach. We also propose a new method to join structural, syntactic and semantic similarities of different elements in a single-number measure. Further, we compare matching algorithms with three different kernel functions.
2. Web Service Specification Formats Table 1 shows an example of two compatible WSDL definitions. Both specifications represent web search services: GoogleSearch and WolframSearch. If a client asks some service registry for service GoogleSearch which is not published there, WolframSearch can be returned instead. Then, if both services are advertised in the same registry they should be classified in the same group to simplify service location. If one of the services fails, another one can be invoked instead to satisfy the user request. These cases require establishment of exact correspondence between service operations, comparison of input/output parameters and checking of data type compatibility. We examine five logical concepts of WSDL files that are supposed to contain meaningful information: services, operations, messages, parts and data types. Semantic information from WSDL file representation is shown in Figure 1. Each element has a description, i.e., a vector that contains semantic information about this element extracted from the specification. Data types can consist of several subelements. We do not consider explicitly their internal organization. However, names of the higher-level organizational tags (like data type category (complexType, simpleType, group, element) or composers (all, sequence, choice, restriction, extension)) are included into the element description. While matching data types, we do not take into account parameter order constraints since parser implementations often do not observe them. This does not harm well-behaved clients and offers some margin for errors. We rely on ”relaxed” structural matching since too strict comparison can significantly reduce recall. For example, for the concepts of GoogleSearch web service in Table 1 the corresponding concepts of service WolframSearch can be found despite of their rather different organization. Striving for the automated web service discovery and composition has lead to the idea of annotating services manually with semantic information. Recently appeared WSDL-S [1] proposal provides a way to associate semantics with web service specifications. It is assumed that there exist formal semantic models rele-
224
N. Kokash / A Comparison of Web Service Interface Similarity Measures Table 1. WSDL specifications of two web services GoogleSearch:
...
... < /all> < /complexType> WolframSearch:
...
< /sequence> < /complexType> < /element> ...
... < /sequence> < /complexType>
Figure 1. WSDL data representation
vant to web services. These models are maintained outside of WSDL documents and are referenced from it via extensibility elements. The specified semantic data include definitions of the preconditions, inputs, outputs and effects of service operations. The main advantage over the similar approaches is that developers can
225
N. Kokash / A Comparison of Web Service Interface Similarity Measures
annotate web services with their own choice of ontology language. Ontologies, i.e., explicit and formal specifications of knowledge, are a key enabling technology for the semantic web. They interweave human understanding of symbols with their machine-processability. In respect to web service technology, ontologies can be used for describing service domain-specific capabilities, inputs/outputs, service resources, security parameters, platform characteristics, etc. The main difficulty in practice arises from the fact that the requester and the provider are unlikely to use the same ontology. Inheritance is important in software development. A bridge between services with and without semantic annotations should be constructed. IR-based service matching algorithms can be extended to become uniform methods that allow matching of WSDL and WSDL-S specifications. In this case some element descriptions will contain references to the corresponding ontology concepts. The latter ones can be compared using specialized matchmaking algorithms. 3. Similarity Assessment In this section, we cover the rationale of the proposed service matching method. The main idea of the algorithm is a combination of element level lexical similarity matching and structure matching: 1. The goal of lexical matching is to calculate the linguistic similarity between concept descriptions. 2. Under structural matching we understand the process of similarity assessment between composite concepts (services, operations, messages, parts, data types) that include several subelements. 3.1. Lexical Similarity Three different linguistic similarity measures were used to compare textual concept descriptions. Given a set of documents we can measure their similarity using Term Frequency - Inverse Document Frequency (TF-IDF) heuristic. Formally it is defined as follows: Let D = {d1 , ..., dn } be a document collection, and for each term wj let nij denote the number of occurrences of wj in di . Let also nj be the number of documents that contain wj at least once. The TF-IDF weight of wj n in di is computed as xij = T Fij · IDFj = |diji | log( nnj ), where |di | is the total number of words in document di . The similarity measure between two docT uments is defined by the cosine coefficient: cos(xi , xk ) = √ Txi √xk T , where xi xi
xk xk
xi = (xi1 , ..., xim ), xk = (xk1 , ..., xkm ) are vectors of TF-IDF weights corresponding to the documents di and dk , m is the number of different words in the collection. A more detailed description can be found in [2]. WordNet is a lexical database with words organized into synonym sets representing an underlying lexical concept. To address the shortcoming of VSM considering words at the syntactic level only, we expanded the query and WSDL concept descriptions with synonyms from WordNet. After that we compared the obtained word tuples using TF-IDF measure.
226
N. Kokash / A Comparison of Web Service Interface Similarity Measures
Finally, element descriptions were compared using an approach more concerned with the meaning of words. Semantic similarity is a measure that reflects the semantic relation between two terms or word senses. Thus, after tokenization (splitting an input string into tokens, i.e. determining the word boundaries), word stemming (removing common morphological and inflexional endings from words) and stopwords removing (eliminating very frequently and very rarely used words), which are common for all three methods, the following steps can be performed to compute semantic similarity of two WSDL concept descriptions: 1. Part of speech tagging. Syntactic categories such as noun, verb, pronoun, preposition, adverb, adjective should be assigned to words. 2. Word sense disambiguation. Each word may have different lexical meanings that are fully understood only in a particular context. Disambiguation is a process of enumerating the likely senses of a word in a ranked order. 3. Semantic matching of word pairs. Given input strings X and Y a relative similarity matrix M can be constructed as follows: each element M [i][j] denotes the semantic similarity between the word at position i of X and the word at position j of Y. If a word does not exist in the dictionary the edit-distance similarity and abbreviation dictionaries can be used. 4. Semantic matching of word tuples. The problem of capturing semantic similarity between word tuples (sentences) can be modelled as the problem of computing a total weight in a weighted bipartite graph described in the next section. Other metrics can be used as well [6]. 3.2. Structural Similarity Each concept of a query should be confronted with one of the concepts in the documents from the collection. This task can be formulated as Maximum Weight Bipartite Matching problem, where the input consists of an undirected graph G = (V, E), where V denotes the set of vertices and E is the set of edges. A matching M is a subset of the edges such that no two edges in M share a vertex. The vertices are partitioned into two parts, X and Y . An edge can only join vertices from different parts. Each edge (i, j) has an associated weight wij . The goal is to find a matching with the maximum total weight. The problem can be solved in polynomial time, for example, using Kuhn’s Hungarian method [4]. We applied the above method on different levels of our matching algorithm: (1) to get semantic similarity of two descriptions, (2) to calculate similarity of complex WSDL concepts given similarity scores for their subelements. Weight wij of each edge is defined as a lexical similarity between elements i and j. The total weight of the maximum weight assignment depends on the set sizes. There are many strategies to acquire a single-number dimension-independent measure in order to compare sets of matching pairs, the simplest of which is the matching average (see Table 2). Here |X| is the number of entries in the first part, |Y | is the number of entries in the second part, and |X ∩ Y | denotes the number of entries that are common to both sets. Finally, |X \ Y | defines the number of entries in the first set that are not in the second, and |Y \ X| is the number of entries in the second set that are not in the first. Two elements i ∈ X, j ∈ Y are considered to be similar if wij > γ for some parameter γ ∈ [0, 1].
N. Kokash / A Comparison of Web Service Interface Similarity Measures
227
Table 2. Similarity coefficients Coefficient name Matching average Dice coefficient Simpson coefficient Jaccard coefficient First Kulczynski coefficient Second Kulczynski coefficient
Formula 2 ∗ M atch(X, Y )/(|X| + |Y |) 2 ∗ |X ∩ Y |/(|X| + |Y |) |X ∩ Y |/(min(|X|, |Y |)) |X ∩Y |/(|X ∩Y |+|X \Y |+|Y \X|) |X ∩ Y |/(|X \ Y | + |Y \ X|) (|X ∩ Y |/|X| + |X ∩ Y |/|Y |)/2
4. Web Service Matching Algorithm To obtain WSDL concept descriptions we extracted: (1) sequences of more than one uppercase letters in a row, (2) sequences of an uppercase letter and following lowercase letters, (3) sequences between two non-word symbols from element names, namespaces, data types, documentation and organizational tags. Our experiments show that these simple heuristics work fairly well. For example, from ”tns:GetDNSInfoByWebAddressResponse” we get the following word tuple: {tns, get, dns, info, by, web, address, response}. After extracting meaningful words from all WSDL specifications we built word indices, where a relative TF-IDF coefficient is assigned to each word. We must note that word stemming (accomplished by the classical Porter stemming algorithm) neither reduced the index dimension nor improved performance on our test bed (described in Section 6). Stopwords removing also brought no effect. Frequently used words in WSDL specifications like get/set, in/out, request/response may distinguish conceptually different elements (e.g., GetDNSInfoByWebAddressSoapIn and GetDNSInfoByWebAddressSoapOut). To reduce dimension of word vectors to be compared we used three separate word indices: the first index for data types, the second one for operations, messages and parts and the third one for service descriptions. The information extracted from WSDL specifications is short and rather different from natural language sentences. A clear semantic context is missing in the concept descriptions collected from several technical XML tags. Due to this reason, word sense disambiguation seems to be infeasible. To define a lexical similarity of all possible senses of two terms we used a WordNet-based metric designed by Seco et al. [14]. Its Java implementation is available on http://wordnet.princeton.edu/links.shtml. Our matching algorithm is presented in Table 3. The overall process starts by comparing service descriptions and the operations provided by the services, combined after in a single-number measure. Operation similarity, in their turn, is assessed based on operation descriptions and their input/output messages. To compare message pairs we again evaluate similarity of message descriptions and compare their parts. Since one part with a complex data type or several parts with primitive data types can describe the same concept, we must compare message parts with subelements of complex data types as well. Function compareDescriptions(d1 , d2 ) compares two concept descriptions d1 and d2 either using TF-IDF heuristic with or without WordNet synonyms or by applying lexical semantic similarity measure. Function getAssignment(M ) finds the maximum weight assignment considering matrix M as a bipartite graph where
228
N. Kokash / A Comparison of Web Service Interface Similarity Measures Table 3. WSDL matching algorithm
double compareTypes(type1 , type2 ) 1 scoreList ← compareDescriptions(type1 .description, type2 .description) 2 for (int i = 0; i < type1 .elementList.length; i++) 3 for (int j = 0; j < type2 .elementList.length; j++) 4 d1 = type1 .elementList[i].description 5 d2 = type1 .elementList[j].description 6 M [i][j] = compareDescriptions(d1 , d2 ) 7 scoreList ← getAssignment(M [i][j])) 8 return getScore(scoreList) double compareElementLists(partList, elementList) 1 for (int i = 0; i < partList.length; i++) 2 for (int j = 0; j < elementList.length; j++) 3 M [i][j] = compareDescriptions(partList[i].description, elementList[j].description) 4 scoreList ← getAssignment(M [i][j])) 5 return getScore(scoreList) double compareParts(part1 , part2 ) 1 scoreList ← compareDescriptions(part1 .description, part2 .description) 2 scoreList ← compareTypes(part1 .type, part2 .type) 3 return getScore(scoreList) double compareMessages(msg1 , msg2 ) 1 scoreList ← compareDescriptions(msg1 .description, msg2 .description) 2 for (int i = 0; i < msg1 .partList.length; i++) 3 elementList1 ← msg1 .partList[i].type.elementList 4 for (int j = 0; j < msg2 .partList.length; j++) 5 elementList2 ← msg2 .partList[j].type.elementList 6 for (int i = 0; i < msg1 .partList.length; i++) 7 for (int j = 0; j < msg2 .partList.length; j++) 8 M [i][j] = compareParts(msg1 .partList[i], msg1 .partList[j]) 9 score1 = getScore(getAssignment(M [i][j])) 10 score2 = compareElementLists(msg1 .partList, elementList2 ) 11 score3 = compareElementLists(msg2 .partList, elementList1 ) 12 return max(score1 , score2 , score3 ) double compareOperations(op1 , op2 ) 1 scoreList ← compareDescriptions(op1 .description, op2 .description) 2 scoreList ← compareMessages(op1.inputM essage, op2.inputM essage) 3 scoreList ← compareMessages(op1.outputM essage, op2.outputM essage) 4 return getScore(scoreList) double compareServices(service1 , service2 ) 1 scoreList ← compareDescriptions(service1 .description, service2 .description) 2 for (int i = 0; i < service1 .operationList.length; i++) 3 for (int j = 0; j < service2 .operationList.length; j++) 4 M [i][j] = compareOperations(service1 .operationList[i], service1 .operationList[j]) 5 scoreList ← getAssignment(M [i][j])) 6 return getScore(scoreList)
N. Kokash / A Comparison of Web Service Interface Similarity Measures
229
rows represent set X, columns represent set Y and edge weight wij is equal to M [i][j]. The total similarity score can be measured by any of the coefficients in Table 2. We considered the impact of each element within a complex concept to be proportional to its length, i.e., given nlist scoreList of matching scores of different elements, getScore(scoreList) = i=1 scoreList[i]/n, where n is the list length. 5. Experimental Results To evaluate the effectiveness of different approaches we run the experiments using a collection of web services described in [17]. It consists of 40 XMethods service descriptions from five categories: ZIP code finder, Weather information finder, DNA information searcher, Currency rate converter and SMS sender. In Table 4, the collection characteristics and preprocessing time performance are shown. Table 4. Preprocessing performance Services Operations 40 628 Parsing time (sec): 37
Messages Parts Types 837 1071 410 Indexing time (sec): 2
Since we did not use any additional information apart from those indicated in WSDL specification (i.e., service documentation and quality parameters), our method can be compared with the interface similarity defined in [17]. Their precision varied from 42 to 62%. The effectiveness of our method was evaluated by calculation of average precision, that combines precision, relevance ranking, and overall recall. Formally, it is defined as a sum of the precision at each relevant document in the extracted total number of relevant documents in n list divided bythe j the collection: v = j=1 (relevant[j] k=1 relevant[k]/j)/r, where n is the number of documents, r is the total number of relevant documents for the query, and relevant[i] is 1 if i-th document in the extracted list is relevant and 0 otherwise. Results are shown in Figure 2.
Figure 2. Average precision
Figure 3. Processing time
Methods using TF-IDF heuristic and synonyms from WordNet were quite fast, while the usage of lexical semantic similarity required a significant time even for such a small collection. In addition, semantic similarity does not bring any gain in matching precision. This conclusion is confirmed by experiments on the collection described in [16]. We compared 447 services divided into 68 groups.
230
N. Kokash / A Comparison of Web Service Interface Similarity Measures
Because of space restriction the results are not published here and can be found along with algorithm sources on http://dit.unitn.it/∼kokash/sources. An interesting observation is that the groups with better precision in [17] correspond to the groups with worse average precision in our experiments. This may have happened due to different proportions of structure vs. semantic similarity impact on the final similarity score. Enriching element descriptions by synonymous from the WordNet ontology leads to significant increase in index size (see Table 5). As we can conclude from this statistics, data types are the most informative part of WSDL files. Exhaustive WordNet context essentially differs the text corpus used in concise service descriptions. Yet, WordNet does not provide multitude associations that are required for service matching. Thus, words ”currency” and ”country” are not recognized as related concepts. Nevertheless, it is clear that given a country name we can get its currency and use a web service accepting currency codes as input to exchange money. Consequently, operations getRateRequest(country1, country2) and conversionRate(fromCurrency, toCurrency) had significantly lower similarity score than they are expected to have. A repository of verified transformations should be created by clustering of lexically similar terms, terms in complex data types and explicit user experiences. Table 5. Index size Terms Synonyms Total
Type 1634 3227 4861
Operation 1336 1460 2796
Description 177 703 880
Total 3147 5390 8537
6. Conclusions and Future Work We proposed a consistent technique for lexical and structural similarity assessment of web service descriptions, that can be useful in discovery, service version comparison, estimation of efforts to adapt a new service, automated service categorization and blocking in service registries. Our approach can significantly reduce manual operations in these areas provided that the advertised specifications contain feasible information. What we frequently observed in our test collections was an absence of any documentation and/or meaningful identifier names. Three different functions to measure specification lexical similarity were applied. The classical vector-space model has shown the best performance. Surprisingly, application of semantic similarity metric did not help to improve precision/recall of service interface matching. The reason for this can be in ambiguity of the terms used in service specifications. For some service classes, comparison of WordNet-empowered descriptions brought a slight improvement. However, classical TF-IDF heuristic over-performed the other approaches in most cases. Due to excessive generality of WordNet ontology many false correlations were found. Particularly lacking from the literature was a comparative analysis of the existing IR techniques applied for web service matchmaking. Our experiments enlighten this situation and pose some relevant issues for future research. The
N. Kokash / A Comparison of Web Service Interface Similarity Measures
231
matching algorithms based on semantic similarity metric should be optimized. More careful study of different approaches is also desirable. We suppose that this work can be improved by using state-of-the-art IR approaches like classification learning or supervised service matching. Also, we are planning to investigate service behavioral compatibility in combination with matching of their structural, syntactic and semantic descriptions.
References [1] Akkiraju, R., et al.: ”Web Service Semantics - WSDL-S”, April 2005, http://lsdis.cs.uga.edu/library/download/WSDL-S-V1.pdf. [2] Baeza–Yates, R., Ribiero–Neto, B.: Modern Information Retrieval. Addison Wesley, 1999. [3] Bruno, M., Canfora, G. et al.: ”An Approach to support Web Service Classification and Annotation”, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2005. [4] Galil, Z: ”Efficient Algorithms for Finding Maximum Matching in Graphs”, ACM Computing Surveys, Vol. 18, No. 1, 1986, pp. 23-38. [5] Garofalakis, J., Panagis, Y., Sakkopoulos, E., Tsakalidis, A.: ”Web Service Discovery Mechanisms: Looking for a Needle in a Haystack?”, International Workshop on Web Engineering, 2004. [6] Corley, C., Mihalcea, R., ”Measuring the Semantic Similarity of Texts”, Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13-18, 2005. [7] Dong, X.L. et al.: ”Similarity Search for Web Services”, Proceedings of VLDB, 2004. [8] Hausmann, J.H., Heckel, R., Lohmann, M.: ”Model-based Discovery of Web Services”, Proceedings of the IEEE International Conference on Web Services, 2004. [9] Jilani., L.L., Desharnais, J.: ”Defining and Applying Measures of Distance Between Specifications”, IEEE Transactions on Software Engineering, Vol. 27, No. 8, 2001, pp. 673–703. [10] Kamps, J., Marx, M., Rijke, M., Sigurbjornsson, B.: ”Structured Queries in XML Retrieval”, Conference on Information and Knowledge Management, 2005. [11] Papazoglou, M. P., Georgakopoulos, D.: ”Service-oriented computing”, Communications of the ACM, Vol. 46, No. 10, 2003, pp. 25–28. [12] Rahm, E., Bernstein, P.: ”A Survey of Approaches to Automatic Schema Matching”, VLDB Journal, Vol. 10, No. 4, 2001, pp. 334–350. [13] Sajjanhar, A., Hou, J., Zhang, Y.: ”Algorithm for Web Services Matching”, Proceedings of APWeb, 2004, pp. 665–670. [14] Seco, N., Veale, T., Hayes, J.: ”An Intrinsic Information Content Metric for Semantic Similarity in WordNet”, European Conference on Artificial Intelligence, 2004. [15] Carman, M., Serafini, L., Traverso, P.: ”Web Service Composition as Planning”, Workshop on Planning for Web Services, 2003. [16] Stroulia, E., Wang, Y.: ”Structural and Semantic Matching for Accessing Web Service Similarity”, International Journal of Cooperative Information Systems, Vol. 14, No. 4, 2005, pp. 407-437. [17] Wu, J., Wu, Z.: ”Similarity-based Web Service Matchmaking”, IEEE International Conference on Services Computing, 2005, pp. 287-294. [18] Zhuang, Z., Mitra, Pr., Jaiswal, A.: ”Corpus-based Web Services Matchmaking”, AAAI Conference, 2005.
232
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Finding Alternatives Web Services to Parry Breakdowns 1 Laure Bourgois a,2 a France Telecom R & D, LIPN Abstract. The increasing number of Web services and thus of possible combinations is particularly hard to accord with the dynamic and versatile feature of the Web. Indeed, in general, complex Web services may frequently happen not to fit the situation encountered. To avoid composing a new Web service "from scratch", we search a practical way of repairing failed ones. This paper presents a Web service repairing heuristic based on a semantic modeling approach. Modeling Web services is triggered with an OWL-S fragment. Repairing is based on our previous work which allows organizing a collection of Web services into hierarchies. A NEXPTIME complexity upper bound is shown for a reparation which doesn’t change the internal structure of the complex Web service. Keywords. Web services, software components, breakdown, dynamic logic, subsumption, reconfiguration
1. Context and Problem This work takes place in the semantic Web context. Web services are single components distributed on the Web [14]. They are using XML based languages. Web services are endowed with communication system and they can use and modify available data on the Web. Various descriptive standards exist, and nowadays, several standards compete to normalize them, each one corresponding to a specific aspect of Web services : messages exchange, concurrence, trust, security, orchestration . . . Several tasks can be automatized like composition, choreography, discovery, invocation . . . Every day, a multitude of Web services arise, it would be useful and a saving of time to turn to good account of existing Web services instead of composing them. Moreover as [17] summarized : "This problem of creation of new composite Web service is in principle equal to the old problem of generalized automatic programming. This problem is notoriously unsolved in general by any known techniques. There is no reason to believe that the Web service version of this problem will be any less resistant to a general solution." Web service repositories exist yet, e.g. UDDI which is an emerging standard. It is a Web services repertory similar to Yellow Pages. It can be expected that libraries of this 1 Use
IOS-Book-Article.tmpl file as a template. rue du Général Leclerc, 92794 Issy-les-Moulineaux Cedex 9, France, Tel : +33 1 45 29 66 56, [email protected] 2 38
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
233
Figure 1. data and program breakdowns
kind will be widely used due to Web services abundance. Surely, that kind of library is a prerequisite to a repairing task. The focus here is on OWL-S [12], that describes Web services execution from a semantic and a logical point of view. Suppose for instance that we want to buy a CD online. We first need an ontology to represent the CD database, and a semantic description of different Web services for representing the effects of their execution on the CD data base 3 . A complex Web service has an normal execution model on the Web, which lays his life-able conditions. More precisely, we see the Web as a huge informal nontransactional database. Top of figure 1, represents the normal execution of the complex Web service reservBuy sequence cancelBuying sequence reserv1. Vertices on figure 1, labeled by numbers, represent constraints on database evolution for the Web Service to be executable. They don’t lay constraints on the database itself. The normal execution model Web service fits with integrity requirements of the database. We suppose we have discrete and datable observation on real Web services execution (as in a diagnostic approach). If a breakdown arises, it means that observations superimposed on normal execution model, generate inconsistencies. It reflects the inadequacy of the Web service compared to Web dynamic feature. Three main categories of problems can explain why the Web is strongly versatile : - network breakdown: We will not treat here that kind of problems4 . - resource breakdown: a CD that was available, for example, 10 seconds ago may have become unavailable. In other words, resources can disappear. It appears as a spontaneous database evolution, but it’s just a consequence of parallelism because data are not locked for a non-transactional database. It means, that another Web service greedy on this resource has acted in the same time on database. An example of failing Web service execution is shown in figure 1 at third step of the middle picture. Observation that lays inconsistency with the normal execution model is represented in bold type in figure 1. - program breakdown: several problems (failing QoS, removal by furnishers, . . . ) can generate breakdowns on component as we can see at the fourth step of the bottom drawing in figure 15 . For some reasons, program reserv1 does not act correctly anymore, the consequence is that it cannot change the status of ¬ CDres into CDres. 3 To
that end, we use their IOPE (input, output precondition and effect), an OWL-S fragment. we want to exam Web services, rather than their environment, ie. the network. 5 This argument already appears in [18] in favour of reconfiguring Web services. 4 Indeed,
234
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
Figure 2. an example of repairing task
These reasons explain why Web services composed by machines or users may fail. They constitute a concrete obstacle against Web services, as composed Web services execution has to be controlled and some recovery may be needed. Moreover, Web services may include a logical contradiction, which is more a software problem rather than a breakdown. Figure 2 represents a logically failing Web service : a complex Web service reservBuy in sequence of cancelBuying and then reserv2. Contradiction in node 3 comes from reserv2 which cannot make twice reservations on the same CD. A possible repairing option is to replace it by reserv1 whose execution does not hold this constraint. For these three reasons (network breakdowns are brushed aside), a way to control composed Web services execution is needed. As we cannot act on resources, we decide to repair composite Web service replacing some Web service components by others. We point out it doesn’t necessary mean that Web services replaced are the "faulty" ones. We suppose that in the future, huge UDDI size will be a concrete obstacle to compose another Web service that does more or less the same, and that it will be easier to find a simple Web service to replace some failing part. We bet with [17], that after repairing "a few", an initially defective Web service will become viable again and will fit again the model so we will avoid a new composing task. The proposed solution relies on previous work of the same authors [3] to generate constraints of a complex Web Service execution from its definition, and to detect subsumption relations between Web services. For the present purpose, breakdowns can be represented by inconsistencies coming from Web service normal execution model and observations superimposition. A side effect of our constraint detection algorithm allows logical conflicts detection (we will call conflicts henceforth clashes) inside Web service possible executions. With those results, we present an heuristic to replace the failing part of a failing Web Service with another one introducing heuristically as few changes in preconditions and effects as possible. We call this task a repair rather than a reconfiguration, to insist on this heuristic of minimal change. It is worthy to point out that this question takes place at the meeting point of automatics and artificial intelligence (diagnosis and reconfiguration, as in [9], appear in both fields, indeed, in research literature, a Web service is often seen as a plan). Some work on general reconfiguration exist, which is not applied to Web services [9]. With regard to Web services reconfiguration, numerous works exist as [17] and [18]. In the first part, we precise a side effect of our constraint detection algorithm which allows localizing clashes on Web service normal execution model and observation superimposition. The repairing task must raise all clashes. The second part presents an heuris-
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
235
tic to that aim. The third part exposes a complexity result of such task in general. Last section describes related works.
2. Modeling Web Services A major hypothesis of our previous work is that the Web is a huge informal nontransactional database. This view is kept here, and only active Web services, those that carry out actions on the huge database as opposed to those that query information (informative Web services), are considered. The distinction has been considered in our previous work [3]. An informative Web service queries a Web database and the answer is a view, a new informative object according to Sahoo [15]. Informative and active Web services do not hold the same behavior. Actually, an informative Web service holds just what Web tradition calls data flow, and active Web service is on control flow side. 2.1. Atomic and Complex Actions Complex Web services are seen as plans composed with atomic actions (atomic Web services). Generally, an atomic Web service is seen as an atomic action with preconditions and effects. Several ways for modeling active Web services are possible. McIlraith’s work [10] uses situation calculus. According to our previous work [3], an atomic action is endowed with positive and negative preconditions lists and positive and negative effects lists, similar to a STRIPS operator [5]. Determinism of action conforms to active Web service behavior. An OWL-S fragment fits with the proposed formalism. A specific PDL: The PDL (Propositional Dynamic Logic) Family is one of the first logical family for modeling programs; invented by Pratt it has been developed by Harel [8]. A PDL logic provides a reasoning tool for dynamic systems and programs. Programs are modeled by means of state transformers and states are seen as sets of properties. Note that processes are not modeled by themselves as in process algebra. In the classical PDL version, constructors for action formulas are : sequence, indeterministic choice, Kleene star and test. Classical PDL is decidable and finitely axiomatizable [8]. However, we use the same particular PDL that in our previous work : with intersection, negation on atomic actions and without neither test nor Kleene star (to keep a decidable version). This PDL version is quite close to the one in [1], was inspired by [6] works, and is still decidable [7]. Syntax: Let Pf a finite set of propositions and Pa a finite set of symbols for atomic action. An action α and a state formula φ have the following form: • α ::= a | α1 ∧ α2 | α1 ∨ α2 | α1 ; α2 | ¬a | any with a ∈ Pa , • φ ::= p | ⊤ | ⊥ | φ1 → φ2 | ¬φ | < α > φ | fun a with p ∈ Pf , a ∈ Pa . Symbol any represents any atomic action (any ≡ a∈Pa a). fun is a meta symbol to constraint semantical relation associated to action α to be functional. We adopt the stan-
236
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
dard definitions for the remaining boolean operations on state formulas and for the box modality 6 . Basic Models of Specific PDL: A model of PDL logic is a particular sort of Kripke structure with a family of relations for semantic modalities. It is a triple M = (S, {RR }, V), with: S a set of states that cannot be void, V a valuation function that maps each atomic formula to the subset of S where it is true, and {RR } a family of relations on S × S. For non atomic actions, {RR } is inductively defined as follows : Rany ⊆ S × S Rα∨β = Rα ∪ Rβ , Rα∧β = Rα ∩ Rβ , R¬α = Rany \ Rα iff α is atomic, Rα;β = Rβ ◦ Rα The satisfaction relation "formula φ holds at state s in structure M”, denoted by M |=s φ is classically defined as follows: M |=s p iff s ∈ V(p) M |=s ¬φ iff M |=s φ M |=s φ → ψ iff (if M |=s φ then M |=s ψ) M |=s < α > φ iff (for some s’ (s, s′ ) ∈ Rα and M |=s′ φ) M |=s fun α iff (exists at most one s’ so(s, s′ ) ∈ Rα ). A formula φ is said to be valid in a structure M, noted M |= φ, if and only if φ holds in all the states of M, i.e., ∀s ∈ S M |=s φ. 2.2. Actions Descriptions Action Theory: Each α ∈ Pa , is described according to its preconditions and effects : (
i∈I
(
i∈I
pi ∧ ¬pi
j∈J
j∈J
¬pj ) →< α > (
pj ) → [α]⊥
k∈K
pk ∧
l∈L
¬pl ) ∧ fun α
(1)
(2)
with I, J sets of positive and negative preconditions and with K, L sets of positive and negative effects. Meta symbol fun insure α execution necessarily yields α effects. Atomic actions descriptions are normalized : each fluent occurring in preconditions lists must occur either positively or negatively in one of the effects lists. Let APa be the set containing formalizations of each atomic action, it is associated to Pa , and it is called action theory. For practical use, we present action theory, APa , in a tableau. Figure 3 describes atomic Web services encountered in introductory examples. Persistence Axioms: PDL action formalization is not sufficient to provide the required determinism. All literals that do not occur positively or negatively in effects list are assumed to be invariant. As in [6], persistence axioms can be formulated in the logic (more details in [3]). 6 So φ ∧ ψ := ¬(φ → ¬ψ), φ ↔ ψ := (φ → ψ) ∧ (ψ → φ), φ ∨ ψ := ¬φ → ψ, [α]φ := ¬ < α > ¬φ. Having two different negative constructors, one for state formulas and one for action formulas, is customary.
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
237
Figure 3. an action theory APa represents a CD reservation and purchase system
The set of persistence axioms associated to Pa is called FPa . All these persistence axioms can be automatically generated. The set APa ∪ FPa completing the action theory is henceforth called the domain theory Γ.
3. Computing Subsumption and Clashes 3.1. Plans Subsumption Comparing plans in order to determine if the first one can be an alternative to the second, being more or less applicable in the same circumstances and/or aiming at the same goal, is a form of subsumption. In a previous work, we have presented two different subsumption notions. We presented a set theoretical subsumption, noted α′ ⊆set α: an action α is more general than another action α′ if α’s preconditions are included in α′ ones and they have the same effects. With goal’s subsumption (as in [1]), classification bears on goals that actions can reach. We could present other subsumption notions (subsumption by change, an action is more general than another if it affects more elements of the world). We select set theoretical subsumption due to its operability for a repairing task. Indeed, it guarantees that no clashes will be generated during the replacing task. Detection of subsumption relies on the comparison of preconditions and effects of complex plans. These are obtained through a propagation of the constraints involved by elementary plans, which is described in the next subsection. Clashes and their detection are then considered. 3.2. Algorithm for Generating Active Web Services Constraints Briefly, the constraint detection algorithm consists of a set of rules to apply. It was inspired by tableau method. The difference here is that we don’t want to check some validity formula but to generate formulas representing Web service preconditions and effects. Intuitively, our tableau procedure takes as input a complex action α and a domain Γ. It returns a set of structures, which are aimed at describing shortly all the possible executions of α. The main notion is the constraints structure (c.s). A c.s. represents a family of similar executions of a plan. We will act on it to repair the failing plan. Definition : A constraint structure (c.s) is a labeled graph (N , E, L) such that
238
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
• N is a set of vertices, • E is a set of edges, • L is a labeling function that associates to every vertex a set of literals and to every edge a set of (complex or atomic) actions. The constraint detection algorithm begins with a constraints structure involving two nodes, xb et xf . Node xb is bound to node xf by an outgoing edge labeled by a plan α modeled in PDL formalism, as we can see 4. A set of rules is applied iteratively. These rules are of two kinds, and are schematically presented on figure 4 : - expansion rules: they correspond to action constructors. We first put them into effect. - propagation rules: they propagate fluent in one or several c.s., corresponding to action application conditions. They are applied after expansion rules 7 . structF inal(α) is the set of c.s. obtained at fix point. Each c.s. contained in structF inal(α) is called a final c.s, it may contain one or several clashes (to be defined below). Definition : A c.s contains a clash iff there exists a vertex v in N and a literal l such that {l, ¬l} ⊆ L(v). A c.s is clash-free iff it does not contain any clash. A side effect of the algorithm is to detect clashes on structF inal(α). struct(α) is the set of the clash-free structures obtained at the fix point. The method to decide subsumption is left aside here. A plan α is defective iff all final c.s. contain one or several clashes (in others words iff struct(α) = ∅). Repairing a web service failing execution is achieved by working on final c.s. To achieve the repairing task, all clashes must be raised from one final c.s. 3.3. Detecting Breakdowns Several Kinds of Breakdowns: In the introduction, three kinds of breakdowns have been distinguished. These breakdowns are representable in our formalism, we can incorporate them into nodes final c.s8 . Resource breakdowns can be represented by adding a fluent to a node’s label. On the middle of figure 1, breakdown representation is achieved by adding ¬CDres to the third node’s label. Program breakdown can be modeled with the help of some knowledge of their effects. On bottom of figure 1, breakdown of reserv1, is represented in node 4 label, adding CDres. Each breakdown results in a clash on final constraint structures. For all these kinds of breakdown, we treat severe failures of a Web service, those which render all possible executions defective9 . So repairing task works on final c.s. which represent failing plan executions. Several Kinds of Clashes: Actually, two types of clashes in c.s. exist: - immediate: faulty action labels the edge (outgoing or ingoing) of the conflictive node. - complex: due to persistence axioms, faulty action is indirectly connected to the conflictive node. 7 Soundness and completeness of the algorithm is achieved in [3] with the help of a relation between c.s. and the action theory canonical model. 8 We unify breakdown representation, even if for the last breakdown type, observations are not needed. 9 Otherwise, the Web service would still be viable.
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
239
Figure 4. Clash localization in a final c.s.
4. Repairing Task The following problem is pointed out: one may not want that repairing task changes plan initial conditions. For example, if an agent is in Tokyo, it is less expensive for this agent to buy a ticket for Japan than to buy it from Paris. A plan could be composed that lay into state we require (the agent goes first to Tokyo and then buys a ticket for Japan) but this plan is not very interesting from a practical point of view. In our c.s., literals in xb are preconditions and not initial conditions. All preconditions modifications (to get an executable plan) may not be necessary acceptable. However, in our repairing task, we will not consider this kind of restrictions. 4.1. Several Kinds of Repairing Changing Actions: One way is to enable nodes and edges creation in the c.s. It is thus possible to replace a plan with n action formula constructors by a wider plan, with at least n + 1 action formula constructors. With this type of reconfiguration, we would get more complex plan, and therefore, costlier. To stay on a realistic and efficient view, we should impose that repairing task will not add node nor edge in the c.s. Repairs Bounded by Plan Size: Wandering from how many replacing operation leaves more or less the plan identic, we can raise the following question : does it make sense to replace all action of a plan10? Should we rather take another one? If the plan is composed with n actions, an arbitrarily but reasonably bound may be set up with m steps repairing bound and m < n. Repairing with Preferential Properties: Each clash can be solved by erasing a literal or its negation. Classically, it is useful to state a number of preferential properties or goals, to ensure one kind of reparation on one clash. Let G be this set. 4.2. A Heuristic We need to find out in less time an other Web service whose execution with current observations generate no clashes. Selecting one Final c.s.: We remark that for upper size plans: - if the execution contains few constraints, it’ll be probably easy to repair. - fewer actions produce probably fewer clashes. 10 See
Thésée Ship [13].
240
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
- fewer states in c.s. imply probably fewer persistence axioms to apply. We require selecting the final c.s. that contains fewer actions and states. Heuristic Procedure: Heuristic depends on subsumption criterion. It can holds as parameter other subsumption criterions. Let λ a defective plan and G a set of literals to preserve. This repairing task involves three distinct and successive steps: - constraints detection algorithm, - selecting best final c.s. to repair11, let s be this c.s., - replacing operations number is bounded by plan actions numbers12 . Replacing Operation: For each clash c in s, final c.s.: if l ∈ G then eraseset (¬l) else eraseset (l) where the erase procedure is defined as follows: eraseset (l): let x, y vertices and α the action so L(x, y) = α, l ∈ prec(α) if exists α′ such that α ⊆set α′ then λ(α/α′ ) with prec(α) union of positive and negative α preconditions and with λ(α/α′ ) is λ with all α occurrences replaced by α′ . 4.3. Advantage and Drawback Replacing Task with Set Theoretical Subsumption This kind of subsumption has the benefice particularity to add no literal into the c.s. Figure 2 shows an example of that a repairing task. This method does not generate new clashes, indeed all replacing actions contain less constraints. However, such actions may be hard to find in an UDDI semantic like directory.
5. Free Repairing Task Complexity The main idea of a free repairing task is to find a final c.s.transformation that makes viable the failing Web service. Problem: Guess a c.s transformation and then prove that each plan replacing the failing one is subsumed or subsumes the failing one13 . Our intuition is that the upper bound is NEXPTIME as it will be proved. First, for each clash, there exists two different possibilities to raise clashes (literal positive or negative). For a language with k literals, at most k different clashes (several same clash occurrence are not counted here) exist, i.e. there is at worst 2k ways to erase faulty literals. Once the literal to erase is fixed, there exists several possibilities for effectively raising it. Precisely, the number of possibilities corresponds to the number of actions containing the faulty literal in its preconditions or in its effects. 11 According
to section 4.2, it is the one that contains fewer actions, states, and clashes. section 4.1 idea. 13 For some α and β, PDL plans, subsumption decision method is defined only α and β are not defective. Indeed, it is defined on struct(α) and struct(β)and not on structF inal(α) and structF inal(β). 12 Following
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
241
5.1. Complexity Graph classical representation: A graph may be represented by an adjacency matrix; if there are l = |V | vertices v1 , . . . , vl , this is an l × l array whose (i, j)th entry is aij = 1 if there is an edge from vi to vj 0 otherwise. For directed graphs the matrix takes up O(l2 ) space, which is wasteful if the graph does not have many edges. c.s. features : Here, all edges are labeled by at least one action. And, all vertex are labeled 14 . Vertices with their labeling must be represented. A code must figure labeling containing clashes. If k is the language cardinality, the possible nodes numbers with different labeling is higher than, 2k canonical semantic size. At the very worst, apply our algorithm to a defective Web service may generate structF inal(α) containing u vertices, k−1 with u = (2k+1 − 1) × k! × t=1 t!1 15 . Costless representation:A succinct representation of a graph with l nodes, where l = 2o is a power of two, is a Boolean circuit C with 2o input gates. The graph represented by C, denoted GC , is defined as follows: The nodes of GC are {1, . . . , o}. And [i, j] is an edge of GC if and only if C accepts the binary representations of the b-bits integers i,j as inputs [13]. result(c) is the result column of the truth table representation of c 16 . For a standard definition of circuits see for example [2]. Theorem : Let λ a defective plan, the complexity of founding an λ′ so ∃α1 , . . . ∃αm and ′ ′ ′ so λ′ = λ(α1 /α1′ , . . . , αm /αm ) and for all j ∈ {1, . . . , m}, αj ⊆set αj′ ∃α1 , . . . , ∃αm has an upper bound NEXPTIME. Proof : Papadimitriou in [13] p.493 has proved : "NEXP and EXP are nothing else but P and NP on exponentially more succinct input". The repairing task representation with exponential size must be found. Repairing task with that representation must be proved to be a NP problem. Another representation of repairing task must be found with succinct circuits. 1. final c.s representation with an adjacency matrix. u is the maximal number vertices in a final c.s. A final c.s. is figured by an adjacency matrix whose size is u × u. 2. NP : A transformation of the matrix must be guessed. This transformation corresponds to α′ plan execution. Checking for all j ∈ {1, . . . , m}, αj ⊆set αj′ is a P problem17 . 3. circuit succinct representation : Let x1 , . . . , xk be the names associated to literals and let xk+1 , . . . , xk+m be the names associated to c.s. nodes and let be an lexicographic order on x1 , . . . , xn , . . . , xn+1 , . . . , xn+m . Succinct circuits are used for testing if two 14 maybe
with empty set. 2k is the number of state with k constraints without contradiction (canonical model). All models with less constraints must be added. So 2k+1 − 1 is the number of models without contradiction. k−1We1 must add . all models without one to k contradictions. It is easy to verify that it is 2k+1 − 1) × k! × t=1 t! 16 For example c(x , x ) = x ∧ ¬x , result(c) = 0010, for d(x , x , x ) = x ∨ (x ∧ ¬x ), 1 2 1 2 1 2 3 1 2 3 result(d) = 00101111. 17 Executions plan number in disjunctive normal form is n + 1 with n number of disjunctions. Number of rules to apply to reach fix point struct(λ) is bounded by plan actions number and by language cardinality. Let nλ number of execution λ failing plan and nλ′ number of execution of plan λ′ , it is easy to check that there is at most m × nλ × nλ′ comparisons operations. 15 Indeed,
242
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
nodes, ni and nj and their respective labeling, represented with two boolean circuits, ci et cj , are connected. 6. Related Works Narayan and MacIlraith [11] works belong to the first category that do not model data flow as we do. They provide a DAML-S 18 modeling with Petri net paradigm. Some model checking tools provide some property reachability establishment. Their control flow modeling correspond to our way of modeling only active Web services. Their model checking task may be seen as a diagnosis task (checking the no contradiction property). Yet, most works model both data and control flow [17]. A complex Web service is seen as an abstract plan, a template to be instantiated. A set of abstract schema with different parameters (QoS, input, output, . . . ) exist. Web service template is instantiated using an external function which extracts information dynamically from the Web. The idea consists on component must be customizable to Web evolution. The aim is to detect bad interaction between data and control flow. They describe a family of customizable Web services. Detecting viable Web service is indicated as a research track. Classifying Web services according to their configurability degree could be combined to our subsumption criterion in order to obtain easy customizable Web service with predictable execution modalities and effects. Dynamical Web service adaptation works as [4] follows more or less the same directions. Yet, it is quite different because adaptation task is achieved by various solutions : components customization, insertion extraction, replacement. Mean adaptation solution is close to our repairing task. Indeed, one Web service can always be erased i.e. replaced by an action that does not change world states. This corresponds to a component extraction. No diagnose phase occurs in [10] and [16] works, only instantiation template with an external function called to gather data. When addressing the problem of Web services selection, our approach lays the same result than [10]. However, the task to find an alternative Web service to parry some breakdowns is not considered by McIlraith, and her approach does not seem to be appropriate to this task. Indeed, the idea of external function to model is widely shared is A.I. literature (in particularly planning) for composing and reconfiguring tasks. [16] works differ from [10] only from the way plans are instantiated, following a hierarchical planning principle. The main difference with our work resides in the presence of data flow modeling. All necessary information are supposed to be already here (with a monitoring task done before, for example) in our vision. Henceforth, there is not need to call an external function to diagnose. In fact, we believe our system is omniscient about Web evolution. Reason lays on informative Web services are not treated. If Informative Web services modeling is added to our formalism, it would return to an instantiation template. 7. Conclusions and Future Works Section 5 emphasizes that our task is not exactly a reconfiguration in the usual meaning. Template notion do not hold here. Each time observations do not coincide with the mo18 DAML-S
is a former 0WL-S version without dataflow.
L. Bourgois / Finding Alternatives Web Services to Parry Breakdowns
243
del execution, logical contradictions result. Web services reparation aims at solving (to raise) every clash. Another heuristic application is a logical verification task. A heuristic to repair failing Web services has been presented. NEXPTIME complexity upper bound has been shown. We achieved decision methods for subsumption implementation in java. Clashes localizing implementation is in progress. In the future, we could optimize and refine the heuristic. An open problem is to diagnose composite Web Services faulty components. We should then model informative Web service to get closer from the Web services reality. Indeed, we do not model for the moment messages exchanges which is however a Web services major characteristic.
Acknowledgements We would like to thank François Lévy and Alexandre Delteil for their helpful comments.
References [1] P. Balbiani, D. Vakarelov, Iteration-free PDL with Intersection : a Complete Axiomatization, Fundamentae Informaticae, Ville, 2001. [2] J.L. Balcázar, J. Díaz, J. Gabarró, Structural Complexity, Springler Berlin, 1988. [3] L. Bourgois, A. Delteil, F. Lévy, Web Services Subsumption with a Specific PDL, ICIW, 2006. [4] M.Cremene, M. Riveill, C. Martel, C. Loghin, C. Miron, Adaptation dynamique de services, DECOR, 2004. [5] M. Ghallab, D. Nau, P. Traverso, Automated Planning Theory and Practice, Elsevier, 2004. [6] G. De Giacomo, M. Lenzerini,PDL-based framework for reasoning about actions, Lecture Notes in Artificial Intelligence, 1995. [7] D. Harel, Recurring dominoe : making the highly undecidable highly understandable, Fondations of Computation Theory, 1983. [8] D. Harel Dynamic Logic, HandBook of Philosophical Logic, 1984. [9] S. Kogekar, S. Neema, X. Koutsoukos, Dynamic Software Reconfiguration in Sensor Networks, SENET, 2005. [10] S. McIlraith, T. C. Son, Adapting Golog for Composition of Semantic Web Services, KRR, 2002. [11] S. Narayan, S.McIlraith, Simulation, verification and automated composition web services, KRR, 2002. [12] http://www.w3.org/Submission/2004/SUBM-OWL-S-20041122/,2004. [13] C. Papadimitriou, Computational Complexity, Addison-Wesley, 1994. [14] Plutarque, Les vies parallèles, La Pléïade, 2002. [15] S. Sanket Sahoo, Web Services Composition (an AI-Based Semantic Approach), University of Georgia, 2004. [16] E. Sirin, B. Parsia, Planning for Semantic Web Services, ISCW, 2004. [17] A. Teije, F. Harmelen, B. Wielinga, Configuration of Web Services as parametric Design, EKAW, 2004. [18] S. van Splunter, P. van Langen, F.Brazier, The Role of Local Knowledge in Complex Web Service Reconfiguration, WIC, 2005.
This page intentionally left blank
Posters
This page intentionally left blank
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
247
Smart Ride Seeker Introductory Plan a
Sameh Abdel-Naby a,1 and Paolo Giorgini a University of Trento, Department of Informatics and Telecommunications (DIT) Abstract. Based on the use of location and available car seats, Car sharing systems allowed a substantial number of people to share car rides.This paper proposes the initial phase of Smart Ride Seeker (SRS), which is a Car Pooling-like technique for distributing resources among a community. This paper develops the SRS technique through a mobile-based application that allows the mapping of ride seekers’ locations along with the locations of available cars on a graphical interface/map, giving the possibility to calculate the optimal path for both the ride giver and seeker to fulfill their demands.
1. Research and Motivation Car Pooling Problem is NP-Hard [1], in our framework a vehicle routing algorithm is also applied for a certain unit customer demand. Therefore, SRS will be involving NPHard-like algorithm. For purposes of simplicity and practicality, places for ride seekers to meet ride givers are assumed to be fixed. Initially, the system will recognize only 3 stops, each of which acts as a pickup/dropping point (PDS). We assume that the system will automatically assign one of the pre-defined PDSs to ride givers (RG) or ride seeker (RS) according to their home or work locations. A common problem for routing the requests within a Car Pooling system is handling a multipoint trip. Calculating a distance between two points is mostly constrained by the time limit. So finally the system avoids having any RG that are going to take more than the ride seeker’s preferred time. But sometimes the opposite takes place, which will let us understand that the ride giver is taking a different and longer path or maybe stopping between the two points. In SRS, we introduce the notion of mapping all PDSs on a graphical interface that acts as a territory map, reflecting the real world actual distances. Accordingly, the system recognizes whether the ride giver is taking the appropriate path to reach the destination. In the second phase, the system recommends a certain request to a specific situation. On a later stage of application, proper routing messages can be reached by linking the system to maps or addresses database that would help giving up the use of fixed PDSs gradually and rely completely on mapping users on graphical interface and calculating the distances needed to accomplish more reliable routing requests. At the SRS initial phase, system participants are well determined, two system actors will be located in the centralized focus of the scheme: the ride seeker and the ride giver. A car sharing system will be basically concerned with fulfilling these two entities demands as well as processing their requests. System inputs and outputs will be exchanged between these two actors and, in 1 Correspondence to: Sameh Abdel-Naby, via Sommarive 14, I-38050 Povo, Italy. Tel.: +39 0461 8815082020; Fax: +39 0461 883964; E-mail: [email protected].
248
S. Abdel-Naby and P. Giorgini / Smart Ride Seeker Introductory Plan
turn, the connection between system, seeker and rider should be well-established. Standard mobile phones relations are taken as a method of communication. But autonomous agents will be negotiating the ride details and finally communicating the final agreement with the mobile user. Similar architecture was described by scholars [2]. Our suggested scheme will be using an any-time algorithm suggested by scholars [3] that is expected to help us in solving the problem of tasks allocation within a community of autonomous agents, and as a result, agents are expected to shape an alliance to better perform a certain car ride seeker request between multiple points.
2. SRS Scenario The RS will be going through an initial phase to select the application interface language and the action to be taken. By selecting the "seeking ride" option, the system will turn the RS to the next step: selection of the nearest PDS as well as the PDS at the final destination. The last step will be selecting the ride date and time within a flexible time range. All these selections will be saved in the mobile-based interface, and then sent to a managing server, either by short message service (SMS) or by Bluetooth. Then the user will wait for a reply containing the ride details. The RG will be going through two possible situations: In the first situation, upon entering his/her offer, the status of the RG will be always Pending during this time and s/he will be receiving messages directly from the system. In the second situation, when a ride seeker requests a ride at a certain time and between two PDSs that were common to be offered by a certain RG, the system would automatically send this request to the RG regardless of his/her status in the system. To perform this function, the SRS has to maintain a logging and communication history, and save the users’ destination and time records. At this point, a technique similar to Agents Cloning [4] can be used to facilitate recommendations of requests and referral routing. The SRS will be supervising four major tasks: 1) managing the request routing process and demand matching; 2) running a reputation system by asking feedback from ride takers; 3) managing a crediting system, in which a ride seeker or giver will get upon registering in the crediting system, presumably, five credits for free; to move from a place to another, the seeker will donate two credits to the ride giver; finally, at a certain time the ride seeker will have to choose either to start to offer rides to others to collect credits or to buy them; and 4) supervising methods of communication between system agents and actors.
References [1] Araque, J. R., T. L. Morin, J. F. Pekny. 1994. A branch and cut algorithm for the vehicle routing problem. Ann. Oper. Res. 50 37-59 [2] Bryl, V., Giorgini, P., Fante, S. 2005. Toothagent: a Multi-Agent System for Virtual Communities Support. , Informatics. Telecomunication. Technical Report DIT-05-064 [3] O. Shehory and S. Kraus Task Allocation via Coalition Formation Among Autonomous Agents, Proc. of IJCAI-95, pages 655-661, Montreal, August 1995. [4] Shehory, O., Sycara, K., Chalasani, P,. and Somesh Jha. “Agent cloning”. The Robotics Institute, Carnegie Mellon University, Pittsburgh.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
249
Spam Filtering: the Influence of the Temporal Distribution of Training Data Anton Bryl 1 University of Trento, Italy; Create-Net, Italy Abstract. The great number and variety of learning-based spam filters proposed during the last years cause the need in many-sided evaluation of them. This paper is dedicated to the evaluation of the dependence of filtering accuracy on the temporal distribution of training data. Such evaluation may be useful for organizing effective training of the filter. Keywords. Spam filtering, machine learning, spam filter evaluation
1. Introduction The great variety of learning-based spam filters proposed during the last years results in the need for ways of complex evaluation and comparison of them. Today the evaluation of filters is mostly concentrated on measuring filtering accuracy without considering the changeability of email and possible variations in the training process. In this paper we propose an additional feature of a spam filter to be evaluated, namely the influence of the temporal distribution of the training data on the filtering accuracy. By the temporal distribution we understand the following: how long ago, and during how long a period, the data was gathered. The evaluation of this influence is necessary because of the changeability of email. Spam is known to be changeable for several reasons, including efforts of spammers to overcome the existing filters [2]. Legitimate mail (also called ham) also can change seriously: e.g. a user may subscribe to a popular mailing list, or touch upon a hot topic in his blog and receive hundreds of comment notifications in one day. The proposed evaluation may be useful for organizing more effective filter training. The rest of the paper is organized as follows: in Section 2 we describe the experiments and discuss the results; and Section 3 is a conclusion. 2. Experiments For the lack of space here we present only a very brief description of experiments, for more details see our technical report [1]. The data corpus used in this study contains 3170 messages received on the author’s mailbox during seven months. The messages are in four different languages (Belarusian, Russian, English, and Italian). For this reason we chose to analyze headers, not bodies of the messages. We can also mention that Lai and Tsai [3] showed that header analysis is likely 1 Correspondence to: Anton Bryl, ICT, University of Trento, via Sommarive 14, 38050 Povo (Trento), Italy. E-mail: [email protected].
250
A. Bryl / Spam Filtering: The Influence of the Temporal Distribution of Training Data
to give better results than body analysis. During this study we have performed two experiments using the Na¨ıve Bayes classifier [4]. The goal of Experiment 1 was to see how the performance changes with time after the last retraining of the filter. For this purpose a series of tests was performed. In each test the filter was trained on a set of 240 messages from one month, and then tested on all the messages from one of the following months. The goal of Experiment 2 was to see if an increase of the length of training period without changing the amount of training data influences the filtering accuracy. In each test in this experiment the filter was trained on a set of 240 messages from a pair of subsequent months, and tested on all the messages from one of the following months. All the possible combinations of training and testing months were used in both experiments. In the results of Experiment 1 the accuracy is low, and changes from month to month in quite random jumps, that are likely to depend on local features of the data rather than on some general rules. In the Experiment 2 the filter shows clearly better results; often training on the combination of two months leads to higher accuracy than training on any of them. We can conclude that training Na¨ıve Bayes on the data gathered for a longer time period may give performance improvement in comparison to a shorter period, even if the amount of data is the same. The most possible reason for this is that during the longer time period the greater variety of both spam and ham appears in a mailbox, so the filter is learned on less specific data. A useful consequence is that the accuracy evaluation based on random splitting of a corpus into training and testing data may lead to delusive results: in such evaluation the filter may show accuracy different from the actual due to the unrealistic temporal distribution of the training data. 3. Conclusion In this paper we have introduced an attempt to evaluate dependence of the accuracy of Na¨ıve Bayes spam filter on the temporal distribution of training data. The main conclusion is that temporal distribution of the training data influences the filtering accuracy distinctly. A useful consequence of this is that random splitting of an experimental corpus into training and testing data may lead to delusive results. Possible future work includes: performing the experiments with other filters; performing tests on a corpus gathered during a longer time period; uncovering the events in the training period that influence the filter accuracy most seriously (it may be e.g. a holiday that causes a great number of greetings). In conclusion, I would like to thank my supervisors, Prof. Fabio Massacci and Prof. Enrico Blanzieri, for their support throughout my studies. References [1] A. Bryl. Learning-Based Spam Filters: the Influence of the Temporal Distribution of Training Data. (Technical report DIT-06-030), 2006. Available at http://dit.unitn.it/~abryl. [2] T. Fawcett. “In vivo” spam filtering: a challenge problem for KDD, SIGKDD Explor. Newsl., v. 5, pp. 140-148, 2003. [3] C.-C. Lai, M.-C. Tsai. An empirical performance comparison of machine learning methods for spam e-mail categorization, in proc. of HIS 2004, pp. 44-48, 2004. [4] M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian Approach to Filtering Junk E-Mail, Learning for Text Categorization, 1998.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
251
An approach for evaluating User Model Data in an interoperability scenario Carmagnola Francesca a1, Cena Federicaa Department of Computer Science, University of Torino, Italy
a
Abstract. A common vision, in user-adapted community, states that user model should be interoperable and shareable across applications in different contexts and environments. The core idea of our approach is that the interoperability of user model data lets to reach more effective adaptation results only if the exchanged data are reliable. In this paper we illustrate how the usage of Semantic Web techniques can support evaluation of data reliability, and thus improves this sharing.
Introduction Nowadays the idea of personalisation is crucial in many areas, such as e-commerce, e-learning, tourism and cultural heritage, digital libraries, travel planning, etc. As a consequence a large number of user-adapted systems (UASs) have been developed. Typically, UASs build a model of the user [1] and then implement some reasoning strategies (defined as heuristic rules, decision trees, Bayesian networks, production rules, inductive reasoning, etc.) to derive user knowledge, update the model and decide about adaptation strategies based on the model. Since users can interact with a great number of personalised systems, there is the great opportunity to share user knowledge across applications to obtain a higher understanding of the user. This is due to the “increased coverage” which means that more knowledge can be covered by the aggregated user model, because of the variety of the contributing systems [4]. The challenge of exchanging user model data among applications leads to many issues, regarding (a) the modalities to make the exchange of knowledge among applications possible, (b) the evaluation of the user model data exchanged by the applications [5]. Our approach moves from these considerations and focuses especially on the second issue. We aim at demonstrating that exchanging user model data across UASs is more useful if it is related with the chance of evaluating the trustworthiness of i) the reputation of the provider system, and ii) the reliability of the exchanged value.
The User-Adapted System Meta-Description and the TER The core idea of our approach is that the interoperability of user model data lets to reach more effective adaptation results only if the exchanged data are reliable. We state that exchanging only the value of the requested user feature is not sufficient in a 1
Corresponding Author: Francesca Carmagnola, Department of Computer Science, Corso Svizzera 185, 10149, Torino, Italy; E-mail: [email protected]
252
F. Carmagnola and F. Cena / An Approach for Evaluating User Model Data
scenario of interoperability since it does not allow to evaluate i) the reliability of the exchanged value and ii) the reputation of the provider system. In order to provide the requestor with this possibility, we introduce a User-Adapted System Meta-Description [2], which identifies a user-adapted system enriched with a set of meta information. We extended the set of metadata elements of Dublin Core2, with some elements necessary in our approach. This meta information regards: i) the provider of the user feature, ii) the user feature itself and the correspondent value, iii) the reasoning strategies used to define the user feature value. This last one is motivated by the idea that the reasoning strategies used in deriving the value, need to be known by the requestor for a complete evaluation of the trustworthiness of the value [2]. We consider all the aspects as endorsement, thus reasons for believing or disbelieving statement to which they are associated [3]. In the same way, in our approach we state that the evaluation of the final value depends on the evaluation of the intermediate values that lead to the value. In our perspective each requestor is furnished with a set of application-dependent heuristics to be applied on the meta-information in order to evaluate if both the system reputation and the value reliability are enough high to use the imported datum to perform its adaptation goals. To support the evaluation task, we propose the usage of the TER (Trustworthiness Evaluation Registry), a registry published in the network that contains a semantic representation of the user data furnished by all the systems that partake in the process of exchanging user knowledge. At the moment, all this information is represented with RDF3 since it lets to specify semantics for data in a standardized interoperable manner; however we are working on representing them through more expressive semantic web formalism, like SWRL4. The idea at the basis of the registry is that all the user-adapted systems that want to take part in the process of exchanging user knowledge should register themselves in the TER and provide all the meta-information required. The requestor, that looks for the value of the feature x for the specific user X, queries the registry. The TER gives as answer not only the searched value, bur also the full set of the meta-information, which will be used by the requestor to evaluate the system reputation and the value reliability.
References [1] Brusilovsky, P., Maybury, M.T, From Adaptive Hypermedia to the Adaptive Web, Communications of the ACM, may 2002/vol. 45, no. 5 [2] Carmagnola, F., Cena, F. From Interoperable User Models to Interoperable User Modeling, to appear in the proceedings of AH 2006, Dublin, Ireland, June 2006. [3] Cohen, P., Heuristic Reasoning about Uncertainly: An artificial Intelligence Approach, Morgan Kaufmann Publishers, Inc., Los Altos, California, 1985 [4] Heckmann, D. Ubiquitous User Modeling. PhD thesis, Department of Computer Science, Saarland University, Germany, 2005 [5] Maximilien, E.M, Edmond D.MP. Conceptual Model for Web Service Reputation, SIGMOD 31, 2002
2
http://dublincore.org/documents/dc http://www.w3.org/RDF/ 4 http://www.daml.org/2003/11/swrl/ 3
253
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
On the improvement of brain tumour data clustering using class information Raúl Cruza,b and Alfredo Vellidoa,1 Soft Computing Group. Univ. Politècnica de Catalunya, Barcelona, Spain b Grup d'Aplicacions Biomèdiques de la RMN, Univ. Autònoma de Barcelona, Spain a
Abstract. Cluster analysis can benefit from the use of class labels, if these are available, in a semi-supervised approach. In this study, we propose the integration of class information in the clustering of Magnetic Resonance Spectra (MRS) corresponding to human brain tumours using an extension of Generative Topographic Mapping (GTM) that behaves robustly in the presence of outliers.
1. Introduction The current study concerns the exploratory analysis of human brain tumour MRS data through clustering. In cluster analysis using mixture models, multivariate tdistributions are an alternative to Gaussians for their robust behaviour in the presence of outliers. The GTM [1] was redefined in [2] as a constrained mixture of tdistributions, termed t-GTM. In this paper, we extend the t-GTM to account for class information in a semi-supervised way (class-t-GTM). For the MRS data analyzed in this study, the available class labels describe different brain tumour types.
2. Adding class information to t-GTM The GTM, a probabilistic alternative to SOM, is a non-linear latent variable model defined as a mapping from a low dimensional latent space onto the multivariate data space, taking the form y = Φ (u )W , where Φ are basis functions. If these basis functions are defined as Gaussians, outliers are likely to negatively bias the estimation of the adaptive parameters. To overcome this limitation, the GTM was recently redefined [2] as a constrained mixture of t-distributions: the t-GTM. The MRS data described in section 3 correspond to different types of brain tumours that can be seen as class information. Class separability might be improved if the clustering model accounted for the available class information [3]. For t-GTM, this c : entails the calculation of the class-conditional responsibilities zˆ kn c = zˆkn
p(xn , cn uk ) K
=
p(xn uk ) p(cn uk ) K
=
p(xn uk ) p(uk cn ) K
∑ p(xn , cn uk' ) ∑ p(xn uk' )p(cn uk' ) ∑ p(xn uk' )p(uk' cn )
k '=1
1
k '=1
k '=1
Corresponding author. E-mail: [email protected]
(1)
254
R. Cruz and A. Vellido / On the Improvement of Brain Tumour Data Clustering
3. Human brain tumour data and experiments The available data consist of 98 MRS for five tumour types and for cystic regions (associated to tumours); the latter are likely to be outliers. The goal of the study is finding out whether the available class information can improve the class separation in the clustering results. Class separability is quantified using an entropy-like measure, with results summarized in Table 1. We first separate the data into cysts versus the five tumour types reduced to a single class (Brain Simplified). These data are visualized in Fig.1 (left plots). Overlapping is present in t-GTM, whereas cysts are completely isolated by class-t-GTM in the bottom-left corner. The full tumour typology (Brain) is visualized in Fig.1 (right plots). Class-t-GTM, again, is shown to reduce the level of tumour type overlapping. Fig.1(right-right) shows that class-t-GTM neatly separates cysts from the rest of tumours. Table 1. Entropy for the models and data sets analyzed. Results are reported only for three values of ν. Class-t-GTM
t-GTM
ν =1
ν =2
ν =3
ν =1
ν =2
ν =3
Brain Simplified 0.330 / 0.524 0.299 / 0.521 0.312 / 0.533 0.329 / 0.523 0.098 / 0.516 0.119 / 0.492 / Brain
4. Conclusion In this study, we have clustered human brain tumour MRS data using a variant of t-GTM that accounts for class information. Experiments have shown that class-t-GTM improves tumour data class discrimination.
Figure 1. (Left plots) t-GTM latent space representation of Brain Simplified: Cysts are circles and tumours points. (left-left): t-GTM; (left-right): class-t-GTM. (Right plots) Representation of Brain: Cysts (circles); astrocytomas (black dots); glioblastomas (black rhombus); metastases (five-pointed stars); meningiomas (white rhombus); oligodendrogliomas (asterisks). (right-left): t-GTM; (right-right): class-t-GTM.
References [1] [2] [3]
C.M. Bishop, M. Svensén, C.K.I. Williams, GTM: The Generative Topographic Mapping, Neural Computation 10 (1998), 215-234. A. Vellido, Missing data imputation through GTM as a mixture of t-distributions. Neural Networks, In press. Y. Sun, P. Tiňo, I. Nabney, Visualization of incomplete data using class information constraints, in J. Winkler, M. Niranjan (eds.) Uncertainty in Geometric Computations, Kluwer Academic Publishers, The Netherlands, 165-174, 2002.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
255
Endowing BDI Agents with Capability for Modularizing Karl Devooght France Telecom R&D, 2 avenue Pierre Marzin, 22300 Lannion (France) [email protected] (+33) (0)2.96.05.22.46 Keywords. BDI Agent, Modularity, Capability, Ability, Opportunity, Know-how
Intelligent agents are required to deal with an increasingly wide variety of applications (dialoging agent [4], air-traffic control [3], etc...). Subsequently, one endowes agents with functionalities coping with these different applicative contexts. Such functionalities are characterized by the description of related domains underlying actions that an agent can perform. In BDI logic-based approaches to reasoning about actions, a domain of a functionality is described by a set of logical formulas. Generally, the set of all domain descriptions is included into an unique BDI logical model [2]. So, at any moment while a particular agent is reasoning, all logical formulas describing the domains can potentially be taken into account althrough they are unsuitable to the current situation. In addition, technically speaking, when one defines a particular domain, one must adapt it to the others existing one in order to keep a coherent and consistant logical model. In terms of agent development, that doesn’t facilitate the implementation and the reuse of domains. We believe that motivates an agent modelization into which domains are arranged in a modular way. Furthermore, the elements of these domains should be included into the agent reasoning when it is pertinent. First, we consider that a domain must be described regardless of the others existing domains and propose that it is a full BDI logical model. That enables notably to define independently domains. Secondly, we distinguish two kinds of logical model for agents. On the one hand, we have models describing what an agent does i.e. traditional models. On the other hand, we have models describing what an agent imagines to do i.e. counterfactual models. Particularly, we believe that models transcripting domains are from the second kind of model i.e. from the imagination and/or the experience of the agent. Based on these two assumptions, we claim that a general structure of a BDI model of agent (based on a kripkean semantic) would consist of traditional logical models M expressing the current mental activity of the agent and of a set of counterfactual logical models Mα representing how an agent imagines the different domains of his functionalities α. To explain the link between M and Mα we introduce a notion of capability, particularly three subsequent notions which are ability, opportunity and know − how. Let us define informally here these notions. An agent i has the ability to realize a functionality α, in some world w of M , if and only if there exists a related model Mα
256
K. Devooght / Endowing BDI Agents with Capability for Modularizing
and a particular event (or action) e in Mα coming from an initial state of world and going to a final state of world. Given Tom having the ability to paint a door, that means that he can imagine a state of world from which there is a possible course of events (that he executes) enabling him to be in a situation where the door is painted. The role of opportunity is to express the compatibility between the current mental activity of the agent, in particularly his current mental state, with the model related to the functionality. This compatibility is checked on the base of some feasibility conditions specific to the functionality. We say that an agent has the opportunity to realize a functionality α, in some world w of M , if and only if the feasibility conditions for Mα are satisfied in w. So, if the only feasibility conditions to paint a door is to have a brush and a paint pot and if Tom has them in the current state of world, we can say that Tom has the opportunity to paint a door. To fill completely the gap between M and Mα , we define the operator know − how and a related logical model M+α which is the resulting model from the combination of elements of Mα and of M in order to integrate the functionality α in the current activity. In this way, we can dynamically activate a functionality α into the reasoning of an agent i. An agent i has the know−how to realize a functionality α, in some world w of M , if and only if there exists a world wα of Mα such that the couple applied to a function f returns a set of worlds including a world w+α of M+α where f takes two worlds w1 and w2 as parameters and returns a set of worlds as possible combinations of w1 and w2. So, in our perspective, knowing-how expresses how the contain of the ability should be translated into the current model M. Imagine that Tom in his imagination of painting uses oil paint and that in the current state of world, he has a particular oil paint pot whose trademark is "Genius Paint". Know-how specifies in this case that the imagined oil paint will be subsumed by the "Genius Paint" oil paint. Finally, an agent i has the capability to realize a functionality α if he has the ability, the opportunity and the know-how to realize it. Lack of either these three notions implies "cannot". There often is a mismatch between them due to the intuitive counterfactual aspect they induce [5]. The definitions introduced here deal with that. In addition, we propose a way to isolate the description of functionalities and to activate them dynamically, which we believe that offers interesting perspectives based on an intuitive notion, the capability, to the lack of a formal account of modularity into the BDI agent systems [1].
References [1] P. Busetta, N. Howden and A. Hodgson. Structuring BDI Agents in Functional Clusters, International Conference of Agent Theories, Architectures and Languages, 277–289, 1999. [2] P.R. Cohen, H.J. Levesque. Intention is Choice with Commitment, Artificial Intelligence 42, 213–26, 1990 [3] A. Rao and M.P. Georgeff. BDI Agents: From Theory to Practice, International Conference on Multiagent Systems, 312–319, 1995. [4] D. Sadek, P. Bretier and F. Panaget. ARTIMIS : Natural Dialogue Meets Rational Agency, Proceedings of International Joint Conference of Artificial Intelligence (2), 1030–1035, 1997. [5] M. Singh. A Logic of Situated Know-How, Proceedings of International Joint Conference of Artificial Intelligence, 1991.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
257
Rational Agents under ASP in Games theory Fernando Zacarías Flores 1 , Dionicio Zacarías Flores, José Arrazola Ramírez, Rosalba Cuapa Canto Universidad Autónoma de Puebla Abstract. An implementation of the called game “ONE" with imperfect information and chance outcomes based on rational agents is deployed. Keywords. Answer Set Programming, Rational Agents, Games theory
1. Introduction Agents are one of the most prominent and attractive technologies in computer science at the beginning of the new millennium. The technologies, methods, and theories of agents and multiagent systems are currently contributing to many diverse domains such as: electronic commerce, computer games, education, etc. They not only are a very promising technology, but are also emerging as a new way of thinking, a conceptual paradigm for analyzing problems and for designing systems, for dealing with complexity, distribution, and interactivity, while providing a new perspective on computing and intelligence [1].
2. The game of the ONE A popular game of the eights group, consists of 108 cards including four suits of different cards plus wild cards and special cards for skipping the next player, reversing the direction of play and making the next player draw cards. Every player picks a card. The person who picks the highest number deals. Action Cards count as zero for this part of the game. Once the cards are shuffled each player is dealt 7 cards. The remainder of the deck is placed face down to form a DRAW pile. The top card of the DRAW pile is turned over to begin a DISCARD pile. If an Action Card is the first one turned up from the DRAW pile, certain rules apply; briefly we describe the functions of action cards. The person to the left of the dealer starts play. He/she has to match the card on the DISCARD pile, either by number, color or symbol. For example, if the card is a red 7, the player must put down a red card or any color 7. Alternatively, the player can put down a Wild card. If the player does not have a card to match the one on the DISCARD pile, he/she must take a card from the DRAW pile. If the card picked up can be played, the player is 1 Correspondence to: Universidad Autónoma de Puebla, 14 Sur y Av. San Claudio, Puebla, Pue., México. Tel.: +55 222 229 5500; Fax: +55 222 229 5672; E-mail: fzfl[email protected]
258
F. Zacarías Flores et al. / Rational Agents Under ASP in Games Theory
free to put it down in the same turn. Otherwise, play moves on to the next person in turn. Players may choose not to play a playable card from their hand. If so, the player must draw a card from the DRAW pile. If playable, that card can be put down in the same turn, but the player may not use a card from the hand after the draw. 3. General strategies of the game 1) Maximize the card that we can play in a turn, for example: If we have in our hand a set of cards (5, blue), (3, yellow), (7, blue), (3, blue) and the top card (3, blue) then our strategy tell us that we have more possibilities to play a card in the next turn if we play now a blue one. Our analysis tell us that is better play for color than for number. 2) Other strategy is punish our opponent, this means, play a card with negative effect over the opponents game. Again this strategy is a set of many strategies, the best card for punishment is a skip card, considering that we are playing a two players game, so if we want to punish we have to start with a skip card, then use a reverse card; and finally use a draw two card. This chain of actions must be analyzed carefully because if we have many of this punishment card we have to use them in a particularly order able to let us play the major number of cards without letting our opponent play any card. We do not use de Wild Draw 4 Card as a punishment card and we reserve it as a card to save us when the opponent almost win or when we do not have any card that match with the top card in the DISCART pile. 3) Draw of the deck is not an option, so we have to analysis our hand an select a play order that warranty the major number of future turns, this consideration make a little change in the way we punish our opponent because if we start a punish rampage and we end with no card to put in the top of the DISCART pile, and that bring us to draw; we must create a strategy that does not let us with a self block. 4) The Wild Card is used before the Wild Draw 4 Card, unless the opponent is almost to win. 5) The last strategy is generated using the Answer Set Programming Paradigms to decide which strategy use or which order combine more strategies. 4. Conclusions the rational agent has a performance almost perfect in the development of his/her game, obtaining a high percentage of games won vs human users. Of 100 games developed between the agent and 10 different users, the agent won 95 games. On the other hand, of 50 games developed between the agent and 3 different expert users (humans) in this game, the agent won 40 games, what shows that the agent has a rational component with a highly competitive performance. Is very important to stand out the fact that our agent has a rational component sufficiently wide that it allows it to be able to develop diverse strategies. this makes to our agent versatile and possessor of a great game level. References [1] Fernando Zacarías, José Arrazola, Dionisio Zacarías, Rosalba Cuapa and Antonio Sánchez. Intelligent agents in the games theory using Answer Set Programming accepted in International Journal of Information Technology and Intelligent Computing, IEEE Computational Intelligence Society - Poland Chapter, vol. 1, No. 1, Poland, 2006.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
259
Automatic Generation of Natural Language Parsers from Declarative Specifications1 Carlos Gómez-Rodríguez a,b , Jesús Vilares b and Miguel A. Alonso b a
Escuela Superior de Ingeniería Informática, Universidade de Vigo (Spain) e-mail: [email protected] b Departamento de Computación, Universidade da Coruña (Spain) e-mail: {jvilares, alonso}@udc.es
1. Introduction Parsing schemata, described in [2], provide a formal, simple and uniform way to describe, analyze and compare different parsing algorithms. The notion of a parsing schema comes from considering parsing as a deduction process which generates intermediate results called items. An initial set of items is directly obtained from the input sentence, and the parsing process consists of the application of inference rules which produce new items from existing ones. Each item contains a piece of information about the sentence’s structure, and a successful parsing process will produce at least one final item containing a full parse tree for the sentence or guaranteeing its existence. Almost all known parsing algorithms may be described by a parsing schema. Parsing schemata are located at a higher abstraction level than algorithms. A schema specifies the steps that must be executed and the intermediate results that must be obtained in order to parse a given string, but it makes no claim about the order in which to execute the steps or the data structures to use for storing the results. Their abstraction of low-level details makes parsing schemata very useful, allowing us to define and study parsers in a simple and straightforward way. However, when we want to actually test a parser by running it on a computer, we need to implement it in a programming language, so we have to abandon the high level of abstraction and worry about implementation details that were irrelevant at the schema level. The technique presented in this paper automates this task, by compiling parsing schemata to Java language implementations of their corresponding parsers. The input to the compiler is a simple and declarative representation of a parsing schema, and the output is an efficient executable implementation of its associated parsing algorithm. 1 Partially supported by Ministerio de Educación y Ciencia and FEDER (Grants TIN2004-07246-C03-01, TIN2004-07246-C03-02), Xunta de Galicia (Grants PGIDIT05PXIC30501PN, PGIDIT05PXIC10501PN and PGIDIT05SIN044E), and Programa de becas FPU (Ministerio de Educación y Ciencia).
260
C. Gómez-Rodríguez et al. / Automatic Generation of Natural Language Parsers
2. From declarative descriptions to program code These are the fundamental ideas behind our compilation process for parsing schemata: • Each deductive step is compiled to a class containing code to match and search for antecedent items and produce the corresponding conclusions from the consequent. • The step classes are coordinated by a deductive parsing engine, as the one described in [1]. This algorithm ensures a sound and complete deduction process, guaranteeing that all items that can be generated from the initial items will be obtained. • In order to attain efficiency, an automatic analysis of the schema is performed in order to create indexes allowing fast access to items. As each different parsing schema needs to perform different searches for antecedent items, the index structures we generate are schema-specific. In this way, we guarantee constant-time access to items so that the computational complexity of our generated implementations is never above the theoretical complexity of the parsers. • Since parsing schemata have an open notation, for any mathematical object can potentially appear inside items, the system includes an extensibility mechanism which can be used to define new kinds of objects to use in schemata. 3. Experimental results We have used our technique to generate implementations of three popular parsing algorithms for context-free grammars: CYK, Earley and Left-Corner2 , and tested all of them with sentences from three different natural language grammars from real corpora: Susanne, Alvey and Deltra. Since we are interested in measuring and comparing the performance of the parsers, not the coverage of the grammars; we have generated random input sentences of different lengths for each of these grammars. The obtained performance measurements show that the empirical computational complexity of the three algorithms is always below their theoretical worst-case complexity of O(n3 ), where n denotes the length of the input string. This empirical complexity is achieved thanks to the automatic indexing techniques used by the code generator, which guarantee constant-time access to items. Our results also show that not all algorithms are equally suitable for all grammars. CYK is the fastest algorithm for larger grammars thanks to its lower computational complexity with respect to grammar size when compared to Earley and Left-Corner; and the efficiency difference between the latter two heavily depends on the way the grammar has been designed. The compilation technique described in this paper is useful to prototype different natural language parsers and easily see which one is better suited for a given application. References [1] Stuart M. Shieber, Yves Schabes, and Fernando C. N. Pereira. Principles and implementation of deductive parsing. Journal of Logic Programming, 24(1–2):3–36, July-August 1995. [2] Klaas Sikkel. Parsing Schemata — A Framework for Specification and Analysis of Parsing Algorithms. Texts in Theoretical Computer Science — An EATCS Series. Springer-Verlag, Berlin/Heidelberg/New York, 1997. 2 However, we must remark that we are not limited to working with context-free grammars, since parsing schemata can be used to represent parsers for other grammar formalisms as well.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
261
Reconsideration on Non-Linear Base Orderings a
Frances Johnson a,1 and Stuart C. Shapiro a SUNY at Buffalo, CSE Dept., Buffalo, NY, USA flj|[email protected] Abstract. Reconsideration is a belief change operation that re-optimizes a finite belief base following a series of belief change operations—provided all base beliefs have a linear credibility ordering. This paper shows that linearity is not required for reconsideration to improve and possibly optimize a belief base. Keywords. Base belief change, knowledge base optimization, reconsideration
Reconsideration, as defined in [2] (and discussed in [3] in these proceedings), reoptimizes a finite belief base in an implemented system following a series of belief change operations, provided the base beliefs have a linear credibility ordering; but ordering all the base beliefs in a knowledge system is impractical. This paper shows that linearity is not required for reconsideration to improve and possibly optimize a belief base.2 A belief base, for implementation purposes, is a finite set of core (or base) beliefs that are input to the system. Any implemented system that can perform expansion (adding a new belief to the base) and consolidation (removing beliefs from the base to restore consistency [1]) can perform reconsideration. We define the minimally inconsistent subsets of a base B as NAND-sets; a NANDset that is a subset of the current base is called active and makes that base inconsistent. Consolidation of B (written B!) uses a decision function to select the base beliefs (called culprits) to be removed (unasserted). In addition to (and assumed to be consistent with) any pre-existing credibility ordering, the selected culprits are considered strictly weaker than other members of their NAND-sets that were not removed. The system must store all base beliefs (asserted and unasserted) in a set called B ∪ in order to perform reconsideration, which is the consolidation of all base beliefs (B ∪ !) and is independent of the current B. An unasserted culprit is JustifiedOut if its return raises an inconsistency that can be resolved only by removing either that culprit or some stronger belief. We define an optimal base by assuming a consistent base is preferred over any of its proper subsets and a belief p is preferred over multiple beliefs (e.g., q, v) that are strictly weaker than p: p ≻ q; p ≻ v; ∴ {p} ≻ {q, v}. If the pre-order defines a least element for all NAND-sets, the following algorithm yields an optimal base. Let B be the set of all non-culprit base beliefs in B ∪ . For each culprit p (in non-increasing order of credibility): if p is not JustifiedOut, reset B ← B ∪ p. After each pass through the for-loop: 1 Correspondence to: Frances Johnson, SUNY at Buffalo, CSE Department, 201 Bell Hall, Buffalo, NY 14260-2000, USA. Tel.: +1 716-998-8394; E-mail: fl[email protected]; Relocating to Cycorp, June 2006. 2 See [2] (or [3]) for a detailed (or brief) discussion of the benefits of reconsideration.
262
F. Johnson and S.C. Shapiro / Reconsideration on Non-Linear Base Orderings Table showing a base, B, revised by ¬a (.95), then revised by a (.98), and then after Reconsideration is performed. Columns show different adjustment strategies producing varied results for revision and reconsideration.
Belief Base
Degree
Standard
Maxi-adjustment
Hybrid
Global
Linear
Quick
B
.95 .90
a∨b a∨f
a∨b a∨f
a∨b a∨f
a∨b a∨f
a∨b a∨f
a∨b a∨f
.40 .20
a ∨ d , ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
a ∨ d , ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
.95 .90 .40
¬a, a ∨ b a∨f f
¬a, a ∨ b a∨f e, f
¬a, a ∨ b a∨f ¬b ∨ ¬d, e, f
¬a, a ∨ b a∨f e, f
¬a, a ∨ b a∨f
¬a, a ∨ b a∨f d, e, f
¬g ∨ ¬b, ¬d ∨ g
¬g ∨ ¬b, ¬d ∨ g
a a∨b a∨f e, f ¬g ∨ ¬b, ¬d ∨ g
a a∨b a∨f ¬b ∨ ¬d, e, f ¬g ∨ ¬b, ¬d ∨ g
(B + ¬a)!
.20 .98 .95 .90 .40 .20
a a∨b a∨f
((B + ¬a) + a)!
.98
a
Reconsideration
.95 .90 .40
a∨b a∨f a∨d
((B + ¬a)! + a)!
.20
(improved)
a
(optimal)
a∨b a∨f a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
a
(optimal)
a∨b a∨f a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
¬g ∨ ¬b, ¬d ∨ g a a∨b a∨f e, f
a
a a∨b a∨f d, e, f
a∨f ¬g ∨ ¬b, ¬d ∨ g
a
(unchanged)
a∨b a∨f e, f
a
(improved)
a∨f a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
a
(optimal)
a∨b a∨f a ∨ d, ¬b ∨ ¬d, d, e, f ¬g ∨ ¬b, ¬d ∨ g
Table 1. This table shows revision and reconsideration on a total pre-order of beliefs using six different adjustment strategies (as implemented in SATEN[4]). For a full discussion, cf. [2].
1. if q is a culprit and q ≻ p, q was processed during an earlier pass; 2. all NAND-sets with p as a least element are not active and will remain so through the end of the algorithm; 3. if p is JustifiedOut, it will remain so through the end of the algorithm. When the loop exits, we know that: • all unasserted culprits are JustifiedOut; • the resultant base, B, is consistent (no NAND-set is active); • the resultant base, B, is optimal (∀B ′ ⊆ B ∪ : B ′ = B ⇒ B ≻ B ′ ).
When the minimal beliefs of a NAND-set number more than one, base optimality is harder to define, but reconsideration can still help improve a base (possibly to a clearly optimal state). Table 1 shows reconsideration on a total pre-order for six different decision functions implemented in SATEN [4]. Five bases improved—three to optimal. Systems with non-linear credibility orderings can benefit from implementing reconsideration. We have implemented an anytime, interleavable algorithm for reconsideration in an existing reasoning system (cf. [2]).
References [1] S. O. Hansson. A Textbook of Belief Dynamics, volume 11 of Applied Logic. Kluwer, Dordrecht, The Netherlands, 1999. [2] F. L. Johnson. Dependency-Directed Reconsideration: An Anytime Algorithm for Hindsight Knowledge-Base Optimization. PhD thesis, Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, January 2006. [3] F. L. Johnson and S. C. Shapiro. Base belief change and optimized recovery. In Proceedings of STAIRS’06 at ECAI’06, Amsterdam, 2006. IOS Press. [4] M.-A. Williams and A. Sims. SATEN: An object-oriented web-based revision and extraction engine. In C. Baral and M. Truszy´nski, editors, Proceedings of the 8th International Workshop on Non-Monotonic Reasoning NMR’2000, 2000. CoRR article: cs.AI/0003059.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
263
Dynamic Abstraction for Hierarchical Problem Solving and Execution in Stochastic Dynamic Environments Per Nyblom Department of Computer and Information Science, Linköping university, Sweden, email: [email protected]
1. Introduction Most of today’s autonomous problem solving agents perform their task with the help of problem domain specifications that keep their abstractions fixed. Those abstractions are often selected by human users. We think that the approach with fixed-abstraction domain specifications is very inflexible because it does not allow the agent to focus its limited computational resources on what may be most relevant at the moment. We would like to build agents that dynamically find suitable abstractions depending on relevance for their current task and situation. This idea of dynamic abstraction has recently been considered an important research problem within the area of hierarchical reinforcement learning [1]. 2. Algorithm We have developed an algorithm that is designed to use techniques for dynamic abstraction targeted at adaptive problem generation. It operates in a hierachical manner and interleaves problem solving and execution. The algorithm is a template and needs to be augmented with domain-specific dynamic abstraction methods, solution techniques and subproblem generation. The algorithm uses a dynamic hierarchy of solutions which is called a Hierarchical Solution Node (HSN) structure where each node represents an abstraction level with a corresponding problem model. A HSN structure is somewhat similar to the task graph used in the MAXQ value function decomposition [2]. Replanning and modifications to the problem model abstraction are performed continuously depending on whether the current abstractions are considered invalid or not, which makes the HSN structure change dynamically. State changes may trigger creation of new subtasks. This results in a new problem model abstraction and replanning for that particular task. 3. Problem Domain and Implementation We have implemented the algorithm for a domain inspired by our unmanned aerial vehicle (UAV) research [3], where we show how dynamic abstraction can be done in practice.
264
P. Nyblom / Dynamic Abstraction for Hierarchical Problem Solving and Execution
The domain consists of a freely moving agent in a continuous 2D environment without obstacles. The agent’s task is to maximize its total reward which is increased by classifying moving targets and to finish at so called finish areas. The reward is decreased when the agent comes too close to any of the moving dangers in the environment. The movement of the targets and dangers is stochastic and they can either be contrained to move on a road network or operate freely. A simple fixed abstraction scheme will eventually fail in this domain due to the curse of dimensionality when the number of objects increases. We have therefore developed a method to find suitable abstractions dynamically. The problem model abstractions are in this case the possible discretizations of the different features in the domain such as the position features of the dangers and targets. The discretizations are limited by a maximum state space size. A utility measure for discretizations, based on the features’ expected relevances in the current situation, are used to state the abstraction selection as an optimization problem. For example, the relevance for a danger’s position feature depends on its distance from the agent and its ability to inflict negative reward. Each feature’s utility increases with the number of discrete values it can take in the final discretization. The total utility, which is the measure that is maximized, is the sum of all the features’ utility functions. The optimization problem is solved by hillclimbing in our current implementation and results in a specification of how many states that each feature should get in the final discretization. The state space is then divided by k-means clustering together with the agent’s internal simulation model of the environment. The same simulation model is used to solve the problem that is defined by the discretization with the DynaQ [4] model-based reinforcement learning algorithm. When a problem is solved on the selected abstraction level, subtasking is performed by creating a new subproblem that corresponds to the first step in the solution. Replanning is performed either when the abstraction is considered too old or when the relevances of the features in the current state differs too much from the ones used in the abstraction. Experiments with our implementation indicate that the abstractions must be replaced frequently for good results in this particular domain. It might therefore be more efficient to use a forward search method instead of reinforcement learning for solving the problems.
References [1] A. G. Barto and S. Mahadevan, ‘Recent advances in hierarchical reinforcement learning.’, Discrete Event Dynamic Systems, 13(4), 341–379, (2003). [2] T. Dietterich, ‘Hierarchical reinforcement learning with the MAXQ value function decomposition’, in Proceedings of the 15th International Conference on Machine Learning, (1998). [3] P. Doherty, ‘Advanced research with autonomous unmanned aerial vehicles’, Proceedings on the 9th International Conference on Principles of Knowledge Representation and Reasoning, (2004). [4] R. S. Sutton, ‘Integrated architectures for learning, planning, and reacting based on approximating dynamic programming’, in Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224, (1990).
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
265
A comparison of two machine-learning techniques to focus the diagnosis task 1 Oscar PRIETO a,2 , Aníbal BREGÓN a a Intelligent Systems Group (GSI), Department of Computer Science, University of Valladolid, Spain Abstract. This work considers a time series classification task: fault identification in dynamic systems. Two methods are compared: i) Boosting and ii) K-Nearest Neighbors with Dynamic Time Warping distance. Keywords. Machine-learning, Boosting, Dynamic Time Warping
1. Introduction Continuous processes are a good example of real systems where Machine-learning techniques can be used to focus a model-based diagnostician. In these systems we can consider fault identification as a task of classifying the time series of the observed variables involved in the process, hence the model-based diagnostician performs fault detection. We have used two machine learning techniques to classify multivariate time series. First, we were able to deduce a set of literal-based classifiers through Boosting technique [3]. Later on, we applied the K-Nearest neighbors algorithm using Dynamic Time Warping as the distance measure [1]. In this work we make a comparison between both approaches and specify characteristics and results. 2. Machine Learning Techniques for Fault Identification DTW [2] provides a dissimilarity measure between two time series which could not be aligned in time. It was combined with K-Nearest Neighbor algorithm. Boosting [5] performs a sequential learning from several classifiers. In this work we used boosting using very simple base classifiers: interval literals. A detailed description of the method and the predicates is available in [4]. 3. Results We have worked with a laboratory plant that resembles common features of industrial continuous processes [1]. The study was made on a data-set, made up of several examples obtained from simulations of the different classes of faults that could arise in the plant. We used 14 fault modes. Each fault is characterized by 11 time series (each one coming from available measurements). 1 This work has been partially funded by Spanish Ministry of Education and Culture, through grant DPI2005-
08498, and Junta Castilla y León VA088A05. 2 Correspondence to: Oscar Prieto, Intelligent Systems Group (GSI), Department of Computer Science, E.T.S.I. Informática, University of Valladolid, Campus Miguel Delibes s/n, 47011 Valladolid, Spain. Tel.: +34 983423670; Fax: +34 983423671; E-mail: [email protected].
266
O. Prieto and A. Bregón / A Comparison of Two Machine-Learning Techniques
Table 1. Upper table shows percentage of error for each method for 30%, 40%, 50%, 100% of the series length. Lower table shows t-test matrix (* means that the column method is significantly better than row method). 30%
100%
Dev.
Mean
Dev.
Mean
Dev.
Mean
Dev.
1-neighbor, DTW distance
43.92
8.25
12.14
5.63
8.92
5.12
8.57
5.37
3-neighbors, DTW distance
42.14
6.69
15.71
6.11
12.85
4.82
11.42
5.27
5-neighbors, DTW distance Boosting with literals
46.78 62.85
8.98 7.93
16.78 18.57
5.06 9.19
16.42 7.14
6.11 4.45
15 2.85
6.02 3.68
d
b
b
c
40% d
-
a
b
c
50%
*
a
*
100% d
* -
a
b
c
d
-
-
c
-
*
50%
Mean
30% a
40%
Technique
* -
* -
Technique a = 1-neighbor, DTW dist.
-
*
b = 3-neighbors, DTW dist.
*
c = 5-neighbors, DTW dist.
-
d = Boosting & literals
4. Discussion and Conclusions We have used 30%, 40% and 50% of the full time series, because it is needed to invoke the classifiers as soon as possible. Each simulation started in a steady state and lasted 900 seconds; faults arising between 180 and 300 seconds. Using 30%, K-Nearest Neighbors are significantly better than Boosting. However, in both cases the percentage of error is too high. In the case of the 40%, we have not significantly different results comparing both techniques. The results could be acceptable for the considered application. For 50% and the full time series the results obtained for Boosting are significantly better than the ones obtained for 3 and 5 Nearest Neighbors. We can conclude that Boosting and 1 Nearest Neighbor have similar results for the percentage of error, that is why we have to consider the computational cost of each technique. Since in our application it is more important the early detection of the fault, we consider that the runtime cost is more important than the training cost. The runtime cost for Boosting is O(n l) and for K-Nearest Neighbors with DTW is O(n2 m v), where l is the number of literals, v is the number of variables, m is the number of training time series, and n is time series length. The complexity is higher for K-Nearest Neighbors with DTW than Boosting. So we can conclude that Boosting is a better classification approach for this problem. References [1]
A. Bregón, M.A. Simon, J.J. Rodriguez, C. Alonso, B. Pulido, and I. Moro. Early fault classification in dynamic systems using case-based reasoning. In Post-proceedings CAEPIA’05, Santiago de Compostela, Spain, 2006. [2] E. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3):358–386, 2005. [3] B. Pulido, J.J. Rodriguez Diez, C. Alonso González, O. Prieto, E. Gelso, and F. Acebes. Diagnosis of continuous dynamic systems: integrating consistency-based diagnosis with machine-learning techniques. In XVI IFAC World Congress, 2005, Prague, Zcheck Republic, 2005. [4] J. J. Rodríguez, C. J. Alonso, and H. Boström. Boosting interval based literals. Intelligent Data Analysis, 5(3):245–262, 2001. [5] Robert E. Schapire. A brief introduction to boosting. In 16th IJCAI, 1999.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
267
Argumentation Semantics for Temporal Defeasible Logic1 Régis Riveret a,2 , Guido Governatori b , and Antonino Rotolo a a CIRSFID, University of Bologna b School of ITEE, University of Queensland Temporal Defeasible Logic extends Defeasible Logic (DL) [1] to deal with temporal aspects. This extension proved useful in modelling temporalised normative positions [3] and retroactive rules, which permit to obtain conclusions holding at a time instant that precedes the time of application of the same rules [4]. Time is added in two ways. First, a temporalised literal is a pair l : t where l is a literal and t is an instant of time belonging to a discrete totally ordered set of instants of time T = {t1 ,t2 , . . . }. Intuitively, the meaning of a temporalised literal l : t is that l holds at time t. Second, rules are partitioned in persistent and transient rules according to whether the consequent persists until an interrupting event occurs or is co-occurrent with the premises. Hence, D = (T , F, R p , Rc , ≻) is a temporal defeasible theory, where T is the set of instants, F a set of facts, ≻ a superiority relation over rules, and R p and Rt the sets of persistent and transient rules. Given a rule r ∈ R p such as a : t ⇒ p b : t ′ , if r is applicable, we can derive b holding at t ′ and at any t ′ + n, until we can block this inference, for example, by deriving ¬b, at a certain time t ′ + m; given an r′ ∈ Rt such as a : t ⇒t b : t, we derive b at any t only if a holds as well at t. Proof tags of DL must be duplicated: ±∆ p l : t and ±∂ p l : t mean, respectively, that l : t is/is not definitely and defeasibly provable persistently; ±∆t l : t and ±∂t l : t mean, respectively, that l : t is/is not definitely and defeasibly provable transiently. On the other hand, DL can also be interpreted in terms of interacting arguments, giving for it an argumentation semantics [2]. Argumentation systems are of particular interest in AI & Law, where notions like argument and counter-argument are very common. For example, a recent development of such semantics is represented by argumentation and mediation systems which assist the users in expressing and organising their arguments, in assessing their impact on controversial legal issues or in building up an effective interaction in dialectical contexts [5]. So far, the logic has been only formalised in a proof-theoretic setting in which arguments play no role. Our purpose is to provide an argumentation semantics for temporal DL. Note that we can dispense with the superiority relation of standard DL, as we can define a modular transformation similar to that given in [1] that enables to empty the superiority relation. 1 The
full version of this paper is available at http://eprint.uq.edu.au/archive/00003954/ 01/Stairs06Long.pdf. 2 Corresponding Author: Régis Riveret, CIRSFID, University of Bologna, 40121 Bologna, Italy; E-mail: rriveret@cirsfid.unibo.it. The first and third authors were supported by the European project for Standardized Transparant Representations in order to Extend Legal Accessibility (ESTRELLA, IST-4-027655); the second author was supported by the Australia Research Council under Discovery Project No. DP0558854 on “A Formal Approach to Resource Allocation in Service Oriented Marketplaces”.
268
R. Riveret et al. / Argumentation Semantics for Temporal Defeasible Logic
In line with [2], an argument for a temporalised literal p : t is a proof tree (or monotonic derivation) for that temporalised literal in DL. Nodes are temporalised literals, arcs connecting nodes correspond to rules. We introduce a new type of connection, represented by “dashed arcs”, which are meant to connect a permanent literal to its successors in time. Types of arguments are distinguished according to the rules used: supportive arguments are finite arguments in which no defeater is used and strict arguments use strict rules only, permanent arguments are arguments in which the ending arc either is a dashed arc or corresponds to a permanent rule while in transient arguments the ending arc corresponds to a transient rule. The notion of attack between arguments is thus defined: a set of arguments S attacks a defeasible argument B if there is an argument A in S that attacks B such that (i) a : ta is a conclusion of A and (ii) a : ta is not a conclusion of a dashed arc, and (iii) b : tb is a conclusion of B, and a is on conflict with b (typically a is the complement of b), and either ta = tb , or it exists a dashed arc in B with premise b : tb′ and conclusion b : tb , and tb′ < ta ≤ tb . This definition of attack allows us to reuse the standard definition as given in [2] of arguments being undercut and arguments being acceptable. Based on these concepts we proceed to define justified arguments, i.e., arguments that resist any refutation. Accordingly, a literal p : t is defined as transiently/permanently justified if it is the conclusion of a supportive and transient/permanent argument in the set of justified arguments JargsD of a theory D. That a literal p : t is justified means that it is provable (+∂ ). However, DL permits to express when a conclusion is not provable (−∂ ). This last notion is captured by assigning the status of rejected to arguments. Roughly, an argument is rejected if it has a rejected sub-argument or it cannot overcome an attack from a justified argument. More generally, a literal is transiently/permanently rejected if it is transiently/permanently rejected by JargsD . One of our results is that permanent and transient defeasible conclusions can be characterised as follows: T HEOREM 1 Given a theory D and its set of justified arguments JargsD , • • • •
D ⊢ +∂ p (p : t) iff p : t is permanently justified; D ⊢ −∂ p (p : t) iff p : t is permanently rejected by JargsD ; D ⊢ +∂t (p : t) iff p : t is transiently justified; D ⊢ −∂t (p : t) iff p : t is transiently rejected by JargsD .
Hence, Theorem 1 gives an argumentation semantics with ambiguity blocking to permanent and transient defeasible conclusions.
References [1] G. Antoniou, D. Billington, G. Governatori, and M.J. Maher. Representation results for defeasible logic. ACM Transactions on Computational Logic, 2, pages 255–287, 2001. [2] G. Governatori, M.J. Maher, D. Billington, and G. Antoniou. Argumentation semantics for defeasible logics. Journal of Logic and Computation, 14, pages 675–702, 2004. [3] G. Governatori, A. Rotolo, and G. Sartor. Temporalised normative positions in defesible logic. In Proc. ICAIL05. ACM, New York, 2005. [4] G. Governatori, M. Palmirani, R. Riveret, A. Rotolo, and G. Sartor. Norm modifications in defeasible logic. In M. Moens, editor, Proc. Jurix’05. IOS Press, Amsterdam, 2005. [5] B. Verheij. Virtual Arguments: On the Design of Argument Assistants for Lawyers and Other Arguers. T.M.C. Asser Press, The Hague, 2005.
269
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
NEWPAR: An Optimized Feature Selection and Weighting Schema for Category Ranking Fernando Ruiz-Rico 1 and Jose-Luis Vicedo University of Alicante. Spain
2
Abstract. This paper presents an automatic feature extraction method for category ranking. It has been evaluated using Reuters and OHSUMED data sets, outperforming some of the best known and most widely used approaches. Keywords. Category ranking, text classification, vectorial model, SVM
1. Extracting Expressions from Documents Documents are processed to extract only relevant expressions. Words are reduced to their roots, and sentences are divided into expressions: single nouns or adjectives that are followed by a noun. To decide how relevant each expression ej is for a particular category ci , the following values are computed: T Lj (# characters in ej ), T Fij (# times ej occurs in all the training documents for ci ), DFij (# training documents for ci in which ej occurs) and CFj (# categories in which ej occurs).
2. Building Category Prototype Vectors A vector is built for each category using the expressions obtained from documents as dimensions. Some considerations taken into account are: • Averaged TF and DF values over each category and over the whole set of categories are used as a threshold for dimensionality reduction. • Expressions representing more than half of the categories are not considered discriminative enough and they are discarded. • A category descriptor is provided to identify the topic of each category. Thus, finding expressions that match these descriptors in a document enhances the relation between a document and a category. • Since the amount of training data differs substantially among categories, TF, DF and TL are normalized to contain the proportion between the total number and the averaged value over all of the expressions in the category. 1 Fernando 2 Jose
Ruiz Rico. University of Alicante, SPAIN. E-mail: [email protected] Luis Vicedo González. University of Alicante, SPAIN. E-mail: [email protected]
270
F. Ruiz-Rico and J.-L. Vicedo / NEWPAR: An Optimized Feature Selection and Weighting Schema
• A document’s title usually summarizes the contents of the full document in only one sentence. The frequency of an expression in a document is doubled for expressions that appear in the title of the document. 3. Weighting Expressions in New Documents To get documents represented as vectors, the category prototypes obtained are used to choose the optimal weight wij of an expression ej for the category ci : (1)
wij = ((T F normij + DF normij ) ∗ T Lnormij ∗ T F newj )) / CFj
where T F newj stands for the frequency of the expression ej in the document being represented. The norm suffix means the values are normalized. 4. Evaluation Documents in Reuters and OHSUMED collections have been ranked with two algorithms and compared to other high performing feature selection criteria (see tables 1 and 2). • Sum of weights. From Eq. (1), each document d in the test set is represented as a set of vectors, one per category: d1 = {w11 , · · · , w1n }, d2 = {w21 , · · · , w2n }, · · · , dm = {wm1 , · · · , wmn }. This way, categories n can be ranked by using a simple sum of weights as a similarity measure: j=1 wij . The higher this value is, the more related document d and category ci are. • SVM. Every document d in both the training and test sets is represented as a vector: d = {w1 , · · · , wn }. The weight wj of the expression ej is calculated m from Eq. (1) as the sum of the individual weights for each category ci : wj = i=1 wij .
Table 1. Results for Reuters-21578 (ModApte split)
Table 2. Results for OHSUMED (heart diseases)
Algorithm
OneErr
AvgP
maxF1
Algorithm
OneErr
AvgP
Rocchio [1]
14.48
90.00
85.00
BoosTexter.RV [3]
29.21
77.40
Perceptron [1]
9.59
91.00
89.00
Centroid Booster [3]
27.31
77.90
MMP (1.1) [1]
8.65
94.00
90.00
SVM light [3]
26.14
79.00
Sum of weights
8.09
94.29
95.56
Sum of weights
21.49
82.58
SVM [2]
8.82
93.53
94.87
SVM [2]
22.82
80.96
References [1] Koby Crammer and Yoram Singer, A new family of online algorithms for category ranking, 2002. Proceedings of SIGIR-02, 25th ACM International Conference on Research and Development in Information Retrieval. [2] Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. [3] Michael Granitzer, Hierarchical text classification using methods from machine learning, 2003. Master’s thesis at Graz University of Technology.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
271
Challenges and Solutions for Hierarchical Task Network Planning in E-Learning a
Carsten Ullricha , Okhtay Ilghamib DFKI GmbH, Germany, b University of Maryland, USA
1. Motivation This paper describes a collaboration between two PhD students. The first author is developing a course generator (CG) for a Web-based e-learning environments (WBLE) [3]. A CG generates sequences of learning objects according to the learner’s goals and individual properties. The assembling process is knowledge intensive: assembling courses that implement modern pedagogical theories requires a framework that allows to represent complex pedagogical strategies. The strategies are implemented using the Hierarchical Task Network (HTN) planner JSHOP2 [1], developed by the second author. In HTN planning, the goal of the planner is to achieve an ordered list of top tasks, where each task is a symbolic representation of an activity to be performed. The planner formulates a plan by applying methods that decompose these top tasks into smaller and smaller subtasks until primitive tasks are reached that can be carried out directly. When we applied the HTN framework to WBLE, we encountered several challenges, which we will describe in this paper, together with our solutions. We believe that this work is of general interest, as today’s WBLE are excellent examples of distributed information systems and illustrate the challenges that can arise in complex scenarios.
2. Challenges and Solutions Vast Amounts of Resources Typically, a WBLE uses vast amounts of learning objects (LOs), potentially distributed over distinct repositories, which each may contain thousands of LOs. This creates a problem because a CG needs to reason about the LOs but traditional AI planning requires evaluating a method’s precondition against the planner’s world state. In a naive approach, this would require mirroring in the world state all the information available about the resources in all the repositories. In real world applications, this is simply infeasible. Additionally, only a subset of all the stored resources may be relevant for the planning, but which subset is unknown beforehand. A possible solution is to access the information on-demand by using external functions. An external function serves to calculate information not directly available in the world state using procedures not native to the planning algorithm. Thus, instead of the operator’s (or method’s) preconditions to match against the logical atoms that make up the world state, they invoke function calls that return possible substitutions. Distributed and Heterogeneous Resources To make things even more difficult, despite standardization efforts, each repository often uses a (at least partly) different knowledge representation format. In Information Integration, these difficul-
272
C. Ullrich and O. Ilghami / Challenges and Solutions for HTN Planning in E-Learning
ties are tackled using a mediator architecture: a mediator acts as a link between the application and resource layer, thus providing a uniform query interface to a multitude of autonomous data sources. It translates queries for a specified set of connected repositories and passes the translated queries to the repositories. The advantage of the mediating component is that the querying component, i.e., the planner, does not need to know the specific specification of the data sources and their query languages, because the mediator translates the queries. The planner simply accesses the mediator using an external function. However, one needs an abstract representation of the resources (a mediated representation) used during planning, and mappings to the representations used in the repositories. For CG, we developed such a representation [2]. Third-Party Services In WBLE, a vast range of services that support the learning process in various ways have been developed. A course should integrate these services in a pedagogically sensible way: during the learning process, at specific times the usage of a tool will be more beneficial than at some other time. The problem is that in a Web-based environment, availability of services may vary. However, one wants to avoid using different domain descriptions for each potential configuration. Therefore, the methods need to encode in their preconditions whether a service is available, which is easily realizable by using external functions. In case the service is not available and the plan generation should not fail, fallback methods should be specified that specify alternative actions. The advantage of adding fallback methods is that the domain description remains reusable, regardless of the actual configuration. In the case of course generation, several methods encode the knowledge at what time in the course the learner preferably should use a learning supporting tool, and insert corresponding calls to the tool or simply text at the appropriate place in the course.
3. Conclusion The HTN framework was used to implement pedagogical strategies formulated by didactical experts. The experts were very comfortable with expressing their knowledge in an hierarchical manner, which eased the formalization into planning operators and methods. The current implementation consists of about 250 methods, 20 operators, and 40 axioms. A technical analysis yielded satisfactory results: generating a course takes between several seconds and minute, which is acceptable. The CG is already used in several schools and universities and evaluations assessing its pedagogical effectiveness are underway. To conclude, a principal characteristic of our solutions to the described challenges is that they do not require extending the planning algorithm, as they are built “on top” of the algorithm. The extensive use of external functions allows us to access information stored in large and heterogeneous resources whenever it is necessary, and to flexibly integrate learning supporting tools.
References [1] O. Ilghami. Documentation for JSHOP2. Technical Report CS-TR-4694, Department of Computer Science, University of Maryland, February 2005. [2] C. Ullrich. The learning-resource-type is dead, long live the learning- resource-type! Learning Objects and Learning Designs, 1(1):7–15, 2005. [3] C. Ullrich. Course generation based on HTN planning. Proc. of 13th Workshop of the SIG Adaptivity and User Modeling in Interactive Systems, pages 74–79, 2005.
Invited Talks
This page intentionally left blank
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
275
Artificial Intelligence and Unmanned Aerial Vehicles Patrick DOHERTY Linköping University, Sweden Abstract The emerging area of intelligent unmanned aerial vehicle (UAV) research has shown rapid development in recent years and offers a great number of research challenges within the artificial intelligence and knowledge representation disciplines. In my talk I will present some of the research currently being pursued and results achieved in our group at Linköping University, Sweden. The talk will focus on artificial intelligence techniques used in our UAV systems and the support for these techniques provided by the software architecture developed for our UAV platform, a Yamaha RMAX helicopter. Additional focus will be placed on some of the planning and execution monitoring functionality developed for our applications in the areas of photogrammetry and emergency services assistance. The talk will include video demonstrations of both single and multi-platform missions and a possible live demonstration of a micro-UAV, the LINKMAV, developed in our group.
276
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
Writing a Good Grant Proposal C. GHIDINI SRA, ITC-irst, Trento, Italy
Abstract Writing a good research grant proposal is not easy. In this talk I will try to collect together a number of information and suggestions on how to deal with this difficult task and an overview of the sources of founding available. The talk is based on guidelines issued by different founding agencies as well as on my experience and personal view on the subject.
STAIRS 2006 L. Penserini et al. (Eds.) IOS Press, 2006 © 2006 The authors. All rights reserved.
277
Author Index Abdel-Naby, S. Alonso, M.A. Antoniou, G. Antonucci, A. Bourgois, L. Bregón, A. Bryl, A. Carmagnola, F. Cena, F. Conte, R. Cozman, F.G. Cruz, R. Cuapa Canto, R. Devooght, K. Di Nuovo, A.G. Doherty, P. Dubois, D. Evrim, V. Flouris, G. Fortin, J. Ghidini, C. Giorgini, P. Gómez-Rodríguez, C. Governatori, G. Gu, Y. Hatzilygeroudis, I. Ide, J.S. Ilghami, O. Jannach, D. Johnson, F. Kittler, J. Klapaftis, I.P. Kokash, N. Konjović, Z. Koutsojannis, C. Kovačević, A. Kuster, J. Lavrac, N.
247 259 132 120 232 265 249 251 251 38 120 253 257 255 50, 84 275 196 72 132 196 276 247 259 26, 267 144 96 120 271 208 162, 261 3 174 220 62 96 62 208 184
Manandhar, S. McLeod, D. Milosavljević, B. Nixon, P. Nyblom, P. O’Donovan, J. Pagliarecci, F. Palesi, M. Paolucci, M. Patti, D. Penserini, L. Peppas, P. Perini, A. Plexousakis, D. Prieto, O. Pulina, L. Ramírez, J.A. Riveret, R. Rotolo, A. Ruiz-Rico, F. Shapiro, S.C. Shevchenko, M. Smyth, B. Song, I. Soutchanski, M. Tolar, J. Trajkovski, I. Turrini, P. Ullrich, C. Vellido, A. Vicedo, J.-L. Vilares, J. Windridge, D. Zacarías Flores, D. Zacarías Flores, F. Zaffalon, M. Zelezny, F.
174 72 62 72 263 72 14 84 38 84 v v v 132 265 108 257 267 267 269 162, 261 3 72 26 144 184 184 38 271 253 269 259 3 257 257 120 184