European Robotics Symposium 2006 [1 ed.] 9783540326885, 3-540-32688-X

This unique reference represents a cross-section of forefront robotics research, ranging from robotics and systems to le

238 60 4MB

English Pages 209 [208] Year 2006

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
An Ontology of Robotics Science....Pages 1-14
A Multi-agent System Architecture for Modular Robotic Mobility Aids....Pages 15-26
How to Construct Dense Objects with Self-Recondfigurable Robots....Pages 27-37
In Situ Autonomous Biomechanical Characterization....Pages 39-50
Incremental Learning of Task Sequences with Information-Theoretic Metrics....Pages 51-63
Representation, Recognition and Generation of Actions in the Context of Imitation Learning....Pages 65-77
Reduction of Learning Time for Robots Using Automatic State Abstraction....Pages 79-92
Efficient Failure Detection for Mobile Robots Using Mixed-Abstraction Particle Filters....Pages 93-107
Optimization and Fail-Safety Analysis of Antagonistic Actuation for pHRI....Pages 109-118
Bilateral Control of Different Order Teleoperators....Pages 119-127
Navigation and Planning in an Unknown Environment Using Vision and a Cognitive Map....Pages 129-142
Exploiting Distinguishable Image Features in Robotic Mapping and Localization....Pages 143-157
Image-Based Visual Servoing for Mobile Robots with Catadioptric Camera....Pages 159-170
Multi-knowledge Approach for Mobile Robot Identification of a Changing Environment....Pages 171-180
Robust Monte-Carlo Localization Using Adaptive Likelihood Models....Pages 181-194
Metric Localization with Scale-Invariant Visual Features Using a Single Perspective Camera....Pages 195-209
Recommend Papers

European Robotics Symposium 2006 [1 ed.]
 9783540326885, 3-540-32688-X

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Springer Tracts in Advanced Robotics Volume 22 Editors: Bruno Siciliano · Oussama Khatib · Frans Groen

Springer Tracts in Advanced Robotics Edited by B. Siciliano, O. Khatib, and F. Groen

Vol. 21: Ang Jr., H.; Khatib, O. (Eds.) Experimental Robotics IX 618 p. 2006 [3-540-28816-3] Vol. 20: Xu, Y.; Ou, Y. Control of Single Wheel Robots 188 p. 2005 [3-540-28184-3] Vol. 19: Lefebvre, T.; Bruyninckx, H.; De Schutter, J. Nonlinear Kalman Filtering for Force-Controlled Robot Tasks 280 p. 2005 [3-540-28023-5] Vol. 18: Barbagli, F.; Prattichizzo, D.; Salisbury, K. (Eds.) Multi-point Interaction with Real and Virtual Objects 281 p. 2005 [3-540-26036-6]

Vol. 10: Siciliano, B.; De Luca, A.; Melchiorri, C.; Casalino, G. (Eds.) Advances in Control of Articulated and Mobile Robots 259 p. 2004 [3-540-20783-X] Vol. 9: Yamane, K. Simulating and Generating Motions of Human Figures 176 p. 2004 [3-540-20317-6] Vol. 8: Baeten, J.; De Schutter, J. Integrated Visual Servoing and Force Control 198 p. 2004 [3-540-40475-9] Vol. 7: Boissonnat, J.-D.; Burdick, J.; Goldberg, K.; Hutchinson, S. (Eds.) Algorithmic Foundations of Robotics V 577 p. 2004 [3-540-40476-7]

Vol. 17: Erdmann, M.; Hsu, D.; Overmars, M.; van der Stappen, F.A (Eds.) Algorithmic Foundations of Robotics VI 472 p. 2005 [3-540-25728-4]

Vol. 6: Jarvis, R.A.; Zelinsky, A. (Eds.) Robotics Research { The Tenth International Symposium 580 p. 2003 [3-540-00550-1]

Vol. 16: Cuesta, F.; Ollero, A. Intelligent Mobile Robot Navigation 224 p. 2005 [3-540-23956-1]

Vol. 5: Siciliano, B.; Dario, P. (Eds.) Experimental Robotics VIII 685 p. 2003 [3-540-00305-3]

Vol. 15: Dario, P.; Chatila R. (Eds.) Robotics Research { The Eleventh International Symposium 595 p. 2005 [3-540-23214-1]

Vol. 4: Bicchi, A.; Christensen, H.I.; Prattichizzo, D. (Eds.) Control Problems in Robotics 296 p. 2003 [3-540-00251-0]

Vol. 14: Prassler, E.; Lawitzky, G.; Stopp, A.; Grunwald, G.; Hagele, M.; Dillmann, R.; Iossiˇdis. I. (Eds.) Advances in Human-Robot Interaction 414 p. 2005 [3-540-23211-7]

Vol. 3: Natale, C. Interaction Control of Robot Manipulators { Six-degrees-of-freedom Tasks 120 p. 2003 [3-540-00159-X]

Vol. 13: Chung, W. Nonholonomic Manipulators 115 p. 2004 [3-540-22108-5] Vol. 12: Iagnemma K.; Dubowsky, S. Mobile Robots in Rough Terrain { Estimation, Motion Planning, and Control with Application to Planetary Rovers 123 p. 2004 [3-540-21968-4] Vol. 11: Kim, J.-H.; Kim, D.-H.; Kim, Y.-J.; Seow, K.-T. Soccer Robotics 353 p. 2004 [3-540-21859-9]

Vol. 2: Antonelli, G. Underwater Robots { Motion and Force Control of Vehicle-Manipulator Systems 209 p. 2003 [3-540-00054-2] Vol. 1: Caccavale, F.; Villani, L. (Eds.) Fault Diagnosis and Fault Tolerance for Mechatronic Systems { Recent Advances 191 p. 2002 [3-540-44159-X]

Henrik I. Christensen (Ed.)

European Robotics Symposium 2006 With 89 Figures

Professor Bruno Siciliano, Dipartimento di Informatica e Sistemistica, Universit`a degli Studi di Napoli Federico II, Via Claudio 21, 80125 Napoli, Italy, email: [email protected] Professor Oussama Khatib, Robotics Laboratory, Department of Computer Science, Stanford University, Stanford, CA 94305-9010, USA, email: [email protected] Professor Frans Groen, Department of Computer Science, Universiteit van Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands, email: [email protected]

Editor Professor Dr. Henrik I. Christensen Center for Autonomous Systems CAS/CVAP/NADA Kungliga Tekniska Hoegskolan (KTH) 10044 Stockholm Sweden

ISSN print edition: 1610-7438 ISSN electronic edition: 1610-742X ISBN-10 3-540-32688-X Springer Berlin Heidelberg New York ISBN-13 978-3-540-32688-5 Springer Berlin Heidelberg New York Library of Congress Control Number: 2006921135 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in other ways, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Digital data supplied by editor. Data-conversion and production: PTP-Berlin Protago-TEX-Production GmbH, Germany Cover-Design: design & production GmbH, Heidelberg Printed on acid-free paper 89/3141/Yu - 5 4 3 2 1 0

Editorial Advisory Board EUROPE Herman Bruyninckx, KU Leuven, Belgium Raja Chatila, LAAS, France Henrik Christensen, KTH, Sweden Paolo Dario, Scuola Superiore Sant’Anna Pisa, Italy R¨udiger Dillmann, Universit¨at Karlsruhe, Germany AMERICA Ken Goldberg, UC Berkeley, USA John Hollerbach, University of Utah, USA Lydia Kavraki, Rice University, USA Tim Salcudean, University of British Columbia, Canada Sebastian Thrun, Stanford University, USA

EUR ON

ASIA/OCEANIA Peter Corke, CSIRO, Australia Makoto Kaneko, Hiroshima University, Japan Sukhan Lee, Sungkyunkwan University, Korea Yangsheng Xu, Chinese University of Hong Kong, PRC Shin’ichi Yuta, Tsukuba University, Japan

European

***

***

STAR (Springer Tracts in Advanced Robotics) has been promoted under the auspices of EURON (European Robotics Research Network)

Research Network

***

***

ROBOTICS

An Ontology of Robotics Science John Hallam1 and Herman Bruyninckx2 1

2

Mærsk Institute, University of Southern Denmark, Campusvej 55, DK–5230 Odense M, Denmark [email protected] Dept Mechanical Engineering, Katholieke Universiteit Leuven, Celestijnenlaan 300B, B-3001 Leuven, Belgium. [email protected]

Summary. This฀paper฀describes฀ground-breaking฀work฀on฀the฀creation฀of฀an฀ontology for฀the฀domain฀of robotics฀as฀a฀science.฀An฀ontology฀is฀a฀collection฀of฀terms, concepts฀ and฀ their฀ inter-relationships,฀ represented฀ in฀ a฀ machine-usable฀ form.฀ An ontology฀in฀a฀particular฀domain฀is฀useful if฀ the฀structure฀it฀adds฀to฀the฀domain฀is simple฀enough฀to฀be฀understood฀quickly฀and฀intuitively,฀and฀rich฀enough฀to฀increase insight฀into฀the฀whole฀domain฀to฀a฀level฀where฀this฀increased฀insight฀can฀lead฀to innovation฀and฀increased฀efficiency฀in฀scientific฀and฀practical฀developments. This฀paper฀presents฀an฀ontology฀for the฀science฀of฀robotics,฀and฀not฀for robots as฀objects:฀the฀latter฀ontology฀describes฀the฀physical฀and฀technical฀semantics฀and properties฀ of฀ individual฀ robots฀ and฀ robot฀ components,฀ while฀ the฀ ontology฀ of฀ the science฀of฀robotics฀encodes฀the฀semantics฀of฀the฀meta-level฀concepts฀and฀domains฀of robotics.฀For฀example, surgical฀robotics and฀industrial฀automation฀are฀two฀concepts in฀the฀ontology฀of฀the฀science฀of฀robotics,฀while฀the฀semantics฀of฀robot฀kinematics and฀dynamics,฀or฀of฀a฀particular฀robot฀control฀algorithm฀belong฀to฀the฀ontology฀of robots฀as฀objects. The฀structure฀in฀the฀presented฀ontology฀for฀the฀science฀of฀robotics฀consists฀of two฀ complementary฀ sub-structures:฀ (i)฀ the฀ robot฀ agent฀ and฀ robot฀ system models (i.e.,฀what฀components฀are฀required฀in฀a฀robot฀device,฀and฀in฀a฀robotic฀application, respectively),฀and฀(ii)฀the฀Context฀Space (i.e.,฀ordinal฀and฀categorical฀relationships between฀physical฀and฀computational฀aspects฀of฀robot฀agents฀and฀systems฀in฀which the฀sub-domains฀of฀robotics฀can฀be฀mapped฀out).฀The฀implementation฀of฀the฀Context Space฀concept฀using฀standard฀ontology฀tools฀is฀explained. The฀paper฀illustrates฀its฀expected฀usefulness฀with฀examples฀of฀sub-domains฀of robotics฀expressed฀as฀contexts฀in฀the฀context฀space,฀and฀with฀two฀use฀cases฀for฀the ontology:฀(i)฀the฀classification฀of฀conference฀or฀journal฀paper฀submissions,฀and฀(ii) the฀guidance฀of฀new฀researchers฀into฀the฀domain฀of฀robotics.

1฀Introduction 1.1฀ Robotics฀as฀a฀Science Robotics฀is฀to฀a฀large฀extent฀a science฀of฀integration,฀constructing฀(models฀of) robotic฀ systems using฀ concepts,฀ algorithms฀ and฀ components฀ borrowed฀ from

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 1–14, 2006. © Springer-Verlag Berlin Heidelberg 2006

2

J.฀Hallam฀and฀H.฀Bruyninckx

various฀more฀fundamental฀sciences,฀such฀as฀physics฀and฀mathematics,฀control theory,฀artificial฀intelligence,฀mechanism฀design,฀sensor฀and฀actuator฀technology. The฀function฀and฀properties฀of฀a฀robotic฀system฀depend฀on฀the฀components from฀which฀it฀is฀made฀—฀the฀specific฀sensors,฀actuators,฀algorithms,฀mechanism฀ —฀ but,฀ beyond฀ that,฀ they฀ depend฀ on฀ the฀ way฀ those฀ components฀ are integrated.฀Furthermore,฀a฀full฀description฀of฀a฀robotic฀system฀must฀include information฀about฀the฀task฀it฀is฀to฀perform฀and฀the฀environment฀in฀which฀the task฀is฀to฀be฀undertaken.฀As฀we฀shall฀see,฀this฀view฀of฀robotics฀(i.e.,฀the฀system formed฀by฀the฀robot฀agent,฀its฀task฀and฀the฀environment฀in฀which฀it฀has฀to perform฀that฀task)฀is฀crucial฀in฀developing฀a฀full฀ontology฀for฀robotics. 1.2฀ Ontologies฀for฀Robotics An฀ontology฀is฀a฀formal฀definition฀of฀concepts,฀terms฀and฀relationships฀appropriate฀to฀some฀domain฀of฀knowledge,฀generally฀expressed฀in฀a฀formalism฀that allows฀machine-usability฀of฀the฀encoded฀knowledge.฀(The฀Wikipedia฀contains a฀good฀introduction฀to฀the฀concept฀of฀an฀ontology,฀[3].)฀Ontologies฀typically serve฀two฀purposes: 1.฀ providing฀agreed฀and฀unambiguous฀terminology฀for฀a฀domain,฀with฀the goal฀of฀helping฀humans฀ express,฀transform฀and฀transfer฀their฀knowledge more฀effectively฀and฀accurately. 2.฀ allowing฀automatic use฀of฀that฀knowledge,฀for฀instance฀in฀exchange฀or฀linking฀of฀data฀between฀processes,฀or฀translation฀of฀terms฀between฀languages. The฀latter฀purpose฀is฀sometimes฀better฀known฀under฀the฀name฀Semantic฀Web, [1],฀and฀the฀W3C฀have฀defined฀standards฀for฀computer-readable฀representation of฀ontologies,฀[2]. Robotics,฀like฀other฀sciences,฀has฀need฀of฀suitable฀ontologies.฀For฀instance, an฀agreed฀ontology฀for฀sensor฀data฀would฀make฀possible฀much฀greater฀interoperability฀between฀sensing฀and฀other฀modules฀in฀a฀robotic฀system.฀Individual฀modules฀could฀label฀the฀data฀they฀generate฀using฀terms฀from฀the฀agreed ontology;฀other฀processes฀would฀then฀have฀available฀a฀definite฀semantics฀for the฀data.฀Similar฀applications฀can฀be฀envisaged฀in฀the฀domains฀of฀mechanical/electronic฀and฀control/software฀engineering฀in฀robotics.฀Such฀ontological knowledge฀would฀also฀be฀of฀use฀for฀designers฀of฀robotic฀systems฀when฀selecting and฀matching฀components. We฀describe฀this฀kind฀of฀ontology฀as฀an฀ontology฀of฀robots฀as฀objects.฀While a฀complete฀ontology฀of฀robotics฀requires฀such฀knowledge,฀the฀source฀of฀that knowledge฀is฀largely฀in฀the฀sciences฀and฀technologies฀on฀which฀robotics฀builds, viz.฀ physics,฀ control฀ engineering,฀ and฀ so฀ on.฀ Such฀ an฀ ontology฀ may฀ be฀ extremely฀valuable,฀but฀is฀not฀the฀focus฀or฀purpose฀of฀this฀paper.฀However,฀the ontology฀presented฀in฀this฀paper฀could฀be฀connected฀to฀these฀object฀ontologies without฀problem.

An Ontology of Robotics Science

3

Robotics, in its guise as a science of integration, implies a second kind of ontology. Consider a typical robotics conference, at which one may find sessions or talks on “field robotics” or (perhaps) “factory automation”. Roboticists use such labels as a taxonomy of their field, and have a more-or-less clear, but informal, understanding of the significant problems, opportunities and special requirements of each sub-field: of the context for the particular sub-domain. So, in the context of papers presented at conferences, even the short labels “field robotics” or “factory automation” are sufficient to help reviewers and readers focus their attention to the contributions of the paper in the topics that are supposed to be difficult and relevant in the state of the art in those domains. It takes students and practitioners in robotics quite some time to learn the links between the short labels on the one hand, and the scientific challenges and contributions they can expect in papers labelled in that way. This paper makes a first contribution in making these links more explicit and structured, in order to not only speed up the learning phase of human practitioners but also to facilitate (semi) automatic support for selection of reviewers, categorization of papers or research proposals, web or library searches, etc. The starting point of the presented research is thus that there is knowledge about the structure of the robotics science itself — what sub-domains are there; how are they inter-related; how can we express the relationships between sub-domains in a principled way? Once again, the idea of an ontology can help. We call an ontology dealing with this knowledge an ontology of robotics science. The existence and formalisation of that ontology is the key insight and focus of the present paper. 1.3 Organisation of the Paper In the following sections we motivate, illustrate and explore the idea of an ontology of robotics science. Section 2 considers robotic systems in general: comprising the agent, the task and the environment. A model of the components expected in a robotic system is presented. Section 3 then presents the key concepts in the ontology of robotics science, particularly the notion of scaling laws, which structure the domain of robotics. Section 4 investigates how the proposed ontology might be applied, and demonstrates its potential usefulness. Section 5 illustrates how the key concept of Context Space can be implemented using standard ontology tools. The final section summarises the paper’s conclusions and indicates planned future work.

2 A Model of Robotics Representing human knowledge about technological domains, such as robotics, is equivalent to defining sets of models with which humans can analyse the real world, make calculations about it, make predictions, etc. In other words,

4

J.฀Hallam฀and฀H.฀Bruyninckx

a฀model฀is฀a฀mathematical฀simplification฀of฀the฀real฀world,฀representing฀only those฀aspects฀of฀the฀real฀world฀that฀are฀relevant฀to฀a฀particular฀human฀purpose. The฀purpose฀of฀this฀paper฀ is฀ to฀bring฀structure฀ to฀ human฀ knowledge฀ of robotics,฀but฀not฀to฀give฀detailed฀technological฀descriptions฀of฀the฀workings of฀ robotics฀ devices฀ or฀ algorithms.฀ We฀ believe฀ that฀ the฀ graphical฀ models฀ in figure฀1฀and฀2฀provide฀us฀with฀the฀appropriate฀balance฀between฀detail฀and generality฀for฀the฀purpose฀of฀this฀paper. Figure฀ 1฀ gives฀ a฀ “robot-centric”฀ model฀ of฀ robotics:฀ it฀ depicts฀ a฀ robot “agent”฀as฀an฀integration฀of฀mechanical฀hardware,฀sensing฀hardware฀and฀actuation฀hardware,฀with฀planning฀software,฀control฀software,฀sensor฀processing software฀and฀actuator฀drive฀software.฀A฀successful฀robot฀interconnects฀knowledge฀and฀hardware฀at฀“appropriate”฀scales฀and฀with฀“appropriate”฀interfaces. Later฀sections฀of฀the฀paper฀will฀indicate฀more฀clearly฀what฀“appropriate”฀really฀means,฀and฀make฀clear฀that฀terms฀such฀as฀“field฀robotics”฀or฀“surgical robotics”฀indicate฀particular฀sub-manifolds฀of฀the฀“robot-centric”฀continuum (and฀ the฀ “application-centric”฀ continuum฀ of฀ figure฀ 2,฀ see฀ below)฀ that฀ have proven฀sufficiently฀successful฀or฀interesting฀to฀attract฀the฀intense฀research฀efforts฀from฀large฀communities.

Fig. 1. The “robot-centric,” or agent-model, of robotics.

Fig. 2. Robotic systems as the manifold of robot agents, tasks and environments.

Figure 2 gives an “application-centric” or “systems-centric” view of robotics complementary to the previous figure: it makes clear that success of a robot agent is not a property of the agent in itself, but is determined also by (i) the task the robot is expected to perform, and (ii) the environment in which that task is to be performed. Each successful robotic systems domain covers a certain subspace of the triangular continuum depicted in figure 2. In some

An Ontology of Robotics Science

5

cases, it is the robot device that contributes most to the success of the system, and its subspace is situated close to the agent vertex of the triangle. In other systems, it is the environment (“hardware”) that is most important, for example in sub-sea or airborne robots where water or air provide the necessary support to the robot. In yet other systems, the task is the determining factor.

3 An Ontology of Robotics Science The key to defining relationships and structure between the various subfields of robotics is an understanding of how the subfields come about. Recall that a particular subfield expresses similarities between a class of robotic systems, and implies relevant background knowledge, significant problems, useful properties and so on. What accounts for much of this variation is scaling laws. Consider the difference in the flight of an air-plane and a bumblebee. Each exploits different aerodynamic properties accessible to it because of the physical scale (tens of metres vs. millimetres) in which it operates: air behaves quite differently in different physical circumstances. This difference is an expression of a physical scaling phenomenon wherein physical system properties depend on the characteristic length of the system. Analogues of this principle can be constructed in other physical scales, such as time or mass, and in more abstract scales such as computational complexity. It is the placement with respect to a set of such scales that determines the properties, problems and possibilities of a given subclass of robotic system. 3.1 Context Space The Context Space is a conceptual space spanned by the various scaling dimensions available to a robotic system (or to its designer). Since, as we have argued in section 2, such a system comprises a device or agent, an environment, and a task, each of these essential components contributes scale dimensions and constraints to the system. In this section we examine some of these dimensions. The example dimensions we have chosen are intended to be illustrative rather than complete — space precludes an exhaustive listing of all possibilities, and such is not necessary to make our point. Each essential component of a complete robotic system contributes scaling dimensions. Thus device scales are the most straightforward to specify. The characteristic length and time constant of the robotic system, its mass, the number of components it comprises and the extent to which they interact with each other are important characteristics, as are the number of controllable degrees of freedom and the total degrees of freedom of the system.

6

J.฀Hallam฀and฀H.฀Bruyninckx

environment฀scales฀ define฀ the฀ type฀ and฀ properties฀ of฀ the฀ environment฀ in which฀the฀task฀is฀carried฀out.฀There฀are฀two฀classes฀of฀environment:฀bulk and฀interface.฀Examples฀of฀the฀former฀are฀vacuum฀(as฀in฀space฀robotics) or฀liquid฀(as฀in฀underwater฀robotics).฀Bulk฀environments฀can฀be฀ordered by฀their฀mechanical฀impedance.฀Examples฀of฀the฀latter,฀interface,฀type฀include฀solid-gas฀(e.g.฀a฀table฀top,฀or฀the฀outdoor฀land฀surface)฀or฀liquid-gas (e.g.฀the฀surface฀of฀the฀sea).฀Salient฀properties฀here฀include฀the฀size฀and spatial฀frequency฀of฀surface฀variation฀—฀how฀smooth฀the฀interface฀is. The฀ size฀ of฀ the฀ environment฀ (its฀ characteristic฀ length);฀ the฀ speed฀ with which฀ events฀ occur฀ (its฀ characteristic฀ time฀ constant);฀ and฀ whether฀ the environment฀is฀abiotic฀or฀biotic฀(or฀on฀the฀interface฀of฀the฀two)฀constitute other฀examples฀of฀relevant฀dimensions฀of฀variability. task฀scales฀ include฀the฀spatiotemporal฀properties฀of฀the฀task,฀for฀instance฀its characteristic฀precision฀and฀the฀lifetime฀of฀the฀robotic฀system฀when฀working฀on฀the฀task,฀and฀the฀risk฀cost,฀i.e.฀the฀cost฀associated฀with฀failure฀or error฀in฀the฀task.฀The฀latter฀can฀informally฀be฀defined฀as฀the฀cost฀of฀fixing a฀disaster฀caused฀by฀system฀failure.฀For฀laboratory฀robotics฀it฀is,฀relatively speaking,฀very฀small;฀for฀surgical฀or฀space฀robotics฀it฀is฀relatively฀large. Many฀ of฀ these฀ scales฀ have฀ an฀ ordinal฀ structure,฀ that฀ is,฀ the฀ axis฀ representing฀the฀dimension฀has฀a฀‘small฀value’฀end฀and฀a฀‘large฀value’฀end,฀with intermediate฀values฀that฀may฀be฀arranged฀in฀order฀of฀size.฀As฀we฀shall฀see below฀in฀section฀3.3,฀these฀ordering฀relationships฀along฀individual฀axes฀contribute฀to฀the฀rich฀structure฀of฀the฀context฀space. 3.2฀ Contexts Contexts฀ are฀ the฀ basic฀ mechanism฀ for฀ using฀ the฀ ontology.฀ They฀ provide฀ a means฀to฀distinguish฀groups฀of฀robotic฀systems฀—฀and฀so฀different฀subfields of฀ robotics฀ as฀ a฀ whole฀ —฀ whose฀ properties฀ are฀ restricted฀ to฀ sub-ranges฀ of certain฀ scale฀ axes.฀ For฀ example,฀ nanorobots฀ have฀ a฀ device฀ length฀ scale฀ in the฀range฀10−9 –10−7฀ metres.฀More฀generally,฀we฀can฀specify฀sets฀of฀systems whose฀ properties฀ lie฀ within฀ specific฀ ranges฀ on฀ several฀ axes฀ simultaneously; most฀generally,฀a฀context฀may฀be฀a union฀of฀such฀convex฀regions฀in฀the฀space. Conceptually,฀a฀context฀is฀a฀sub-class฀of฀the฀class฀of฀all฀possible฀robotic systems.฀Such฀classes฀will฀usually฀(but฀not฀necessarily)฀have฀intuitive฀names, for฀ example฀ “field฀ robotics,”฀ “medical฀ robots,”฀ “swarm฀ robots,”฀ and฀ each context฀implies฀a฀set฀of฀relevant฀technology,฀background฀knowledge,฀scientific problems฀and฀solutions฀following฀from฀the฀particular฀choices฀of฀scale฀which define฀the฀context. 3.3฀ Relationships฀Between฀Contexts Contexts,฀as฀defined฀above,฀can฀be฀thought฀of฀as฀sub-classes฀of฀the฀class฀of robotic฀systems;฀sub-classes฀generated฀by฀restricting฀the฀values฀of฀certain฀scaling฀properties฀(e.g.฀device฀length,฀risk฀cost,฀number฀of฀components,฀and฀so

An Ontology of Robotics Science

7

on) to particular values or sets of values from their respective scales (e.g. millimetre-metre, EUR 100–10000, 5–10, etc.). As such, contexts can take part in the standard class relationships of set theory: they can be super- or sub-classes of other classes; intersections, unions and complements can be defined; and membership can be checked. However, more interesting are the relationships implied by the ordinal structure of the scale axes. For instance, robot devices whose physical size (characteristic length) is in the centimetre-decimetre range are neighbours of those in the decimetre-metre range: they will have many properties in common. On the other hand, nanometre scale robotic devices will be quite different from centimetre ones, in some respects, since the dominant physical effects differ between those characteristic sizes. Neighbouring contexts are often contexts between which there exist loose analogies or trade-offs. For instance, one could describe a 100 mobile robot collective as a group of 100 6-dof systems or a single 600-dof system: exchange physical dimensions for interconnected system piece complexity. Neighbourhood implies quasi-invariance between aspects of the relationships that describe the same system in different contexts. Change in one scaling dimension is compensated by a corresponding change in another dimension. Sufficiently sophisticated reasoning mechanisms could exploit neighbourhood relationships to propose novel connections between robotic systems, or identify possible similarities between hithertofore disjoint subfields of the discipline. The neighbourhood relationships provided by the context scales also force human students in robotics to think more explicitly about the really unique, respectively shared, properties of specific robotics domains.

4 Using the Ontology The proposed ontology is unusual in that it focusses on the structure of a field of science rather than on the objects with which that field is concerned. Nevertheless, such an ontology has considerable practical use. In addition, it is easier to start with the ontology of robotics as a science, because, as presented in this paper, this ontology can be explained and understood (almost) completely in a relative short amount of time. In this section we give three illustrative examples of practical usefulness of the ontology. 4.1 Structuring the Domain of Robotics Science To illustrate the ideas presented so far, consider the various sub-fields of robotics typically found in the contents listings of conference proceedings: can we define them using context space ideas and axes? Suppose we start with Field Robotics. We can say that the device scale (robot size) is in the metre range; the environment length scale is 102 to 1010

8

J.฀Hallam฀and฀H.฀Bruyninckx

metres;฀the฀device/mission฀lifetime฀ranges฀between฀103฀ and฀1010฀ seconds;฀the environment฀may฀be฀vacuum,฀gas,฀liquid฀or฀an฀interface. So฀far฀so฀good;฀now฀consider฀underwater฀robotics.฀This฀is฀Field฀Robotics in฀ a฀ bulk฀ liquid฀ environment.฀ Space฀ Robotics฀ is฀ Field฀ Robotics฀ in฀ a฀ bulk vacuum฀environment,฀with฀environment฀length฀scale฀from฀105฀ to฀1010฀ metres, device/mission฀lifetime฀106 –1010฀ seconds,฀say฀(for฀space฀craft)฀and฀a฀high฀risk cost. Consider฀Research฀Lab฀Robotics.฀In฀this฀case฀we฀have฀device฀lengths฀of 10−3 –100฀ metres;฀ environment฀ length฀ scale฀ 101฀ metres,฀ say;฀ gas฀ or฀ smooth gas-solid฀interface;฀lifetime฀in฀the฀102 –105฀ second฀range;฀and฀low฀risk฀cost. Table-top฀ robotics,฀ for฀ instance฀ using฀ the฀ popular฀ Khepera฀ robot,฀ is฀ a specialisation฀of฀the฀Research฀lab฀context,฀with฀10−2฀ metre฀device฀scale,฀100 metre฀environment฀scale,฀very฀smooth฀solid-gas฀interface฀environment,฀very low฀ risk฀ factor,฀ low฀ environmental฀ interaction฀ and฀ lifetime฀ in฀ the฀ 102 –103 second฀range. As฀ a฀ final฀ example,฀ consider฀ factory฀ automation฀ or฀ industrial฀ robotics. Here฀we฀have฀device฀scales฀in฀the฀10−3 –100฀range,฀environments฀in฀the฀100 –101 metre฀range,฀gas-solid฀interface,฀lifetimes฀from฀105 –107฀ seconds;฀medium-high risk฀cost;฀moderate฀system฀complexity฀and฀low฀environment฀interaction. Although฀these฀examples฀are฀still฀somewhat฀vague฀—฀recall,฀for฀instance, ‘medium-high฀risk฀cost’฀or฀‘moderate฀system฀complexity’฀—฀they฀nevertheless illustrate฀that฀the฀different฀sub-fields฀of฀robotics฀occupy฀specifiable฀parts฀of the฀context฀space฀and฀that฀related฀sub-fields฀occupy฀neighbouring฀or฀subset regions฀ of฀ the฀ context฀ space.฀ Thus฀ it฀ is฀ possible฀ to฀ define฀ the฀ subfields฀ of robotics฀by฀labelling฀the฀appropriate฀regions฀of฀the฀context฀space฀and฀thereby infer฀relationships฀between฀subfields฀that฀might฀not฀immediately฀be฀apparent฀ from฀ just฀ their฀ names.฀ In฀ addition,฀ being฀ confronted฀ with฀ the฀ different ordinal฀scales฀that฀are฀relevant฀in฀a฀particular฀robotics฀domain฀will฀stimulate students฀and฀researchers฀to฀think฀about฀how฀the฀fundamental฀physical฀properties฀of฀a฀particular฀context฀could฀change฀when฀“moving”฀to฀a฀neighbouring context,฀or฀they฀could฀get฀inspiration฀from฀successful฀solutions฀implemented in฀neighbouring฀domains฀that฀they฀had฀not฀thought฀of฀before. 4.2฀ Use฀Case:฀Guided฀Reading฀for฀New฀Roboticists All฀teachers฀in฀robotics฀have฀experienced฀the฀following฀evolution฀in฀their฀students:฀the฀first฀couple฀of฀weeks฀or฀months฀in฀their฀study฀of฀robotics,฀students are฀reading฀tons฀of฀papers,฀working฀hard฀to฀see฀the฀forest฀for฀the฀trees,฀but are฀mostly฀unable฀to฀evaluate฀the฀essential฀contributions฀(or฀lack฀thereof)฀of every฀individual฀paper฀they฀read.฀Good฀teachers฀also฀know฀that฀they฀must organise฀regular฀reading฀sessions฀with฀their฀students,฀in฀order฀to฀guide฀this process฀of฀getting฀acquainted฀with฀a฀particular฀robotics฀research฀domain. We฀ believe฀ that฀ a฀ key฀ role฀ of฀ the฀ teachers฀ in฀ this฀ coaching฀ phase฀ is฀ to indicate฀ to฀ their฀ students฀ what฀ are฀ the฀ relevant฀ relationships฀ between฀ the details฀of฀the฀science฀explained฀in฀particular฀papers,฀and฀the฀goals฀of฀making

An Ontology of Robotics Science

9

a better robot system (i.e., the combination of agent, task and environment) in the particular domain of interest to the student. We believe that teachers (most often) implicitly convey to their students the kind of ontology that this paper presents, and that making these ontological relationships explicit will help the students in their learning. In addition, by using the ontology of this paper in real-world reading classes, the ontology will become more complete and refined. In addition, it could possibly become the basis for a series of textbooks that cover the whole domain of robotics in the most efficient way. 4.3 Use Case: Classification of Papers and Reviewers A second application of these ideas is to paper classification. Imagine that authors submitting papers to a robotics journal or conference (or, indeed, a funding agency) are invited to indicate approximately where, on figure 2 and figure 1, their paper content falls. Then, depending on their answer, they are invited to specify where it lies on various appropriate scale axes. For example, the author of a paper on inertial guidance for underwater robots would indicate ‘agent-centred’, ‘sensors’, and choose ‘bulk liquid environment’, ‘environment size 102 –103 metre’, ‘lifetime 103 –104 second’, and so on. Reviewers could similarly indicate their areas of expertise. Software based on the ontology could then match reviewers to papers depending on their distance in the contextual space. For most papers, this would produce the expected results: a paper on space robotics, for instance, would go to space robotics referees. But interesting desirable effects are possible. A paper on collective robotics might be offered to a reviewer interested in the control of very highly redundant systems, on the arguably sensible grounds that 100 6-dof robots have some similarity with one 600-dof one — the two subjects are neighbours in (some dimensions of) the context space.

5 Implementing the Ontology of the Science of Robotics To be useful, it must be possible to implement the ontology of the science of robotics using the standard ontology tools of the semantic web. The World Wide Web Consortium has selected OWL as its standard for the representation of ontological knowledge [4]. OWL is built on top of standard infrastructure (RDF and XML) and provides mechanisms for representing concepts (classes), instances, properties and relationships. It comes in three flavours — OWL-Lite, OWL-DL and Full OWL — which represent different compromises between representational power and computational tractability. The ontology described here is implemented using OWL-DL.

10

J.฀Hallam฀and฀H.฀Bruyninckx

5.1฀ Implementing฀the฀Context฀Space The฀key฀concept฀to฀implement฀is฀the฀Context฀Space.฀To฀do฀that,฀we฀identify฀a context฀with฀the฀set฀of฀all฀robotic฀systems฀falling฀into฀that฀context;฀i.e.฀‘Space Robotics,’฀in฀terms฀of฀the฀ontology,฀is฀defined฀as฀the฀class฀of฀robotic฀systems studied฀by฀that฀field.฀The฀class฀of฀such฀systems฀is฀actually฀defined฀using฀more fundamental฀properties,฀such฀as฀the฀environment,฀spatial฀and฀temporal฀scales in฀which฀the฀systems฀operate,฀and฀so฀on. The฀basis฀for฀implementing฀the฀Context฀Space฀is฀then฀to฀define฀the฀class฀of all฀robotic฀systems,฀sub-classes฀of฀which฀represent฀individual฀contexts.฀Contexts฀are฀defined฀using฀standard฀class฀operations฀of฀union,฀intersection,฀complement฀and฀property฀value฀restriction฀(restricting฀the฀legal฀values฀of฀a฀specific instance฀property฀to฀a฀known฀class). In฀line฀with฀the฀discussion฀of฀section฀2,฀we฀therefore฀assert฀that฀each฀robotic system฀ (an฀ instance฀ of฀ the฀ robotic฀ System฀ Class)฀ comprises฀ a฀ device฀ (the agent),฀ an฀ environment฀ and฀ a฀ task.฀ We฀ further฀ define฀ three฀ classes,฀ which we฀call Aspect฀Classes,฀consisting฀respectively฀of฀all฀devices,฀all฀environments and฀all฀tasks,฀as฀the฀value฀ranges฀of฀the฀three฀instance฀properties฀of฀a฀system.฀Associated฀with฀the฀instances฀of฀each฀aspect฀class฀are฀scaling฀properties, for฀example฀device-characteristic-length฀or฀environment-impedance,฀the฀values฀for฀which฀are฀drawn฀from฀Scale฀Classes฀ representing฀the฀various฀scaling axes.฀Notice฀that฀different฀aspects฀may฀have฀properties฀relating฀to฀the฀same scale:฀device,฀environment฀and฀task฀characteristic฀lengths฀or฀times฀would฀be an฀obvious฀example. Figure฀3฀illustrates฀this฀general฀scheme.฀Technically,฀each฀instance฀of฀the robotic฀Systems฀class฀has฀three฀properties,฀whose฀values฀are฀restricted฀to฀lie each฀in฀one฀of฀the฀three฀aspect฀classes;฀and฀each฀instance฀of฀an฀aspect฀class (for฀example,฀a฀particular฀environment)฀has฀scaling฀properties฀whose฀values are฀restricted฀to฀particular฀sub-classes฀from฀given฀scale฀axes.฀This฀construction has฀the฀crucial฀advantage฀that฀the฀class฀structure฀is฀defined฀intensionally฀— we฀are฀not฀required฀to฀enumerate฀all฀robotic฀systems,฀all฀environments,฀etc. —฀while฀also฀allowing฀specific฀systems,฀devices฀and฀so฀on฀to฀be฀represented฀as instances. To฀illustrate฀ the฀ implementation฀ strategy,฀ consider฀ systems฀ using฀nanoscale฀robotic฀devices,฀which฀can฀be฀characterised฀by฀the฀fact฀that฀their฀devicelength฀scale฀is฀in฀the฀nanometre฀range.฀Figure฀4฀illustrates฀how฀this฀is฀represented:฀a฀sub-class฀of฀the฀length฀scale฀class฀is฀defined฀to฀represent฀the฀nanometre฀range,฀and฀a฀sub-class฀of฀the฀devices฀aspect฀class฀is฀defined฀such฀that฀the device-length฀property฀is฀restricted฀to฀take฀values฀from฀the฀nanometre฀range class.฀Finally,฀a฀sub-class฀of฀the฀systems฀class฀with฀device฀property฀values฀restricted฀to฀the฀just-defined฀nano-device฀class฀completes฀the฀representation฀of nano-scale฀robotic฀systems.฀Notice฀that฀we฀have฀not฀restricted฀the฀environments฀in฀which฀these฀systems฀may฀operate฀nor฀the฀tasks฀to฀which฀they฀may be฀applied.฀The฀red฀(dashed)฀components฀in฀the฀figure฀illustrate฀the฀newly

An Ontology of Robotics Science

11

Fig. 3. Class hierarchy in the Robotics-as-Science Ontology: the universe is represented as the class of all robotic Systems, where each system (instance) has three Aspects — device, environment and task. Instances of the Aspect Classes have scaling properties, e.g. characteristic length, with values from the Scale Classes.

Fig. 4. Nano-Robotic Systems

Fig. 5. Medical Nano-Robotics

defined฀sub-classes;฀the฀standard฀OWL฀reasoners฀are฀able฀to฀infer฀that฀these are฀indeed฀sub-classes฀of฀their฀parents. Figure฀ 5฀ takes฀ the฀ example฀ one฀ stage฀ further.฀ Consider฀ medical฀ nanorobotics.฀The฀principal฀distinction฀between฀this฀and฀nano-systems฀robotics, for฀our฀illustrative฀purposes,฀is฀the฀environment฀in฀which฀the฀systems฀must work฀ —฀ on฀ a฀ macro-molecular฀ scale฀ and฀ in฀ interaction฀ with฀ living฀ matter. This฀is฀represented฀by฀the฀construction฀of฀a฀sub-class฀of฀biotic฀nanoscale฀environments฀to฀which฀the฀environment฀property฀of฀the฀system฀is฀restricted.฀The red฀(dashed)฀components฀in฀the฀figure฀illustrate฀these฀new฀definitions.฀Once again,฀standard฀OWL฀reasoners฀can฀infer฀that฀the฀medical฀nano-systems฀class

12

J.฀Hallam฀and฀H.฀Bruyninckx

is฀a฀sub-class฀of฀the฀nano-robotic฀systems฀or,฀equivalently,฀that฀medical฀nanorobotics฀is฀a฀sub-field฀of฀nano-scale฀robotics. 5.2฀ Context฀Relationships The฀ability฀to฀represent฀contexts฀is฀crucial฀to฀the฀ontology,฀but฀so฀is฀the฀ability to฀represent฀or฀infer฀relationships.฀As฀we฀have฀seen,฀certain฀relationships฀such as฀class฀inclusion฀can฀be฀inferred฀by฀the฀standard฀OWL฀reasoners. Not฀so฀with฀neighbourhood฀relationships,฀however.฀The฀problem฀is฀that OWL฀reasoners฀contain฀no฀machinery฀for฀inferring฀‘distance’฀between฀classes. To฀illustrate฀this,฀consider฀the฀length฀scale.฀We฀can฀construct฀in฀OWL฀a฀partition฀of฀this฀scale฀into฀(amongst฀others)฀millimetre,฀centimetre,฀decimetre฀and metre฀lengths฀with฀the฀appropriate฀ordering฀relationships฀(e.g.,฀centimetres are฀ longer฀ than฀ millimetres฀ and฀ shorter฀ than฀ metres).฀ But฀ it฀ is฀ not฀ possible฀with฀standard฀reasoners฀to฀infer฀that฀the฀metre฀sub-class฀is฀an฀indirect neighbour฀of฀the฀centimetre฀class฀two฀steps฀up฀the฀ordinal฀scale. What฀this฀means฀is฀that฀neighbourhood฀relationships,฀which฀depend฀on the฀‘distances’฀between฀context฀properties฀in฀the฀ordinal฀scale฀axes,฀cannot฀be inferred฀by฀standard฀reasoners฀since฀the฀ordinal฀‘distances’฀themselves฀cannot be฀inferred฀by฀those฀reasoners.฀However,฀more฀specialist฀reasoners฀can฀be฀built to฀make฀such฀inferences,฀and฀one฀can฀imagine฀such฀reasoners฀working฀on฀the ontology฀enumerating฀neighbourhood฀relationships฀between฀the฀represented contexts฀and฀automatically฀adding฀to฀the฀ontology฀explicit฀representations฀of discovered฀neighbourhood฀relationships. 5.3฀ Summary The฀ontology฀of฀the฀science฀of฀robotics฀depends฀on฀the฀key฀concept฀of฀contexts, which฀we฀can฀straightforwardly฀and฀intuitively฀represent฀using฀the฀standard ontology฀language฀OWL฀by฀defining฀a฀context฀(a฀sub-field฀of฀robotics)฀to฀be represented฀by฀the฀class฀of฀all฀systems฀belonging฀to฀that฀context฀(sub-field). Standard฀reasoners฀can฀determine฀simple฀class฀relationships฀between฀contexts. Neighbourhood฀relationships฀between฀contexts฀cannot฀be฀inferred฀using฀standard฀ reasoners,฀ but฀ specialist฀ discovery฀ processes฀ could฀ be฀ implemented฀ to annotate฀the฀ontology฀with฀such฀relationships.

6 Conclusions and Future Work 6.1 Conclusions The present paper has motivated the notion of an ontology of the science of robotics as opposed to an ontology for the objects of which robotic systems are composed. An ontology of robotics science allows us to define terms and concepts such as “surgical robotics,” “field robotics,” or “nanorobotics” in an

An Ontology of Robotics Science

13

objective way in terms of more primitive technological concepts. This allows us to give formal (machine-usable) meaning to these various terms and to explore the relationships between them. These meanings and relationships provide a meta-level structure in the scientific discipline of robotics, that could be exploited in various ways. The formal setting in which such definitions are made is a Context Space established by considering the physical, environmental and task-related scaling laws and relationships that apply to robotic systems. New sub-fields of robotics, and new relationships between them and between existing subfields, may be discovered by considering the relationships between them in context space. The ontology has been implemented using the standard ontology language OWL-DL using the Proteg´e suite. The proposed ontology is believed to be simple enough to be understood by humans within the span of a couple hours, while at the same time it is rich enough to bring non-trivial and hence useful structure to scientific domain of robotics. The ontology may be used, inter alia, as a tool for classification of robotic material, the matching of reviewers to conference, journal or funding agency submissions, or during teaching robotics to new students. Although the description of the ontology in the paper is somewhat incomplete, this is natural: to our knowledge there is no state of the art at all addressing this kind of ontology for robotics and also no truly comparable work in other (technological or non-technological) domains. Hence the goal of the paper is to present in outline a novel but, we believe, valuable tool for roboticists. 6.2 Future Work Before an ontology of robotics science is complete and fully usable, an amount of future work remains — both conceptual and practical. Conceptual work includes elaborating the set of scale axes to identify a complete, minimal and consistent set; identification of the kinds of questions and use cases for which the ontology provides a useful resource; and wide consultation with the robotics community on the clarity, utility and completeness of the conceptual framework. Practical work comprises completing the implementation of the concepts and relationships of the ontology tools conforming to the standards and recommendations established by the World-Wide Web Consortium; and the formal definition and implementation of suitable query and reasoning interfaces to the implemented ontology. The ontology must also be tested more intensely on the “educational workfloor” of teaching robotics to masters and PhD level students.

14

J.฀Hallam฀and฀H.฀Bruyninckx

Acknowledgement The฀authors฀gratefully฀acknowledge฀the฀support฀of฀the EURON฀Research฀Atelier฀Rose:฀a฀robotics฀ontology฀for฀the฀semantic฀web,฀and the฀many฀stimulating฀discussions฀with฀members฀of฀EURON’s฀Education฀and Training฀committee,฀which฀have,฀to฀a฀large฀extent,฀helped฀shape฀the฀ideas฀of the฀ontology฀presented฀in฀this฀paper.฀Wei฀Xu฀contributed฀many฀useful฀ideas to฀the฀described฀implementation฀of฀the฀ontology.

References 1.฀ Grigoris,฀A.฀and฀Van฀Harmelen,฀F.฀A฀Semantic฀Web฀Primer,฀MIT฀Press,฀2004. 2.฀ W3C.฀ Semantic฀ Web฀, http://www.w3.org/2001/sw/,฀ accessed฀ on-line฀ on September฀30th,฀2005. 3.฀ Wikipedia.฀ Ontology฀ (computer฀ science)฀, http://en.wikipedia.org/wiki/ Ontology฀(computer฀science),฀accessed฀on-line฀on฀September฀30th,฀2005. 4.฀ W3C.฀ OWL฀ Web฀ Ontology฀ Language฀ Overview฀, http://www.w3.org/TR/ owl-features/,฀accessed฀on-line฀on฀January฀12th,฀2006.

A฀Multi-agent฀System฀Architecture฀for Modular฀Robotic฀Mobility฀Aids Georgios฀Lidoris฀and฀Martin฀Buss Institute฀of฀Automatic฀Control฀Engineering Technische฀Universit¨ at฀M¨ unchen D-80290฀Munich,฀Germany [email protected]

Abstract In฀ this฀ paper฀ a฀ multi-agent฀ system฀ architecture฀ for฀ a฀ modular฀ mobility enhancement฀ system฀ is฀ presented.฀ The฀ system฀ consists฀ of฀ one฀ or฀ multiple mobile฀robotic฀platforms฀and฀a฀set฀of฀user฀defined฀application฀modules,฀such฀as chairs,฀tables฀multifunctional฀chairs฀etc.฀All฀these฀modules฀can฀be฀inexpensive everyday฀life฀items,฀that฀become฀functional฀when฀a฀mobile฀platform฀is฀properly attached฀to฀them.฀This฀way฀the฀system฀can฀act฀as฀a฀wheelchair,฀as฀a฀carrier of฀objects,฀or฀even฀as฀a฀walker.฀∗

1 Introduction With the increase in the number of senior citizens, there is a growing demand for mobility aids. Older people have problems moving themselves, handling or moving objects and at the same time operating complicated devices. Therefore mobility aids with intelligent capabilities become a necessity. Several robotic, intelligent wheelchairs have been proposed demonstrating navigation, manipulation and transportation capabilities. Examples include Rolland [1] , TAO [2], OMNI [3], NavChair [4] and many other. However existing systems are still difficult to operate, to adjust to the needs of the individual user and are very costly. At the same time research on multi-agent systems has provided both principles for the construction of complex systems involving multiple agents and mechanisms for the coordination of independent agent behaviours. In this paper we intend to apply well-analysed concepts from multi-agent and multi-robot systems research, to create a system architecture for a modular mobility enhancement aid. This is a new and very demanding application ∗

This work was partially supported by the Specific Targeted Research Project MOVEMENT co-funded by INFSO DG of the European Commission (contract number 511670)

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 15–26, 2006. © Springer-Verlag Berlin Heidelberg 2006

16

G.฀Lidoris฀and฀M.฀Buss

domain฀for฀multi-agent฀systems฀and฀besides฀presenting฀the฀proposed฀approach we฀intend฀to฀expose฀its฀specific฀requirements. The฀system฀can฀consist฀of฀one฀or฀multiple฀mobile฀robotic฀platforms฀and฀a set฀of฀user฀defined฀application฀modules,฀such฀as฀chairs,฀tables,฀multifunctional chairs฀ etc.฀ All฀ these฀ modules฀ can฀ be฀ inexpensive฀ everyday฀ life฀ items,฀ that become฀ functional฀ when฀ a฀ robotic฀ mobile฀ platform฀ is฀ properly฀ attached฀ to them.฀This฀way฀the฀system฀can฀act฀as฀a฀wheelchair,฀as฀a฀carrier฀of฀objects, or฀even฀as฀a฀walker.฀Also฀multiple฀levels฀of฀autonomy฀are฀supported฀by฀the system฀architecture,฀from฀manual฀operation฀to฀autonomous฀functionality.฀Such a฀modular฀system,฀addresses฀many฀diverse฀needs฀and฀can฀be฀used฀by฀people with฀different฀mobility฀problems.฀Its฀scalable฀multi-agent฀architecture฀allows it฀to฀be฀deployed฀for฀the฀home฀care฀of฀an฀individual,฀as฀well฀as฀in฀large฀health care฀institutions.

2 Related Work Several approaches have been proposed in the multi-agent community to coordinate agent actions. Distributed behaviour-based architectures have been developed, like ALLIANCE [5] and [6], which were capable of dealing with complex tasks but lacked on scalability and robustness. Other approaches address multi-robot cooperation through negotiation, such as M+[7], Traderbots [8], MURDOCH [9] and numerous others. Market-based architectures are winning popularity and already a large amount of literature relevant to the field exists. An interesting survey of market based multi-robot coordination can be found in [10]. Our approach uses a negotiation mechanism for task allocation, as well as a behavioural-based layer for the execution of complex tasks. This way the robustness of market-based mechanisms is combined with the ability of behaviour-based approaches to deal with tasks of increased complexity. A contribution to the challenges of multi-agent systems research as described in [10] is made, by introducing a new real-world application domain where long-term, reliable and robust operation of human, robot teams can be demonstrated.

3 Concept of System Architecture As we mentioned above our application domain introduces several special requirements. We are dealing with a safety critical domain, where we have to ensure the continuous functionality of our system. Direct task allocation and supervision of system’s decisions by the user must be supported. Tasks arrive in an unpredicted manner, from several user interfaces. Finally new modules can be inserted to and removed from the system at any time.

A฀Multi-agent฀System฀Architecture฀for฀Modular฀Robotic฀Mobility฀Aids

17

The฀ system฀ architecture฀ is฀ responsible฀ for฀ the฀ coordination฀ of฀ modules consisting฀the฀system.฀It฀shall฀enable฀the฀user฀dependent฀usage฀of฀the฀modules฀and฀has฀the฀task฀of฀controlling฀the฀interconnection฀of฀modules฀and฀their behaviour.฀ More฀ specifically฀ we฀ present฀ a฀ Multi-Agent฀ System฀ architecture which฀is฀responsible฀for฀receiving฀high-level฀commands฀from฀the฀user฀interface,฀decomposes฀them฀and฀delegates฀sub-tasks฀to฀the฀appropriate฀application modules.฀These฀modules฀are฀dynamically฀selected฀according฀to฀utility฀functions,฀which฀are฀created฀using฀performance฀indicators฀and฀task฀requirements. In฀the฀formal฀multi-robot฀task฀allocation฀(MRTA)฀framework฀introduced฀in [11],฀our฀system฀addresses฀”Single-task฀robots,฀single฀robot฀tasks,฀instantaneous฀ assignment”฀ (ST-SR-IA).฀ Furthermore฀ our฀ system฀ continuously฀ runs background฀tasks฀such฀as฀managing฀module฀power฀levels,฀performing฀conflict resolution฀between฀system฀modules฀during฀task฀executions฀and฀finally฀enabling the฀system’s฀autonomous฀functionalities฀by฀making฀use฀of฀a฀behaviour-based task฀model. Component฀selection฀for฀task฀allocation฀is฀performed฀by฀using฀a฀simple distibuted฀first-price฀one฀round฀auction฀protocol,฀similar฀to฀the฀Contract-Net protocol฀[12].฀The฀progress฀of฀task฀execution฀is฀monitored฀by฀the฀winner฀and if฀no฀progress฀is฀made,฀then฀the฀task฀is฀re-assigned฀through฀a฀new฀negotiation. This฀is฀a฀very฀important฀fault-prevention฀functionality฀of฀the฀system. 3.1฀ Agent฀Model Taking฀into฀account฀the฀distributed฀nature฀and฀complexity฀of฀the฀application domain,฀we฀decided฀to฀split฀the฀control฀of฀our฀system฀amongst฀a฀number฀of฀deliberative฀agents.฀Each฀agent฀is฀capable฀of฀reasoning,฀according฀to฀the฀classical hybrid฀architecture฀paradigm฀[13].฀At฀the฀same฀time,฀by฀distributing฀control to฀a฀number฀of฀agents,฀we฀achieve฀robustness฀and฀scalability฀gains.฀Therefore our฀system฀can฀be฀considered฀as฀an฀extension฀of฀the฀layered฀approach฀[14], [15],฀[16]฀to฀a฀multi-agent฀domain,฀exploiting฀advantages฀of฀both฀centralised and฀distributed฀approaches. The฀general฀internal฀structure฀of฀a฀system฀agent฀is฀illustrated฀in฀Figure 1.฀It฀consists฀of฀three฀parts.฀The฀first฀one฀is฀responsible฀for฀communicating with฀other฀agents฀and฀handling฀communication฀requests.฀The฀second฀part฀is a฀scheduler,฀which฀receives฀task฀requests,฀creates฀a฀queue฀and฀schedules฀their execution฀from฀the฀third฀layer.฀The฀task฀execution฀layer฀is฀responsible฀for฀the agent’s฀ actions,฀ which฀ according฀ to฀ each฀ agent’s฀ role฀ can฀ vary฀ from฀ simple communication฀acts฀to฀movements,฀for฀our฀physical฀agents. By฀ using฀ such฀ an฀ internal฀ organization,฀ we฀ are฀ able฀ to฀ deal฀ with฀ coordination฀and฀cooperation฀issues฀that฀arise฀between฀our฀agents.฀Our฀system becomes฀capable฀of฀having฀physical฀agents฀that฀respond฀quickly฀to฀dynamic events฀e.g.฀collisions,฀while฀at฀the฀same฀time฀producing฀and฀executing฀complex฀strategies฀for฀achieving฀multiple฀goals.฀This฀is฀due฀to฀the฀fact฀that฀each system฀component฀is฀able฀to฀schedule฀its฀own฀actions฀according฀to฀its฀internal capabilities฀and฀state.

18

G.฀Lidoris฀and฀M.฀Buss communication

scheduling

task฀execution

Fig. 1. Internal structure of a system agent

3.2 Overview of the Proposed Approach The proposed system architecture consists of several software agents that are responsible for performing task decomposition, task allocation through negotiations as well as system monitoring and module management. It also contains physical agents which represent the modules that are available to the system at each time point. No limitations exist to the number of possible modules that can be inserted to the system, enabling it to be used in small setups with only one mobile base and a few application modules, as well as in large care institutions with an arbitrary number of mobile bases and application modules. In Figure 2 we present an overview of our architectural concept, for a setup where the available modules that can be docked with the platforms are chairs, tables, multifunctional chairs and walker modules. When the user issues a command via the input hardware of the User Interface, this command is integrated and preprocessed and then sent to a link layer. This link layer between the Multi-Agent System and the User Interface is responsible for transforming the issued command, into an appropriate form in order to be transmitted to the control architecture. Important information being encapsuled are: • • • •

Level of autonomy in which the system is currently operating Type of operation to be performed by the system Which application module issues the command Command specific parameters e.g. goal position coordinates, in the autonomous functionality or speeds to be executed by the mobile platform, read by the input device for the manual functionality

This high-level command is transmitted from the user interface link layer, to the system. In order to make clear how a command is handled by the system, we have to describe each agent’s role in more detail.

A฀Multi-agent฀System฀Architecture฀for฀Modular฀Robotic฀Mobility฀Aids

19

User฀Interface Link฀Layer High-level commands฀/฀notifications

System State Agent

Mediator Agent

Task Allocation Agent

Platforms

Chairs

Background Tasks Agent

Tables

MF-Chairs

Lifter฀/ Walkers Component Selection

Sensors

Hardware฀/ Actuators

Fig. 2. Overview of the proposed approach

Mediator Agent This multi-purpose agent is responsible for accepting commands from the user interface’s Link Layer, initiating task execution procedures and querying for specific module information. Finally, after the task is executed, it informs the interface for changes. It also receives safety notifications from the Background Tasks Agent and initiates actions accordingly, giving them priority. System State Agent It is responsible for providing management and naming services for the rest of the system and monitoring the state of the system’s application modules. It informs, upon request, the rest of the system about the available modules and their state. The possible states of each application module depend on its functionality. For each robotic platform its state is a tuple < d, t > showing if its docked with an application module and which task it is currently undertaking. For an application module its state shows if it is docked with a platform. Task Allocation Agent If it is assigned a task by the Mediator Agent, then decomposes it, so that the necessary components are chosen and initiates a negotiation procedure similar to Contract Net Protocol, in order to find the best module to perform the task. After it receives bids from the Component Agents it chooses a winner (the highest bidder) and assigns it the task. When it receives information from the Component Agent that the task was performed

20

G.฀Lidoris฀and฀M.฀Buss

it฀informs฀the฀Mediator.฀In฀the฀case฀of฀passive฀modules฀like฀tables฀and฀chairs we฀don’t฀have฀a฀task฀assignment฀but฀the฀winner฀is฀the฀module฀that฀is฀more appropriate฀for฀a฀given฀task.฀For฀example฀if฀we฀want฀a฀given฀platform฀to฀go and฀bring฀a฀table,฀then฀the฀negotiation฀finds฀the฀most฀suitable฀one฀(e.g.฀the one฀that฀is฀closer)฀and฀the฀Mediator฀is฀informed,฀so฀that฀it฀can฀assign฀the฀task to฀the฀platform. Background฀Tasks฀Agent฀ Monitors฀system฀components฀and฀issues฀commands to฀the฀Mediator฀Agent,฀in฀order฀to฀deal฀with฀safety฀issues.฀For฀example,฀we want฀to฀avoid฀the฀risk฀of฀power฀failure฀of฀our฀mobile฀robotic฀platforms฀during task฀execution.฀Therefore฀the฀Background฀Tasks฀Agent฀constantly฀checks฀the power฀level฀of฀all฀mobile฀platforms.฀If฀a฀platform’s฀power฀level฀is฀found฀less฀than an฀acceptable฀minimum,฀then฀a฀Charge฀command฀is฀issued฀to฀the฀Mediator and฀ the฀ platform฀ in฀ question฀ charges฀ its฀ batteries.฀ It฀ also฀ can฀ be฀ used฀ for setting฀out฀alerts฀for฀pre-programmed฀system฀maintenance฀operations,฀module failures฀ etc.฀ Generally฀ the฀ commands฀ issued฀ by฀ this฀ agent฀ to฀ the฀ Mediator Agent฀have฀higher-priority฀as฀normal฀commands. Component฀ Agents Each฀ of฀ these฀ agents฀ is฀ created฀ when฀ a฀ new฀ hardware module฀is฀inserted฀to฀the฀system฀and฀it฀represents฀this฀module฀to฀the฀MAS. They฀are฀involved฀in฀the฀component฀selection฀negotiation฀procedure.฀If฀they are฀assigned฀a฀task฀they฀initiate฀a฀pre-programmed฀set฀of฀behaviours฀in฀order to฀accomplish฀it.฀The฀task฀description฀must฀contain฀information฀about฀its฀type and฀also฀other฀specific฀information฀regarding฀possible฀parts฀of฀the฀task.฀These agents฀ have฀ control฀ over฀ the฀ hardware฀ they฀ embody฀ and฀ pass฀ appropriate commands฀ to฀ it,฀ by฀ sequencing฀ high฀ level฀ movement฀ descriptions฀ into฀ low level฀commands.฀When฀the฀level฀of฀autonomy฀is฀set฀to฀manual,฀they฀just฀pass low฀level฀commands฀that฀are฀received฀from฀the฀UI฀to฀the฀hardware. Passive฀modules฀such฀as฀chairs,฀tables,฀etc,฀do฀not฀perform฀tasks฀themselves but฀need฀to฀be฀docked฀with฀a฀platform.฀They฀take฀part฀to฀the฀negotiations initiated฀by฀the฀Task฀Allocation฀Agent฀when฀we฀need฀to฀find฀the฀most฀suitable module,฀in฀order฀to฀minimize฀resource฀usage.฀For฀example฀let’s฀suppose฀that the฀user฀wants฀a฀given฀platform฀to฀bring฀a฀table.฀Table฀Component฀Agents compute฀their฀utilities฀based฀on฀their฀distance฀from฀the฀platform.฀This฀is฀useful especially฀for฀home฀use฀scenarios,฀where฀only฀one฀platform฀exists฀but฀many tables฀and฀chairs,฀which฀are฀really฀inexpensive. Platform฀Component฀Agents฀rely฀on฀a฀behaviour฀based฀system.฀Behaviours are฀seen฀as฀condition฀/฀action฀pairs฀where฀conditions฀are฀logical฀expressions over฀the฀inputs฀and฀actions฀are฀themselves฀behaviours.฀In฀both฀cases,฀a฀behaviour฀is฀a฀directed฀acyclic฀graph฀of฀arbitrary฀depth.฀The฀leaves฀of฀the฀graphs are฀the฀behaviour฀type’s฀respective฀outputs. This฀notion฀of฀behaviour฀is฀also฀consistent฀with฀that฀laid฀out฀in฀[17]฀and [18].฀Behaviours฀are฀nested฀at฀different฀levels:฀selection฀among฀lower-level฀behaviours฀can฀be฀considered฀a฀higher-level฀behaviour,฀with฀the฀overall฀agent behaviour฀considered฀a฀single฀”do-the-task”฀behaviour.฀These฀resulting฀highlevel฀tasks฀are฀also฀interconnected฀with฀each฀other฀and฀all฀together฀consist

A฀Multi-agent฀System฀Architecture฀for฀Modular฀Robotic฀Mobility฀Aids

21

our฀architecture’s฀task฀model.฀Figure฀3฀illustrates฀the฀interconnection฀between these฀tasks฀and฀how฀they฀are฀decomposed฀to฀simpler฀behaviours. robot+ module to฀target

robot฀to target

dock with฀module

...

no docked with฀module

no

yes

path planning no user confirmation

...

platform localization

target reached yes stop

path execution

no

obstacle detection

yes

obstacle avoidance

no target reached

path tracking

yes stop

Fig. 3. Task model

When a robotic platform is assigned a task, it becomes unavailable for further tasks. This means that it does not take part in negotiations. Instead it monitors task execution and if no progress is made, it contacts the Task Allocation Agent to re-assign the task. Monitoring of task execution can simply be made by recalculating the utility of the task, in a given time space. If the utility remains unchanged or decreases below a given threshold, task execution must be terminated and the task has to be re-allocated. In the following section we will give a definition of task utility.

4 Utility for Task Assignment Component selection based on performance measures and task requirements is of great importance for our system. Imagine use cases where multiple modules of the same functionality exist. For example a user may have more than one table modules, which are really inexpensive, in his house. The system must then be in position to choose the best module for a given task e.g. in a fetch and carry scenario, where the mobile platform has to dock with a table, reach a goal location and return to the user, the best module in order to minimize resource usage would be the one closer to the goal location. We are dealing here with an instance of the Optimal Assignment Problem (OAP) [19] that was

22

G.฀Lidoris฀and฀M.฀Buss

originally฀studied฀in฀game฀theory.฀A฀formulation฀of฀this฀problem฀for฀robotics can฀be฀found฀in฀[20]. To฀address฀this฀problem฀we฀use฀the฀concept฀of฀utility.฀The฀idea฀is฀that฀each individual฀can฀internally฀estimate฀the฀value฀of฀executing฀an฀action.฀This฀estimation฀results฀a฀utility฀function฀which฀is฀highly฀domain฀and฀task฀dependent. An฀instructive฀definition฀of฀utility฀follows฀[11].฀It฀is฀assumed฀that฀each฀agent฀/ hardware฀module฀is฀capable฀of฀estimating฀its฀fitness฀for฀every฀task฀of฀which฀it is฀capable.฀This฀estimation฀takes฀into฀account฀two฀task-฀and฀robot-dependent factors: • expected฀quality฀of฀task฀execution,฀given฀the฀method฀and฀equipment฀to฀be used • expected฀resource฀cost,฀given฀the฀spatio-temporal฀requirements฀of฀the฀task (฀e.g.฀the฀length฀of฀the฀path฀to฀a฀target฀etc) So฀given฀a฀module฀M฀and฀a฀task฀T,฀if฀M฀is฀capable฀of฀executing฀T,฀then฀we฀can define฀QM T (p)฀and฀CM T (d)฀as฀the฀quality฀and฀cost,฀respectively,฀expected฀to result฀from฀the฀execution฀of฀T฀by฀M.฀Quality฀of฀task฀execution฀can฀be฀seen฀as฀a function฀of฀application฀module฀specific฀parameters,฀denoted฀by฀p.฀An฀example can฀be฀the฀power฀level฀of฀the฀battery฀of฀a฀robotic฀mobile฀platform.฀If฀the฀power levels฀are฀low฀then฀the฀platform฀will฀have฀to฀charge฀soon,฀therefore฀becoming temporary฀unavailable฀to฀execute฀tasks.฀Cost฀can฀be฀seen฀as฀a฀function฀of฀the path฀length฀from฀the฀goal฀position.฀A฀non-negative฀utility฀measure฀can฀now be฀defined. UMT =฀

QMT (p)฀− CMT (d)฀if฀M฀is฀capable฀of฀executing฀T฀and฀QMT (p)฀>฀CMT (d) 0 otherwise

Each฀agent’s฀objective฀is฀to฀maximize฀utility.฀Utility฀maximization฀is฀equivalent฀ in฀ our฀ case฀ with฀ effective฀ resource฀ usage฀ and฀ therefore฀ better฀ system performance.฀ We฀ can฀ incorporate฀ learning฀ into฀ utility฀ estimation,฀ therefore adapting฀the฀system’s฀performance฀to฀the฀user฀behaviour฀and฀needs.

5 Experimental Design To validate our system’s functionality and take some first simulation results, we have created an experimental testbed. For the implementation of our agents we decided to use JADE (Java Agent Development Framework) as a middleware. JADE is a software development framework aimed at developing multi-agent systems and applications conforming to FIPA (Foundation for Intelligent Physical Agents) standards for intelligent agents. Using JADE as a middleware basically means that we take advantage of its communication structures and naming directory services, in order to develop our experimental set-ups effectively and with respect to the standards of the agent research community. This way we are able to analyse interactions between agents, as well as decision-making procedures. For the simulations we have used the

A฀Multi-agent฀System฀Architecture฀for฀Modular฀Robotic฀Mobility฀Aids

23

Player/Stage฀simulator.฀The฀structure฀of฀our฀experimental฀implementation฀is illustrated฀in฀Figure฀4. From฀the฀discussion฀above฀it฀has฀been฀made฀evident฀that฀a฀large฀number of฀possible฀system฀states฀and฀setups฀can฀exist,฀making฀it฀almost฀impossible to฀ simulate฀ all฀ of฀ them.฀ For฀ that฀ reason฀ we฀ had฀ to฀ choose฀ some฀ scenarios to฀ be฀ simulated฀ which฀ would฀ test฀ the฀ task฀ allocation฀ and฀ fault฀ prevention mechanisms฀of฀our฀architecture.฀The฀following฀three฀were฀chosen: • one฀ mobile฀robotic฀platform,฀two฀different฀types฀of฀application฀modules and฀multiple฀tasks฀to฀be฀performed฀autonomously • multiple฀mobile฀robotic฀platforms,฀two฀different฀types฀of฀application฀modules฀and฀one฀single฀task฀to฀be฀performed฀autonomously,฀with฀robot฀malfunction฀and฀task฀reallocation • multiple฀mobile฀robotic฀platforms,฀two฀different฀types฀of฀application฀modules฀and฀multiple฀tasks฀to฀be฀performed฀autonomously Results฀from฀the฀simulations฀are฀very฀encouraging,฀with฀the฀system฀assinging฀tasks฀to฀modules฀in฀an฀efficient฀manner.฀Even฀when฀task฀execution฀was interrupted฀on฀purpose฀by฀us,฀tasks฀were฀re-allocated฀succesfully.฀Agent฀communication฀is฀optimal฀posing฀almost฀no฀communication฀overhead.฀More฀specif-

JADE฀Platform UI-Agent

System State Agent

Mediator Agent

Task Deocomposition Agent

Platforms PlayerClient PositionPlayerDevice LaserPlayerDevice

Background Tasks Agent

Chairs PlayerClient PositionPlayerDevice LaserPlayerDevice

.....

Stage

Player

GUI

Fig. 4. Experimental implementation

24

G.฀Lidoris฀and฀M.฀Buss

ically,฀ agent฀ interactions฀ are฀ shown฀ in฀ Figure฀ 5฀ for฀ the฀ task฀ of฀ driving฀ an initially฀ undocked฀ chair฀ (ch1)฀ to฀ a฀ given฀ destination฀ (target).฀ FIPA฀ Agent Communication฀Language฀performatives฀are฀used.

UI-C฀/ Link฀Layer

Mediator Agent

System State Agent

Task Allocation Agent

Background Tasks Agent

Component Agents

inform(ch1,auto,goto(target)) query-if(ch1,”docked”) inform(ch1,null) inform(”performing฀docking”) request(ch1,dock) cfp(ch1,dock) propose(bid) accept(pl1) inform(”done”) inform(”done”,pl1) inform(”done฀docking”) request(pl1,goto(target)) inform(”done”) inform(”done”)

Fig. 5. Agent interactions while performing a complex task

At first Mediator Agent checks with the System State Agent whether the requested chair is docked with a mobile platform or not. The answer is negative, so a module selection procedure is iniated by the Task Allocation Agent in order to find the most suitable platform (pl1 in Figure). When it is found, it is ordered to dock with the chair and then reach the defined destination. The User Interface is appropriately updated by the Mediator while task execution is progressing. We can see that only few messages are used to accomplish a complicated task.

6 Conclusions We have defined a Multi-Agent System architecture for modular mobility aids in terms of software architecture, modelling and requirements. We have shown how components interact with each other in order to achieve effective functionality of the whole system. A quantification of each module’s ability to undertake a task was also presented and used in a negotiation based module selection mechanism. Throughout this effort, our main concern was simplicity

A฀Multi-agent฀System฀Architecture฀for฀Modular฀Robotic฀Mobility฀Aids

25

and฀functionality,฀given฀the฀safety฀critical฀nature฀of฀the฀application฀domain. Finally,฀a฀complete฀experimental฀testbed฀has฀been฀presented฀which฀has฀been used฀to฀validate฀our฀concepts. Since฀this฀is฀a฀new฀and฀demanding฀application฀domain฀much฀work฀remains on฀the฀architecture.฀We฀need฀to฀develop฀complete฀performance฀measures฀and enhance฀the฀system฀with฀learning฀mechanisms฀that฀will฀allow฀it฀to฀adapt฀its performance฀to฀the฀user’s฀behaviour.฀We฀also฀have฀to฀look฀deeper฀into฀mechanisms฀for฀fault฀prevention฀and฀system฀recovery.฀Finally,฀tests฀on฀physically embodied฀robots฀will฀be฀made.

References 1. Mandel C, Huebner K, Vierhuff T (2005) Towards an Autonomous Wheelchair: Cognitive Aspects in Service Robotics. In Proceedings of Towards Autonomous Robotic Systems (TAROS 2005): 165-172 2. Gomi T, Griffith A (1998) Developing intelligent wheelchairs for the handicapped. In: Assistive Technology and Artificial Intelligence: Applications in Robotics, User Interfaces and Natural Language Processing 3. Hoyer H (1999) The OMNI-Wheelchair - State of the Art. In: Proceedings of Conference on Technology and Persons with Disabilities, Los Angeles 4. Simpson R et al (1998) Navchair: An assistive wheelchair navigation system with automatic adaptation. In: Assistive Technology and Artificial Intelligence: Applications in Robotics, User Interfaces and Natural Language Processing 5. Parker L (1998) Alliance: An architecture for fault tolerant multirobot cooperation. In: IEEE Transactions on Robotics and Automation, 14(2): 220-240 6. Simmons R, Smith T, Dias M B, Goldberg D, Hershberger D, Stentz A, Zlot R (2002) A layered architecture for coordination of mobile robots. In: Multi-Robot Systems: From Swarms to Intelligent Automata 7. Botelho S, Alami R (1999) M+: A schema for multi-robot cooperation through negotiated task allocation and achievement. In: Proceedings of the IEEE International Conference on Robotics and Automation: 1234-1239 8. Zlot R, Stentz A, Dias M B, Thayer S (2002) Multi-robot exploration controlled by a market economy. In: Proceedings of the IEEE International Conference on Robotics and Automation: 3016-3023 9. Gerkey B, Mataric M J (2002) Sold!: Auction methods for multi-robot coordination. In: IEEE Transactions on Robotics and Automation, 16(5):758-768 10. Dias M B, Zlot R M, Kalra N, Stentz A (2005) Market-Based Multirobot Coordination: A Survey and Analysis. Tech report CMU-RI-TR-05-14, Robotics Institute, Carnegie Mellon University 11. Gerkey B, Mataric M J (2004) A formal analysis and taxonomy of task allocation in multi-robot systems. In: Intl. Journal of Robotics Research, 23(9):939954 12. Smith R G (1980) The Contract Net Protocol: high level communication and control in a distributed problem solver. In: IEEE Transactions on Computers, C-29(12) 13. Gat E (1997)On three-layer architectures. In: Artificial Intelligence and Mobile Robots. MIT AAAI Press

26

G.฀Lidoris฀and฀M.฀Buss

14.฀ Bonasso฀ P,฀ Kortenkamp฀ D฀ (1995)฀ Characterizing฀ an฀ architecture฀ for฀ intelligent,฀reactive฀agents.฀In:฀Working฀Notes,฀AAAI฀Spring฀Symposium฀on฀Lessons Learned฀from฀Implemented฀Software฀Architectures฀for฀Physical฀Agents:฀29-34 15.฀ Gat฀ E.฀ (1991)฀ Integrating฀ planning฀ and฀ reacting฀ in฀ a฀ heterogeneous฀ asynchronous฀architecture฀for฀mobile฀robots.฀In:฀SIGART฀Bulletin฀2 16.฀ Connell฀J฀(1992)฀A฀hybrid฀architecture฀applied฀to฀robot฀navigation.฀In:฀Proceedings฀of฀IEEE฀ICRA฀’92,฀Nice฀France 17.฀ Stone฀P฀(1998)฀Layered฀learning฀in฀multi-agent฀systems.฀Phd฀Thesis,฀Carnegie Mellon฀University 18.฀ Mataric฀ M฀ (1994)฀ Interaction฀ and฀ intelligent฀ behavior.฀ Phd฀ Thesis,฀ MIT฀ AI Lab 19.฀ Gale฀D฀(1960)฀The฀theory฀of฀linear฀economic฀models.฀McGraw-Hill฀Book฀Company,฀New฀York 20.฀ Gerkey฀ B฀ (2003)฀ On฀ multi-robot฀ task฀ allocation.฀ Phd฀ Thesis,฀ University฀ of South฀California,฀Los฀Angeles

How to Construct Dense Objects with Self-Reconfigurable Robots Kasper Stoy University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark [email protected]

Summary. Self-reconfigurable robots can change their shape by rearranging the modules from which they are built. This self-reconfiguration process has proven difficult to control, because it involves control of a distributed system of autonomous, but mechanically coupled modules connected in time-varying ways. In this work we address this control problem and specifically look at how to construct dense configurations. The basic idea behind our solution is that holes inside the desired configuration and modules outside attract each other. The effect is that holes propagate to the surface of the desired configuration and modules outside move in to fill them. This basic solution is augmented with a scaffold and gradient-based algorithm which ensures that the robot does not get stuck in local minima or get disconnected during self-reconfiguration. The approach is evaluated in simulation and we find that the self-reconfiguration process always converges and the time to complete a configuration scales approximately linearly with the number of modules. This is, to the best of our knowledge, the first time this result has been obtained for dense configuration.

1 Introduction Reconfigurable robots are robots built from modules. These robots can be reconfigured by changing the way these modules are connected. If a robot is able to change the way the modules are connected autonomously, the robot is a self-reconfigurable robot. Self-reconfigurable robots are versatile because they can adapt their shape to fit the task. Self-reconfigurable robots are also robust, because if a module fails, it can be ejected from the system and be replaced by a spare module. Potential applications for such robots include search and rescue missions, planetary exploration, building and maintaining structures in space, and entertainment. In order to realize these applications, research has to focus both on the development of the hardware of self-reconfigurable robots and on their control. The latter is the focus of this paper.

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 27–37, 2006. © Springer-Verlag Berlin Heidelberg 2006

28

K. Stoy

Fig. 1. This figure shows a simulated selfreconfigurable robot transforming from an initial random configuration to a configuration resembling a rhinoceros.

Fig. 2. This shows how a CAD model is represented using a set of overlapping bricks. This representation is used to drive the self-reconfiguration algorithm.

One of the fundamental control problems of self-reconfigurable robots is control of the self-reconfiguration process: the process by which the robot changes from one shape to another. An example of a self-reconfiguration process is shown in Figure 1, where a simulated self-reconfigurable robot reconfigures from a random initial configuration to a configuration resembling a rhinoceros. The method proposed here consists of two steps. In the first step a 3D CAD model, representing the desired configuration, is transformed into a set of overlapping bricks as shown in Figure 2. The set of overlapping bricks is just a space efficient way to represent the volume described by the CAD model. The second step is the actual self-reconfiguration process. The desired configuration is grown from a seed using the volume described by a set of overlapping bricks to guide the growth process. The growth process consists of two parallel-running processes: the scaffold and the fill process. The scaffold process is responsible for growing a porous scaffold which guarantees convergence and connectivity of the system. The fill process is responsible for filling in the pores of the scaffold and in this way create a dense configuration. The solution to the self-reconfiguration problem presented here is, to the best of our knowledge, the first self-reconfiguration algorithm which can be used to build dense configurations while only mildly restricting the class of configurations that can be constructed. At a more general level the algorithm we present provides a scalable, systematic, and convergent solution to the self-reconfiguration problem.

How฀to฀Construct฀Dense฀Objects

29

The฀paper฀is฀structured฀as฀follows.฀In฀Section฀2,฀the฀related฀work฀is฀described.฀In฀Section฀3,฀we฀describe฀the฀algorithm฀we฀use฀to฀transform฀the฀3D CAD฀model฀into฀a฀set฀of฀overlapping฀bricks.฀Section฀4฀describes฀how฀the฀representation฀of฀the฀desired฀configuration฀interacts฀with฀the฀other฀components of฀the฀system฀to฀realize฀the฀self-reconfiguration฀algorithm.฀Finally,฀in฀Section 6,฀the฀system฀is฀evaluated฀using฀a฀simulated฀self-reconfigurable฀robot฀where scalability฀and฀convergence฀properties฀are฀documented.

2 Related Work The self-reconfiguration problem is: given a start configuration, possibly a random one, how to move the modules in order to arrive at the desired final configuration. It is computational intractable to find the optimal solution (see [4] for a discussion). Therefore, self-reconfiguration planning and control are open to heuristic-based methods. Chirikjian, Pamecha, and Chiang propose iterative improvement as a way around the computational explosion [4, 9, 3]. The idea is to find a suboptimal sequence of moves, which leads from the initial configuration to the final configuration. This suboptimal sequence of moves can then be optimized using local searches. Rus et al. simplify the planning problem by using a specific intermediate chain configuration that is easy to configure into and out of [11]. Kotay, Unsal, and Prevas propose a hierarchical planner [7, 15, 10]. At the highest level, some of the motion constraints of the underlying hardware are abstracted away to facilitate efficient planning. Based on these high-level plans, the lower level then produces the detailed sequence of actions. Another approach, suggested by Kotay [7], is to use meta-modules consisting of a small number of modules. The idea is that by planning at the meta-module level there are no or few motion constraints. This makes the planning tractable. On the other hand, meta-modules consist of several modules, making the granularity of the robot larger. A related approach is to make sure that the robot maintains a uniform scaffolding structure, facilitating planning [14]. Butler implemented the distributed Pacman algorithm on the Crystalline robot [2]. The Pacman algorithm was improved and implemented on the Telecubes by Vassilvitskii [16]. These two robots have few motion constraints, making the planning problem easier. The approaches reviewed so far are deterministic and planning-based. The following approaches are not deterministic, but are much simpler, because a planning process is not needed. In early work, the idea was to use local rules as far as possible and then add randomness to deal with the problems that could not be solved using local rules. This is, for instance, true for the work on Fracta [8] and also later work on other robots [18]. The problem tended to be that even though the robot often ended up in the desired configuration, it was not always so. We refer to this as the problem of convergence. This problem was also present in the work of Yim et al [17]. However, the chance of reaching

30

K. Stoy

the final configuration was higher in this work, because local communication was used to get some idea about the global shape at the local level. There are two solutions to the problem of convergence. One solution, proposed by Bojinov et al. [1], is not to focus on a specific configuration. Instead, the idea is to build something with the right functionality. Using this approach it is acceptable if a few modules are stuck as long as the structure maintains its functionality. Alternatively, Jones et al. insist on a specific configuration, but achieve convergence by enforcing a specific sequence of construction [6]. This paper is based on the work described in [12, 13], but extends this work by handling dense configuration.

3 From CAD Model to Representation It is difficult and time-consuming to hand-code local rules that result in a desired configuration being assembled. Therefore, we need an automatic way of transforming a human-understandable description of a desired configuration into a representation we can use for control. In our system, the desired configuration is specified using a connected three-dimensional volume in the VRML 1997 or Wavefront .obj file format, which are industry standards produced by most CAD programs. We transform the model into a representation whose size scales with the complexity of the three-dimensional model. The representation consists of a set of overlapping bricks of different sizes which approximates the input shape. The choice to use bricks is fairly arbitrary and other basic geometric shapes, such as spheres or cones, could be used as well. The representation is automatically generated from a CAD model. In the first step, the algorithm builds an approximation of the CAD model using unit-bricks. A unit-brick is placed at a given coordinate, if its coordinate is contained in the model. This is true if a line from this coordinate to infinity intersects the CAD model an uneven number of times. Afterwards, the algorithm optimizes the representation by repeatedly combining side-sharing bricks and by eliminating bricks which are contained in other bricks. Figure 3 shows a simple example of two overlapping bricks and their representation.

4 From Representation to Self-Reconfiguration Algorithm The challenge is to develop a distributed control algorithm which can reconfigure the robot from an initial random configuration to the desired configuration as described by the representation. The basic idea behind our solution is that holes inside the desired configuration and modules outside attract each other. The effect is that holes propagate to the surface of the desired configuration and modules outside move in to fill them. The end result is that the volume

How฀to฀Construct฀Dense฀Objects

B

31

A

A: (0,0,1) → (3,1,2) B: (0,0,0) → (2,2,2) Fig.฀ 3.฀ This฀ figure฀ shows฀ two฀ overlapping฀ bricks฀ and฀ their฀ representation฀ (left). In฀ this฀ example฀ eight฀ unit-cubes฀ are฀ placed฀ in฀ the฀ box฀ labeled฀ B฀ and฀ one฀ in฀ the box฀ labeled฀ A.฀ These฀ unit-cubes฀ merge฀ until฀ only฀ two฀ cubes฀ are฀ left.฀ One฀ cube containing฀B฀with฀corner฀coordinates฀(0,0,0)฀and฀(2,2,2)฀and฀one฀containing฀A฀and overlapping฀B฀with฀corner฀coordinates฀(0,0,1)฀and฀(3,1,2).฀This฀figure฀also฀shows฀a basic฀unit฀of฀scaffold฀consisting฀of฀four฀modules฀(middle)฀and฀a฀lattice฀consisting of฀8฀scaffold฀elements฀(right).฀A฀lattice฀built฀from฀these฀scaffold฀elements฀is฀porous allowing฀modules฀to฀pass฀through฀it.

of฀the฀desired฀configuration฀is฀filled฀implying฀that฀the฀desired฀configuration has฀been฀reached. 4.1฀ Gradients Holes฀ and฀ modules฀ use฀ gradients฀ to฀ attract฀ each฀ other.฀ Gradients฀ provide modules฀with฀information฀about฀relative฀distances฀and฀directions฀in฀the฀configuration.฀A฀gradient฀is฀created฀by฀a฀source฀module฀which฀sends฀out฀an฀integer,฀representing฀the฀concentration,฀to฀all฀neighbours.฀A฀non-source฀module calculates฀the฀concentration฀at฀its฀position฀by฀taking฀the฀minimum฀received value฀and฀adding฀one.฀This฀concentration฀is฀then฀propagated฀to฀all฀neighbours and฀so฀on.฀For฀example,฀if฀the฀source฀module฀sends฀out฀100,฀neighbours฀of฀this module฀will฀have฀a฀concentration฀of฀101,฀their฀neighbours฀102฀and฀so฀on. A฀module฀can฀locate฀the฀source฀by฀descending฀the฀gradient฀of฀concentration.฀However,฀if฀a฀module฀has฀to฀rely฀on฀the฀basic฀integer-based฀gradient฀to locate฀the฀source,฀it฀would฀have฀to฀move฀around฀randomly฀for฀a฀while฀to฀detect the฀direction฀of฀the฀gradient.฀In฀order฀to฀eliminate฀these฀unnecessary฀moves, we฀introduce฀a฀vector฀gradient฀which฀makes฀direction฀information฀available locally.฀The฀basic฀gradient฀implementation฀is฀extended฀with฀a฀vector฀indicating฀the฀local฀direction฀of฀the฀gradient.฀This฀vector฀is฀propagated฀by฀copying the฀ vector฀ from฀ the฀ neighbour฀ with฀ the฀ lowest฀ concentration฀ and฀ adding฀ a unit฀vector฀in฀the฀direction฀of฀this฀neighbour฀and฀renormalizing฀the฀result. We฀use฀local฀message฀passing฀to฀implement฀the฀gradient.฀A฀message฀takes one฀time฀step฀to฀travel฀between฀neighbours฀and฀therefore฀it฀can฀take฀many time฀steps฀for฀gradients฀to฀be฀propagated฀in฀the฀system.

32

K. Stoy

4.2 Building a Porous Scaffold Unfortunately, the basic gradient-based algorithm alone does not work, because the algorithm also has to ensure that the robot does not get stuck in local minima or get disconnected during self-reconfiguration. In order to address these problems we introduce a porous scaffold. In general, the set of positions contained in the scaffold Ps can be a subset of the positions in the set of positions in the desired configuration Pdc . However, in our implementation we use a fixed scaffold as shown in Figure 3. The algorithm that constructs the scaffold works as follows. All modules are initially connected in a random configuration and have a copy of the representation of the desired configuration. An arbitrary module is given an arbitrary coordinate contained in the desired configuration and the scaffold. This ensures that the seed module is not placed in a hollow part of the configuration. The seed module defines the coordinate system of the robot. Throughout the self-reconfiguration process all modules keep track of their position in this coordinate system using local communication. Note that internal localization is a simple problem in self-reconfigurable robots, because all modules are physically connected throughout the self-reconfiguration process as opposed to multi-robot systems where localization is a major challenge. If a module at random moves into a position next to a scaffold module and finds that this position is also part of the scaffold and the desired configuration, it changes to the scaffold state. In this way the scaffold is grown from the first scaffold module. The porous scaffold simplifies the reconfiguration problem, because it allows modules to move through the structure and thereby remove local minima from the configuration. The scaffold is also used to prevent modules from disconnecting from the robot. Modules in the system cannot move independently of each other, because they depend on each other for connection to the robot. The problem is then to keep the system connected while allowing wandering modules to move. In our solution scaffold modules emit a connection gradient and modules only move if they do not perturb the gradient. This is enough to guarantee that the system stays connected. Detailed rules and proofs were presented in [12]. The scaffold puts a mild constraint on the class of configuration where convergence is guaranteed. In our implementation the constraint is that the desired configuration can be covered with a set of overlapping 2x2x2 cubes. Any cube needs an overlap of at least four modules on the same face with neighbouring cubes. This constraint ensures that there is one connected scaffold in the desired configuration. This constraint prevents the algorithm from working with desired configurations containing sub-configurations that are one module thick. This constraint may be reduced or might even be removed if the scaffold is generated specifically for each configuration.

How฀to฀Construct฀Dense฀Objects

33

4.3฀ Self-Reconfiguration฀Algorithm We฀ can฀ now฀ implement฀ the฀ self-reconfiguration฀ algorithm฀ on฀ top฀ of฀ these support฀ algorithms.฀ The฀ idea฀ here฀ is,฀ as฀ mentioned,฀ to฀ 1)฀ propagate฀ holes toward฀modules฀outside฀the฀desired฀configuration฀and฀2)฀move฀modules฀outside the฀desired฀configuration฀towards฀holes฀in฀the฀desired฀configuration. A฀module฀can฀be฀in฀one฀of฀three฀states:฀wander,฀fill,฀and฀scaffold.฀Wander modules฀ are฀ outside฀ the฀ desired฀ configuration.฀ Fill฀ modules฀ are฀ inside฀ the desired฀configuration,฀but฀are฀not฀part฀of฀the฀scaffold.฀Finally,฀scaffold฀modules are฀inside฀the฀desired฀configuration฀and฀belong฀to฀the฀scaffold. Wander฀modules฀descend฀a฀hole฀gradient฀created฀by฀scaffold฀or฀fill฀modules to฀reach฀holes฀in฀the฀desired฀configuration.฀Wander฀modules฀also฀emit฀a฀wander gradient.฀Wander฀modules฀change฀to฀fill฀when฀their฀positions฀are฀contained฀in the฀desired฀configuration฀and฀are฀positioned฀next฀to฀a฀scaffold฀module.฀If,฀in addition,฀their฀positions฀are฀included฀in฀the฀scaffold,฀they฀change฀to฀scaffold. Fill฀modules฀climb฀the฀wander฀gradient,฀if฀and฀only฀if฀it฀does฀not฀take฀them out฀of฀the฀desired฀configuration.฀This฀means฀that฀filling฀modules฀move฀away from฀ wandering฀ modules฀ and฀ in฀ effect฀ propagate฀ holes฀ towards฀ the฀ surface and฀ the฀ wandering฀ modules.฀ Fill฀ modules฀ change฀ to฀ scaffold฀ if฀ they฀ move into฀a฀position฀contained฀in฀the฀scaffold.฀Both฀fill฀and฀scaffold฀modules฀emit the฀hole฀gradient฀if฀a฀neighbour฀position฀is฀unfilled฀and฀part฀of฀the฀desired configuration. The฀self-reconfiguration฀algorithm฀runs฀in฀parallel฀with฀the฀scaffold฀building฀algorithm.฀The฀two฀algorithms฀interact฀implicitly฀with฀each฀other:฀1)฀the scaffold฀is฀porous฀and฀therefore฀allows฀the฀self-reconfiguration฀algorithm฀to run.฀2)฀the฀self-reconfiguration฀algorithm฀moves฀modules฀to฀scaffold฀positions making฀the฀scaffold฀algorithm฀efficient.

5 Simulated System In our simulation, we use modules, which are more powerful than any existing hardware platforms, but do fall within the definition of a Proteo module put forward by Yim et al. [17]. The modules are assumed to be cubical and when connected they form a lattice structure. They have six hermaphrodite connectors and can connect to six other modules in the directions: east, west, north, south, up, and down. Modules directly connected to a module are referred to as neighbours. A module can sense whether there are modules in neighbouring lattice cells. In this implementation, we do not control the actuator of the connection mechanism, but assume that neighbour modules are connected and disconnected appropriately. A module can only communicate with its neighbours. It is able to rotate around neighbours and to slide along the surface of a layer of modules. Finally, we assume that direction vectors can be uniquely transformed from

34

K. Stoy

the coordinate system of a module to the coordinate systems of its neighbours. This is necessary to propagate the vector gradient and use the brick representation. The simulator is programmed in Java3D. The simulation uses time steps. In each time step all the modules are picked in a random sequence and are allowed: 1) to process the messages they have received since last time step, 2) to send messages to neighbours (but not wait for reply), and 3) to move if possible. Note that this implies that it can take many time steps for gradients to be propagated in the system. In order for a module to move it has to satisfy the motion constraints of the system. This means that in order to move to a new position the new position has to be free. If using the rotate primitive, the position that the module has to rotate through to get to the new position has to be free as well. This is all assumed to be sensed locally. In our implementation the problem of two modules moving into the same cell at the same time is not addressed, because in the simulation only one module is allowed to move at a time. However, in a real parallel-running system this problem might be solved by rolling the module back to its original position in case of collision or through local right-of-way rules.

6 Experiments The task we pick for our experiment is to reconfigure the self-reconfigurable robot from a random initial configuration to another random final configuration satisfying the constraint on desired configurations. The representations of the desired configurations are automatically generated. The representation is then downloaded into the modules of the simulation and the assembly process is started. We repeated the experiments ten times with changing numbers of modules. For each experiment we recorded the time steps needed, the total number of moves, and calculated the theoretical lower bound on the needed number of moves. The lower bound is the minimum number of moves needed to reconfigure from the initial configuration, to the final configuration assuming that interference with other modules can be ignored [5]. The lower bound was only calculated for configurations with 800 modules or below, because with a higher number of modules it became computationally infeasible. The time steps needed to reconfigure grow sub-linearly with the number of modules (see Figure 4) which is comparable to our earlier results with porous configurations [12] and the results reported by Yim et al. [17]. Figure 4 shows that the number of moves grows approximately linearly with the number of modules. These results do of course depend on the shape of the desired configurations which, even though they are randomly generated, tend to be spherical for larger numbers of modules. In all experiments the system converges to the desired configuration.

How฀to฀Construct฀Dense฀Objects 800

70000

700

60000

600

Moves

Time steps

Lower bound Observered

50000

500 400 300

40000 30000 20000

200

10000

100 0

35

0

500

1000

1500

2000

Modules

2500

3000

3500

0

0

500

1000

1500

2000

2500

3000

Modules

Fig.฀ 4.฀ This figure shows how the number of time steps (left) and moves (right) needed to reconfigure depends on the configuration and the number of modules. The theoretical lower bound on the number of moves is also shown.

Fig.฀5.฀This figure shows the total number of messages used for communication as a function of the number of modules. It is shown how the messages are divided between different tasks (the order of items in the legend is the same as in the column stacks).

The฀number฀of฀moves฀and฀the฀time฀to฀complete฀a฀reconfiguration฀are฀important฀characteristics฀of฀an฀algorithm.฀However,฀another฀aspect฀is฀the฀amount of฀communication฀needed.฀This฀information฀is฀summarized฀in฀Figure฀5.฀The system฀relies฀heavily฀on฀communication฀and฀in฀case฀of฀the฀3200-modules฀experiments฀on฀average฀8฀9 . ± 2฀3฀messages฀are฀sent฀per฀module฀per฀time฀step.฀These . messages฀ originate฀ from฀ propagating฀ three฀ changing฀ gradients,฀ propagating position฀information,฀and฀message฀exchanges฀need฀to฀run฀the฀distributed฀connection฀algorithm.฀In฀our฀current฀implementation,฀little฀is฀done฀to฀minimize the฀number฀of฀messages,฀so฀there฀is฀room฀to฀improve฀the฀performance.

36

K.฀Stoy

7฀Conclusion฀and฀Future฀Work We฀have฀presented฀an฀approach฀to฀self-reconfiguration฀working฀in฀two฀steps. First,฀a฀representation฀of฀a฀desired฀configuration฀is฀generated฀automatically based฀on฀a฀3D฀CAD฀model.฀Second,฀a฀self-reconfigurable฀robot฀uses฀this฀representation฀to฀transform฀itself฀from฀a฀random฀initial฀configuration฀to฀the฀desired฀configuration.฀The฀transformation฀is฀controlled฀by฀an฀algorithm฀which propagates฀holes฀in฀the฀desired฀configuration฀towards฀modules฀outside.฀This algorithm฀is฀extended฀with฀a฀scaffold฀building฀algorithm฀that฀removes฀local minima฀from฀the฀desired฀configuration฀and฀thereby฀guarantees฀convergence of฀the฀algorithm.฀Furthermore,฀a฀connection฀algorithm฀ensures฀that฀the฀robot stays฀connected฀during฀the฀transformation฀process฀by฀guaranteeing฀that฀modules฀at฀all฀times฀are฀connected฀to฀the฀scaffold. In฀experiments฀with฀a฀simulated฀self-reconfigurable฀robot฀we฀have฀demonstrated฀ that฀ the฀ system฀ represents฀ an฀ efficient,฀ systematic,฀ and฀ convergent solution฀to฀the฀self-reconfiguration฀problem,฀while฀only฀restricting฀the฀class of฀possible฀configurations฀mildly.฀The฀algorithm฀represents,฀to฀the฀best฀of฀our knowledge,฀ the฀ first฀ algorithm฀ able฀ to฀ maintain฀ these฀ characteristics฀ while handling฀dense฀configurations.

Acknowledgments The฀author฀would฀like฀to฀thank฀George฀Konidaris,฀University฀of฀Massachusetts at฀Amherst,฀for฀the฀fruitful฀discussions฀resulting฀in฀this฀paper.

References 1. H. Bojinov, A. Casal, and T. Hogg. Emergent structures in modular selfreconfigurable robots. In Proc., IEEE Int. Conf. on Robotics & Automation (ICRA’00), volume 2, pages 1734 –1741, San Francisco, California, USA, 2000. 2. Z. Butler, S. Byrnes, and D. Rus. Distributed motion planning for modular robots with unit-compressible modules. In Proc., IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS’01), volume 2, pages 790–796, Maui, Hawaii, USA, 2001. 3. C.-J. Chiang and G.S. Chirikjian. Modular robot motion planning using similarity metrics. Autonomous Robots, 10(1):91–106, 2001. 4. G. Chirikjian, A. Pamecha, and I. Ebert-Uphoff. Evaluating efficiency of selfreconfiguration in a class of modular robots. Robotics Systems, 13:317–338, 1996. 5. G.S. Chirikjian and J.W. Burdick. The kinematics of hyper-redundant robot locomotion. IEEE transactions on robotics and automation, 11(6):781–793, 1995. 6. C. Jones and Maja J. Matari´c. From local to global behavior in intelligent self-assembly. In Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA’03), pages 721–726, Taipei, Taiwan, 2003.

How฀to฀Construct฀Dense฀Objects

37

7.฀ K.฀ Kotay฀ and฀ D.฀ Rus.฀ Algorithms฀ for฀ self-reconfiguring฀ molecule฀ motion planning.฀ In฀ Proc.,฀ IEEE/RSJ฀ Int.฀ Conf.฀on฀Intelligent฀ Robots฀ and฀ Systems (IROS’00,฀volume฀3,฀pages฀2184–2193,฀Maui,฀Hawaii,฀USA,฀2000. 8.฀ S.฀Murata,฀H.฀Kurokawa,฀and฀S.฀Kokaji.฀ Self-assembling฀machine.฀ In฀Proc., IEEE฀ Int.฀ Conf.฀ on฀ Robotics฀ &฀ Automation฀ (ICRA’94),฀ pages฀ 441–448,฀ San Diego,฀California,฀USA,฀1994. 9.฀ A.฀Pamecha,฀I.฀Ebert-Uphoff,฀and฀G.S.฀Chirikjian.฀ Useful฀metrics฀for฀modular฀robot฀motion฀planning.฀ IEEE฀Transactions฀on฀Robotics฀and฀Automation, 13(4):531–545,฀1997. 10.฀ K.C.฀ Prevas,฀ C.฀ Unsal,฀ M.O.฀ Efe,฀ and฀ P.K.฀ Khosla.฀ A฀ hierarchical฀ motion planning฀strategy฀for฀a฀uniform฀self-reconfigurable฀modular฀robotic฀system.฀ In Proc.,฀IEEE฀Int.฀Conf.฀on฀Robotics฀and฀Automation฀(ICRA’02),฀volume฀1,฀pages 787–792,฀Washington,฀DC,฀USA,฀2002. 11.฀ D.฀Rus฀and฀M.฀Vona.฀Self-reconfiguration฀planning฀with฀compressible฀unit฀modules.฀ In฀Proc.,฀IEEE฀Int.฀Conf.฀on฀Robotics฀and฀Automation฀(ICRA’99),฀volume฀4,฀pages฀2513–2530,฀Detroit,฀Michigan,฀USA,฀1999. 12.฀ K.฀Støy.฀Controlling฀self-reconfiguration฀using฀cellular฀automata฀and฀gradients. In Proc.,฀8th฀int.฀conf.฀on฀intelligent฀autonomous฀systems฀(IAS-8),฀pages฀693– 702,฀Amsterdam,฀The฀Netherlands,฀2004. 13.฀ K.฀Støy฀and฀R.฀Nagpal.฀ Self-reconfiguration฀using฀directed฀growth.฀ In฀Proc., int.฀ conf.฀ on฀ distributed฀ autonomous฀ robot฀ systems฀ (DARS-04),฀ pages฀ 1–10, Toulouse,฀France,฀2004. ¨ and P.K. Khosla. A multi-layered planner for self-reconfiguration of a 14.฀ C. Unsal uniform group of i-cube modules. In Proc., IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS’01), volume 1, pages 598–605, Maui, Hawaii, USA, 2001. ¨ 15. C. Unsal, H. Kiliccote, and P.K. Khosla. A modular self-reconfigurable bipartite robotic system: Implementation and motion planning. Autonomous Robots, 10(1):23–40, 2001. 16. S. Vassilvitskii, M. Yim, and J. Suh. A complete, local and parallel reconfiguration algorithm for cube style modular robots. In Proc., IEEE Int. Conf. on Robotics and Automation (ICRA’02), volume 1, pages 117–122, Washington, DC, USA, 2002. 17. M. Yim, Y. Zhang, J. Lamping, and E. Mao. Distributed control for 3d metamorphosis. Autonomous Robots, 10(1):41–56, 2001. 18. E. Yoshida, S. Murata, H. Kurokawa, K. Tomita, and S. Kokaji. A distributed reconfiguration method for 3-d homogeneous structure. In Proc., IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS’98), volume 2, pages 852–859, Victoria, B.C., Canada, 1998.

In฀Situ฀Autonomous฀Biomechanical Characterization Mehdi฀Boukallel,฀Maxime฀Girot,฀and฀St´ephane฀R´egnier Laboratoire฀de฀Robotique฀de฀Paris Universit´e฀Pierre฀et฀Marie฀Curie฀-฀Paris฀6 CNRS฀-฀FRE฀2507 18฀route฀du฀Panorama฀-฀92265฀Fontenay฀Aux฀Roses฀-฀France boukallel,฀girot,฀[email protected] Summary. This฀paper฀presents฀a฀fully฀automated฀microrobotic฀system฀based฀on force/vision฀ referenced฀ control฀ designed฀ for฀ cell฀ mechanical฀ characterization.฀ The design฀ of฀ the฀ prototype฀ combines฀ Scanning฀ Probe฀ Microscopy฀ (SPM)฀ techniques with฀advanced฀robotics฀approaches.฀As฀a฀result,฀accurate฀and฀non-destructive฀mechanical฀ characterization฀ based฀ on฀ soft฀ contact฀ mechanisms฀ are฀ achieved.฀ The฀ in vitro฀working฀conditions฀are฀supported฀by฀the฀experimental฀setup฀so฀that฀mechanical฀characterizations฀can฀be฀performed฀in฀biological฀environmental฀requirements฀as well฀as฀in฀cyclical฀operating฀mode฀during฀several฀hours.฀The฀design฀of฀the฀different฀ modules฀ which฀ compose฀ the฀ experimental฀ setup฀ are฀ detailed.฀ Mechanical฀ cell characterization฀experiments฀under฀in฀vitro฀ conditions฀on฀human฀adherent฀cervix Epithelial฀Hela฀cells฀are฀presented฀to฀demonstrate฀the฀viability฀and฀effectiveness฀of the฀proposed฀setup.

Keywords: In฀vitro฀mechanical฀cell฀characterization;฀Scanning฀Probe฀Microscopy฀(SPM)฀techniques;฀human฀adherent฀cervix฀Epithelial฀Hela฀cells฀mechanical฀characterization.

1 Introduction Nowadays robotics and microrobotics techniques play an important role for exploring the mechanical cell behaviour. The ability to study accurately the mechanical behaviour of individual cells is a key component in understanding the elementary biological functions that various biological cells exhibit. Furthermore, the individual mechanical cell characterization is a major step towards the knowledge and understanding the behavior of the complex mechanical tissues. Up to date, several experimental setups have been developed to identify the control mechanisms of the cell mechanical response [1]-[5]. Among these robotics and microrobotics systems, the most promising ones

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 39–50, 2006. © Springer-Verlag Berlin Heidelberg 2006

40

M.฀Boukallel,฀M.฀Girot,฀and฀S.฀Regnier

involves฀Scanning฀Probe฀Microscopy฀(SPM)฀techniques฀for฀nanoscale.฀These techniques฀ have฀ the฀ potential฀ to฀ give฀ an฀ accurate฀ quantitative฀ information about฀ local฀ forces฀ and฀ contact฀ mechanisms.฀ The฀ Atomic฀ Force฀ Microscope (AFM)฀has฀become฀a฀commonly฀used฀tool฀to฀measure฀mechanical฀properties of฀the฀biological฀samples฀[6]-[10].฀In฀these฀studies,฀a฀flexible฀cantilever฀with low฀spring฀constant฀and฀an฀atomic฀sharp฀tip฀is฀brought฀in฀the฀vicinity฀of฀the biological฀sample.฀Deflection฀of฀the฀cantilever฀as฀a฀result฀of฀the฀mechanical interaction฀between฀the฀microindenter฀and฀the฀sample฀is฀monitored฀by฀a฀split photodiode฀and฀the฀use฀of฀a฀laser฀beam฀reflected฀on฀the฀back฀of฀the฀cantilever. The฀significant฀research฀involving฀AFM฀make฀possible฀the฀measuring฀of฀relevant฀ cells฀ mechanical฀ properties฀ (Young’s฀ modulus,฀ bulk฀ modulus,฀ surface roughness,฀...).฀However,฀most฀of฀these฀studies฀have฀not฀been฀performed฀in biological฀clean฀room฀environment.฀Since฀elementary฀biological฀functions฀and mechanical฀properties฀of฀biological฀cells฀are฀widely฀affected฀by฀the฀experimental฀conditions,฀identified฀properties฀may฀not฀be฀relevant.฀Furthermore,฀as฀the reaction฀of฀the฀biological฀samples฀to฀stress฀vary฀greatly฀in฀a฀given฀lapse฀time, it฀is฀important฀to฀monitor฀the฀characterization฀process฀continuously฀in฀an฀in vitro environment. Moreover,฀some฀problems฀are฀commonly฀associated฀with฀the฀use฀of฀standard commercial฀cantilevers฀with฀a฀sharp฀tip฀for฀the฀soft฀sample฀mechanical฀characterization.฀The฀nanometers฀size฀dimensions฀of฀tips฀can฀cause฀important฀local strains฀which฀are฀higher฀than฀elastic฀domain.฀Furthermore,฀depending฀on฀the magnitude฀of฀the฀applied฀force฀on฀the฀soft฀samples,฀cantilevers฀tips฀as฀well as฀the฀samples฀can฀be฀easily฀damaged฀so฀that฀the฀local฀strain฀applied฀in฀the indented฀area฀becomes฀changed.฀To฀overcome฀these฀problems,฀Mahaffy฀et฀al. [11]฀suggest฀to฀attach฀a฀micro-sphere฀at฀the฀cantilever฀tip.฀However,฀besides the฀necessity฀of฀dexterous฀manipulations฀for฀accurate฀placement฀of฀the฀microsphere,฀this฀method฀needs฀to฀use฀well-know฀materials฀(geometry,฀mechanical properties,฀...). Another฀difficulty฀is฀associated฀with฀the฀mechanical฀characterization฀of฀biological฀samples฀using฀sharp฀cantilever.฀Usually,฀the฀spectroscopy฀curves฀collected฀with฀the฀AFM฀are฀used฀in฀conjunction฀with฀an฀appropriate฀analytical model฀to฀estimate฀Young’s฀modulus,฀friction,฀wear฀and฀other฀material฀properties.฀According฀to฀the฀literature,฀the฀Hertzian฀model฀which฀describes฀the relationship฀between฀force฀and฀identation฀is฀the฀commonly฀approach฀used฀for fitting฀the฀experimental฀data.฀Also,฀two฀major฀assumptions฀are฀made฀:฀linear elasticity฀and฀infinite฀sample฀thickness.฀Some฀authors฀have฀showed฀that฀in฀the case฀of฀soft฀contact฀mechanisms,฀models฀derived฀from฀linear฀elasticity฀can฀lead to฀significant฀errors฀[12].฀Moreover,฀due฀to฀the฀imperfections฀of฀the฀tip฀radius of฀curvature฀an฀ill-defined฀contact฀region฀results฀between฀the฀probe฀and฀the sample.฀Consequently,฀uncertainties฀are฀introduced฀for฀choosing฀the฀appropriate฀fitting฀analytical฀model.฀It฀has฀been฀also฀shown,฀depending฀on฀the฀applied force฀and฀on฀the฀sample฀thickness,฀that฀large฀errors฀may฀result฀when฀using infinite฀ thickness฀ models฀ [13][14].฀ The฀ authors฀ compute฀ force-displacement

In Situ Autonomous Biomechanical Characterization

41

curves for finite sample thickness to show that, for soft and thin sample the error in the estimated elasticity modulus can be an order of magnitude. In our opinion, the cell mechanical characterization in in vitro environment using tipless cantilever seems to be a promising solution. As study involving such cantilever are less prone to problems associated to sharp tip cantilever, enhanced non-destructive cell mechanical characterization should be achieved. For this purpose, a Force Bio-Microscope (FBM) system has been developed which combines SPM techniques and advanced robotics approaches. A tipless chemically inert cantilever is used for this study. Contact mechanisms resulting from the biomechanical process are modeled with the DMT (Derjaguin, Muller and Toporov) contact theory which taking into account both adhesion forces and finite sample thickness. In order to demonstrate the accuracy of the DMT model in the case of soft contact mechanisms, the estimated force-deformation curve is compared to the one predicated by the Hertz theory.

2฀Experimental฀Setup฀Overview The฀automated฀Force฀Bio-Microscope฀(FBM)฀device฀is฀an฀hybrid฀AFM฀microscope฀associating฀both฀scanning฀microscopy฀approach฀and฀biological฀environment฀constraint.฀The฀FBM฀ consists฀mainly฀of฀three฀units:฀the฀mechanical sensing฀unit฀which฀performs฀detection,฀positioning฀and฀sensing฀features,฀the imaging/grabbing฀unit฀for฀imaging฀and฀cell฀tracking฀features฀and฀the฀clean room฀ in฀ vitro฀ unit฀ which฀ allows฀ experiments฀ to฀ be฀ conducted฀ in฀ biological environment. As฀ the฀ FBM฀ experimental฀ setup฀ provides฀ suitable฀ conditions฀ for฀ study฀ in controlled฀ environment,฀ the฀ mechanical฀ measurements฀ process฀ can฀ be฀ done on฀the฀biological฀sample฀on฀an฀extended฀lapse฀of฀time.฀A฀master฀computer฀is used฀to฀drive฀the฀FBM฀ in฀a฀automatic฀operating฀mode฀based฀on฀force/vision referenced฀control.฀A฀user-definable฀graphical฀interface฀has฀been฀developed฀in order฀ to฀ make฀ configuration฀ of฀ the฀ automatic฀ cell฀ mechanical฀ characterization฀process฀easier.฀To฀avoid฀undesired฀mechanical฀vibrations฀during฀the฀cell characterization฀tasks,฀the฀FBM฀ is฀installed฀on฀an฀anti-vibration฀table.฀The overall฀configuration฀of฀the฀FBM฀ and฀the฀different฀working฀components฀are shown฀in฀figure฀1(A). 2.1฀ Mechanical฀Sensing฀Unit The฀mechanical฀sensing฀unit฀is฀based฀on฀the฀detection฀of฀the฀deflection฀of฀a cantilever฀by฀an฀optical฀technique.฀A฀four฀quadrant฀photodiode฀with฀internal฀amplifiers฀associated฀to฀a฀low฀power฀collimated฀laser฀diode฀(wavelength 650฀nm)฀ are฀ used฀ in฀ order฀ to฀ perform฀ both฀ axial฀ and฀ lateral฀ nanoNewtons forces฀measurements.฀ The฀ optical฀path฀ of฀the฀ Gaussian฀laser฀ beam฀ is฀ optimized฀using฀a฀pair฀of฀mirrors฀and฀a฀glass฀aspheric฀condenser฀lens.฀The฀total

42

M.฀Boukallel,฀M.฀Girot,฀and฀S.฀Regnier

sensing฀area฀of฀the฀photodiode฀is฀7฀mm2฀ with฀a฀spectral฀response฀from฀400฀to 1100฀nm.฀The฀sensitivity฀of฀the฀optical฀detection฀device฀is฀5฀mV฀/µm. A฀low฀spring฀constant฀(0.2฀N/m)฀uncoated฀tipless฀silicon฀cantilever฀is฀used฀as probe฀for฀the฀cell฀mechanical฀characterization.฀The฀lever฀is฀450 µm฀long,฀90 µm฀large฀and฀2฀µm฀thick.฀The฀biological฀samples฀are฀accurately฀positioned below฀the฀cantilever฀by฀a฀3฀DOF’s฀(x,y฀and฀z)฀micropositioning฀encoded฀stage with฀a฀submicrometer฀resolution฀(0.1฀µm).฀The฀kinematics฀features฀of฀the฀micropositioning฀stage฀allows฀to฀achieve฀accurate฀mechanical฀measurements฀in฀a workspace฀of฀25฀x฀25฀x฀25฀mm3฀ with฀a฀good฀repeatability.฀The฀configuration of฀the฀mechanical฀sensing฀unit฀including฀the฀optical฀detection฀device฀is฀presented฀in฀figure฀1(B).฀A฀magnified฀picture฀of฀the฀cantilever฀with฀the฀focused laser฀beam฀on฀its฀reflective฀surface฀is฀shown฀in฀the฀same฀figure.

(A) Cage incubator

100 mm

(B) 3 DOF micropositioning stage

Heating module

Antivibration table

Miror

Laser diode

15 mm

10 mm Photodiode Glass aspheric condenser lens Silicon tipless cantilever

Fais Focused laser beam 100 µm

Inverted microscope

CCD camera

10 mm

Cantilever

Fig.฀1.฀(A)฀The฀FBM฀ experimental฀setup฀overview.฀(B)฀The฀mechanical฀sensing unit.

For฀the฀preliminary฀study,฀we฀focused฀on฀force฀feedback฀control฀of฀cantilever฀ flexural฀ deflection.฀ Thus,฀ only฀ the฀ vertical฀ micropositioning฀ stage฀ is servoed.฀By฀knowing฀the฀micromotors฀vertical฀position฀as฀well฀as฀deflection of฀the฀cantilever฀using฀the฀optical฀detection฀device,฀a฀optimized฀Proportional and฀Derivative฀(PD)฀controller฀was฀designed฀to฀ensure฀optimal฀control฀performance.฀The฀(PD)฀terms฀are฀optimized฀using฀the฀Ziegler-Nichols฀method. Figure฀ 2(A)฀ shows฀ experimental฀ results฀ on฀ the฀ force฀ referenced฀ control฀ approach฀for฀different฀desired฀forces. 2.2฀ Imaging/Grabbing฀Unit The฀imaging/grabbing฀unit฀consists฀of฀an฀inverted฀microscope฀(Olympus฀IMT2)฀with฀Nikon฀10x฀and฀20x฀objectives.฀A฀phase฀contrast฀device฀is฀mounted฀on

In Situ Autonomous Biomechanical Characterization

43

the microscope for precise contrast operation. The inverted microscope is fitted out with a CCD camera (754x488 pixels resolution). Using a frame grabber and a specialized imaging library package (Matrox Imaging) associated to the CCD camera, automatic mechanical characterization based on image features tracking is achieved (cf. figure 2(B)). The pixel to real world calibration of the CCD camera is achieved by means of a calibrated glass micro-array as well as calibrated micro-spheres (cf. figure 3). (A)

(B)

Cell target

100 µm

Cantilever

Fig. 2. (A) The force feedback control approach. (B) The vision feedback control approach.

Fig. 3. Pixel to real world calibration of the CCD camera using the calibrated glass micro-array.

44

M.฀Boukallel,฀M.฀Girot,฀and฀S.฀Regnier

2.3฀ Clean฀Room฀in฀vitro฀Unit The฀ incubating฀ system฀ is฀ formed฀ with฀ a฀ controlled฀ heating฀ module฀ which maintains฀temperature฀on฀the฀cage฀incubator฀at฀37฀ o C฀ using฀a฀single฀thermocouple฀probe.฀The฀cage฀incubator฀ensures฀a฀temperature฀stability฀within the฀0.1฀o C.฀A฀mixed฀stream฀composed฀by฀a฀5%฀CO2฀ and฀humidified฀air฀is฀fed into฀a฀small฀incubating฀chamber฀containing฀the฀biological฀samples,฀avoiding in฀this฀way฀condensation฀on฀the฀cage฀walls฀that฀could฀damage฀the฀mechanical parts฀ of฀ the฀ microscope฀ and฀ the฀ micropositioning฀ stage.฀ The฀ whole฀ system including฀the฀FBM฀ is฀placed฀in฀a฀positive฀pressure฀clean฀room฀to฀protect฀the biological฀environment.

3฀In฀vitro฀Mechanical฀Characterization฀Experiments Experiments฀ presented฀ in฀ this฀ paper฀ are฀ focused฀ on฀ the฀ mechanical฀ stress response฀ of฀ human฀ cervix฀ Epithelial฀ Hela฀ (EpH)฀ cells฀ under฀ controlled฀ environment.฀In฀order฀to฀be฀valid,฀some฀assumptions฀are฀made฀in฀this฀regard: (i)฀the฀biological฀membrane฀is฀assimilated฀to฀a฀perfectly฀elastic฀spherical฀material;฀(ii)฀the฀enclosing฀liquid฀by฀the฀biological฀membrane฀of฀the฀EpH฀ cells is฀considered฀as฀incompressible;฀(iii)฀the฀surfaces฀are฀assumed฀frictionless฀so that฀only฀normal฀forces฀are฀transmitted;฀(iv)฀the฀osmotic฀influence฀on฀volume modification฀is฀neglected. The฀EpH฀ cells฀can฀be฀assimilated฀morphologically฀to฀an฀elliptical฀cells฀with฀a thin฀surrounding฀biomembrane.฀In฀the฀present฀study,฀the฀average฀dimensions of฀the฀biological฀sample฀is฀10 µm฀long,฀9฀µm฀large฀and฀6฀µm฀height฀(cf.฀figure 4). The฀Epithelial฀Hela฀cells฀(EpH)฀are฀prepared฀on฀Petri฀dishes฀with฀specific฀culture฀medium฀formed฀by฀Dulbecco’s฀Modified฀Eagle’s฀Medium฀(DMEM)฀with high฀glucose฀and฀L-glutamine฀components฀and฀10฀%฀of฀foetal฀bovine฀serum.

Fig. 4. (A) Magnified image of the cervix Epithelial Hela cells obtained with an 63x objective. (B) The cervix Epithelial Hela cells morphology observed by fluorescence techniques.

In฀Situ฀Autonomous฀Biomechanical฀Characterization

45

3.1฀ Cell’s฀Mechanical฀Response฀Characterization Figure฀5(A)฀shows฀the฀experimental฀curves฀of฀the฀photodiode฀output฀as฀function฀of฀the฀sample฀displacement฀(Δz)฀performed฀on฀both฀single฀EpH฀ cell฀and hard฀surface.฀The฀single฀step฀of฀the฀sample฀displacement฀is฀200฀nm฀and฀the total฀displacement฀is฀8฀µm.฀Deformations฀δ฀ of฀the฀EpH฀ cell฀are฀monitored by฀ calculating฀ the฀ difference฀ between฀ the฀ sample฀ displacement฀ Δz฀ and฀ the cantilever฀deflection฀Δd.฀The฀non-linear฀elastic฀behaviour฀of฀the฀EpH฀ can฀be seen฀in฀the฀figure฀5(B)฀which฀presents฀the฀sample฀deformation฀δ฀as฀function of฀the฀load฀force฀applied฀by฀the฀cantilever. (A)

(B)

3.5 Hard surface Biological sample

3

0.25

2.5 Deformation δ (µm)

Vertical photodiode output (V)

0.3

0.2

0.15

2

1.5

0.1

1

0.05

0.5

0 0

5 10 Sample displacement µm) (

0 0

0.1 0.2 0.3 Load forceµN) (

0.4

Fig. 5. (A) Experimental data of the photodiode output as function of the sample displacement performed on both single EpH cell and hard surface. (B) Experimental curve of the sample deformation δ as function of the applied load by the cantilever.

3.2฀ Viscoelastic฀Behaviour฀Characterization The฀viscoelastic฀behaviour฀of฀the฀EpH฀ cells฀are฀also฀investigated฀by฀the฀FBM device.฀Cyclical฀and฀automatic฀approach฀and฀retract฀experimentations฀were conducted฀on฀the฀same฀biological฀sample฀during฀2฀hours฀with฀3฀minutes฀intervals.฀For฀this฀given฀study,฀the฀motion฀amplitude฀and฀the฀single฀step฀of฀the vertical฀microstage฀are฀fixed฀to฀8 µm and฀200 nm,฀respectively.฀In฀order฀to reduce฀the฀cantilever฀damping฀oscillations฀during฀the฀mechanical฀characterization฀process,฀velocity฀of฀the฀sample฀positioning฀stage฀is฀chosen฀small฀(0.5 µm/s).฀Figure฀6(A)฀shows฀3฀approaches฀and฀retracts฀curves฀monitored฀at฀different฀time฀intervals฀(t=0฀mn,฀40 mn and฀80฀mn)฀of฀the฀cyclical฀experiments. A฀single฀referenced฀approach฀and฀retract฀curves฀performed฀on฀hard฀surface are฀ given฀ in฀ figure฀ 6(B).฀ According฀ to฀ the฀ collected฀ data,฀ the฀ EpH฀ sample exhibit฀the฀same฀viscoelastic฀behaviour฀during฀all฀the฀experimentation.

46

M.฀Boukallel,฀M.฀Girot,฀and฀S.฀Regnier (A)

(B)

0.45

0.45 t=0 mn

Appoach

t=40 mn

0.4

Retract

0.4

t=80 mn 0.35

Measured force (µN)

Measured force (µN)

0.35

0.3

0.25

0.2

0.15

0.3

0.25

0.2

0.15

0.1

0.1

0.05

0.05

0

0 0

2

4

6

8

0

Sample displacement (µm)

2

4

6

8

Sample displacement (µm)

Fig. 6. (A) Experimental spectroscopy curves (approach and retract) performed on a single EpH cell at different time intervals (t=0 mn, 40 mn and 80 mn). (B) Single referenced approach and retract curves performed on hard surface. 0.7

ac

t

0.6

Strain ε

Re

tr

0.5

0.4

ch

oa

pr

0.3

Ap

Adhesion forces

0.2

0.1

0

0

1

2

3

4

5

6

Sample displacement (µm)

Fig. 7. Dimensionless strain per module length ε as function of the sample displacement Δz.

In Situ Autonomous Biomechanical Characterization

47

Compared to the approach and retract curves performed on hard surface (cf. figure 6(B)), the retract collected data performed on the Eph sample are different from the approach one. This suggests adhesion forces between the cantilever probe and the biological sample which modifies the total strain energy. Figure 7 shows the strain ε according to the sample displacement Δz. This figure emphasizes adhesion interaction between the cantilever and the biological sample.

3.3฀ In฀vitro฀Efficiency฀Approach฀for฀Cell฀Mechanical Characterization In฀order฀to฀study฀correlation฀between฀the฀mechanical฀cell฀properties฀and฀the environmental฀culture฀conditions,฀we฀have฀experimented฀automatic฀and฀cyclical฀spectroscopy฀tasks฀on฀a฀single฀EpH฀ cell฀during฀several฀minutes฀without the฀use฀of฀the฀incubating฀system.฀As฀the฀precedent฀study,฀the฀sample฀displacement฀and฀the฀single฀step฀of฀the฀vertical฀micropositioning฀stage฀are฀fixed฀to 8฀µm฀and฀200 nm฀respectively.฀Figure฀8(A)฀shows฀evolution฀of฀the EpH฀ cell mechanical฀behaviour฀of฀cyclical฀spectroscopy฀operation฀with฀and฀without฀the use฀of฀the฀incubating฀system.฀More฀specifically,฀curve฀(A)฀shows฀the฀approach and฀retract฀curves฀using฀the฀cage฀incubator.฀Curves฀(B),฀(C)฀and฀(D)฀show the฀mechanical฀behaviour฀of฀the฀studied฀EpH฀ cell฀for฀different฀elapsed฀times t0฀ once฀the฀cage฀incubator฀is฀turned฀off. These฀mechanical฀characterization฀experiments฀obviously฀reveal฀that฀mechanical฀properties฀of฀the฀studied฀sample฀are฀affected฀by฀the฀temperature฀conditions.฀This฀difference฀suggests฀that฀the฀intra฀or฀extra-cellular฀matrix฀react฀to the฀variation฀of฀temperature. 3.4฀ Analytical฀Model฀for฀Sample฀Deformation฀Estimation The฀deformation฀δ฀resulting฀from฀the฀mechanical฀cell฀characterization฀process is฀ estimated฀ using฀ an฀ appropriate฀ analytical฀ model.฀ The฀ DMT฀ (Derjaguin, Muller฀and฀Toporov)฀theory฀is฀chosen฀for฀this฀purpose. Noting฀ R฀ the฀ radius฀ of฀ the฀ biological฀ sample฀ (R=10 µm),฀ w฀ the฀ adhesion work,฀ P฀ the฀ load฀ force฀ applied฀ by฀ the฀ cantilever฀ and฀ a0฀ the฀ radius฀ of฀ the contact฀area฀when฀P฀ =฀0,฀both฀contact฀area฀radius฀a฀and฀deformation฀δ฀can be฀expressed฀according฀to฀DMT฀[15]฀by P a3฀ =฀a30 (฀ + 1) 2πRw a2 δ= R

(1) (2)

48

M.฀Boukallel,฀M.฀Girot,฀and฀S.฀Regnier

The฀DMT฀model฀suggests฀that฀adhesion฀work฀w฀can฀be฀expressed฀according฀to฀the฀pull-off฀force฀Pof f฀ needed฀to฀overcome฀adhesion฀forces฀as฀[15] Pof f฀ =฀2πRw

(3)

As฀the฀pull-off฀force฀Pof f฀ and฀the฀contact฀area a0฀ are฀accurately฀measured using฀ the฀ FBM฀ (Pof f฀ 20฀ nN ,฀ a0฀ 2฀ µm),฀ the฀ values฀ of฀ w฀ and a0฀ are introduced฀in฀equation฀1฀for฀the฀determination฀of฀a.฀Equation฀2฀gives฀the฀estimation฀of฀δ. Figure฀8(B)฀shows฀the฀estimation฀of฀the฀biological฀sample฀deformation฀δ฀as a฀function฀of฀the฀simulated฀load฀force฀P฀ using฀the฀Hertz฀and฀the฀DMT฀theories.฀These฀analytical฀results฀are฀compared฀to฀the฀experimental฀measurements performed฀with฀the฀FBM฀and฀presented฀in฀section฀3.1.฀As฀adhesion฀forces฀are not฀considered,฀large฀errors฀are฀observed฀between฀the฀experimental฀data฀and the฀predicated฀force-deformation฀curves฀(in฀order฀of฀0.2฀µm฀of฀magnitude)฀in the฀case฀of฀the฀Hertz฀model. (A)

(B) 3.5

0.4 t0 = 0 mn

(A)

3

t0 = 5 mn

0.3

t0 = 9 mn

0.25

t0 = 13 mn

(B) (C)

0.2 (D)

0.15 0.1

Deformation δ (µm)

Measured force (µN)

0.35

Experimental data Hertz DMT

2.5 2 1.5 1

0.05 0.5

0 0.05 0

1

2

3

4

5

6

Sample displacement (µm)

7

8

0

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Applied load P (µN)

Fig. 8. Evolution of the measured force as a function of the sample displacement for different elapsed times t0 = 0, 5, 9 and 13 mn. (B) Estimation of the biological sample deformation δ as a function of the simulated load force P using the Hertz and the DMT theories compared to the experimental data.

4 Conclusion This paper has presented the development of a micro-force sensing system for in vitro mechanical cell characterization. The experimental setup combines Scanning Probe Microscopy (SPM) techniques with advanced robotics approaches. As the developed system operates in a fully automatic mode based on visual and force tracking control, effective mechanical characterization and reliable data acquisition are achieved. The Force Bio-Microscope

In Situ Autonomous Biomechanical Characterization

49

device (FBM) consists of three units with autonomous force sensing and measurements capabilities. Each unit is designed, calibrated or configured towards an effective in vitro mechanical cell characterization. Mechanical characterization is conducted using the FBM on human adherent cervix Epithelial Hela cells. These experiments demonstrate the efficiency of the experimental setup developed to explore the mechanical properties in in vitro conditions of adherent biological samples. The contact mechanisms resulting form the cell mechanical characterization process are predicted using appropriate model (DMT) taking into account both adhesion forces and finite sample thickness.

5 Future work Current work involves modelling the soft contact mechanisms using a tipless cantilever taking into account friction, osmotic influence and anisotropic non-linear elastic properties. Future work will be focused on exploring the mechanical transduction of living cells in in vitro environment conditions. One of the most promising results of this study will be the description of both mechanosensitivity and mechanical transduction mechanisms at the microscale for biological cells or tissue configuration.

Acknowledgment The authors are grateful to J. Angulo-Mora from the Genetic of the Radiosensitivity Laboratory (LGR) of the French Atomic Energy Commission (CEA), for cells preparation and for his enthusiastic help.

References 1. Sun Y, Nelson B J (2004) Jrnl. of Information Acquisition 1:23–32 2. Van Vliet K J, Bao G, Suresh S (2003) Acta Materialia journal 51:5881–5905 3. Kim D H, Yun S, Kim B (2004) in proc.of the Int. Conference on Robotics and Automation 5013–5018 4. Guck J, Ananthakrishnan R, Cunningham C C, Kas J (2002) J.Physics:Condens. Matter 14:4843–4856 5. Sharp J M, Clapp A R, Dickinson R B (2003) Colloids and Surface B:Biointerfaces 27:355–364 6. Radmacher M (1997) IEEE Engineering in Medicine and Biology 47–57 7. Sahin O, Yaralioglu G, Grow R, Zappe S F, Atalar A, Quate C, Solgaard O (2004) Sensors and Actuators A 114:183–190 8. Park S, Costa K, Ateshian G (2004) Journal of Biomechanics 37:1687–1697 9. Dimitriadis E K, Horkay F, Maresca J, Kachar B, Chadwick R S (2002) Biophysical Journal 82:2798–2810

50

M.฀Boukallel,฀M.฀Girot,฀and฀S.฀Regnier

10.฀ Yao฀X,฀Walter฀J,฀Burke฀S,฀Stewart฀S,฀Jericho฀M฀H,฀Pink฀D,฀Hunter฀R,฀Beveridge T฀J฀(2002)฀Colloids฀and฀Surface฀B:Biointerfaces฀23:213–230 11.฀ Mahaffy฀R,฀Shih฀C฀K,฀Mackintosh฀F฀C,฀Kas฀J฀(2000)฀Phys.฀Rev.฀Lett.฀85:880– 883 12.฀ Costa฀K,฀Yin฀F฀C฀(1999)฀J.฀Biomeh.฀Engr.฀Trans.฀ASME฀121:462–471 13.฀ Domke฀J,฀Radmacher฀M฀(1998)฀Langmuir฀14:3320–3325 14.฀ Akhremitchev฀B,฀Walker฀G฀(1999)฀Langmuir฀15:5630–5634 15.฀ Maugis฀ D฀ (2000),฀ Contact,฀ Adhesion฀ and฀ rupture฀ of฀ elastic฀ solids,฀ SpringerVerlag

Incremental Learning of Task Sequences with Information-Theoretic Metrics∗ Michael Pardowitz, Raoul Z¨ ollner, and Rudiger Dillmann Institute of Computer Science and Engineering Universit¨ at Karlsruhe (TH) Karlsruhe, D-76138, Germany, P.O. Box 6980 Email: {pardow, zoellner, dillmann}@ira.uka.de Summary. Learning tasks from human demonstration is a core feature for household service robots. Task knowledge should at the same time encode the constraints of a task while leaving as much flexibility for optimized reproduction at execution time as possible. This raises the question, which features of a task are the constraining or relevant ones both for execution of and reasoning over the task knowledge. In this paper, a system to record and interpret demonstrations of household tasks is presented. A measure for the assessment of information content of task features is introduced. This measure for the relevance of certain features relies both on general background knowledge as well as task-specific knowledge gathered from the user demonstrations. The results of the incremental growth of the task knowledge when more task demonstrations become available is demonstrated within the task of laying a table.

1 Introduction During the last years, human-centered robotics became a major trend in the robotics community. One crucial point in HCR is to design systems that adapt to unstructured and new situations the human user comes up with. This can be achieved by giving the robot the ability of learning to transform the knowledge the user has into knowledge utilizable by a mobile service robot. One of the most intuitive ways to acquire new task knowledge is to learn it from the human user via demonstration and interaction. This approach to task learning is widely known as Programming by Demonstration (PbD). PbD-systems should be capable of learning a task from a single demonstration sufficiently for execution while being able to generalize it when further ∗

This work has been partially conducted within the german collaborative research center ”SFB Humanoide Roboter” granted by Deutsche Forschungsgemeinschaft, and within the EU Integrated Project COGNIRON (”The cognitive robot companion”), funded by the European Commision Division FP6-IST Future and Emerging Technologies under Contract FP6-002020.

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 51–63, 2006. © Springer-Verlag Berlin Heidelberg 2006

52

M.฀Pardowitz,฀R.฀Z¨ ollner,฀and฀R.฀Dillmann

demonstrations฀of฀the฀same฀task฀become฀available.฀Incremental฀learning฀approaches฀that฀gradually฀refine฀task฀knowledge฀pave฀the฀way฀towards฀suitable PbD-systems฀for฀humanoid฀robots. One฀ aspect฀ that฀ can฀ only฀ be฀ learned฀ incompletely฀ from฀ a฀ single฀ user demonstration฀is฀the฀sequence฀the฀subparts฀of฀a฀certain฀task฀can฀be฀scheduled. So฀the฀task฀knowledge฀must฀explicitly฀encode฀those฀in฀order฀to฀ensure฀a฀reliable,฀faultless฀and฀save฀execution฀of฀the฀task.฀After฀seeing฀a฀single฀demonstration,฀PbD-systems฀can฀state฀multiple฀valid฀hypotheses฀on฀the฀sequential฀constraints฀a฀task฀must฀obey.฀When฀more฀demonstrations฀become฀available,฀the incorrect฀hypotheses฀conflicting฀with฀the฀new฀demonstrations฀can฀be฀pruned. Identifying฀the฀corresponding฀subtasks฀performed฀in฀different฀demonstrations฀of฀the฀same฀task฀is฀not฀an฀easy฀job.฀Many฀features฀of฀the฀task฀are฀exposed to฀noise.฀Additionally,฀some฀features฀posess฀higher฀relevance฀to฀a฀certain฀task, while฀being฀negligible฀for฀others.฀A฀method฀for฀identifying฀the฀features฀that characterize฀a฀task฀and฀basing฀the฀recognition฀of฀corresponding฀subtasks฀on this฀selection฀is฀presented฀in฀this฀paper. The฀remainder฀of฀this฀paper฀is฀organized฀as฀follows:฀The฀next฀section฀gives an฀overview฀on฀related฀work฀concerning฀programming฀by฀demonstration฀and task฀ learning฀ from฀ user฀ demonstrations.฀ Section฀ 3฀ describes฀ the฀ system฀ for the฀acquisition฀of฀task฀knowledge฀from฀a฀single฀user฀demonstration.฀Section 4฀introduces฀task฀precedence฀graphs฀and฀proposes฀a฀method฀for฀learning฀the underlying฀precedence฀graph฀of฀a฀task฀with฀two฀or฀more฀sequential฀demonstrations฀given.฀The฀methods฀for฀identifying฀corresponding฀subtasks฀within different฀task฀demonstrations฀described฀in฀section฀5฀are฀an฀important฀preliminary฀for฀this฀computation.฀Finally,฀the฀methods฀desscribed฀in฀earlier฀sections are฀evaluated฀in฀section฀6.

2 Related Work Since [7] proposed a programming system for industrial robots using visual observation of a human user, several programming by demonstration (PbD) systems and approaches based on human demonstrations have been proposed during the past years. Many of them address special problems or a special subset of objects only. An overview and classification of the approaches can be found in [5, 14]. The mentioned Systems can be discriminated by the learning paradigm applied: Imitation learning systems are used to re-identify and execute human motions [3, 1, 2], aiming at a high similarity of the robot’s movements to the demonstrated trajectories. Examples for the entities learned by imitation are skills like grasp or release actions ([15]) or moves of manipulated objects with constraints like ”move along a table surface” etc. The classical ”peg in hole” problem can be learned by imitation, for example using a force controller which minimizes a cost function or a task consisting of a sequence of skills like ”find the hole”, ”set perpendicular position” and ”insert”. These systems

Incremental Learning of Task Sequences

53

require a large set of task demonstrations for generalizing the trajectory before the learning process can start. Background knowledge based or deductive PbD-systems, as presented in [13, 16], usually require much less or even only a single user demonstration to generate executable task descriptions. Often, the analysis of a demonstration takes place observing the changes in the scene, described by using relational expressions or contact relations [7, 10]. When mapping the learned task to the executing system, background knowledge based approaches are used in order to replicate the effects in the environment [12]. Sequential analysis of tasks is presented in [4, 9]. The first records multiple user demonstrations of a complex manipulation skill. Unnecessary actions that do not appear in all demonstrations are pruned and only the sequence of essential actions is retained. The latter stresses the role of interaction with the human user to facilitate learning of sequential arrangement of behaviors in a navigation task. In this paper a system is proposed that combines background knowledge based learning techniques with the subsymbolic, data-driven approaches into a single framework for learning of and reasoning on user task demonstrations . This gives the system to use the advantages of both learning paradigms. It overcomes the data-driven approaches’ need for a lot of demonstrations being available before the learning process can start, while being able to incrementally refine the task knoweldge that is generated using background knowledge based methods. It is applied to analyse the sequential structure of a task. In advance to the method of [4], it can not only prune the unnecessary operations that do not contribute to the effects of the demonstration, but reorder the remaining manipulation segments. This is done by applying symbolical reasoning methods on the different demonstrations of a single task in an autonomous way, in contradition to the approch in [9].

3 A System to Generate Hierarchical Task Descriptions User demonstrations featuring a task to be learned by the system are observed and analysed by a Programming by Demonstration system developed at our institute in recent years. This section gives a short overview of the task acquisition process of our system. During the performance of a task in a so called training-center, the user is observed using different sensors. These sensors include two data gloves for gathering the finger joint angles, magnetic trackers for obtaining the hands’ position, and two active stereo cameras mounted on pan-tilt-units for object localisation and tracking. Further details on the hardware used can be found in [6]. From the sensor input data, so called elementary operators are extracted. Elementary operators (EO’s) are actions that abstract a sensory motor coupling like primitive movement types (linear moves etc.), grasping and un-

54

M.฀Pardowitz,฀R.฀Z¨ ollner,฀and฀R.฀Dillmann

Fig. 1. Hierarchical representation of manipulation segments

grasping actions. These EO’s abstract from the specific execution hardware and are robot-independent from a system point of view, although they have to be implemented on the specific target robot system. From the elementary operators, compound actions are constructed in a bottom-up manner. These are called macro-operators (MO’s) and contain the full task description. On a basis level, elementary move operators are aggregated to approach, depart and transport trajectories. Together with the grasp and ungrasp EO’s, grabbing and releasing of objects can be constructed. Between those gripper operations, various basic manipulation operations are detected (like transport operations, pouring operations and tool handling for instance). On the highest level, a sequence of manipulation segments is subsumed under a whole task demonstration (see figure 1). For further details, please refer to [16]. On each level in this hierarchical task representation, the changes to the environment are induced from lower levels of the hierarchy. The pre- and postconditions describing the environmental states before and after the execution of each macro-operator are propagated from the EO-level to the manipulation segment level and are computed from the environmental changes during the manipulation. The state of the environment is described in terms of first order predicate logics, based on a set of semantical variables (see figure 2). These are basically geometrical inter-object relations (like “on”, “under”, “to the right of...” or “contained in ...”) and intra-object predicates like object class (“fork”, “plate”, “table”) and internal state descriptors (“opening angle”, “oven temperature”, “liquid level”, etc., depending on the object class). Each of these semantical variables is computed at every segmentation point, that starts or end a macro-operator. When the semantic variables change during the demonstration of a task these changes are accumulated to the overall effects of the demonstration. From these effects the pre- and postconditions of each node in the macro-operator tree, including the root, is computed. These are the central features of a task, used for the re-identification of tasks and subtasks to be described in section 5.

Incremental Learning of Task Sequences

55

Fig. 2. Some semantical variables describing the relative positions of objects.

The system covers a broad range of manipulation classes drawn from the operations needed in a household environment: Transport operations (Pick&Place), device handling (operating household devices, opening a fridge) and tool handling (pouring liquids from a bottle to a glas, unscrewing a bottle etc.). It depends on the condition that the user is doing all changes in the environment (closed world assumption). The result of the task acquisition process outlined in this section is a task description, containing a sequence of manipulation segments together with their pre- and post-conditions and the hierarchical decomposition into elementary operators.

4 Learning Task Precedence Graphs This section is concerned with the representation of the sequential features of a task, the task precedence graph (TPG), and how it can be learned incrementally from multiple demonstrations of this task.

56

M.฀Pardowitz,฀R.฀Z¨ ollner,฀and฀R.฀Dillmann

When฀a฀user฀performs฀a฀task,฀he฀performs฀it’s฀subparts฀(the฀manipulation segments)฀in฀a฀sequential฀ordering฀that฀he฀has฀chosen฀by฀random฀or฀by฀intent from฀ all฀ possible฀ task฀ execution฀ orders.฀ For฀ a฀ system฀ recording฀ his฀ actions these฀appear฀as฀a฀simple฀sequence฀of฀operations.฀A฀learning฀system฀that฀builds knowledge฀about฀a฀task฀has฀to฀hypothesize฀on฀the฀underlying฀sequential฀task structure.฀These฀hypotheses฀can฀be฀represented฀by฀task฀precedence฀graphs, introduced฀by฀[11]. Definition฀1. A฀task฀precedence฀graph฀(TPG)฀for฀a฀task฀T฀ is฀a฀directed฀graph P฀ =฀(N, R)฀with฀N฀being฀the฀set฀of฀subtasks o1 , o2 , .฀. .฀, on ,฀and R฀⊂ N฀× N being฀the฀set฀ of฀precedence฀ relations฀ a฀faultless฀task฀execution฀must฀ comply with.฀ A฀ precedence฀ relation฀ (o1 , o2 )฀ ∈ R฀ with฀ o1 , o2฀ ∈ N฀ implies฀ that฀ the operation฀o1฀ must฀be฀complete฀before฀the฀execution฀of฀o2฀ can฀start.฀This฀is abbreviated฀as o1฀ → o2 . For฀ a฀ system฀ that฀ is฀ supplied฀ with฀ only฀ a฀ single฀ user฀ demonstration D฀ =฀ (o1 , o2 , .฀. .฀, on )฀ of฀ a฀ task,฀ it฀ is฀ hard฀ to฀ guess฀ the฀ inherent฀ task฀ precedence฀graph.฀The฀learning฀system฀can฀state฀different฀equally฀valid฀hypotheses,฀ranging฀from฀the฀most฀restricting฀precedence฀graph฀P D฀ =฀(N,฀RD )฀that only฀allows฀the฀execution฀order฀demonstrated,฀to฀the฀most฀liberal฀TPG฀that posesses฀no฀sequential฀dependencies฀at฀all฀and฀allows฀the฀robot฀to฀choose฀the sequence฀of฀operations฀without฀any฀constraints.฀[11]฀states฀that฀the฀set฀of฀valid hypotheses฀is฀a฀version฀space,฀partially฀ordered฀by฀the฀subset-predicate฀on฀the sets฀of฀precedence฀relations R. While฀the฀learning฀system฀can฀not฀decide,฀which฀task฀precedence฀graph from฀the฀set฀of฀consistent฀TPG’s฀fits฀the฀task฀best฀after฀seeing฀only฀a฀single example,฀it฀seems฀a฀viable฀approach฀to฀supply฀it฀with฀more฀sample฀demonstrations,฀applying฀different฀task฀execution฀orders.฀In฀order฀to฀learn฀task฀knowledge฀from฀even฀a฀single฀demonstration฀sufficient฀for฀execution฀but฀improving the฀learned฀task฀when฀more฀knowledge฀in฀form฀of฀task฀demonstrations฀is฀available,฀an฀incremental฀approach฀is฀chosen. Assuming฀ that,฀ after฀ processing฀ m฀ demonstrations฀ {D1 , D2 , .฀. .฀, Dm } of the฀same฀task,฀the฀system฀has฀learned฀the฀most฀restrictive฀TPG฀Pm .฀Under observation฀of฀a฀new฀demonstration฀Dm+1 ,฀the฀learned฀task฀precedence฀graph can฀be฀adapted฀in฀a฀way฀that฀incorporates฀the฀new฀knowledge.฀[8]฀suggests that฀this฀can฀be฀done฀by฀further฀generalising฀the฀previous฀hypothesis฀to฀a฀new one,฀covering฀the฀additional฀example.฀In฀order฀not฀to฀generalize฀too฀far,฀the minimal฀generalization฀of฀Pm฀ that฀is฀consistent฀with฀Dm+1฀ must฀be฀chosen. The฀resulting฀ new฀set฀of฀task฀precedence฀relations฀Rm+1฀ can฀be฀expressed as฀ a฀ function฀ of฀ the฀ previously฀ learned฀ hypothesis฀ Pm฀ =฀ (N,฀Rm )฀ and฀ the most฀ restrictive฀ hypothesis฀ for฀ the฀ new฀ observed฀ demonstration฀ P Dm+1฀ = (N, RDm+1 ): Rm+1฀ =฀Rm฀ ∩ RDm+1 (1) Now,฀one฀can฀state฀the฀process฀of฀incremental฀learning฀of฀task฀precedence graphs:

Incremental Learning of Task Sequences

57

1. For the first training example D1 , initialize the task precedence graph to the most specific TPG P1 = P D1 . 2. For each additional demonstration Dm+1 of the same task, compute P Dm+1 and update the hypothesis Pm+1 according to equation 1. One issue not adressed until now is the question of when the additional task demonstrations arrive. In the application domain of household service robots one cannot assume that the user gives all demonstrations sufficient to learn the correct task precedence graph at once. Instead, it might take a long time between two successive demonstrations of the same task. Moreover, the task knowledge learned during past demonstrations has to be utilised because it is likely that the user will require the robot to execute the learned task without having taken into account all task demonstrations that might appear in the future. Therefore, the task precedence graph learned by the system should be reliable enough to ensure a faultless and, above all, secure task execution. With the mechanisms presented in this section it is possible to incrementally learn task precedence graphs from user demonstrations. The process of refining task precedence graphs starts with the most restrictive hypothesis (= the order seen in the first demonstrations) and continues to become more and more general as more demonstrations with different sequences of actions are taken into account.

5 Subtask Similarity Measures Based on Information-Theoretic Metrics While the last section presented a method for learning the sequential constraints of a task when in every task demonstration exactly the same subtasks are used, this is not likely to be true for every task. Usually the human demonstrator will only use similar but partly different manipulation segments in order to fulfill the same task. This section deals with the topic of identifying the corresponding manipulation segments in two or more demonstrations of the same task. In order to recognize the matching manipulation segments for two different task demonstrations, the ordering of the manipulations can not be a reliable measure for subtask correspondence as the sequence of subtasks performed is potentially permuted. Though matching the segments that manipulate the same class of objects seems to be a good idea at first, this method fails as soon as there are multiple objects of the same class present in the scene. So more features should be taken into account to determine the similarity of subtasks and establish sufficiently robust subtask corespondences in order to identify operations of equal impact to the scene. In order to recognize the matching manipulation segments, two sets of features are taken into account:

58

M.฀Pardowitz,฀R.฀Z¨ ollner,฀and฀R.฀Dillmann

• Object฀ features:฀ These฀ contain฀ the฀ class฀ of฀ the฀ objects฀ manipulated฀ or referenced฀in฀the฀certain฀subtask฀(cup,฀plate,฀table฀etc.)฀as฀well฀as฀their possible฀functional฀roles฀(liquid฀container,฀object฀container฀etc.). • Pre-฀and฀Postcondition฀features:฀These฀contain฀the฀geometrical฀relations that฀exist฀between฀the฀objects฀before฀or฀after฀the฀performance฀of฀the฀subtask฀respectively. The฀base฀operation฀to฀assess฀feature฀conformance฀is฀the฀measure฀of฀similarity฀sF฀ for฀two฀sets฀of฀features฀A,฀B฀with฀the฀portion฀of฀common฀features in฀both฀feature฀sets: |A฀∩ B| sF฀(A,฀B)฀=฀ . (2) |A ∪ B|

It turned out that this straight-forward approach exposes one major drawback: As the number of features is relatively high and many of the features are irrelevant to the certain task to be learned, the features carrying the relevant information will be predominated by the irrelevant ones. The solution is to weight each feature f depending of it’s relevance to the task to be learned with a weight w(f ). This turns equation 2 into sF (A, B) =

f ∈A∩B f ∈A∪B

w(f ) w(f )

.

(3)

When choosing a useful weight function w(f ) one has to consider two points: Features of high relevance to the specific task should gain high weights. One way to assess the relevance of a feature is to test its occurrence over several different demonstrations of the same task. Features with a high relevance to the task to be performed have a great probability of occurrence, while features of lower relevance will occurr less frequently. According to Shannon, the information content I(f ) of a feature f with probability p(f ) is I(f ) = − log2 p(f ). As features with low information content (= high probability of occurrence) to the specific task class T should be favored, w(f ) = − log2 (1 − p(f |T)) seems a reasonable choice for the weight function. On the other hand, when a completely new task is learned or only few demonstrations of the same task are available one has to take into account that only little information about the distribution of features in the specific task class is known. The idea is to introduce global background knowledge on the feature distribution. When several demonstrations of different tasks have been observed by the robot and incorporated into the task knowledge base, it can be assumed that the features that uniquely discriminate the task are those that have low occurrence rates accross the whole task knowledge base. Thus, the information content to all other tasks T of feature f is − log2 p(f |T). An incremental learning system should be able to cope with both situations. Therefore a combination has to be found that favors the global information content measure when no or few task demonstrations are available, and the task-specific relevance measure as more demonstrations of the specific task become available. The weight function that fulfills those requirements is

Incremental Learning of Task Sequences

w(f ) = − α log2 (1 − p(f |T)) + (1 − α) log2 p(f |T)

59

(4)

with the relative weighting α of the task-specific knowledge as α = 1−e−k·|T| . In our experiments we chose k = − ln 0.25 5 , such that after observing 5 demonstrations of the same task, the proportion α of task-specific to prior-knowledge is 0.75 : 0.25 and converging towards 1 for more demonstrations. This choice is motivated by [1], stating that the important features of a task can be sufficiently learned with 4 − 5 demonstrations. With the weighted proportion of features sF as in eq. 3 it is possible to compute the similarities spre and spost of the pre- and postcondition sets of two subtasks M1 and M2 and the average of the subtask similarities ssub of the tasks. They are set to spre = sF (P recond(M1 ), P recond(M2 )) spost = sF (P ostcond(M1 ), P ostcond(M2 )) 1 ssub = sM (m1 , m2 ). |M1 ∪ M2 | m1 ∈M1 ,m2 ∈M2

The (recursive) overall similarity sM of the two manipulation segments M1 , M2 is defined as the simple average of the pre- and postcondition similarity and the average of all subtask similarities: sM (M1 , M2 ) = 31 · (spre + spost + ssub ). Once the subtask-similarity sM is computed for every pairing of the task’s subtasks, the subtask correspondence pST is computed as the permutation that maximizes the sum of similarities: pST = arg

max

p∈perm(subtasks)

sM (M1 , M2 ). (M1 ,M2 )∈p

With the subtask permutation pST the most restrictive hypothesis P Di on the underlying task precedence structure can easily be constructed. This hypothesis can then be utilised to learn sequential task constraints in the way described in section 4.

6 Experimental Results In this section the experiments validating the method for incremental acquisition of task knowledge using heuristic relevance estimates for task features are reported. The task to be learned was chosen from the household domain; it consisted of laying the table for a main dish. The manipulated objects were: a plate, a fork and a mug. The main features that should be learned by the system are the spatial arrangement of the objects (the fork should be placed to the right of the plate and the mug placed on the distant side of the fork) as well as the sequential independence of all subtasks.

60

M.฀Pardowitz,฀R.฀Z¨ ollner,฀and฀R.฀Dillmann

Eight฀ different฀ demonstrations฀ of฀ that฀ task฀ were฀ given฀ (T฀ = {D1 , D2 , · · · , D8 }).฀In฀the฀first฀four฀of฀them,฀the฀fork,฀the฀plate฀and฀the฀mug were฀manipulated฀in฀this฀order.฀In฀the฀fifth฀and฀sixth฀demonstration฀the฀mug was฀placed฀first,฀followed฀by฀the฀fork฀and฀the฀plate.฀In฀the฀final฀two฀demonstrations฀the฀mug,฀the฀plate฀and฀the฀fork฀were฀manipulated฀in฀this฀ordering. After฀ taking฀ into฀ account฀ the฀ first฀ example,฀ the฀ initial฀ task฀ precedence graph฀P D1฀ =฀P1฀ was฀correctly฀obtained,฀which฀can฀be฀seen฀in฀the฀first฀column of฀table฀1.฀After฀observing฀a฀second฀example,฀D2 ,฀the฀relevance฀estimates฀for the฀task฀similarity฀measure฀were฀continously฀taken฀into฀account.฀The฀relevance estimates฀for฀several฀features฀depending฀on฀the฀number฀of฀task฀demonstrations are฀depicted฀in฀figure฀3(a).

0,9 0,8 0,7

0,025 0,6

0,02

Similarity

Relative Relevance Estimate

0,03

feat1 & 2 feat3 & 4 feat5 feat6 feat7

0,015 0,01

heuristical Spre naive Spre heuristical Ssub naive Ssub

0,5 0,4 0,3 0,2

0,005

0,1 0

0 1

2

3

4

5

6

7

1

8

2

3

4

5

6

7

8

#Task Demonstrations

# Task Demonstrations

(a)

(b) 1 0,9 0,8

Similarity

0,7 0,6

heuristical Spost naive Spost heuristical Sm naive Sm

0,5 0,4 0,3 0,2 0,1 0 1

2

3

4

5

6

7

8

# Task Demonstrations

(c) Fig. 3. (a) Relevance estimates for several features, depending on the number of tasks demonstrations taken into account. (b) & (c) resp.: Precondition-, Subtask-, Postcondition- and Overall-Similarity for the demonstrations D1 and D5 , calculated according to formula 3 (heuristic similarity) and 2 (naive similarity)

One can see that features 1-4 (the Relations mug ON table, fork ALIGNED EAST mug, plate ON table, fork ON plate) increase their relevance estimate as more task demonstrations become available. Meanwhile features 5-7 (the Relations fork TOUCHES SOUTH plate, mug TOUCHES NORTH

Incremental Learning of Task Sequences

61

plate and mug INTERSECTING fork ), occurring unintentionally in some task demonstrations, receive lesser ratings as they tend to occurr less frequently in all task demonstrations. With the assessment of feature relevance, the pre- and post-condition similarity spre resp. spost are calculated as well as the overall task similarity measure for all subtasks ssub , and finally the overall task similarity measure sM . In figures 3(b) & (c) , the results for the comparision for the task demonstrations D1 and D5 are shown in dependence of the number of task demonstrations regarded for the heuristical relevance estimation. The figures suggest the following: First, the heuristical measures start with a far better similarity than the naive approach. This is due to the second term in formula 4, which favours features that discriminate the new task from all other previously seen tasks. Second, the similarity measure gains higher values with more task-specific data becoming available. With more demostrations of the same task, the increasing weight α favors the first part of equation 4. As the task feature statistics also become more striking with more demonstrations, this part ensures a task-specific adaption of the relevance measures. After the system has seen four examples with all the same order of actions, the learned task precedence graph is unchanged. After that, the system observes a demonstration featuring a different execution order. The two different sequences and the recognized subtask permutation are shown in figure 4. The task precedence graph learned after five demonstrations is shown in the second column of table 1. When the last two user demonstrations are observed, the system can eliminate the last unnecessary precedence relation between the fork and the plate and is free to schedule every subtask as it is most convenient during the execution.

Fig. 4. Recognition of equal subtasks. Subtask correspondences are drawn with stipled lines. The sequential odering of subtasks is represented by bold arrows.

7 Conclusion This paper presents an approach to combining learning and reasoning on tasks. Tasks are recorded from user demonstrations, segmented, interpreted

62

M.฀Pardowitz,฀R.฀Z¨ ollner,฀and฀R.฀Dillmann

Table฀1.฀Task฀precedence฀graphs฀learned฀incrementally฀during฀the฀performance฀of the฀experiment฀described฀in฀section฀6. i

1-4

5-6

7-8

P Di

Pi

and stored in a data representation called macro-operators. Reasoning methods are applied to discover the task’s underlying sequential structure and reordering possibilities. Methods based on information-theoretic approaches are presented that estimate the relevance of task features. These heuristics combine pre-existing background knowledge on feature distributions across different tasks with learned knowledge about the task itself. The methods and models outlined are demonstrated and validated in the context of an everyday houshold task to be learned by a robot servant. The results proved the utility of the relevance estimates, the task similarity measures and the algorithms for learning sequential constraints and opportunities of tasks. This approach presents a step towards robots that learn task knowledge from human demonstrations in an incremental manner that allow them to refine and complete their learned knowledge as additional data becomes available, and act as real ’intelligent’ robot servants in future household applications.

References 1. Aude Billard, Yann Epars, Sylvain Calinon, Stefan Schaal, and Gordon Cheng. Discovering optimal imitation strategies. Robotics and Autonomous Systems, 47(2-3):69–77, 2004. 2. Cynthia Breazeal, Daphna Buchsbaum, Jesse Gray, David Gatenby, and Bruce Blumberg. Learning from and about others: Towards using imitation to bootstrap the social understanding of others by robots. Artificial Life, 11(1):31–62, 2005. 3. S. Calinon and A. Billard. Learning of Gestures by Imitation in a Humanoid Robot, chapter To appear, page In Press. Cambridge University Press, 2005. 4. Jason Chen and Alex Zelinsky. Programming by demonstration: Coping with suboptimal teaching actions. The International Journal of Robots Research, 22(5):299–319, May 2003. 5. R. Dillmann, O. Rogalla, M. Ehrenmann, R. Z¨ ollner, and M. Bordegoni. Learning Robot Behaviour and Skills based on Human Demonstration and Advice:

Incremental Learning of Task Sequences

6.

7.

8. 9.

10.

11.

12.

13.

14. 15.

16.

63

the Machine Learning Paradigm. In 9th International Symposium of Robotics Research (ISRR 1999), pages 229–238, Snowbird, Utah, USA, 9.-12. Oktober 1999. M. Ehrenmann, R. Z¨ ollner, O. Rogalla, S. Vacek, and R. Dillmann. Observation in programming by demonstration: Training and exection environment. In Humanoids 2003, Karlsruhe/Munich, Germany, October 2003, 2003. Y. Kuniyoshi, M. Inaba, and H. Inoue. Learning by Watching: Extracting Reusable Task Knowledge from Visual Observation of Human Performance. IEEE Transactions on Robotics and Automation, 10(6):799–822, 1994. Tom M. Mitchell. Generalization as search. Artificial Intelligence, 18(2):203– 226, 1982. M.. Nicoluscu and M. Mataric. Natural methods for robot task learning: instructive demonstrations, generalization and practice. In 2nd International joint conference on Autonomous agents and multiagent systems, 2003, pages 241–248, Melbourne, Australia, July 2003. H. Onda, H. Hirukawa, F. Tomita, T. Suehiro, and K. Takase. Assembly Motion Teaching System using Position/Force Simulator—Generating Control Program. In 10th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 389–396, Grenoble, Frankreich, 7.-11. September 1997. M. Pardowitz, R. Z¨ ollner, and R. Dillmann. Learning sequential constratins of tasks from user demonstrations. In IEEE-RAS Intl. Conf. on Humanoid Robots (HUMANOIDS), Tsukuba, Japan, 2005. O. Rogalla. Abbildung von Benutzerdemonstrationen auf variable Roboterkonfigurationen. PhD thesis, University Karlsruhe, Departmen of Computer Science, 2002. Y. Sato, K. Bernardin, H. Kimura, and K. Ikeuchi. Task analysis based on observing handds and objects by vision. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Lausanne, pages 1208 – 1213, 2002. S. Schaal. Is imitation learning the route to humanoid robots? In Trends in Cognitive Scienes, volume 3, pages 323–242, 1999. Qin Wang, Ruqing Yang, and Weijun Zhang. Skill acquisition from human demonstrations using fcm clustering of qualitative contact states. In Proc. 2003 IEEE Intl. Symp. on Computational Intelligence in Robotics and Automation, pages 937–942, 2003. R. Zoellner, M. Pardowitz, S. Knoop, and R. Dillmann. Towards cognitive robots: Building hierarchical task representations of manipulations from human demonstration. Intl. Conf. on Robotics and Automation (ICRA) 2005, Barcelona, Spain, 2005.

Representation,฀Recognition฀and฀Generation฀of Actions฀in฀the฀Context฀of฀Imitation฀Learning Haris฀Dindo1฀ and฀Ignazio฀Infantino2 1฀

DINFO,฀Dipartimento฀Ingengeria฀Informatica,฀Universit` a฀degli฀Studi฀di฀Palermo, Viale฀delle฀Scienze฀Ed.฀6,90128,฀Palermo,฀Italy,฀[email protected] 2฀ ICAR,฀Istituto฀di฀Calcolo฀e฀Reti฀ad฀Alte฀Prestazioni,฀Consiglio฀Nazionale฀delle Ricerche,฀Viale฀delle฀Scienze฀Ed.฀11,฀90128฀Palermo,฀Italy, [email protected]

The฀paper฀deals฀with฀the฀development฀of฀a฀cognitive฀architecture฀for฀learning฀ by฀ imitation฀ in฀ which฀ a฀ rich฀ conceptual฀ representation฀ of฀ the฀ observed actions฀is฀built.฀We฀adopt฀the฀paradigm฀of฀conceptual฀spaces,฀in฀which฀static and฀dynamic฀entities฀are฀employed฀to฀efficiently฀organize฀perceptual฀data,฀to recognize฀positional฀relations,฀to฀learn฀movements฀from฀human฀demonstration and฀to฀generate฀complex฀actions฀by฀combining฀and฀sequencing฀simpler฀ones. The฀aim฀is฀to฀have฀a฀robotic฀system฀able฀to฀effectively฀learn฀by฀imitation฀and which฀has฀the฀capabilities฀of฀deeply฀understanding฀the฀perceived฀actions฀to฀be imitated.฀Experimentation฀has฀been฀performed฀on฀a฀robotic฀system฀composed of฀a฀PUMA฀200฀industrial฀manipulator฀and฀an฀anthropomorphic฀robotic฀hand.

1 Introduction Among the modern robotic control architectures, imitation systems have gained an important role, aiming to reduce the complexity of the robot programming by endowing the system with the capability to understand actions of a human demonstrator. For a review on learning by imitation paradigm and different aspects on imitation see [15]. The learning by imitation approach involves several difficult research problems as outlined in [4], [7] and [14]. In the past years, different aspects of the problem were addressed in many working systems ([1], [2], [3], [10], [12], [17], [18]). However, the main emphasis is given to the problem of how to map the observed movement of a teacher onto the movement apparatus of the robot. While this is a fundamental aspect of any robotic system which copes with imitation, we claim that in order to effectively learn by imitation, the system itself should have the capabilities to effectively ”understand ” its environment and the perceived actions to be imitated. In our approach, understanding involves the generation of a high-level, declarative description of the perceived world. In other words, understanding

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 65–77, 2006. © Springer-Verlag Berlin Heidelberg 2006

66

H.฀Dindo฀and฀I.฀Infantino

copes฀with฀building฀an฀internal฀model฀of฀the฀external฀world.฀It฀endows฀the robot฀with฀the฀capability฀to฀perform฀reasonings฀and฀predictions฀about฀what surrounds฀it. As฀it฀has฀been฀proposed฀by฀Chella฀et฀al.฀[6],฀the฀conceptual฀spaces฀(see฀[8] for฀an฀introduction)฀allow฀to฀involve฀the฀generation฀of฀a฀high-level,฀declarative description฀of฀the฀perceived฀world.฀Following฀this฀approach,฀in฀this฀paper฀we propose฀a฀robust฀architecture฀that฀allows฀us฀to฀extract฀useful฀information฀from each฀intermediate฀level฀of฀a฀conceptual฀space฀for฀the฀purposes฀of฀imitation learning฀(geometric฀and฀color฀properties฀of฀objects,฀spatial฀relations฀between objects฀and฀actions฀performed฀on฀different฀types฀of฀objects).฀The฀Moreover, learning฀and฀imitation฀are฀treated฀in฀the฀same฀architecture฀as฀two฀different points฀of฀view฀of฀the฀same฀knowledge฀representation. The฀rest฀of฀the฀paper฀is฀organized฀as฀follows:฀in฀the฀next฀Section฀the฀cognitive฀architecture฀is฀summarized฀and฀detailed฀description฀of฀the฀conceptual representation฀is฀given.฀Section฀3฀shows฀an฀implementation฀of฀the฀architecture on฀a฀robotic฀platform.฀Finally,฀Section฀4฀gives฀short฀conclusions฀and฀outlines the฀future฀work.

2฀The฀Cognitive฀Architecture The฀proposed฀architecture฀is฀organized฀in฀three฀computational฀areas฀as฀shown in฀fig.฀1.฀The฀Subconceptual฀Area is฀concerned฀with฀the฀low-level฀processing of฀perceptual฀data฀coming฀from฀vision฀sensors฀through฀a฀set฀of฀built-in฀visual฀mechanisms฀(e.g.฀color฀segmentation,฀feature฀extraction,฀teacher’s฀posture฀reconstruction),฀as฀well฀as฀with฀the฀control฀of฀the฀robotic฀system.฀In฀the Conceptual฀Area,฀different฀properties฀of฀the฀perceived฀scene฀are฀captured:฀the Perceptual฀Space describes฀static฀properties฀of฀the฀entities฀in฀the฀scene฀(e.g. geometrical฀properties฀of฀the฀objects),฀the฀Situation฀Space฀encodes฀relations between฀ various฀ entities฀ in฀ a฀ Perception฀ (e.g.฀ near,฀ left,฀ right,฀ etc.),฀ while the฀Action฀Space฀encodes฀the฀dynamic฀properties฀occurring฀in฀the฀world.฀In the฀Linguistic฀Area,฀representation฀and฀processing฀are฀based฀on฀a฀high-level symbolic฀language.฀The฀symbols฀in฀this฀area฀are฀anchored฀to฀sensory฀data฀by mapping฀ them฀ on฀ appropriate฀ representations฀ in฀ the฀ different฀ layers฀ of฀ the conceptual฀area. The฀ same฀ architecture฀ is฀ used฀ both฀ to฀ analyze฀ the฀ observed฀ task฀ to฀ be imitated฀(observation฀mode)฀and฀to฀perform฀the฀imitation฀of฀the฀given฀task (imitation฀mode).฀For฀the฀purposes฀of฀the฀present฀discussion,฀we฀consider฀a simplified฀bidimensional฀world฀populated฀with฀various฀objects฀in฀which฀observation/imitation฀takes฀place.฀During฀the฀observation฀phase,฀the฀user฀shows her/his฀hand฀while฀performing฀arbitrary฀tasks฀of฀manipulating฀objects฀in฀front of฀a฀single฀calibrated฀camera.฀The฀task฀is฀then฀segmented฀into฀meaningful฀units and฀its฀properties฀(objects’฀position฀and฀orientation,฀color฀and฀shape,฀relations between฀objects฀and฀action฀types)฀are฀represented,฀through฀the฀Conceptual Area,฀ into฀ the฀ linguistic฀ area฀ in฀ terms฀ of฀ high-level฀ symbolic฀ terms.฀ In฀ the

Representation,฀Recognition฀and฀Generation฀of฀Actions

67

Fig. 1. The computational areas of the proposed architecture and their relation with the external world.

imitation phase, the robotic system performs the same tasks triggered by a known stimuli in the external world or by a user’s query. In the following, a detailed description of each computational area is given. 2.1 The Subconceptual Area In the proposed architecture, the sub-symbolic processing area is organized in two distinct modules performing low-level processing of sensory data. The Perceptuomotor Module continuously tracks the actions of human demonstrator in the scene during the observation mode and reports any significant changes in its motion state. During the imitation mode it translates high-level descriptions of the task to be performed into low-level control commands to be sent to the robot’s actuators. The Scene-Reconstruction Module detects all the objects in the scene and reconstructs them by means of parametric functions. It outputs positional, color and shape properties of objects in the scene. The Perceptuomotor Module The perceptuomotor module in the Subconceptual Area is concerned with both the human hand’s posture reconstruction and with the mapping between human and robot movements. The system performs skin detection in order to detect the human hand and continuously tracks its position and orientation in the image plane. The homography relationship between the image and the working plane of the user is computed during an off-line calibration phase. The module outputs the stimuli relative to the human hand’s trajectory in the world coordinate system:

68

H.฀Dindo฀and฀I.฀Infantino

P M Mhand (t)฀=฀[xhand (t), yhand (t), θhand (t)]T฀.

(1)

As฀the฀scene฀evolves฀changes฀occurs฀in฀the฀world฀because฀objects฀are฀being

(a)

(b)

(c)

(d) X Trajectory

300 250 200 150 100 50 0

0

1

2

3

4

5

6

4

5

6

Y Trajectory 300 250 200 150 100 50 0

0

1

2

3

Fig. 2. The upper figure shows the sequence of actions performed by the human user. Human hand is continuously tracked and its position and orientation is recorded. Zero-velocity crossing algorithm performs the detection of key-frames and objects’ properties (position, color and shape). The lower figure depicts computed trajectories of the human hand during the observation phase (cartesian position of the hand’s centroid). Vertical lines correspond to extracted key-frames.

grasped by the hand and moved to another location. In order to detect changes in a dynamic scene, we perform motion segmentation on the human hand data by using zero-velocity crossing detection. The zero-velocity occurs either when the direction of movement changes or when no motion is present. In order to accurately detect the zero-crossings in the movement data, we first apply a low-pass filter to eliminate high-frequency noise components, and then we

Representation,฀Recognition฀and฀Generation฀of฀Actions

69

seek for the zero velocities within a threshold. The result is a list of frame indexes, named key-frames, in which object properties (shape and color) are to be processed. Fig. 2 shows a typical task performed by the user and the process of extracting the key-frames from the human hand trajectory data. This module is also responsible for the of the issues related to the mapping of teacher’s internal/external space, e.g. joint angles or Cartesian coordinates of the end-effector, to that of the robot. In the current implementation we convert the Cartesian coordinates (in terms of x-y position and orientation) into robot joint angles by the means of standard inverse kinematics methods ([16]). Scene-Reconstruction Module For the purpose of the present discussion, we represent the objects in a scene as superellipses [11]. A superellipse is a closed curve defined by the following equation: y 2 x 2 (2) ( ) n + ( ) n฀ =฀1 a b where a฀and฀b฀are฀the฀size฀(positive฀real฀number)฀of฀the฀major฀and฀minor฀axes and฀n฀is฀a฀rational฀number฀which฀controls฀the฀shape฀of฀the฀of฀the฀curve.฀For example,฀if฀n฀=฀1฀and฀a฀ =฀ b,฀we฀get฀the฀equation฀of฀a฀circle.฀For฀smaller฀n we฀get฀gradually฀more฀rectangular฀shapes,฀until฀for฀n฀→ 0฀the฀curve฀takes฀up a฀rectangular฀shape฀(fig.฀3).฀A฀superellipse฀in฀general฀position฀requires฀three additional฀parameters฀for฀expressing฀the฀rotation฀and฀translation฀relative฀to the฀ center฀ of฀ the฀ world฀ coordinate฀ system฀ (x,฀y, θ).฀ Therefore,฀ in฀ order฀ to completely฀describe฀a฀superellipse,฀we฀need฀six฀parameters฀[a, b,฀n, x, y,฀θ]. For฀ each฀ detected฀ object฀ in฀ the฀ scene,฀ we฀ fit฀ a฀ superellipse฀ to฀ it.฀ First, we฀obtain฀the฀position฀and฀orientation฀of฀the฀objects฀in฀the฀scene฀by฀using the฀Hotelling฀transform฀(see฀[9]),฀and฀then฀we฀obtain฀the฀minimum฀bounding rectangle฀(i.e.,฀the฀rectangle฀with฀minimum฀area฀that฀contains฀all฀the฀data).฀At this฀point,฀we฀are฀given฀all฀the฀superellipse฀parameters฀(axis฀lengths,฀center, and฀orientation)฀except฀the฀shape฀parameter,฀n,฀which฀can฀be฀approximated as n฀∼ =฀1 − t ±

t2 − 6t฀+฀5,

(3)

Fig. 3. Superellipses with different values of exponent n. Size parameters a and b are kept constant.

70

H.฀Dindo฀and฀I.฀Infantino

A where, t = 4ab , and A is the area of the image object(see [13]). An example of the fitting process is given in the fig. 4. Segmented Image (HSV Color Space Segmentation)

Fitted Superellipses

X=281.1;Y=180.0;Θ=−1.5° ID=3;GRASPED=0

X=105.0;Y=89.0;Θ=15.8° ID=1;GRASPED=0

X=247.0;Y=81.1;Θ=−19.7° ID=2;GRASPED=0

Fig. 4. Results of the superellipse fitting algorithm.

However, since the process of superellipse fitting is time-consuming, the detection of scene objects is performed only at the key-frames extracted in the Perceptuomotor Module. 2.2 The Conceptual Area The conceptual spaces are based on geometric treatment of concepts in which concepts are not independent of each other but structured into domains [8], e.g. color, shape, kinematics and dynamics quantities, and so on. Each domain is defined in terms of a set of quality dimensions (e.g. the color domain may be described in terms of hue/saturation/value quantities while the kinematics domain is described by joint angles). In a conceptual space, objects and their temporal and spatial evolution are represented as points describing the quality dimensions of each domain. It is possible to measure the similarity between different individuals in the space by introducing suitable metrics. In the

Representation,฀Recognition฀and฀Generation฀of฀Actions

71

context of present architecture we propose three conceptual spaces capturing different temporal and spatial aspects of the scene (see fig. 5) as described in the following.

Fig. 5. The structure of the Conceptual Area: Perceptual Space PS describes static properties of the entities in the scene (e.g. geometrical properties of the objects), the Situation Space SS encodes relations between various entities in a Perception (e.g. near, left, right, etc.), while the Action Space AS encodes the dynamical properties of the world.

The Perceptual Space Points in the Perceptual Space are computed starting from the stimuli obtained during the subconceptual processing stage. In this work, the quality dimensions of our Perceptual Space are given by the shape and color properties of the objects perceived: P Sobj = [aobj , bobj , mobj , hobj , sobj , vobj ]T .

(4)

Hence, this space captures the shape and color properties of objects. This is useful in assigning high-level terms to objects, such us square, circle, red, green, etc. Each point is the Perceptual Space is assigned a unique identifier, ps id, together with the spatial information (x, y, θ) and the key-frame at which it has been recorded. The Situation Space The Situation Space encodes spatial relations between objects in a scene, such as left, right, near, etc. Suppose to have to have two objects in the Perceptual Space recorded at the same key-frame and described by their labels: (ps id1 , x1 , y1 , θ1 , kf and (ps id2 , x2 , y2 , θ2 , kf ). The situation between objects is then described by using the following equation: x = x1 − x2 ,

y = y1 − y 2 ,

θ = θ 1 − θ2 .

(5)

72

H.฀Dindo฀and฀I.฀Infantino

Hence,฀the฀quality฀dimensions฀of฀the฀space฀are฀given฀by฀the฀( x, y, θ), and฀it฀results฀to฀be฀symmetric฀with฀respect฀to฀the฀origin.฀Each฀point฀in฀the Situation฀Space฀is฀assigned฀a฀unique฀identifier,฀ss id, together with the ordered set of Perceptual Space labels {ps id1 , ps id2 }. A particular ss id is used to describe the absolute position/orientation of objects at each key-frame. It is straightforward to introduce suitable metrics in order to measure spatial relations between objects in a scene. The Action Space In order to account for the dynamic aspects of the perceived scene, we propose an Action Space in which each point represents the simple motion of entities. An action corresponds to a ”scattering” from one situation to another in the conceptual space. However, the decision of which kind of motion can be considered simple is not straightforward, and it is strictly related to the problem of motion segmentation, as described in Sect. 2.1. An immediate choice is to consider ”simple” any motion of the human hand occurring between keyframes. In our world there are two possible types of actions which can be performed by the hand: reach (reaches an object for the purposes of grasping it) and transfer (moves the hand to an absolute position and orientation in the world coordinate system, or relative to other objects in the world). In the first case, the trajectory of the hand is expressed relative to the object to be grasped, while in the second, it is represented relative to the origin of the world reference system. In both cases, we will refer to target configuration, xtarget . In order to represent simple motions, it is possible to choose a set of basis functions, in terms of which any simple motion can be expressed. Such functions can be associated to the axes of the dynamic conceptual space as its dimensions. Therefore, from the mathematical point of view, the resulting Action Space is a functional space. A single point in the space therefore describes a simple motion of the human hand. In the context of the present work we choose the technique of Principal Component Analysis (PCA) in order to obtain the set of basis functions. We first obtain a set of training trajectories performed by different users, ϕ(t) = [x(t), y(t), θ(t)]T , and represent each trajectory relative to the target configuration, ϕ (t) = ϕ(t) − xtarget . In order to account for different temporal durations of simple motions, we interpolate the data to a fixed length. We noted that generally the first six principal components account for the most of the data variance, which are used to encode the simple human hand motions. Principal components form the basis functions, ϕi (t), in terms of which each hand trajectory can be represented as a linear combination: k

ϕ (t) =

ci ϕi (t) i=1

(6)

Representation,฀Recognition฀and฀Generation฀of฀Actions

73

where ci฀ are฀the฀projection฀coefficients,฀and฀the฀index k฀refers฀to฀the฀number of฀principal฀components฀(in฀our฀case,฀k฀=฀6).฀Hence,฀the฀quality฀dimensions of฀the฀Action฀Space฀are฀given฀by฀the฀PCA฀coefficients: AS฀=฀[c1 , c2 , .฀. .฀, c6 ]T฀.

(7)

However,฀principal฀components฀are฀useful฀also฀as฀the฀basis฀for฀generating novel฀trajectories฀that฀resembles฀those฀seen฀during฀the฀training฀phase.฀This is฀done฀as฀the฀process฀of฀data฀interpolation:฀given฀the฀starting,฀(ϕ (t0 )),฀and ending,฀(ϕ (tf฀)),฀hand฀configurations,฀it฀is฀possible฀to฀determine฀the฀coefficients฀[c1 , .฀. .฀, c6 ]T฀ which฀satisfy฀the฀boundary฀conditions฀by฀solving฀the฀set of฀six฀(linear)฀equations฀in฀six฀unknowns: k

k

ϕ (t0 )฀=

ci ϕi (t)

ci ϕi (t); ϕ (tf ) = i=1

(8)

i=1

2.3 The Linguistic Area The more “abstract” forms of reasoning, that are less perceptually constrained, are likely to be performed mainly within the Linguistic Area. The elements of the Linguistic Area are symbolic terms that have the role of summarizing perceptions and actions represented in the conceptual spaces previously described, i.e., linguistic terms are anchored to the structures in the conceptual spaces [5]. We adopted a first-order logic language in order to implement the Linguistic Area. Perceptions and actions are represented as first-order terms. The semantics of the language is given by structures in conceptual spaces according to G¨ ardenfors [8]. In particular, terms related to actions are interpreted in the action space, terms related to the spatial properties of the objects in the scene are interpreted in the situation space, while terms related with objects are interpreted in the perceptual space. In the present work, we adopt a finite number of primitive actions the robot can perform: • reach and grasp(ps id, as id), adopted to describe an action of reaching and grasping the object described by the identifier ps id by executing the action as id, and • transf er and release(ss id, as id), adopted to describe an action, described by the indentifier as id, of transferring and releasing a grasped object to an absolute position/orientation or relative to other objects in the workspace. Each complex task may be described as a sequence of primitive actions: task : −reach and grasp(ps id, as id), transf er and release(ss id, as id), ...

74

H.฀Dindo฀and฀I.฀Infantino

3฀Experimental฀Results Our฀robotic฀setup฀is฀composed฀of฀a฀conventional฀six฀degrees-of-freedom฀PUMA 200฀ manipulator฀ and฀ an฀ anthropomorphic฀ four-fingered฀ robotic฀ hand.฀ The overall฀system฀has฀22฀degrees-of-freedom,฀and฀it฀is฀endowed฀with฀the฀ability to฀navigate฀in฀the฀free฀space,฀and฀to฀grasp฀objects฀which฀can฀be฀described฀as superellipses.฀The฀experiments฀we฀performed฀are฀concerned฀with฀the฀problem of฀teaching฀the฀robotic฀system฀several฀manipulative฀tasks. In฀order฀to฀facilitate฀the฀process฀of฀teaching฀the฀robot฀a฀task,฀we฀have฀developed฀ the฀ Visual฀Blackboard฀System฀ for฀ this฀ purpose.฀ It฀ allows฀ a฀ user฀ to create฀a฀workspace฀with฀the฀same฀dimensions฀as฀the฀real฀one,฀to฀populate฀it with฀various฀objects,฀and฀to฀execute฀several฀tasks.฀It฀also฀allows฀to฀observe the฀results฀of฀the฀imitation฀phase.฀The฀Visual฀Blackboard฀System฀is฀shown฀in figure฀6.

Fig. 6. Visual Blackboard System for teaching tasks.

As an example, consider the task of placing the ”red square” to the right of the ”green circle”. Having seen this task performed by a user, the system extracts the key-frames and encodes all scene objects, their relations and human hand motion in the Conceptual Area. Suppose that ”red square” is assigned the perceptual space label ps id1 and green circle is assigned the label ps id2 . Their final situation is assigned the situation space label ss idgoal .

Representation,฀Recognition฀and฀Generation฀of฀Actions

75

This task is automatically assigned the following description: reach(ps id1 ), grasp(ps id1 ), transfer relative to(ss idgoal ), and could be labelled accordingly to the goal performed. Since the perceptual space points hold the whole information about the objets involved in a task, it is possible to refer the previous description to another scene in which similar object are present (e.g. red square and green circle) and perform exactly the same task. However, the main strength of our architecture lies in the fact that it is also possible to anchor the same task to the more general instances by exploiting the conceptual representations. Hence, it is possible to obtain the task of ”placing squares left to circles”, or ”placing red objects left to the green ones”, and their combinations. However, each task description is given a priority: less general tasks have higher priority with respect to more general tasks. In the Imitation Mode, the system observes the scene and searches for an already seen task according to the priority (in our example it suffices to seek for squares, circles, red or green objects). If it is not able to match the task, it prompts the user which eventually shows the task in the observation mode. Otherwise, the matched description is then tailored to the particular situation, and absolute positions and orientations to be reached by the robotic system are produced and sent to the Action Space for the trajectory computation. The whole trajectory is then sent to the Robotic Control Module and mapped to our robot’s actuators. An example of the robot imitating the task (depicted in fig. 2) is shown in fig. 7. Note that in this example, the control of the robotic hand fingers is performed as a separate behavior and is not involved in the imitation process.

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 7. (a) Top view of the world as seen by the robot. (b)-(f) Sequence of actions performed by the robot in the imitation mode: the robot executes the same task performed by the user.

76

H.฀Dindo฀and฀I.฀Infantino

4฀Conclusions The฀main฀contribution฀of฀the฀paper฀is฀the฀proposal฀of฀an฀effective฀cognitive architecture฀for฀imitation฀learning฀in฀which฀three฀different฀representation฀and processing฀areas฀coexist฀and฀integrate฀in฀a฀theoretical฀way:฀the฀subconceptual, the฀conceptual฀and฀the฀linguistic฀areas.฀The฀resulting฀system฀is฀able฀to฀learn natural฀ movements฀ and฀ to฀ generate฀ complex฀ action฀ plans฀ by฀ manipulating symbols฀in฀the฀linguistic฀area. It฀should฀be฀noted฀that฀the฀conceptual฀spaces฀may฀be฀employed฀as฀a฀simulation structure,฀i.e.฀to฀simulate฀movements฀and฀actions฀before฀effectively฀executing them.฀Current฀research฀concerns฀how฀to฀enhance฀the฀proposed฀architecture by฀stressing฀this฀simulating฀capabilities฀in฀the฀context฀of฀imitation฀learning.

References 1. C. G. Atkeson, S. Schaal (1997) Learning Tasks From A Single Demonstration. In: Proceedings of IEEE-ICRA, pp. 1706-1712, Albuquerque, New Mexico. 2. A. Billard, M.J. Mataric (2001) Robotics and Autonomous System, no. 37, pp. 145-160. 3. A. Billard (2002) Imitation: a means to enhance learning of a synthetic protolanguage in autonomous robots. In: K. Dautenhahn and C.L. Nehaniv (eds), Imitation in Animals and Artifacts, pp. 281-310, MIT Press. 4. C. Breazeal, B. Scasselati (2002) Challenges in Building Robots That Imitate People. In: K. Dautenhahn and C.L. Nehaniv (eds), Imitation in Animals and Artifacts, pp. 363-390, MIT Press. 5. A. Chella, M. Frixione, S.Gaglio (2003) Robotics and Autonomous Systems, Special Issue on Perceptual Anchoring, vol. 43, 2-3, pp. 175-188. 6. A. Chella, H. Dindo, I. Infantino (2005) A Cognitive Framework for Learning by Imitation. ICRA05 workshop on the Social Mechanisms of Robot Programming by Demonstration, Barcelona, Spain. 7. K. Dautenhahn, C.L. Nehaniv (2002) The agent-based perspective on imitation. In: K. Dautenhahn and C.L. Nehaniv (eds), Imitation in Animals and Artifacts, pp. 1-40, MIT Press. 8. P. G¨ ardenfors (2000) Conceptual Spaces. MIT Press-Bradford Books, Cambridge, MA. 9. R. C. Gonzales, and P. Wintz (1987) Digital Image Processing (2nd ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. 10. J.A. Ijspeert, J. Nakanishi, S. Schaal (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proceedings of Intl. Conf. on Robotics and Automation, Washington. 11. A. Jaklic, A. Leonardis, F. Solina (2000) Segmentation and Recovery of Superquadrics. Computational Imaging and Vision, Kluwer, Dordrecth, 2000 12. Y. Kuniyoshi, M. Inaba, and H.Inoue (1994) IEEE Trans. Robot. Automat., 10:799-822, November. 13. P. Rosin (2000) IEEE Transactions. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 7, pp.726-232. 14. S. Schaal (1999) Trends in Cognitive Sciences 3, pp.233-242.

Representation,฀Recognition฀and฀Generation฀of฀Actions

77

15. S. Schaal, A. J. Ijspeert, A. Billard (2003) Computational Approaches to Motor Learning by Imitation. In: Philosophical Transactions: Biological Sciences (The Royal Society), no. 358, pp.537-547. 16. L. Sciavicco, and B. Siciliano (2000) Modelling and Control of Robot Manipulators. 2nd ed. pringer-Verlag, New York. 17. A. Ude, T. Shibata, C.G. Atkeson (2001) Robotics and Autonomous Systems, vol. 37, pp. 115-126. 18. D. M. Wolpert, K. Doya, M. Kawato (2003) A unifying computational framework for motor control and social interaction. In: Philosophical Transactions: Biological Sciences (The Royal Society), no. 358, pp. 593-602.

Reduction฀of฀Learning฀Time฀for฀Robots฀Using Automatic฀State฀Abstraction Masoud฀Asadpour1 ,฀Majid฀Nili฀Ahmadabadi2 ,฀and฀Roland฀Siegwart1 1฀

Autonomous฀Systems฀Lab,฀Ecole฀Polytechnique฀F´ed´erale฀de฀Lausanne฀(EPFL), CH-1015฀Lausanne,฀Switzerland฀{masoud.asadpour,roland.siegwart}@epfl.ch 2฀ Control฀and฀Intelligent฀Processing฀Center฀of฀Excellence,฀ECE฀Dept.,฀University of฀Tehran,฀Iran฀ [email protected] Summary.฀ The฀required฀learning฀time฀and฀curse฀of฀dimensionality฀restrict฀the฀applicability฀of฀Reinforcement฀Learning(RL)฀on฀real฀robots.฀Difficulty฀in฀inclusion฀of initial฀knowledge฀and฀understanding฀the฀learned฀rules฀must฀be฀added฀to฀the฀mentioned฀problems.฀In฀this฀paper฀we฀address฀automatic฀state฀abstraction฀and฀creation of฀hierarchies฀in฀RL฀agent’s฀mind,฀as฀two฀major฀approaches฀for฀reducing฀the฀number of฀learning฀trials,฀simplifying฀inclusion฀of฀prior฀knowledge,฀and฀making฀the฀learned rules฀more฀abstract฀and฀understandable.฀We฀formalize฀automatic฀state฀abstraction and฀hierarchy฀creation฀as฀an฀optimization฀problem฀and฀derive฀a฀new฀algorithm฀that adapts฀decision฀tree฀learning฀techniques฀to฀state฀abstraction.฀The฀proof฀of฀performance฀is฀supported฀by฀strong฀evidences฀from฀simulation฀results฀in฀nondeterministic environments.฀Simulation฀results฀show฀encouraging฀enhancements฀in฀the฀required number฀of฀learning฀trials,฀agent’s฀performance,฀size฀of฀the฀learned฀trees,฀and฀computation฀time฀of฀the฀algorithm. Keywords:฀ State฀Abstraction,฀Hierarchical฀Reinforcement฀Learning

1 Introduction It is well accepted that utilization of proper learning methods reduces robots’ after-design problems caused naturally due to use of inaccurate and incomplete models and inadequate performance measures at the design time in addition to confronting unseen situations and environment and robot changes. Existing learning methods that are capable of handling dynamics and complicated situations are slow and produce rules – mostly numerical – that are not abstract, modular, and comprehensible by human. Therefore, the designer’s intuition cannot be simply incorporated in the learning method and in the learned rules. In addition, the existing learning methods suffer from the curse of dimensionality, requiring a large number of learning trials and lack of abstraction capability. RL [1] despite its strength in handling dynamic and nondeterministic environments is an example of such learning methods. In this

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 79–92, 2006. © Springer-Verlag Berlin Heidelberg 2006

80

M.฀Asadpour,฀M.N.฀Ahmadabadi,฀and฀R.฀Siegwart

Fig.฀1. State฀abstraction฀concept

research฀we฀try฀to฀solve฀some฀of฀the฀mentioned฀problems฀in฀RL฀domain฀because of฀its฀simplicity฀and฀variety฀of฀its฀applications฀in฀robotics฀field,฀in฀addition฀to its฀mentioned฀advantages. It฀ is฀ believed฀ that฀ some฀ of฀ the฀ discussed฀ drawbacks฀ can฀ be฀ lessen฀ to฀ a great฀extend฀by฀implementation฀of฀state฀abstraction฀methods฀and฀hierarchical architectures.฀Having฀abstract฀states฀and฀hierarchies฀in฀the฀agent’s฀mind,฀the learned฀rules฀are฀partitioned,฀definition฀of฀performance฀measures฀for฀future learning฀is฀simplified,฀and฀the฀number฀of฀learning฀trials฀can฀be฀reduced฀by฀a careful฀design฀of฀learning฀method.฀In฀addition,฀incremental฀improvement฀of agent’s฀performance฀becomes฀much฀simpler. Different฀approaches฀have฀been฀proposed฀so฀far฀for฀state฀abstraction฀and creation฀of฀hierarchies฀and฀implementation฀of฀RL฀methods฀in฀those฀architectures.฀One฀group฀of฀these฀methods฀assumes฀a฀known฀structure฀and฀use฀RL methods฀to฀learn฀both฀rules฀in฀each฀layer฀and฀the฀hierarchy.฀Another฀group฀of researches฀proposes฀hand-design฀of฀subtasks฀and฀learning฀them฀in฀a฀defined hierarchy.฀These฀two฀approaches฀require฀much฀design฀effort฀for฀explicit฀formulation฀of฀hierarchy฀and฀state฀abstraction.Moreover,฀the฀designer฀should฀to some฀extend฀know฀how฀to฀solve฀the฀problem฀before฀s/he฀designs฀the฀hierarchy฀and฀subtasks.฀The฀third฀approach฀uses฀decision-tree฀learning฀methods฀to automatically฀abstract฀states,฀detect฀subtasks฀and฀speed฀up฀learning฀(Fig.฀1). The฀devised฀approaches฀however฀do฀not฀take฀the฀exploratory฀nature฀of฀RL into฀ account฀ which฀ results฀ in฀ non-optimal฀ solution.฀ In฀ this฀ paper,฀ we฀ take a฀mathematical฀approach฀to฀develop฀new฀criteria฀for฀utilization฀of฀decision trees฀in฀state฀abstraction.฀Our฀method฀outperforms฀the฀existing฀solutions฀in performance,฀number฀of฀learning฀trials,฀and฀size฀of฀trees. Related฀works฀are฀reviewed฀in฀the฀next฀section.฀U-Tree฀algorithm฀and฀its drawbacks฀are฀briefly฀described฀in฀the฀third฀section.฀In฀the฀fourth฀section฀we formalize฀state฀abstraction฀as฀an฀optimization฀problem฀and฀derive฀new฀split criteria.฀ The฀ fifth฀ section฀ describes฀ the฀ simulation฀ results.฀ Conclusions฀ and future฀works฀are฀discussed฀finally.

Reduction฀of฀Learning฀Time฀for฀Robots฀Using฀Automatic฀State฀Abstraction

81

2฀Related฀Works The฀simplest฀idea฀to฀gain฀abstract฀knowledge฀is฀detection฀of฀subtasks,฀specially repeating฀ones฀[2],฀and฀discovering฀the฀hierarchy฀among฀them.฀Once฀the฀agent learned฀a฀subtask,฀which฀is฀faster฀and฀more฀efficient฀due฀to฀confronting฀with smaller฀state฀space,฀it฀would฀be฀able฀to฀use฀it฀afterwards.฀Subtasks฀in฀Hierarchical฀RL(HRL)฀are฀closed-loop฀policies฀that฀are฀generally฀defined฀for฀a฀subset of฀ the฀ state฀ set฀ [3].฀ These฀ partial฀ policies฀ are฀ sometimes฀ called฀ temporallyextended-actions,฀ options [4], skills฀ [5],฀ behaviors [6],฀ etc.฀ They฀ must฀ have well-defined฀termination฀conditions. Although฀these฀methods฀give฀more฀understanding฀about฀the฀underlying aspects฀ of฀ Hierarchical฀ RL฀ however;฀ since฀ particular฀ features฀ are฀ designed manually฀ based฀ on฀ the฀ task฀ domain,฀ it฀ is฀ hard฀ to฀ apply฀ them฀ to฀ complex tasks.฀The฀general฀problem฀that฀should฀be฀solved฀first,฀either฀when฀designing manually฀or฀extracting฀the฀structures฀automatically฀is฀the฀state฀abstraction.฀3 In฀many฀situations฀significant฀portions฀of฀a฀large฀state฀space฀may฀be฀irrelevant to฀ a฀ specific฀ goal฀ and฀ can฀ be฀ aggregated฀ into฀ a฀ few,฀ relevant,฀ states.฀ As฀ a consequence,฀the฀learning฀agent฀can฀learn฀the฀subtask฀reasonably฀fast.฀Our work฀addresses฀this฀problem. Techniques฀ for฀ non-uniform฀ state-discretization฀ are฀ already฀ known฀ e.g. Parti-game฀[8],฀G฀algorithm฀[9],฀and฀U-Tree฀[10].฀They฀start฀with฀the฀world฀as a฀single฀state฀and฀recursively฀split฀it฀when฀necessary.฀Continuous฀U-Tree฀[11] extends฀U-Tree฀to฀work฀with฀continuous฀state฀variables.฀TTree฀[12]฀applies Continuous฀U-Tree฀to฀SMDPs.฀Jansson฀and฀Barto฀[13]฀applied฀U-Tree฀to฀options฀ but฀ they฀ use฀ one฀ tree฀ per฀ option฀ and฀ build฀ the฀ hierarchy฀ of฀ options manually.฀We฀adopt฀U-Tree฀as฀our฀basic฀method฀because฀the฀reported฀results illustrate฀it฀as฀a฀promising฀approach฀to฀automatic฀state฀abstraction. In฀ this฀ paper฀ we฀ show฀ that฀ the฀ employed฀ U-Tree฀ methods฀ ignore฀ RL’s explorative฀nature.฀This฀imposes฀a฀bias฀in฀the฀distribution฀of฀samples฀that฀are saved฀for฀introducing฀new฀splits฀in฀U-Tree.฀As฀a฀result,฀finding฀a฀good฀split becomes฀more฀and฀more฀difficult฀and฀the฀introduced฀splits฀can฀be฀far฀from optimal.฀Moreover,฀U-Tree-based฀techniques฀have฀been฀excerpted฀in฀essence from฀decision฀tree฀learning฀methods.฀Therefore฀the฀used฀split฀criteria฀are฀very general฀and฀can฀work฀with฀any฀sort฀of฀data.฀Here฀we฀devise฀some฀specialized split฀criteria฀for฀state฀abstraction฀in฀RL฀and฀show฀that฀they฀are฀more฀efficient than฀the฀general฀ones฀both฀in฀learning฀performance฀and฀computation฀time. Dean฀and฀Robert฀[14]฀have฀developed฀a฀minimization฀model,฀known฀as฀ reduction,฀to฀construct฀a฀partition฀space฀that฀has฀fewer฀number฀of฀states฀than the฀original฀MDP,฀while฀ensuring฀that฀the฀utility฀of฀the฀policy฀learned฀in฀the reduced฀state฀space฀is฀within฀a฀fixed฀bound฀of฀the฀optimal฀policy.฀Our฀work is฀different฀because฀ -reduction฀does฀not฀extract฀any฀hierarchy฀among฀state 3

The inverse of state abstraction, State Aggregation [15] clusters “similar” states and assigns them same values. It effectively reduces the state space size, but requires large number of trials and huge initial memory.

82

M.฀Asadpour,฀M.N.฀Ahmadabadi,฀and฀R.฀Siegwart

partitions,฀while฀building฀a฀hierarchy฀can฀reduce฀search฀time฀in฀partition฀lists from฀O(n)฀to฀O(log n),฀n฀being฀the฀number฀of฀partitions.฀Also,฀their฀theory฀is developed฀based฀on฀immediate฀return฀of฀actions฀instead฀of฀long-term฀return. We฀can฀argue฀that -reduction฀is฀a฀special฀case฀of฀our฀method.

3 Formalism We model the world as a MDP which is a 6-tuple (S, A, T, γ, D, R) , where S is a set of states (here can be infinite), A = {a1 , . . . , an } is a set of actions, a T = {Ps´ s } is a transition model that maps S ×A×S into probabilities in [0, 1], γ ∈ [0, 1) is a discount factor , D is the initial-state distribution from which the start state is drawn (shown by s0 ∼ D ), and R is a reward function that maps S × A × S into real-valued rewards. A policy π maps from S to A, a realvalued value function V on states, or a real-valued action-value function (also called Q-function) on state-action pairs. The aim is to find an optimal policy π∗ (or, equivalently, V ∗ or Q∗ ) that maximizes the expected discounted total rewards of the agent. We assume that each state s is a sensory-input vector (x1 , . . . , xn ), where xi is a state-variable (called also feature, or attribute). 3.1 U-Tree A U-Tree [10] [11] abstracts the state space incrementally. Each leaf Lj of the tree corresponds to an abstract state S¯j . Each leaf keeps action-values Q(S¯j , a) for all available actions. The tree is initialized with a single leaf, assuming the whole world as one abstract state. New abstract states are added if necessary. Sub-trees of the tree represent subtasks of the whole task. Each sub-tree can have other sub-sub-trees that correspond to its sub-sub-tasks. The hierarchy breaks down to leaves that specify primitive sub-tasks. Continuous U-Tree [11] loops through a two phase process: sampling and processing phases. In sampling phase, a history of the transition steps Ti = (si , si , ri , s´i ) composed of the current state, the selected action, the received immediate reward, and the next state is saved. The sample is assigned to a unique leaf of the tree, based on the value of the current state-variables. During the sampling phase the algorithm behaves as a standard RL algorithm, with the added step of using the tree to translate sensory input to an abstract state. After some number of learning episodes the processing phase starts. In processing phase a value is assigned to each sample: V (Tt ) = rt + γV (¯ st+1 ), V (¯ st+1 ) = max Q(¯ st+1 , a) a

(1)

where s¯t+1 is the abstract state that st+1 belongs to. Then if a significant difference among the distributions of sample-values within a leaf is found that leaf is split in two leaves.

Reduction฀of฀Learning฀Time฀for฀Robots฀Using฀Automatic฀State฀Abstraction

83

To฀find฀the฀best฀split฀point฀for฀each฀leaf,฀the฀algorithm฀loops฀over฀statevariables.฀The฀samples฀within฀a฀leaf฀are฀sorted฀according฀to฀a฀state-variable and฀ a฀ trial฀ split฀ is฀ virtually฀ added฀ between฀ each฀ consecutive฀ pair฀ of฀ statevariable฀values.฀This฀split฀divides฀the฀abstract฀state฀in฀two฀sets.฀A฀splitting criterion,฀ like฀ Kolmogorov-Smirnov฀ test฀ (KS),฀ compares฀ the฀ two฀ sets฀ and returns฀ a฀ number฀ indicating฀ the฀ different฀ between฀ the฀ two฀ distributions.฀ If the฀ largest฀ difference฀ among฀ all฀ state-variables฀ is฀ bigger฀ than฀ a฀ confidence threshold,฀then฀a฀split฀is฀introduced.฀This฀procedure฀is฀repeated฀for฀all฀leaves. Although฀the฀original฀algorithm฀for฀U-Tree฀[10]฀can฀function฀in฀partially observable฀domains,฀for฀the฀sake฀of฀simplification,฀we฀make฀the฀standard฀assumption฀that฀the฀agent฀has฀access฀to฀a฀complete฀description฀as฀well.฀Also, since฀the฀structure฀of฀the฀tree฀is฀not฀revised฀once฀a฀split฀is฀introduced;฀we assume฀the฀environment฀is฀stationary. If฀ we฀ assume฀ processing฀ phase฀ is฀ done฀ after฀ each฀ learning฀ episode,฀ the number฀of฀leaves฀4฀ at฀episode฀t฀is฀L(t),฀and฀sample฀size฀in฀each฀leaf฀is฀m,฀the best฀sort฀algorithms,฀e.g.฀Quicksort,฀can฀sort฀the฀samples฀based฀on฀a฀statevariable฀in฀O(m฀ log฀m).฀Maximum฀split฀points฀can฀be฀(m฀− 1).฀To฀find฀the cumulative฀distribution,฀KS-test฀needs฀the฀samples฀in฀the฀two฀virtual฀splits฀to m be฀sorted฀based฀on฀their฀values.฀This฀needs฀at฀least฀O(2฀ m 2 log 2 )). Then a pass through all sorted samples needs O(m). Totally the order of the algorithm using KS-test and assuming n being dimension of the state vectors, is: O(L(t) n m4 log m log

m ) 2

(2)

3.2 Splitting Criteria In literature two types of split criteria have been used for state abstraction. Rank-based tests e.g. non-parametric KS-test [10] [11], and tests inspired from Information Theory e.g. Information Gain Ratio(IGR) [17]. Non-parametric KS-test looks for violation of Markov Property by computing statistical difference between the distributions of samples on either side of a split using the difference in cumulative distributions of the two datasets. IGR-test measures the amount of reduction in uncertainty of sample values and is computed as follows; given a set of samples T = {Ti = (si , ai , ri , s´i )}, i = 1, . . . , m from ab¯ labeled with V (Ti ) – computed according to (1) – the Entropy stract state S, of sample-values is: EntropyS¯ (V ) = −

¯ log2 P (v(Ti ) = v|S) ¯ P (V (Ti ) = v|S)

(3)

v

¯ is the probability that samples in T have value v. where P (V (Ti ) = v|S) Information Gain of splitting the abstract state S¯ to S¯1 and S¯2 is defined as the amount of reduction in the entropy of sample-values i.e.: 4

Total number of nodes is 2L(t) − 1.

84

M.฀Asadpour,฀M.N.฀Ahmadabadi,฀and฀R.฀Siegwart

¯ S¯1 , S¯2 ) = EntropyS¯ (V ) − Inf oGain(S,

2

EntropyS¯i (V ) i=1

mi m

(4)

where mi is the number of samples from T that fall in S¯1 . IGR is defined as: ¯ S¯1 , S¯2 ) = IGR(S,

¯ S¯1 , S¯2 ) Inf oGain(S, −

2 mi i=1 m

log2

mi m

(5)

3.3 Biased Sampling Problem Each leaf of U-Tree in implementation can hold a limited number of samples. If the number of samples exceeds an upper limit, an old sample is selected randomly and is replaced by the new sample. However, selection probability of an action in a state depends on its action-value function. This fact affects the distribution of samples in a leaf. We call this a biased sampling. The biased sampling problem has a negative effect; RL algorithms escape from local optimums by making a trade off between exploration and exploitation. If size of a sample list is not big enough, then number of samples with explorative character would not be statistically significant to introduce a split, until the confidence threshold for split criterion is reduced. Reducing the threshold increases the occurrence of non-optimal splits. One expect that size of the generated tree might be large with low performance. This problem can be solved by having in each leaf, instead of one big list with capacity k, |A| k per action. Then the algorithm is modified so smaller lists with capacity |A| that when new samples arrive and the sample list is full, an old sample from the list that corresponds to the action of the new sample is deleted randomly. Now sorting the new sample list would be faster: |A| O(

k k k log ) = O(k log ) < O(k log k) |A| |A| |A|

(6)

4 SANDS In this section we formalize the state abstraction problem and provide mathematical description of a new algorithm that is specially derived for HRL. We call it SANDS which stands for State Abstraction in Nondeterministic Systems. Also we show how we can combine SANDS with other heuristics. 4.1 State Abstraction An abstract state S¯ is a subset of state space S, so that all states within it have “close” values (see bellow). Value of abstract state S¯ is defined as the expected discounted reward return if an agent starts from a state in S¯ and follows the policy π afterwards i.e.:

Reduction฀of฀Learning฀Time฀for฀Robots฀Using฀Automatic฀State฀Abstraction

¯ = V฀π (S)

¯ P (s|S) ¯ s∈S

a a π Ps´ s)) s (Rs´ s + γV (´

π(s, a) s∈A

85

(7)



where π(s, a) is the selection probability of action a in state s under policy π, a Ps´ ´ after doing action a s is the probability that environment goes to state s a in state s, and Rs´ is the expected immediate reward after doing action a in s ¯ a) is defined as: state s and going to state s´. Similarly, Qπ (S, ¯ a) = Qπ (S,

¯ P (s|S) ¯ s∈S

a a π Ps´ s)) s (Rs´ s + γV (´

(8)



From set theory, a set partition of a set S, is defined as a collection of disjoint subsets of S whose union is S. The number of partitions of the set is called a Bell number. So U = {S¯1 , . . . , S¯p } is a set partition of S iff: S¯i ⊆ S, ∀i ∈ {1, . . . , p} p S = i=1 S¯i ¯ Sj = ∅, ∀i, j ∈ {1, . . . , p}, i = j

S¯i

(9)

State abstraction is then an optimization problem which aims to find a set partition and a policy π on state space S with p, the Bell number of set partition, minimized while ensuring that the value of the learned policy π is within a fixed bound of the optimal policy π∗ , i.e.: ∗ ¯i ∈ U |V π (s) − V π (S¯i )| < , ∀s ∈ S¯i , ∀S฀

(10)

4.2฀ SANDS฀Splitting฀Criterion It฀ is฀ easy฀ to฀ prove฀ that฀ leaves฀ of฀ U-Tree฀ generate฀ a฀ set-partition฀ of฀ state space.฀It฀is฀clear฀from฀definition฀of฀state฀abstraction฀in฀(10)฀that฀we฀are฀minimizing฀the฀difference฀between฀the฀optimal฀policy฀and฀the฀policy฀that฀U-Tree represents.฀Here฀we,฀instead฀of฀walking฀toward฀the฀optimal฀policy฀which฀is฀unknown,฀try฀to฀walk฀away฀from฀the฀current฀policy฀toward฀the฀optimal฀policy. Now,฀assume฀that฀we฀have฀a฀U-Tree฀that฀defines฀a฀policy฀π.฀We฀would฀like ˜ by splitting the partition S¯ (one of the leaves in to฀enhance฀it฀to฀a฀policy π the tree) to unknown partitions S¯1 and S¯2 , so that the expected return of the resulting tree is maximized i.e. we maximize: G= s∼D

P (s)[V ˜π (s) − V π (s)]

(11)

where D is initial states distribution. Using Bellman equations we can write: P (s)

G= s∼D



˜ [π(s, a) a

a [π(s, a)

a a s (Rs´ s s´ Ps´ a a P (R s s´ s s´ s´

s))] + γV ˜π (´ + γV π (´ s))]

(12)

86

M.฀Asadpour,฀M.N.฀Ahmadabadi,฀and฀R.฀Siegwart a ˜ Rs´ s (π(s, a) − π(s, a)) ˜(s, a)V ˜π (´ +γ(π s) − π(s, a)V π (´ s))

a P (s)Ps´ s

G฀=฀ s∼D฀ a฀



(13)

V ˜π (s) is unknown, we can use V π (s) as an estimation. For the states that fall ¯ π ˜(s, a) = π(s, a). outside of S¯ policy remains unchanged after split i.e ∀s ∈ S, ¯ ¯ For the states within S or within the new abstract states S1 and S¯2 we define ˜i (a) = π ˜(s, a), s ∈ S¯i , i = 1, 2. Hence: π(a) = π(s, a), s ∈ S¯ and π   2

G=

a

i=1

( π ˜i (a) − π(a))

¯i s∈S



a a π P (s)Ps´ s)) s (Rs´ s + γV (´

(14)

¯ × From the properties of conditional probabilities P (s) = P (s|S¯i ) × P (S¯i |S) ¯ ∀s ∈ S¯i , i = 1, 2. Considering the definition in (8), we can write: P (S), 2

¯ G = P (S) a

i=1

¯ ˜i (a) − π(a))Qπ (S¯i , a)P (S¯i |S) (π

(15)

This way the learner finds the split that maximizes the performance of the tree over the “whole action set”, provided that actions are selected according to Softmax distribution. But an important point here is that the learner can not revise the tree after initiating the split. This means that it is fixing somehow a part of the policy in this level. Therefore, it is more valuable to judge the splits based on the performance of the greedy action instead of Softmax. The optimization term then should change from summation to maximum. More¯ is same for all states in abstract state S. ¯ So, the final formulation over, P (S) for SANDS criterion is simplified to: ¯ S¯1 , S¯2 ) = max SAN DS(S, a

2 i=1

¯ ˜i (a) − π(a))Qπ (S¯i , a)P (S¯i |S) (π

(16)

U-Tree generates a set-partition of state space. But, optimality of the number of partitions is not guaranteed although the algorithm starts with minimum partitions and adds more partitions if required. In fact the splits here are Univariate [16] that shrink the state space parallel to one of its axes. The necessary (but not sufficient) condition to have minimum size tree is to have Multivariate [16] splits, which can split the space in any direction. Multivariate splits are not covered in the current version of the algorithm however; we discuss the size of the trees in result section. If we assume processing phase starts after one learning episode, number of leaves in episode t is L(t), sample size in each leaf is m, and state dimension is m ) n, time order of the new algorithm would be O(L(t) n m (m2 − |A|2 ) log |A| which is comparable to (2) by a rough factor of m log m. 4.3 Using Mont Carlo Method to Estimate SANDS We now show how to compute each term by Mont-Carlo method using the samples recorded during learning episodes. Each logged sample is a 4-tuple

Reduction฀of฀Learning฀Time฀for฀Robots฀Using฀Automatic฀State฀Abstraction

87

(si , ai , ri ,฀´ si ), corresponding to current state, selected action, next state, and received reward. If among ma samples for action a in S¯ , ma1 and ma2 = ma −ma1 samples correspond to S¯1 and S¯2 respectively, we can compute: ρˆai = ˆ ai = µ ˆai = π

ma i ma , i 1 ma i

= 1, 2 ma i j=1 (rj

ρˆi = + γ maxb Q(´ sj , b)), i = 1, 2

ˆa exp(µ i /τ) P ,i ˆb b exp(µi /τ)

= 1, 2

ˆ a1 + ρˆa2 µ ˆ a2 ˆ a = ρˆa1 µ µ ˆa /τ) exp(µ P ˆb b exp(µ /τ) a a ˆ(a))µ ˆ i ρˆi π

ˆ(a) = π

= 1, 2

¯ S¯1 , S¯2 ) = maxa SAN DS(S,

P ma Pa ia , i am

2 ˆ i=1 (πi (a)



(17)

4.4 SoftMax Gain Ratio (SMGR) New criteria can be derived by combining SANDS with different heuristics. One of them is what we call SoftMax Gain Ratio: 2 aˆ ˆ ˆ ˆ ¯ S¯1 , S¯2 ) = max −π(a) log π(a) + i=1 ρˆi πi (a) log πi (a) SM GR(S, 2 a − i=1 ρˆi log ρˆi

(18)

In this criterion, instead of defining entropy function on sample-values, we define it on action-probabilities and try to find the splits that maximize the reduction of this entropy i.e. the split that results in more certainty on selection probability of actions. We can prove that SMGR scales and combines SANDS with a penalty term as following: ¯ S¯1 , S¯2 ) = SM GR(S,

1 ¯ ¯ ¯ a τ SAN DS(S, S1 , S2 ) + log 2 − i=1 ρˆi log ρˆi

2 i=1

πρiˆi (a)

(19)

We฀can฀show฀the฀penalty฀term฀punishes฀the฀splits฀that฀result฀in:฀(1)฀giving฀high probability฀to฀a฀specific฀action฀without฀support฀of฀enough฀evidence฀(samples), and฀(2)฀contradicting฀estimations฀on฀selection฀probabilities฀of฀a฀specific฀action in฀either฀side฀of฀the฀virtual฀split,฀in฀the฀sense฀that฀an฀action฀is฀the฀best฀action in฀one฀split฀but฀the฀same฀action฀is฀the฀worst฀action฀in฀the฀other฀one.

5฀Simulation฀Results We฀have฀tested฀our฀algorithms฀on฀a฀simplified฀football฀task.฀It฀is฀very฀similar to฀the฀classic฀Taxi฀problem฀with฀the฀possibility฀of฀having฀moving฀players฀with different฀patterns฀to฀create฀environments฀with฀different฀levels฀of฀nondeterministicity.฀The฀playing฀field฀is฀an฀8 × 6฀grid฀(Fig.฀2).฀The x฀axis฀is฀horizontal฀and y฀axis฀is฀vertical.฀There฀are฀two฀goals฀located฀at฀(0, 3)฀and฀(8, 3).฀A฀learning agent฀ plays฀ against฀ two฀ “passive”฀ opponents฀ and฀ learns฀ to฀ deliver฀ the฀ ball to฀the฀left฀goal.฀By฀passive฀opponents฀we฀mean฀they฀move฀in฀the฀field฀only to฀restrict฀movements฀of฀the฀learner฀and฀do฀not฀perform฀any฀action฀on฀the ball.฀The฀learner฀can฀choose฀among฀6฀actions:฀Left,฀Right,฀Up,฀Down,฀Pick,

88

M.฀Asadpour,฀M.N.฀Ahmadabadi,฀and฀R.฀Siegwart

Fig. 2. A screen-shot from the simulator

Put. If any player selects an action that results in touching the boundaries or going to a pre-occupied position by another player, that action is ignored. In a special case when the learner has picked the ball, if it hits the other players or a boundary then the ball would fall down. We test the learning simulations on 3 types of opponents: Static, Offender, and Midfielder. Static-players do not move at all. Offenders move randomly in x direction (i.e. they select randomly between Left and Right actions). Midfielders move randomly in all directions. States of the learner consists of absolute x and y position of ball and players, and status of ball from {Free, Picked } states, plus two state for left or right goal. This sums up to 3, 001, 252(= 2 × (7 × 5)4 + 2) states. Players act in parallel; so the next state of the learner when playing against offenders or midfielders would be a stochastically selected state among 4 or 16 different states, respectively. Each experiment consists of 100,000 episodes. Each episode starts by positioning the learner and ball in a random (and unoccupied) place. Each episode consists of maximum 1000 steps. Each step includes one movement by each player. An episode is finished if ball enters any goal or maximum number of steps passed. Upon delivering ball to the left goal the learner receives +1 reward. In case of wrong goal it receives -1 punishment. Otherwise it is punished with -0.01. All methods use SARSA with decaying -greedy policy. Learning rate is 0.1 and discount factor γ is 0.95. Exploration rate is initialized with 0.2 and decays by multiplying to decay factor 0.99999 after each episode. Leaf sample size is 360 (60 samples per action). For each state-abstraction method a range of confidence thresholds are tested to find the best settings; the one that minimizes the number of actions each learner executes from a random initial position to the goal. The number of actions is averaged over the last 10,000 episodes. This average shows how effective has the agent learned to make goal in fewer moves. The results are averaged over 30 runs. Table 1 shows number of actions in each episode averaged over the last 10,000 episodes. UKS and UIGR are new versions of KS and IGR with unbi-

Reduction฀of฀Learning฀Time฀for฀Robots฀Using฀Automatic฀State฀Abstraction

89

Fig. 3. Number of actions (left) and leaves of the tree (right) during learning episodes in midfielder case, averaged per every 1000 episodes (non-accumulated) Table 1. Number of moves in each episode averaged over the last 10,000 episodes Player SARSA KS Static 10.85±.01 11.23±.2 Offender 19.63 ± .06∗ 14.97±.57 Midfielder 668.36∗∗ 18.41±1.2 ∗ 14.72 after 1 million episodes

UKS IGR UIGR SMGR 11.35±.22 11.21±.05 11.23±.04 11.12±.03 14.91±.14 14.55±.15 14.58±.5 14.28±.07 17.39±.21 18.65±1.73 17.39±.24 16.84±.06 ∗∗ 18.92 after 15 million episodes

SANDS 11.1±.04 14.39±.09 16.95±.2

Table 2. Number of leaves in the learned trees SARSA Static 2452 Offender 120052 Midfielder 3001252

KS 692±119 1476±404 1857±253

UKS 596±92 1331±78 1651±141

GR 703±35 1536±302 2263±504

UIGR 666±28 1242±23 1533±48

SMGR 516±10 1259±26 1436±78

SANDS 491±7 1282±22 1313±77

Table 3. Computation time of the algorithms in minutes SARSA KS Static 1 14 Offender 11∗ 25 Midfielder 50∗∗ 35 ∗ 16฀min฀for฀1฀million฀episodes

IGR SMGR SANDS 15 13 10 27 18 12 40 27 15 ∗∗ 370฀min฀for฀15฀million฀episodes

ased฀sampling.฀Table฀2฀shows฀number฀of฀leaves฀in฀the฀learned฀trees(For฀SARSA it฀is฀the฀number฀of฀states).฀Confidence฀intervals฀are฀computed฀with฀significance level฀0.05.฀Table฀3฀shows฀computation฀time฀in฀minutes฀5 .฀Fig.฀3฀shows฀number of฀actions฀and฀leaves฀recorded฀during฀learning฀episodes฀in฀midfielder฀case.

5

Tests were done on a Dell R InspironTM 5150 laptop with [email protected] MHz.

90

M.฀Asadpour,฀M.N.฀Ahmadabadi,฀and฀R.฀Siegwart

5.1฀ Hierarchical฀Versus฀Flat฀RL From฀Table฀1,฀it฀is฀clear฀that฀although฀SARSA฀shows฀better฀performance฀in static-player฀case฀but฀it฀has฀quite฀poor฀performance฀in฀other฀cases.฀Even฀we let฀the฀agent฀continue฀its฀learning฀episodes฀to฀10฀and฀150฀times฀more฀(1฀and 15฀million)฀for฀offender฀and฀midfielder฀cases,฀respectively.฀The฀performance฀is still฀poor฀compared฀to฀hierarchical฀methods. By฀looking฀at฀time฀order฀of฀the฀HRL฀algorithms฀and฀flat฀RL฀we฀observe that฀SARSA฀takes฀only฀one฀minute฀to฀learn฀the฀static-player฀case,฀with฀higher efficiency฀ than฀ all฀ hierarchical฀ algorithms.฀ But฀ when฀ environment฀ becomes more฀nondeterministic฀the฀required฀computation฀time฀increases฀exponentially and฀efficiency฀of฀the฀results฀is฀far฀from฀that฀of฀hierarchical฀approaches.฀An interesting฀result฀for฀HRL฀algorithms฀is฀that฀the฀required฀time฀increases฀in logarithmic฀order฀while฀the฀number฀of฀states฀increases฀exponentially.฀These results฀show฀that฀HRL฀have฀a฀great฀potential฀in฀decreasing฀learning฀time฀and number฀of฀trials฀in฀online฀robot฀learning฀tasks. 5.2฀ Unbiased฀Sampling Table฀1฀shows฀as฀environment฀becomes฀more฀and฀more฀nondeterministic,฀positive฀effect฀of฀unbiased฀sampling฀becomes฀more฀clear.฀Although฀the฀average in฀UKS฀and฀UIGR฀is฀a฀little฀bit฀bigger฀than฀KS฀and฀IGR฀for฀static฀players, it฀is฀much฀smaller฀for฀midfielder฀case.฀Also฀number฀of฀leaves฀in฀the฀unbiased versions฀is฀always฀fewer฀than฀the฀regular฀versions;฀e.g.฀difference฀between฀IGR and฀UIGR฀in฀midfielder฀case฀is฀around฀700฀partitions.฀Confidence฀intervals are฀always฀less฀for฀unbiased฀versions฀and฀this฀means฀that฀number฀of฀leaves฀in their฀generated฀trees฀does฀not฀vary฀a฀lot. 5.3฀ Efficiency฀of฀SANDS฀and฀SMGR Table฀1฀and฀Table฀2฀show฀in฀all฀cases฀SANDS฀and฀SMGR฀create฀trees฀with better฀performance฀than฀all฀other฀methods฀(even฀unbiased฀versions).฀The฀difference฀becomes฀more฀clear฀as฀the฀environment฀becomes฀more฀nondeterministic.฀In฀midfielder฀player฀case฀the฀difference฀between฀the฀new฀criteria฀and฀the old฀ones฀is฀about฀2฀actions฀per฀episode.฀The฀same฀is฀true฀for฀size฀of฀the฀generated฀trees.฀In฀almost฀all฀cases฀(except฀in฀UIGR,฀offender฀player฀with฀negligible difference)฀SANDS฀and฀SMGR฀generate฀smaller฀trees. Figure฀3฀shows฀that฀SANDS฀and฀SMGR฀converge฀slower฀than฀the฀others. Although฀this฀can฀be฀problematic฀when฀state฀space฀is฀very฀big,฀in฀the฀sense that฀their฀performance฀does฀not฀increase฀very฀much฀for฀a฀long฀time,฀but฀it is฀expected฀that฀after฀a฀time,฀their฀performance฀anticipate฀the฀other฀criteria. Table฀1฀and฀Fig.฀3฀(Left)฀confirm฀that฀splits฀in฀SANDS฀and฀SMGR฀are฀selected฀more฀carefully฀because฀not฀only฀their฀tree฀is฀the฀smallest,฀but฀also฀their performance฀is฀the฀best.

Reduction฀of฀Learning฀Time฀for฀Robots฀Using฀Automatic฀State฀Abstraction

91

SMGR฀shows฀a฀small฀enhancement฀in฀offender฀and฀midfielder฀case฀compared฀to฀SANDS.฀It฀is฀due฀to฀a฀penalty฀term฀that฀SMGR฀adds฀to฀the฀splitting criterion฀which฀rejects฀some฀inefficient฀splits.฀In฀static฀and฀offender฀case฀it฀is not฀evident฀that฀which฀one฀generates฀fewer฀partitions.฀But฀in฀midfielder฀case SANDS฀generates฀trees฀with฀around฀100฀leaves฀less. 5.4฀ Computation฀Time As฀explained฀in฀Sect.฀4.2,฀time฀order฀of฀SANDS฀and฀SMGR฀is฀less฀than฀KS and฀IGR.฀In฀static-player฀case฀the฀difference฀is฀not฀clear฀but฀in฀other฀cases SANDS฀decreases฀computation฀time฀to฀half฀compared฀to฀KS฀and฀even฀more compared฀ to฀ IGR.฀ A฀ part฀ of฀ the฀ reduction฀ in฀ computation฀ time฀ is฀ due฀ to reduction฀in฀the฀order฀of฀sort฀algorithm฀in฀sample฀lists.฀Another฀part฀is฀due to฀creating฀smaller฀trees.฀This฀is฀important฀because฀it฀also฀affects฀the฀speed of฀knowledge฀retrieval฀and฀the฀required฀memory฀of฀the฀trees.฀Another฀part฀is due฀to฀reduction฀in฀the฀number฀of฀actions฀in฀episodes.

6฀฀Conclusion In฀this฀paper฀we฀formalized฀state฀abstraction฀as฀an฀optimization฀problem฀and by฀mathematical฀formulation฀we฀derived฀two฀new฀criteria฀for฀split฀selection. The฀new฀criteria฀adapt฀decision฀tree฀learning฀techniques฀to฀state฀abstraction. We฀showed฀in฀simulation฀that฀the฀new฀criteria฀outperform฀the฀existing฀criteria not฀only฀in฀efficiency฀and฀size฀of฀the฀learned฀trees฀but฀also฀in฀computation time.฀We฀believe฀that฀criteria,฀like฀KS฀and฀IGR,฀judge฀splits฀based฀only฀on their฀rank฀or฀probability฀of฀sample-values฀without฀taking฀their฀magnitude฀into account฀while;฀they฀are฀very฀informative฀for฀state฀abstraction.฀SANDS฀and SMGR฀mix฀probability฀and฀q-value฀of฀actions. We฀have฀tested฀other฀criteria฀like฀Information-Gain฀[19],฀Gini-Index฀[18] and฀Students฀T-test฀[10]฀but฀the฀results฀were฀with฀limited฀success,฀as฀reported in฀[10].฀Students฀T-test฀has฀poor฀performance฀when฀variance฀over฀the฀splits gets฀close฀to฀zero฀because฀of฀a฀variance฀term฀in฀its฀divisor.฀Information-Gain and฀Gini-Index฀have฀poor฀performance฀in฀nondeterministic฀environments. Inability฀to฀revise฀the฀tree฀in฀non-stationary฀environments฀is฀a฀drawback of฀Univariate฀methods.฀Linear฀Multivariate฀[16]฀splits฀are฀perhaps฀more฀suitable฀candidates฀as฀they฀use฀a฀weighted฀sum฀of฀state-variables฀in฀split฀point. Adjusting฀these฀weights฀might฀be฀easier฀than฀reordering฀the฀nodes฀of฀the฀tree.

Acknowledgment This฀research฀is฀funded฀by฀the฀Future฀and฀Emerging฀Technologies฀programme฀(ISTFET)฀of฀the฀European฀Community,฀under฀grant฀IST-2001-35506.฀The฀information provided฀is฀the฀sole฀responsibility฀of฀the฀authors฀and฀does฀not฀reflect฀the฀Community’s฀opinion.฀The฀Community฀is฀not฀responsible฀for฀any฀use฀that฀might฀be฀made of฀ data฀ appearing฀ in฀ this฀ publication.฀ The฀ Swiss฀ participants฀ to฀ the฀ project฀ are supported฀under฀grant฀01.0573฀by฀the฀Swiss฀Government.

92

M.฀Asadpour,฀M.N.฀Ahmadabadi,฀and฀R.฀Siegwart

References 1.฀ Sutton฀ R.S.฀ and฀ Barto฀ A.:฀ Reinforcement฀ Learning:฀ An฀ Introduction,฀ MIT Press,฀Cambridge,฀MA฀(1996) 2.฀ Uther฀W.฀and฀Veloso฀M.:฀The฀Lumberjack฀Algorithm฀for฀Learning฀Linked฀Decision฀Forests,฀In฀Proc.฀of฀PRICAI-2000฀(2000) 3.฀ Barto฀A.G.฀and฀Mahadevan฀S.:฀Recent฀Advances฀in฀Hierarchical฀Reinforcement Learning,฀Discrete฀Event฀Dynamic฀Systems,฀13(4):41–77฀(2003) 4.฀ Sutton฀R.S.,฀Precup฀D.,฀and฀Singh฀S.:฀Between฀MDPs฀and฀semi-MDPs:฀A฀framework฀for฀temporal฀abstraction฀in฀reinforcement฀learning,฀Artifcial฀Intelligence, 112:181–211฀(1999) 5.฀ Thrun฀S.B.,฀and฀Schwartz฀A.:฀Finding฀structure฀in฀reinforcement฀learning,฀In Tesauro฀G.,฀Touretzky฀D.S.,฀and฀Leen฀T.(eds),฀Advances฀in฀Neural฀Information Processing฀Systems,฀pp.385–392,฀MIT฀Press,฀Cambridge,฀MA(1995) 6.฀ Brooks฀R.A.:฀Achieving฀artificial฀intelligence฀through฀building฀robots,฀Technical Report฀A.I.฀Memo฀899,฀AI฀Lab.,฀MIT,฀Cambridge,฀MA฀(1986) 7.฀ Dayan฀ P.฀ and฀ Hinton฀ G.E.:฀ Feudal฀ reinforcement฀ learning,฀ In฀ Hanson฀ S.J., Cowan฀ J.D.,฀ and฀ Giles฀ C.L.(eds),฀ Neural฀ Information฀ Processing฀ Systems฀ 5, San฀Mateo,฀CA,฀Morgan฀Kaufmann฀(1993) 8.฀ Moore฀A.W.:฀The฀parti-game฀algorithm฀for฀variable฀resolution฀reinforcement learning฀in฀multidimensional฀state-spaces,฀In฀Cowan฀J.D.,฀Tesauro฀G.,฀and฀Alspector฀J.(eds),฀Advances฀in฀Neural฀Information฀Processing฀Systems,฀6:711-718, Morgan฀Kaufmann฀(1994) 9.฀ Chapman฀D.,฀and฀Kaelbling฀L.P.:฀Input฀generalization฀in฀delayed฀reinforcement learning:฀An฀algorithm฀and฀performance฀comparisons,฀In฀Proc.฀of฀the฀12th฀International฀Joint฀Conf.฀on฀Artificial฀Intelligence฀(IJCAI-91),฀pp.726-731฀(1991) 10.฀ McCallum฀A.:฀Reinforcement฀Learning฀with฀Selective฀Perception฀and฀Hidden State,฀PhD฀thesis,฀Computer฀Science฀Dept.,฀Univ.฀of฀Rochester฀(1995) 11.฀ Uther฀W.T.B.฀and฀Veloso฀M.M.:฀Tree฀based฀discretization฀for฀continuous฀state space฀reinforcement฀learning,฀In฀Proc.฀of฀the฀15th฀National฀Conf.฀on฀Artificial Intelligence,฀AAAI-Press/MIT-Press,฀pp.769-774฀(1998) 12.฀ Uther฀W.T.B.:฀Tree฀Based฀Hierarchical฀Reinforcement฀Learning.฀PhD฀thesis, Dept.฀of฀Computer฀Science,฀Carnegie฀Mellon฀Univ.,฀Pittsburgh,฀PA,฀USA฀(2002) 13.฀ Jonsson฀A.฀and฀Barto฀A.G.:฀Automated฀state฀abstraction฀for฀options฀using฀the U-tree฀algorithm,฀In฀Advances฀in฀Neural฀Information฀Processing฀Systems:฀Proc. of฀the฀2000฀Conf.,฀pp.1054–1060,฀MIT฀Press,฀Cambridge,฀MA฀(2001) 14.฀ Dean฀T.,and฀Robert฀G.:Model฀minimization฀in฀markov฀decision฀processes,฀In Proc.฀AAAI-97,฀p.76฀(1997) 15.฀ Singh฀S.P.,฀Jaakola฀T.,฀and฀Jordan฀M.I.:฀Reinforcement฀learning฀with฀soft฀state aggregation,฀In฀Tesauro฀G.,฀Touretzky฀D.S.,฀and฀Leen฀T.K.(eds),฀Neural฀Information฀Processing฀Systems฀7,฀MIT฀Press,฀Cambridge,฀MA฀(1995) 16.฀ Yildiz฀O.T.,฀Alpaydin฀E.:฀Omnivariate฀Decision฀Trees,฀IEEE฀Trans.฀on฀Neural Networks,฀12(6):1539–1546฀(2001) 17.฀ Au฀M.,฀Maire฀F.:฀Automatic฀State฀Construction฀using฀Decision฀Tree฀for฀Reinforcement฀Learning฀Agents,฀International฀Conf.฀on฀Intelligent฀Agents,฀Web Technologies฀and฀Internet฀Commerce฀(CIMCA),฀Gold฀Coast,฀Australia฀(2004) 18.฀ Breiman฀L.,฀Friedman฀J.,฀Olshen฀R.,฀Stone฀C.:฀Classification฀and฀Regression Trees,฀Wadsworth,฀Pacific฀Grove,฀CA฀(1984) 19.฀ Quinlan฀R.:฀Induction฀of฀decision฀trees,฀Machine฀Learning,฀1:81–106฀(1986)

Efficient Failure Detection for Mobile Robots Using Mixed-Abstraction Particle Filters Christian Plagemann, Cyrill Stachniss, and Wolfram Burgard University of Freiburg Georges-Koehler-Allee 79110 Freiburg, Germany {plagem,stachnis,burgard}@informatik.uni-freiburg.de Summary. In this paper, we consider the problem of online failure detection and isolation for mobile robots. The goal is to enable a mobile robot to determine whether the system is running free of faults or to identify the cause for faulty behavior. In general, failures cannot be detected by solely monitoring the process model for the error free mode because if certain model assumptions are violated the observation likelihood might not indicate a defect. Existing approaches therefore use comparably complex system models to cover all possible system behaviors. In this paper, we propose the mixed-abstraction particle filter as an efficient way of dealing with potential failures of mobile robots. It uses a hierarchy of process models to actively validate the model assumptions and distribute the computational resources between the models adaptively. We present an implementation of our algorithm and discuss results obtained from simulated and real-robot experiments.

1 Introduction Whenever mobile robots act in the real world, they are affected by faults and abnormal conditions. Detecting such situations and allowing the robot to react appropriately is a major precondition for truly autonomous vehicles. While the applied techniques need to be able to reliably detect rare faults, the overall estimation process under error-free conditions should not be substantially more complex compared to systems that are optimized for the normal operational mode. Separate monitoring processes that use more complex models to cover all possible system behaviors introduce an unnecessary high computational load. In this paper, we introduce mixed-abstraction particle filters as an effective means for adaptively distributing the computational resources between different system models based on the estimated validity of their specific model assumptions. The term ”fault detection” is commonly referred to as the detection of an abnormal condition that may prevent a functional unit from performing a

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 93–107, 2006. © Springer-Verlag Berlin Heidelberg 2006

94

C.฀Plagemann,฀C.฀Stachniss,฀and฀W.฀Burgard

Fig.฀1.฀The฀left฀image฀depicts฀a฀simulated฀robot฀before฀colliding฀with฀an฀obstacle which฀is฀not฀detected฀by฀its฀sensors.฀The฀right฀photograph฀shows฀a฀real฀mobile฀robot that฀collides฀with฀an฀undetected฀glass฀door฀while฀moving฀on฀its฀planned฀trajectory to฀the฀neighboring฀room.

required฀function฀[11].฀Most฀works฀in฀the฀fault฀detection฀and฀isolation฀literature฀deal฀with฀internal฀faults฀such฀as฀defects฀in฀hardware฀or฀software.฀For฀the mobile฀robot฀domain,฀we฀apply฀the฀same฀nomenclature฀to฀external฀influences like฀collisions฀or฀wheel฀slip฀since฀their฀effects฀are฀similar฀to฀those฀of฀internal defects฀and฀the฀resulting฀models฀have฀the฀same฀structure. As฀an฀illustrating฀scenario,฀consider฀a฀mobile฀robot฀equipped฀with฀wheel encoders,฀a฀laser฀range฀finder,฀and฀a฀sufficiently฀accurate฀map฀of฀the฀environment.฀In฀the฀fault-free฀case,฀the฀position฀of฀the฀robot฀can฀be฀tracked฀using a฀ standard฀ tracking฀ algorithm฀ such฀ as฀ a฀ Kalman฀ filter฀ or฀ a฀ particle฀ filter with฀a฀simplistic฀odometry-based฀motion฀model,฀which฀is฀formally฀given฀in Section฀3.3.฀In฀odometry-based฀models,฀the฀next฀system฀state฀x฀t is฀directly predicted฀from฀the฀odometry,฀which฀is฀the฀measurement฀o฀t obtained฀from฀the wheel฀encoders. Although฀such฀models฀allow฀us฀to฀evaluate฀different฀state฀hypotheses฀by weighting฀them฀using฀exteroceptive฀measurements,฀e.g.,฀using฀a฀laser฀range measurement l฀,฀they฀do฀not฀directly฀allow฀us฀to฀detect฀collisions฀with฀obstacles t that฀ cannot฀ be฀ perceived฀ by฀ the฀ sensors฀ of฀ the฀ robot.฀ This฀ is฀ due฀ to฀ the fact฀that฀when฀the฀robot฀stops฀moving,฀its฀wheel฀encoders฀do฀not฀record฀any motion,฀which฀is฀perfectly฀consistent฀with฀the฀recorded฀laser฀measurements. Therefore,฀no฀filter฀degradation฀occurs฀and฀there฀is฀no฀possibility฀to฀detect such฀faults฀inside฀the฀filter.฀One฀typical฀solution฀to฀overcome฀such฀problems is฀ to฀ compare฀ the฀ estimated฀ trajectory฀ with฀ the฀ planned฀ one฀ on฀ a฀ higher system฀level.฀As฀major฀drawbacks฀of฀such฀an฀approach,฀one฀cannot฀infer฀the actual฀ cause฀ for฀ the฀ deviation฀ from฀ the฀ planned฀ trajectory฀ and฀ the฀ system architecture฀is฀complicated฀by฀the฀stronger฀connection฀between฀the฀planning and฀tracking฀module. An฀alternative฀solution฀is฀to฀consider฀the฀actual฀motion฀commands฀that have฀been฀sent฀to฀the฀motors฀instead฀of฀just฀using฀the฀wheel฀encoder฀readings.฀However,฀this฀makes฀the฀system฀model฀substantially฀more฀complex฀and

Efficient Failure Detection for Mobile Robots

95

the predictions, which are now based on motor currents and accelerations, less accurate. In our experiments, we observed that such a model is around 32 times slower to compute than the odometry-based model. It is important to note that the odometry-based model makes the implicit assumption, that the wheel encoder measurements reflect the intended motion. If this assumption is violated, the standard estimation technique cannot be used to estimate the joint probability p(x, f ) anymore, where x stands for the pose of the robot and f indicates the failure state. In this paper, we propose to make the model assumptions explicit and to build a model abstraction hierarchy. We present the mixed-abstraction particle filter algorithm that uses such a hierarchy to direct computational resources to the most efficient model whose assumptions are met. In this way, it minimizes the computational load while maximizing the robustness of the system.

ot xt-1

Ut-1 xt

xt-1 lt

xt ot

lt

Fig. 2. Arbitrary model hierarchy (left) with an unrestricted model M0 , several restricted models M1 , M2 , M3 as well as the specific assumptions Ai→j that restrict the state spaces respectively. Two models for the same physical process: the standard odometry-based model (e.g. M1 ) that uses the odometry measurements ot as controls (middle) and the laser measurements lt as sensor inputs. A less restricted model (e.g. M0 , on the right) that includes the actual motion commands ut as controls.

The paper is organized as follows. After the discussion of related work, we present our mixed-abstraction particle filter algorithm and its application to monitoring mobile robots in Section 3. Finally, Section 4 presents experimental results.

2 Related Work Model-based diagnosis has been approached from within the AI community using symbolic reasoning with a focus on large systems with many interacting components and from the control theory community concentrating on fewer components with complex dynamics and higher noise levels [8]. With symbolic approaches, the system is typically assumed to move between discrete

96

C.฀Plagemann,฀C.฀Stachniss,฀and฀W.฀Burgard

steady฀states฀[17].฀Here,฀diagnosis฀is฀often฀based฀on฀system฀snapshots฀without฀a฀history.฀Krysander฀[8]฀proposed฀a฀hybrid฀model฀consisting฀of฀discrete fault฀modes฀that฀switch฀between฀differential฀equations฀to฀describe฀the฀system behavior.฀The฀diagnosis฀system฀is฀designed฀for฀large฀systems฀with฀low฀noise levels,฀where฀instantaneous฀statistical฀tests฀are฀sufficient฀to฀identify฀a฀faulty component. As฀Dearden฀and฀Clancy฀[3]฀pointed฀out,฀the฀close฀coupling฀between฀a฀mobile฀system฀with฀its฀environment฀makes฀it฀hard฀to฀apply฀discrete฀diagnosis models฀ directly,฀ because฀ extremely฀ complex฀ monitoring฀ components฀ would have฀to฀be฀used.฀A฀more฀robust฀and฀efficient฀way฀is฀to฀reason฀directly฀on฀the continuous฀sensor฀readings.฀As฀a฀result,฀probabilistic฀state฀tracking฀techniques have฀been฀applied฀to฀this฀problem.฀Adopted฀paradigms฀range฀from฀Kalman filters฀ [16]฀ to฀ particle฀ filters฀ in฀ various฀ modelings฀ [1,฀ 3,฀ 12].฀ Particle฀ filters represent฀the฀belief฀about฀the฀state฀of฀the฀system฀by฀a฀set฀of฀state฀samples, which฀are฀moved฀when฀actions฀are฀performed฀and฀re-weighted฀when฀sensor measurements฀are฀integrated฀(see฀Dellaert฀et฀al.฀[4]).฀In฀particle฀filter฀based approaches฀to฀failure฀diagnosis,฀the฀system฀is฀typically฀modeled฀by฀a฀dynamic mixture฀of฀linear฀processes฀[2]฀or฀a฀non-linear฀Markov฀jump฀process฀[5].฀Benanzera et฀al. [1]฀combine฀consistency-based฀approaches,฀i.e.฀the฀Livingstone system,฀with฀particle฀filter฀based฀state฀estimation฀techniques. Verma฀et฀al.฀[15]฀introduce฀the฀variable฀resolution฀particle฀filter฀for฀failure detection.฀Their฀approach฀is฀similar฀to฀ours฀in฀that฀they฀build฀an฀abstraction hierarchy฀of฀system฀models.฀The฀models฀of฀consideration฀build฀a฀partition฀of the฀complete฀state฀space฀and฀the฀hierarchy฀is฀defined฀in฀terms฀of฀behavioral similarity.฀Our฀focus฀in฀contrast฀lies฀on฀switching฀between฀overlapping฀system models฀for฀certain฀parts฀on฀the฀state฀space.฀Our฀model฀hierarchy฀is฀therefore based฀on฀efficiency฀differences฀and฀explicit฀model฀assumptions฀about฀the฀system฀state.฀The฀two฀approaches฀should฀therefore฀be฀seen฀as฀complementary rather฀than฀alternatives. Other฀approaches฀that฀deal฀with฀the฀time฀efficiency฀of฀particle฀filters฀include฀Kwok฀et฀al.฀[10]฀in฀which฀real-time฀constraints฀are฀considered฀for฀single system฀ models฀ or฀ techniques฀ in฀ which฀ a฀ Rao-Blackwellized฀ particle฀ filter฀ is used฀to฀coordinate฀multiple฀models฀for฀tracking฀moving฀objects฀[9].

3 Particle Filters for Sequential State Estimation A mobile robot can be modeled as a dynamic system under the influence of issued control commands ut and received sensor measurements zt . The temporal evolution of the system state xt can be described recursively using the formalism of the so called Bayes filter

Efficient Failure Detection for Mobile Robots

97

Algorithm 1 Particle filter(Xt−1 , ut−1 , zt ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:

X t = Xt = ∅ for m = 1 to M do [m] [m] sample xt ∼ p(xt | ut−1 , xt−1 ) [m] [m] wt = p(zt | xt ) [m] [m] X t = X t + xt , wt end for for m = 1 to M do [i] draw particle i with probability ∝ wt [i] add xt to Xt end for return Xt

p(xt | z0:t , u0:t−1 ) = = ηt

p(xt | zt , ut−1 , xt−1 ) p(xt−1 | z0:t−1 , u0:t−2 ) dxt−1 p(zt | xt ) observation model

(1)

p(xt | ut−1 , xt−1 ) p(xt−1 | z0:t−1 , u1:t−2 ) dxt−1 . (2) motion model

recursive term

The term ηt is a normalizing constant ensuring that the left-hand side sums up to one over all xt . With Equation 1, we assume Markovian dependencies, namely that xt only depends on the most recent measurement zt and control command ut−1 given knowledge about the preceding state of the system xt−1 . Particle filters are an implementation of the Bayes filter and can be used to efficiently estimate the posterior p(xt ) in a sequential manner. Here, the pos[m] [m] terior is represented by a discrete set of weighted samples X = { xt , wt }. With the sampled representation, the integral in Equation 2 simplifies to a finite sum over the samples resulting from the previous iteration. The motion model and the observation model can be applied directly to move and weight the individual samples respectively. Algorithm 1 formulates the standard particle filtering approach [14]. In Algorithm 1, the state of the system at time t is represented by a set Xt [m] of state samples xt . In Line 3, we perform a state prediction step using [m] the external motion command ut−1 and the motion model p(xt | ut−1 , xt−1 ). Line 4 incorporates the current sensor measurement zt by re-weighting the [m] state samples according to the measurement model p(zt | xt ). From Line 7 to 10, a resampling step is performed to concentrate the samples on highprobability regions of the state space. We refer to Thrun et al. [14] for details about the resampling step and its efficient implementation. For the odometry-based model that treats the odometry measurements as controls, we have ut−1 = ot and zt = lt , where ot is the odometry measurement and lt is a perception of the environment. For the dynamic model depicted

98

C.฀Plagemann,฀C.฀Stachniss,฀and฀W.฀Burgard

in฀ the฀ right฀ diagram฀ of฀ Figure฀ 2,฀ ut−1฀ is฀ the฀ actual฀ control฀ command฀ and zt฀ =฀ ot , lt .฀Both฀models฀are฀described฀in฀more฀detail฀in฀Section฀3.3. 3.1฀ Process฀Model฀Hierarchy A฀fundamental฀problem฀in฀science฀and฀engineering฀is฀choosing฀the฀right฀level of฀ abstraction฀ for฀ a฀ modeled฀ system.฀ While฀ complex฀ and฀ high-dimensional models฀promise฀high฀estimation฀accuracy,฀models฀of฀less฀complexity฀are฀often฀significantly฀more฀efficient฀and฀easier฀to฀construct.฀In฀the฀area฀of฀mobile robotics,฀the฀accuracy-efficiency฀trade-off฀is฀an฀important฀issue,฀since฀on฀the one฀hand,฀computational฀resources฀are฀strictly฀limited฀in฀online฀problems฀and on฀the฀other฀hand,฀estimation฀errors฀have฀to฀be฀avoided฀to฀prevent฀serious฀malfunctioning.฀We฀therefore฀propose฀an฀online฀model฀selection฀algorithm฀with adaptive฀resource฀allocation฀based฀on฀the฀Bayesian฀framework. An฀abstraction฀hierarchy฀for฀process฀models฀is฀given฀by฀the฀specific฀assumptions฀that฀the฀different฀models฀make฀about฀the฀world฀(compare฀Figure฀2). We฀define฀the฀model฀abstraction฀hierarchy฀as฀an฀acyclic฀directed฀graph฀with the฀different฀system฀models฀Mi฀ as฀nodes฀and฀their฀model฀assumptions฀Ai→j as฀edges,฀leading฀from฀the฀more฀general฀model฀i฀to฀a฀more฀restricted฀one฀j.฀A model฀assumption฀Ai→j฀ is฀defined฀as฀a฀binary฀function฀on฀the฀state฀space฀of the฀unrestricted฀model฀Mi . As฀an฀example,฀consider฀the฀process฀model฀for฀a฀mobile฀robot฀that฀should be฀able฀to฀continuously฀localize฀itself฀in฀a฀given฀map.฀A฀general฀model฀M0 would฀include฀the฀the฀pose฀of฀the฀robot฀x฀and฀additionally฀take฀physical฀factors฀ like฀ ground฀ friction,฀ tire฀ pressure,฀ load฀ balance,฀ motor฀ characteristics, etc.฀into฀account฀and฀treat฀those฀as฀additional฀state฀variables฀in฀a฀state฀vector฀ f .฀ In฀ most฀ situations,฀ however,฀ it฀ is฀ quite฀ common฀ and฀ reasonable฀ to assume฀ a฀ simpler฀ model฀ M1 ,฀ where฀ these฀ additional฀ variables฀ are฀ constant and฀do฀not฀need฀to฀be฀estimated฀during฀operation.฀Formally,฀the฀state฀space of฀M1฀ is฀therefore฀{x, f฀| f฀=฀const},฀which฀is฀a฀projection฀of฀the฀more฀general space฀{x,฀f } of฀model M0 .฀The฀assumption฀A0→1฀ would฀in฀this฀case฀be฀defined as true฀ : f฀=฀const A0→1 (x,฀f )฀:=฀ false฀ : f฀=฀const. It฀is฀important฀to฀note฀that฀the฀validity฀of฀an฀assumption฀can฀only฀be฀tested in฀a฀less฀restricted฀state฀space,฀where฀this฀assumption฀is฀not฀made.฀In฀practice, this฀means฀that฀we฀have฀to฀test฀for฀every฀edge฀in฀the฀model฀abstraction฀graph the฀associated฀assumption฀using฀the฀more฀general฀model.฀As฀a฀measure฀for the฀validity฀of฀an฀assumption฀A˜ at time t, we use the ratio ˜ := vt (A)

A

p(zt | x[m] )

p(zt | x[m] )

1 |A| 1 |X |

,

(3)

˜ [m] )} of all particles X for which where A is defined as the subset {x[m] | A(x A˜ is valid. More informally, we compute the amount of evidence in favor of

Efficient Failure Detection for Mobile Robots

99

Algorithm 2 Mixed-Abstraction Particle filter 1: Calculate samples for the unrestricted model Mi until the assumption Ai→j (˜ xt ) is valid for a minimal number of samples x ˜t 2: Build a first estimate of vt (Ai→j ) according to Equation 3 3: repeat 4: if vt (Ai→j ) >= Θ then 5: Calculate samples for Mj 6: else 7: Calculate samples for Mi 8: end if 9: until either Mi received enough samples or (vt (Ai→j ) >= Θ and Mj received enough samples)

a restricted state space relative to the unrestricted case. The quantity vt (A˜) is based on the current approximation of the posterior distribution by the particle filter. 3.2 Adaptive Model Selection To adaptively switch between alternative system models, the validity of the model assumptions have to be estimated online and computational resources have to be distributed among the appropriate models. The distribution of resources is done by increasing or decreasing the number of particles for the different models. To achieve this, we apply the following algorithm that takes as input the model abstraction hierarchy graph defined in the previous section. When a new measurement zt is obtained, the mixed abstraction particle filter algorithm draws samples from the particle set representing the current posterior Xti−1 for all system models i, incorporates the measurement, and builds new posterior distributions Xti . The key question in this update step is which model posterior Xti should receive how many samples. We base this decision on the estimated validity of the model assumptions Ai→j . If the estimated quantity vt (Ai→j ) drops below a predefined threshold Θ, we sample into the more complex model Mi and otherwise prefer the more efficient one Mj . This decision is made repeatedly on a per particle basis until a model has received enough samples and all its assumptions are validated. In each iteration, we start with the most unrestricted model Mi and perform the following steps for each of its outgoing edges Ai→j . In the update steps mentioned above, the samples are taken from the j previous posterior distributions Xt−1 , if assumption Ai→j was valid in the i previous step and from Xt−1 otherwise. When all outgoing edges Ai→j of model Mi have been processed in the described manner, the same update is applied to models Mj further down in the hierarchy until either the leaf nodes have been processed or one of the assumptions did not receive enough evidence to justify further model simplifications. Several quantities like the numbers of samples necessary for each model (Line 9) or the validity threshold Θ (Line 4) have to be determined in an offline

100

C.฀Plagemann,฀C.฀Stachniss,฀and฀W.฀Burgard

learning฀step.฀For฀the฀experimental฀results฀reported฀below,฀we฀optimized฀these values฀on฀a฀set฀of฀representative฀trajectories,฀recorded฀from฀real฀and฀simulated robots. To฀recapitulate,฀the฀mixed฀abstraction฀particle฀filter฀estimates฀the฀system state฀by฀running฀several฀particle฀filters฀in฀parallel,฀each฀using฀a฀different฀system฀model.฀Samples฀are฀assigned฀applying฀the฀following฀rule.฀For฀each฀two alternative฀system฀models,฀the฀simpler฀one฀is฀prefered฀as฀long฀as฀there฀is฀positive฀evidence฀for฀the฀validity฀of฀the฀corresponding฀model฀assumption. 3.3฀ Motion฀Models฀for฀Mobile฀Robots The฀standard฀odometry-based฀motion฀model฀for฀a฀wheeled฀robot฀estimates฀the posterior฀ p(xt฀ | xt−1 , ot )฀ about฀ the฀ current฀ pose฀ xt฀ based฀ on฀ the฀ previous pose฀xt−1฀ and฀the฀wheel฀encoder฀measurement฀ot .฀A฀popular฀approach฀[6]฀to represent฀the฀relative฀movement฀is฀to฀use฀three฀parameters,฀an฀initial฀rotation α,฀a฀translation฀d,฀and฀a฀second฀rotation฀β฀as฀illustrated฀in฀Figure฀3.฀Typically, one฀uses฀a฀Gaussian฀distribution฀for฀each฀of฀these฀parameters฀to฀model฀the noise.

measured pose

β’

β

d’ initial pose

d α

α’

final pose path

Fig. 3. Parameters of the standard odometry-based motion model.

Under the influence of events like the collision with an obstacle or wheel slippage, the odometry-based model is not applicable anymore since the wheel encoder measurements do not provide useful information about the actual motion. To handle such situations, we construct an alternative model, termed dynamic motion model, that depends on the actual motion commands that were sent to the motors of the robot. This model includes the geometry of the robot and its physical attributes like mass and moment of inertia. For each wheel, we compute the influence of the velocity command on the translational and rotational energy of the robot. In this representation, we can directly incorporate the effects of collisions, slippage, and deflating tires. We then convert the energy state of the system to its speed and obtain a state prediction for the next time step. It is important to note that the dynamic motion model is not designed for the failure states only. Rather it is able to deal with normal conditions

Efficient Failure Detection for Mobile Robots

101

as well. It is therefore considered as more general as the standard odometrybased model in our model abstraction hierarchy. The assumption placed on the system state by the odometry-based model is, that there are no external influences like collisions, slippage, etc.

4 Experiments 4.1 Quantitative Evaluation Using a Simulator To quantitatively evaluate our system, we performed several simulation experiments and compared the results to the known ground truth. We used the high-fidelity simulator Gazebo [7], in which physics and motion dynamics are simulated accurately using the Open Dynamics Engine [13]. In several practical experiments carried out with real robots, we experienced the Gazebo simulator as well suited for studying the motion dynamics of robots even under highly dynamic events like collisions. For example, we did not have to change any system parameters when we ported our system from the simulator to the real robot. To demonstrate how the proposed algorithm coordinates multiple particle filters that have been designed independently, we confronted the system with two different faults within one scenario. A simulated ActivMedia Pioneer 2DX robot (see the left image of Figure 1) was placed in the corridor of a 3D office environment. We manually steered the robot through this environment. On its path, it encountered a collision with an undetectable object. After that, its left tire started to deflate. Four filters were used independently to track the state of the system. The first filter was based on the standard odometry-based motion model described in Section 3.3. The second filter used the dynamic motion model described in the previous section and also included a model for collision faults. The third model was also based on the dynamic motion model but was capable of dealing with deflating tires. Finally, the forth filter was the proposed mixed-abstraction filter that combined all of the filters described above. Figure 4 depicts the true trajectory as well as the tracking results obtained with the individual filters overlayed on a map built from laser measurements in a real building. The three arrows in the diagram mark the following three events: a collision with an undetected glass door (1), a point where the robot stopped backing off and turned around (2), and a point where the left tire of the robot started losing air (3). As can be seen from the figure, the filter that is able to deal with deflating tires diverges immediately after the collision at point 1. Since the filter able to deal with collisions cannot deal with deflating tires, it diverges at point 3. The odometry-based model keeps track of the robot despite the collision at point 1, however it is not aware of any fault at this point. The combined filter in contrast succeeds in tracking the robot despite of both incidents.

102

C.฀Plagemann,฀C.฀Stachniss,฀and฀W.฀Burgard

The฀ middle฀ and฀ lower฀ image฀ of฀ Figure฀ 4฀ plot฀ the฀ internal฀ states฀ of฀ the specialized฀detectors฀within฀the฀mixted-abstraction฀filter.฀These฀values฀reflect the฀belief฀of฀the฀system฀about฀the฀presence฀of฀certain฀faults.฀The฀middle฀image plots฀the฀relative฀number฀of฀particles฀in฀the฀fault฀mode฀of฀the฀collision฀detector over฀time.฀As฀can฀be฀seen,฀this฀number฀raises฀significantly฀shortly฀after฀the collision฀as฀well฀as฀after฀the฀full฀stop฀at฀point฀2.฀After฀the฀robot฀had฀been intenionally฀stopped฀there,฀the฀system฀cannot฀know฀whether฀an฀obstacle฀is in฀ its฀ way฀ or฀ not.฀ The฀ lower฀ image฀ of฀ Figure฀ 4฀ shows฀ the฀ evolution฀ of฀ the relative฀number฀of฀particles฀in฀the฀fault฀mode฀of฀the฀deflation฀detector.฀Since the฀ collision฀ at฀ point฀ 1฀ has฀ been฀ handled฀ by฀ the฀ collision฀ detection฀ within the฀mixed-abstraction฀filter,฀this฀filter฀does฀not฀switch฀to฀a฀failure฀mode฀until point฀3.฀At฀that฀point,฀the฀filter฀switches฀into฀its฀failure฀mode฀and฀in฀this฀way enables฀the฀mixed-abstraction฀filter฀to฀keep฀track฀of฀the฀pose฀of฀the฀robot. 4.2฀ Analyzing฀the฀Gain฀in฀Efficiency In฀ this฀ experiment,฀ we฀ quantatively฀ evaluate฀ the฀ gain฀ in฀ efficiency฀ that฀ we achieve฀ by฀ dynamically฀ distributing฀ samples฀ between฀ the฀ individual฀ filters. In฀the฀modeled฀scenario,฀a฀simulated฀robot฀follows฀a฀trajectory฀and฀collides twice฀with฀an฀obstacle,฀which฀is฀too฀small฀to฀be฀detected฀by฀its฀laser฀range finder.฀ After฀ the฀ collisions,฀the฀ robot฀backs฀ off฀and฀continues฀ its฀ task.฀ The top฀diagram฀of฀Figure฀5฀shows฀the฀trajectory฀of฀the฀vehicle฀and฀the฀locations where฀the฀algorithm฀correctly฀detected฀a฀collision.฀The฀other฀two฀diagrams illustrate฀the฀failure฀detection฀process฀of฀the฀same฀simulation฀run.฀The฀bar at฀the฀bottom฀indicates฀the฀true฀time฀stamps฀of฀the฀faults.฀Whereas฀the฀plot in฀the฀second฀row฀depicts฀the฀relative฀likelihood฀for฀a฀collision฀as฀defined฀in Equation฀3,฀the฀curve฀plotted฀in฀the฀third฀row฀gives฀the฀times฀needed฀for฀the individual฀iterations฀of฀the฀particle฀filter. Table฀1฀gives฀the฀results฀of฀a฀comparison฀of฀our฀adaptive฀model-switching approach฀to฀three฀other฀implementations,฀where฀only฀single฀models฀were฀used for฀state฀estimation฀and฀fault฀detection.฀The฀results฀are฀averaged฀over฀the฀full trajectories฀of฀100฀runs฀per฀implementation.฀The฀implementations฀considered here฀are฀realized฀on฀the฀basis฀of฀two฀models.฀Whereas฀model฀M0฀ is฀the฀complex฀model฀that฀considers฀the฀actual฀motion฀commands฀and฀therefore฀is฀able to฀track฀the฀position฀as฀well฀as฀the฀failure฀state,฀model฀M1฀ is฀the฀standard odometry-based฀system฀model,฀which฀is฀able฀to฀track฀the฀position฀of฀the฀robot reliably฀with฀low฀time฀requirements,฀but฀cannot฀detect฀the฀collisions.฀The฀first implementation฀is฀based฀on฀model฀M1฀ with฀20฀particles,฀the฀second฀is฀model M1฀ with฀ 200฀ particles.฀ While฀ the฀ third฀ implementation฀ is฀ based฀ on฀ model M0฀ with฀300฀particles,฀the฀forth฀one฀is฀the฀mixed-abstraction฀particle฀filter combining฀implementation฀one฀and฀three.฀The฀common฀task฀of฀all฀implementations฀ was฀ to฀ track฀ the฀ position฀ of฀ the฀ robot฀ along฀ a฀ trajectory฀ on฀ which the฀robot฀encountered฀two฀undetected฀collisions฀after฀8.2฀and฀28.9฀seconds. A฀typical฀estimate฀of฀the฀trajectory฀generated฀by฀the฀mixed-abstraction฀filter including฀markings฀for฀detected฀collisions฀is฀depicted฀in฀the฀left฀diagram฀of

Efficient Failure Detection for Mobile Robots

1 0.5 0

1 0.5 0

103

Collision Detector 1 2 3 22 24 26 28 30 32 34 36 38 40 Simulation time [sec] Deflation Detector 1 2 3 22 24 26 28 30 32 34 36 38 40 Simulation time [sec]

Fig. 4. Results of an experiment with two fault-events (top). The collision of the robot with a glass door is marked with the arrow “1”, the point where it stopped backing of and turned is marked with “2”, and at point “3” the left tire of the robot started losing air. The diagram in the middle plots the relative number of the particles in the fault mode of the collision detector. The lower diagram shows the corresponding number for the deflation detector.

Figure 5. The lower right image of Figure 5 plots the value of vt (A) over time. As can be seen from the figure, the evidence for a fault substantially increases at the time of the incidences. The upper right image in the same figure plots the CPU time used by the mixed-abstraction filter. It nicely shows that the computation time is only high when the evidence of a failure has increased. Table 1 shows average values obtained from the 100 test runs for each implementation. Whereas model M1 was never able to detect a failure, M0

104

C.฀Plagemann,฀C.฀Stachniss,฀and฀W.฀Burgard 17

Collision detected Estimated trajectory

Estimated y-position

16.5

t=8.2 sec

16 15.5 15 14.5

t=50.0 sec t=0.0 sec

14 13.5

t=28.9 sec 7

8

9

10

11

12

13

14

Rel. likelihood

Estimated x-position 8 7 6 5 4 3 2 1 0

Relative likelihood: collision

0

10

20

30

40

50

Time per iteration [ms]

Simulation time [sec] 35 30 25 20 15 10 5 0

Time per iteration

0

10

20

30

40

50

Simulation time [sec]

Fig.฀5.฀A฀trajectory฀followed฀by฀a฀simulated฀robot฀(first฀row)฀with฀marks฀at฀positions where฀the฀evidence฀for฀a฀collision฀was฀high.฀The฀plot฀in฀the฀second฀row฀depicts฀the relative฀likelihood฀for฀collisions.฀The฀plot฀also฀shows฀the฀ground฀truth฀(the฀bar฀at the฀bottom).฀The฀last฀plot฀shows฀the฀time฀needed฀for฀each฀iteration฀(third฀row).

as฀well฀as฀our฀adaptive฀switching฀algorithm฀detected฀all฀failures฀equally฀well. However,฀ our฀ adaptive฀ model฀ required฀ substantially฀ less฀ computation฀ time compared฀to฀M0฀ alone฀using฀300฀particles. 4.3฀ Evaluation฀on฀a฀Real฀Robot We฀also฀tested฀our฀system฀on฀a฀real฀ActivMedia฀Pioneer฀2DX฀robot฀in฀an฀office฀environment.฀The฀right฀image฀of฀Figure฀1฀depicts฀the฀experimental฀setup. Three฀positions฀of฀the฀robot฀were฀manually฀cut฀from฀a฀recorded฀video฀and overlayed฀on฀one฀image฀to฀illustrate฀the฀process.฀The฀robot฀planned฀a฀path to฀the฀neighboring฀room฀on฀the฀right-hand฀side฀of฀the฀corridor.฀While฀executing฀the฀planned฀trajectory,฀the฀robot฀could฀not฀detect฀the฀glass฀door฀that

Efficient Failure Detection for Mobile Robots

105

Table 1. Results of a series of simulation runs using different system models for state estimation. The results are averaged over the complete trajectories of 100 runs per model. System model

Failure Detection Average time Average estimation error rate per iteration Position Orientation M1 : standard odometry 0% 0.67 ms 0.13 m 3.6◦ 20 particles M1 : standard odometry 0% 5.83 ms 0.13 m 3.4◦ 200 particles M0 : dynamic 100 % 10.10 ms 0.11 m 5.8◦ 300 particles adaptive-switching: 100 % 3.42 ms 0.12 m 3.9◦ M1 : 20 particles M0 : 300 particles

blocked its path and thus collided with the wooden part of the door. In this situation, the standard odometry-based system model used for localization does not indicate a defect, because the wheel encoders report no motion, which is perfectly consistent with the laser measurements. The left diagram of Figure 6 gives the evolution of the observation likelihoods for the standard model, which stays nearly constant. In contrast to this, the proposed mixed abstraction particle filter detects that the model assumption of the standard model is not valid anymore after the collision with the door and switches to the more complex system model. The right diagram of Figure 6 visualizes this process. For the sake of clarity, we plotted the estimated validity of the negated model assumption, which can be interpreted as the evidence against the assumption that no collision has occurred. The upper curve corresponds to the time needed per iteration. Note that the required computational resources only slightly exceed those of the standard odometry-based model (see left diagram for a comparison). Only in the failure case, the runtime increases seriously since the more complex model requires substantially more particles. Please also note, that the runtime goes back to the original value after the robot has backed off and the model assumption of the simplified model is valid again.

5 Conclusion This paper presents an efficient approach to estimate the state of a dynamic system including its failures. Complex models with high computational requirements are needed in order to detect and track unusual behaviors. We therefore proposed a mixed-abstraction particle filter which distributes the computational resources in a way that failure states can be detected and tracked but at the same time allows us an efficient estimation process in case

1

Time per iteration [ms]

0.5 0

0

5

10

15 20 Timestamp [sec]

0.03

25

30

Observation likelihood

0.02 0.01 0

0

5

10

15 20 Timestamp [sec]

25

Time per iteration [ms]

C.฀Plagemann,฀C.฀Stachniss,฀and฀W.฀Burgard

Rel. likelihood

Observation likelihood

Time per iteration [ms]

106

30

40

Time per iteration [ms]

30 20 10 0

0

5

10 8 6 4 2 0

10

15 20 Timestamp [sec]

25

30

Relative likelihood (collision)

0

5

10

15 20 Timestamp [sec]

25

30

Fig.฀6.฀In฀an฀experiment฀with฀a฀real฀robot,฀the฀observation฀likelihood฀of฀the฀standard system฀model฀(lower฀left)฀does฀not฀indicate฀the฀collision฀with฀a฀glass฀door฀at฀time t฀=฀22฀seconds.฀The฀mixed-abstraction฀particle฀filter฀(right)฀detects฀the฀fault฀without needing฀ substantially฀ more฀ computational฀ resources฀ in฀ the฀ fault-free฀ case฀ (upper diagrams).

the฀ systems฀ runs฀ free฀ of฀ faults.฀ To฀ achieve฀ this,฀ we฀ apply฀ a฀ process฀ model hierarchy฀which฀allows฀us฀to฀model฀assumptions฀that฀hold฀for฀the฀fault-free case฀but฀not฀in฀general. In฀ several฀ experiments฀ carried฀ out฀ in฀ simulation฀ and฀ with฀ real฀ robots, we฀demonstrated฀that฀our฀technique฀is฀well-suited฀to฀track฀dynamic฀systems affected฀by฀errors.฀Our฀approach฀allows฀us฀to฀accurately฀track฀different฀failure states฀and฀at฀the฀same฀time฀is฀only฀marginally฀slower฀in฀case฀the฀system฀is running฀free฀of฀faults.฀We฀believe฀that฀our฀approach฀is฀not฀limited฀to฀the฀failure detection฀problem฀and฀can฀also฀be฀advantageous฀for฀various฀state฀estimation tasks฀in฀which฀different฀system฀models฀have฀to฀be฀used฀to฀correctly฀predict the฀behavior฀of฀the฀system฀under฀varying฀conditions.

Acknowledgments This฀work฀has฀partly฀been฀supported฀by฀the฀EC฀under฀contract฀number฀FP6004250-CoSy,฀by฀the฀German฀Science฀Foundation฀(DFG)฀under฀contract฀number฀SFB/TR-8฀(A3)฀and฀the฀German฀Federal฀Ministry฀of฀Education฀and฀Research฀(BMBF)฀under฀contract฀number฀01IMEO1F.

References 1.฀ E.฀Benazera,฀R.฀Dearden,฀and฀S.฀Narasimhan.฀ Combining฀particle฀filters฀and consistency-based.฀ In฀15th฀International฀Workshop฀on฀Principles฀of฀Diagnosis, Carcassonne,฀France,฀2004. 2.฀ N.฀ de฀ Freitas,฀ R.฀ Dearden,฀ F.฀ Hutter,฀ R.฀ Morales-Menendez,฀ J.฀ Mutch,฀ and D.฀ Poole.฀ Diagnosis฀ by฀ a฀ waiter฀ and฀ a฀ mars฀ explorer.฀ In฀ Invited฀ paper฀ for Proceedings฀of฀the฀IEEE,฀special฀issue฀on฀sequential฀state฀estimation,฀2003.

Efficient Failure Detection for Mobile Robots

107

3. R. Dearden and D. Clancy. Particle filters for real-time fault detection in planetary rovers. In Proceedings of the Thirteenth International Workshop on Principles of Diagnosis, pages 1–6, 2002. 4. F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte carlo localization for mobile robots. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), Leuven, Belgium, 1998. 5. J.N. Driessen and Y. Boers. An efficient particle filter for nonlinear jump markov systems. In IEEE Sem. Target Tracking: Algorithms and Applications, Sussex, UK, 2004. 6. J.-S. Gutmann, W. Burgard, D. Fox, and K. Konolige. An experimental comparison of localization methods. In Proc. of the IEEE/RSJ InternationalConference on Intelligent Robots and Systems, 1998. 7. N. Koenig and A. Howard. Design and use paradigms for gazebo, an opensource multi-robot simulator. technical report. Technical report, USC Center for Robotics and Embedded Systems, CRES-04-002, 2004. 8. M. Krysander. Design and Analysis of Diagnostic Systems Utilizing Structural Methods. PhD thesis, Link¨ oping University, Sweden, 2003. 9. C. Kwok and D. Fox. Map-based multiple model tracking of a moving object. In RoboCup 2004: Robot Soccer World Cup VIII, pages 18–33, 2004. 10. C. Kwok, D. Fox, and M. Meila. Real-time particle filters. In Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15 (NIPS), pages 1057–1064. MIT Press, 2002. 11. N. Leveson. Safeware : System Safety and Computers. Addison-Wesley Pub Co., Reading, Mass., 1995. 12. B. Ng, A. Pfeffer, and R. Dearden. Continuous time particle filtering. In Proceedings of the 19th IJCAI, Edinburgh, 2005. 13. R. Smith. Open dynamics engine. http://www.q12.org/ode/ode.html, 2002. 14. S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, 2005. 15. V. Verma, S. Thrun, and R. Simmons. Variable resolution particle filter. In Georg Gottlob and Toby Walsh, editors, IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003, pages 976–984. Morgan Kaufmann, 2003. 16. R. Washington. On-board real-time state and fault identification for rovers. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 1175–1181, 2000. 17. B.C. Williams and P.P. Nayak. A model-based approach to reactive selfconfiguring systems. In Jack Minker, editor, Workshop on Logic-Based Artificial Intelligence, Washington, DC, College Park, Maryland, 1999. University of Maryland.

Optimization and Fail–Safety Analysis of Antagonistic Actuation for pHRI Gianluca Boccadamo, Riccardo Schiavi, Soumen Sen, Giovanni Tonietti, and Antonio Bicchi Interdepartmental Research Centre “E. Piaggio” via Diotisalvi 2, Faculty of Engineering University of Pisa, Italy @ing.unipi.it Summary. In this paper we consider some questions in the design of actuators for physical Human-Robot Interaction (pHRI) under strict safety requirements in all circumstances, including unexpected impacts and HW/SW failures. We present the design and optimization of agonistic-antagonistic actuation systems realizing the concept of variable impedance actuation (VIA). With respect to previous results in the literature, in this paper we consider a realistic physical model of antagonistic systems, and include the analysis of the effects of cross–coupling between actuators. We show that antagonistic systems compare well with other possible approaches in terms of the achievable performance while guaranteeing limited risks of impacts. Antagonistic actuation systems however are more complex in both hardware and software than other schemes. Issues are therefore raised, as to fault tolerance and fail safety of different actuation schemes. In this paper, we analyze these issues and show that the antagonistic implementation of the VIA concept fares very well under these regards also.

1 Introduction One of the goals of contemporary robotics research is to realize systems which operate with delicacy in environments they share with humans, ensuring their safety despite any adverse circumstance [4]. These may include unexpected impacts, faults of the mechanical structure, sensors, or actuators, crashes or malfunctional behaviours of the control software [6, 7, 1, 5]. A recent trend in robotics is to design intrinsically safe robot arms by introducing compliance at their joints. The basic idea of this approach is that compliant elements interposed between motors and moving links help prevent the (heavy) reflected inertia of actuators from concurring to damage in case of impacts. Introducing compliance, on the other hand, tends to reduce performance of the arm. Some approaches in the direction of minimizing the perfor-

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 109–118, 2006. © Springer-Verlag Berlin Heidelberg 2006

110

G.฀Boccadamo฀et฀al.

mance฀loss฀while฀guaranteeing฀safety฀in฀case฀of฀impacts฀have฀been฀presented in฀the฀recent฀literature฀(see฀e.g.฀[2]).฀Among฀these,฀a฀method฀was฀proposed in฀ [2]฀ consisting฀ in฀ varying฀ the฀ compliance฀ of฀ the฀ joint฀ transmission฀ mechanism฀while฀moving฀the฀arm.฀This฀so-called฀Variable฀Stiffness฀Transmission (VST)฀technique,฀and฀its฀generalization฀in฀the฀Variable฀Impedance฀Actuation (VIA)฀concept,฀have฀been฀shown฀to฀be฀capable฀in฀theory฀of฀delivering฀better performance฀than฀purely฀passive฀compliance฀and฀other฀techniques. In฀its฀formulation,฀however,฀the฀VIA฀concept฀in฀[2]฀used฀a฀rather฀abstract model฀of฀actuator฀and฀transmission,฀whereby฀the฀impedance฀could฀be฀directly controlled฀to฀desired฀values฀in฀negligible฀time.฀In฀this฀paper,฀we฀consider฀a more฀realistic฀model฀of฀an฀actuation฀system฀implementing฀the฀idea,฀which฀is based฀on฀the฀use฀of฀two฀actuators฀and฀nonlinear฀elastic฀elements฀in฀antagonistic฀arrangement.฀The฀antagonistic฀solution฀has฀several฀advantages,฀and฀has been฀used฀in฀many฀robotic฀devices฀before฀(see฀e.g.฀[3,฀11,฀14,฀13,฀8]),฀in฀some cases฀because฀of฀biomorphic฀inspiration.฀However,฀to฀the฀best฀of฀our฀knowledge฀the฀introduction฀of฀nonlinear฀springs฀to฀achieve฀variable฀stiffness฀in฀real time฀(i.e.,฀during฀different฀phases฀of฀each฀motion฀act)฀was฀not฀a฀motivation for฀earlier฀work฀with฀the฀purpose฀of฀guaranteeing฀safety. In฀this฀paper,฀we฀consider฀the฀implementation฀of฀the฀VIA฀concept฀by฀means of฀antagonistic฀actuation,฀discuss฀the฀role฀of฀cross-coupling฀between฀antagonist฀actuators,฀and฀apply฀optimization฀methods฀to฀choose฀parameters฀which are฀crucial฀in฀its฀design.฀We฀show฀that฀antagonistic฀systems฀can฀implement effectively฀the฀VIA฀concept,฀and฀their฀performance฀compares฀well฀with฀other possible฀approaches. Antagonistic฀actuation฀systems฀however฀are฀more฀complex฀in฀both฀hardware฀and฀software฀than฀other฀schemes.฀Issues฀are฀therefore฀raised฀as฀to฀whether safety฀is฀guaranteed฀under฀different฀possible฀failure฀modes.฀In฀the฀paper,฀we also฀analyze฀these฀issues฀and฀show฀that฀the฀antagonistic฀implementation฀of the฀VIA฀concept฀fares฀very฀well฀under฀these฀regards฀also.

2฀Antagonistic฀Actuation฀as฀a฀VIA฀System In฀[2]฀it฀was฀shown฀that฀an฀ideal฀VIA฀mechanism฀(depicted฀in฀fig.฀1-a)฀can effectively฀recover฀performance฀of฀mechanisms฀designed฀to฀guarantee฀safety฀of humans฀in฀case฀of฀impact.฀The฀basic฀idea฀is฀that฀a฀VIA฀mechanism฀can฀be controlled฀according฀to฀a฀stiff-and-slow/fast-and-soft฀paradigm:฀namely,฀to฀be rather฀stiff฀in฀the฀initial฀and฀final฀phases฀of฀motion,฀when฀accuracy฀is฀needed and฀ velocity฀ is฀ low,฀ while฀ choosing฀ higher฀ compliance฀ in฀ the฀ intermediate, high-velocity฀phase,฀where฀accuracy฀is฀typically฀not฀important.฀Low฀stiffness implies฀that฀the฀inertia฀of฀the฀rotors฀does฀not฀immediately฀reflect฀on฀the฀link฀in case฀of฀impacts,฀thus฀allowing฀smoother฀and฀less฀damaging฀impacts.฀Such฀arguments฀were฀supported฀in฀[2]฀by฀a฀detailed฀mechanism/control฀co-design฀optimization฀analysis,฀based฀on฀the฀solution฀of฀the฀so-called฀safe฀brachistochrone

Optimization and Fail–Safety of Antagonistic Actuation for pHRI

a)

111

b)

Fig. 1. The concept of Variable Impedance Actuation (a) and a possible implementation by means of antagonistic actuators (b). Effective rotor inertias are coupled to the link inertia through nonlinear springs.

a)

b)

Fig. 2. An experimental implementation of an antagonistic VIA actuator (a) and its conceptual scheme (b).

problem, i.e. a minimum time control problem with constraints on the maximum acceptable safety risk at impacts. The model considered fig. 1-a, however, uses direct variations of impedance, which is not physically realizable. A possible implementation of the concept via an antagonistic mechanism is depicted in fig. 1-b. Practical implementations of antagonistic VIA systems may assume more general configurations than the one in fig. 1-b. For instance, in the prototype of an antagonistic VIA system depicted in fig. 2-a ([10]), the two actuators act through a nonlinear elastic element on the link, but they are also connected to a third elastic element cross-coupling the actuators. Questions we consider in this section are the following: is the stiff-andslow/fast-and-soft control paradigm still valid, and are the good safety and performance properties of the ideal VIA device fig. 1-a retained by an antagonistic implementation as in fig. 1-b? What is the role of cross-coupling elastic elements as in fig. 2-b in antagonistic VIA actuators? To answer these questions, we use again the analysis of solutions to the safe brachistochrone problem, which consists in finding the optimal motor torques τ1 , τ2 which drive the link position xmov between two given configurations in minimal time, subject to the mechanism’s dynamics, motor torque limits, and safety constraints. This problem is formalized for the antagonistic mechanism of fig. 1-b as

112

G.฀Boccadamo฀et฀al.



T

minτ 0 1 dt Mrot1 x ¨rot1 + φ1 (xrot1 , xmov ) = τ1  Mrot2 x ¨rot2 − φ2 (xrot2 , xmov ) = τ2 Mmov x ¨link − φ2 (xrot2 , xmov ) − φ1 (xrot1 , xmov ) = 0 |τ1 | ≤ U1,max |τ | ≤ U2,max  2 HIC(x˙ mov , x˙ rot1 , x˙ rot2 , φ1 , φ2 ) ≤ HICmax ,

(1)

where Mmov , Mrot1 , Mrot2 are the inertias of the link and the rotors (effective, i.e. multiplied by the squared gear ratio); Ui,max , i = 1, 2 is the maximum torque for motor i; φi , i = 1, 2 represent the impedance of deformable elements as functions of the position of the rotors and link. A polynomial nonlinear stiffness model is used, whereby the applied force as a function of end-point displacement is φi (xj , xk ) = K1 (xj − xk ) + K2 (xj − xk )3 .

(2)

This model has been found to fit well experimental data for the device in fig. 2-a. The safety constraint HIC(x˙ mov , x˙ rot1 , x˙ rot2 , φ1 , φ2 ) ≤ HICmax describes the fact that the Head Injury Coefficient of an hypothetical impact at any instant during motion, should be limited. The HIC is an empirical measure of biological damage used in car crash analysis literature ([12]), and depends on both the velocity of the impacting mass, its inertia and the effective inertia of rotors reflected through the transmission stiffness. The HIC function is rather complex, and can only be evaluated numerically for non trivial cases. However, based on simulation studies, a conservative approximation of the HIC function for the antagonistic mechanisms was obtained which allows rewriting the safety constraint in the simpler form |x˙ mov | ≤ vsaf e (φ1 , φ2 , HICmax ) (cf. [2]). Different solutions of problem (1) have been obtained numerically, setting parameters to realistic values as Mmov = 0.1 Kgm2 , Mrot1 = Mrot2 = 2.5 0.6 Kgm2 , HICmax = 100 ms4 , U1,max = U2,max = 7.5 Nm. A first interesting set of results is reported in fig. 3. The optimal profiles of link velocity and joint stiffness are reported for the case where both the initial and final configurations are required to be stiff (σ0 = σf = 16 Nm/rad, plots a and b) and when both are compliant (σ0 = σf = 0.2 Nm/rad, plots c and d). Notice in fig. 3 that the stiff-and-slow/fast-and-soft paradigm applies also to antagonistic actuation approaches. The minimum time necessary in the two cases is 2.4 sec and 2.65 sec, respectively. This level of performance should be compared with what can be achieved by a simpler actuation system, consisting of a single actuator connected to the link through a linear elastic element (this arrangement is sometimes referred to as SEA, Series Elastic Actuation [9]). A SEA system with a motor capable of torque Umax = 2U1,max = 15 Nm and inertia Mrot = 2Mrot1 = 1.2 Kgm2 , with linear elasticity coefficient matched exactly with the required stiffness in the two cases σ = 16 and σ = 0.2, would

Optimization and Fail–Safety of Antagonistic Actuation for pHRI

113

4.5 16

4

14

3

Stiffness [Nm/rad]

Link Velocity [rad/s]

3.5

2.5 2 1.5

10 8 6 4

1

2

0.5 0 0

12

0.5

1 Time [s]

1.5

0 0

2

0.5

1 Time [s]

a)

1.5

2

b)

4.5

2.5

4 2

3

Stiffness [Nm/rad]

Link Velocity [rad/s]

3.5

2.5 2 1.5 1

1.5

1

0.5

0.5 0 0

0.5

1

1.5 Time [s]

c)

2

2.5

0 0

0.5

1

1.5 Time [s]

2

2.5

d)

Fig. 3. Optimization results for antagonistic actuation without cross-coupling in a pick-and-place task. A stiff-to-stiff task is shown in a, b, while c, d, refer to a soft-to-soft task.

reach the desired configuration in 3.15 sec and 3.6 sec, respectively. Moreover, it should be pointed out that the SEA system cannot change its stiffness without modifying the mechanical hardware. Focusing again on the antagonistic VIA system’s results, it is also interesting to notice that, in the likely case that the task requires the manipulator to be stiff at the initial and final configurations (as it would happen e.g. in a precision pick-and-place task), the actuators are required to use a significant portion of their maximum torque just to set such stiffness, by co–contracting the elastic elements. However, it is also in the initial and final phases that torque should be made available for achieving fastest acceleration of the link. Based on this observation, it can be conjectured that some level of preloading of the nonlinear elastic elements in an antagonistic VIA system could be beneficial to performance. Elastic cross-coupling between actuators (fig. 2-b) can have a positive effect in that it can bias the link stiffness at rest, so that more torque is available in slow phases, while torque is used for softening the link in fast motion. On the other hand, it is intuitive that very stiff crosscoupling elements would drastically reduce the capability of the mechanism to vary link stiffness, thus imposing low velocities for safety and ultimately a performance loss. It is therefore interesting to study the effect of cross-coupling, to determine if there is an intermediate value of stiffness which enhances performance with respect to the limit cases of fig. 1 (no cross-coupling) and constant stiffness (rigid cross-coupling). To this purpose, we study a modified formulation of the safe brachistochrone problem, namely

114

G.฀Boccadamo฀et฀al. 5

16

3.6 3.4 3.2 3 2.8 2.6 2.4 2.2

4

Stiffness [Nm/rad]

Link Velocity [rad/s]

Minimum Time [s]

3.8

3 2 1

14 12 10 8 6 4 2

−2

10

0

2

10

10

K* [Nm/rad]

a)

0 0

0.5

1

1.5

Time [s]

b)

2

2.5

0 0

0.5

1

1.5

Time [s]

2

2.5

c)

Fig. 4. a) Minimum time to reach the target configuration vs. cross-coupling stiffness K3 (L0 = 1). Solid: σ0 = σf = 16 Nm/rad; dashed: σ0 = σf = 0.2 Nm/rad. Optimization results for antagonistic actuation with cross-coupling in a pick-andplace, stiff-to-stiff task (solid) and soft-to-soft task (dashed): velocity (b) and joint stiffness (c).



T

minτ 0 1 dt Mrot1 x ¨rot1 + φ1 (xrot1 , xmov ) − φ3 (xrot1 , xrot2 ) = τ1  Mrot2 x ¨rot2 + φ2 (xrot2 , xmov ) + φ3 (xrot1 , xrot2 ) = τ2 Mmov x ¨link + φ2 (xrot2 , xmov ) − φ1 (xrot1 , xmov ) = 0 |τ1 | ≤ U1,max |τ | ≤ U2,max  2 HIC(x˙ mov , x˙ rot1 , x˙ rot2 , φ1 , φ2 ) ≤ HICmax ,

(3)

where φ3 (xrot1 , xrot2 ) indicates the cross-coupling elasticity. In particular, we study how optimal solutions vary in different instances of the problem with increasing cross-coupling stiffness. A linear stiffness model is assumed for crosscoupling, with φ3 (xrot1 , xrot2 ) = K3 (L0 + xrot2 − xrot1 ), where L0 denotes the preload offset. In fig. 4 numerical solutions obtained for problem (3) are reported, for the two stiff-to-stiff and soft-to-soft tasks already considered in fig. 3). It can be observed from fig. 4-a that an optimal value of cross-coupling K3 exists in both cases, but they do not coincide. Setting for instance K3 = 0.04 Nm/rad, the optimal value for the stiff-to-stiff task, we obtain the optimal link velocity and stiffness plots reported in fig. 4-b and -c, and the optimal times 2.35 sec and 2.65 sec, respectively. Optimizing the design for one task can hence decrease performance in others.

3 Fail Safety In this section we analyze how antagonistic VIA mechanisms behave under failure of some of their components, and compare their ability to remain safe in spite of failures, with those of other possible actuation structures. A large variety of possible failure modes should be considered in order to assess fail safety of a robot system. In our case, we focus on the following

Optimization and Fail–Safety of Antagonistic Actuation for pHRI

115

possible events: mechanical failures, consisting in breakage of one or more of the elastic elements, leading to K → 0, and HW/SW control failures, whereby one or more of the actuators delivers an uncontrolled torque. In the latter case, we consider that the torque can either remain “stuck” at some value, or default to zero, or to ±Umax . An extensive simulation campaign has been undertaken to study the effective HIC at impacts of an antagonistic VIA mechanism (fig. 1-b), evaluating all possible failure mode combinations. To be conservative, the HIC is evaluated assuming an impact occurs after some time from the instant of the fault event, during which time the mechanism could increase its momentum. The diagram in fig. 5 reports the worst-case HIC during the execution of a typical pick-and-place task, under a nominal optimal control. Labels in different regions indicate which failure mode determines the worst-case HIC of impacts. It can be noticed from fig. 5 that the worst case impact has maxi-

100 τ =U 1 1,max τ2 = U2,max

90 80

K =0 1 τ2 = U2,max

4

50

HIC [m .5/s ]

60

2

70

τ1 = U1,max τ =U

τ1 = stuck τ2 = stuck

40

2

2,max

K =0 1 τ2 = stuck

30 20 10 0

1

1.5

2

2.5 Time [S]

3

3.5

Fig. 5. HIC values under different failures mode for VIA.

mum HIC value around 100. This has been obtained by suitably choosing the mechanism parameters, in particular by setting U1,max = U,2max = 1.5N m. The performance under these design choice is around 3 sec. Should different values of worst-case HIC be specified, the antagonistic VIA mechanism could be designed with different parameters to accommodate for the specifications. It is interesting to compare these results with other existing actuation systems for pHRI. Namely, we will make reference to SEA and to the so-called Distributed Macro-Mini (DM2 ) actuation scheme ([15]). The latter basically consists of a SEA with a small inertia, low torque, high bandwidth actuator connected directly to the link inertia to actively damp oscillations. In order to obtain a meaningful comparison, we consider optimized SEA and DM2 implementations, with equal rotor and link inertia, which obtain

116

G.฀Boccadamo฀et฀al.

a฀similar฀performance฀of฀3฀sec฀in฀the฀same฀task.฀Fig.฀6฀describes฀the฀worstcase฀HIC฀values฀obtained฀for฀SEA฀and฀DM2 ,฀respectively.฀Failure฀modes฀have been฀considered฀which฀are฀analogous฀to฀those฀described฀above,฀as฀are฀all฀other parameters฀in฀the฀simulations.฀It฀can฀be฀observed฀that,฀in฀some฀modes฀(notably for฀control฀torques฀defaulting฀to฀their฀maximum฀value,฀τ฀→ Umax ),฀both฀SEA and฀DM2฀ are฀not฀as฀safe฀as฀VIA฀(HIC฀ 100).฀The฀DM2฀ scheme฀exhibits slightly฀ better฀ fail฀ safety฀ characteristics฀ than฀ SEA.฀ An฀ explanation฀ of฀ the apparently฀superior฀fail-safety฀characteristics฀of฀antagonistic฀VIA฀actuation is฀that฀such฀scheme฀achieves฀comparable฀nominal฀performance฀by฀employing two฀motors฀each฀of฀much฀smaller฀size฀than฀what฀necessary฀in฀the฀SEA฀and DM2 .

HIC (SEA) [m2.5/s4]

200 150

1

MAX

τ = UMAX

τ = UMAX τ = UMAX

50 0

τ1 = −UMAX

τ = −U

100

1

1.5

2

2.5 Time [s]

3

HIC (DM2) [m2.5/s4]

300 τ1 = −U1,MAX τ2 = τ2, opt

200 τ1 = U1,MAX τ2 = U2, MAX

100 0

1

1.5

2

2.5 Time [s]

3

3.5

4

τ1 = −U1,MAX τ =τ 2

2, opt

τ1 = U1,MAX τ2 = U2, MAX

3.5

4

Fig. 6. Maximum of the HIC value in various failure cases for SEA (top) and DM2 (bottom).

4 Conclusion We discussed the design of variable stiffness actuation systems based on antagonistic arrangements of nonlinear elastic elements and motors. The basic stiff-and-slow/fast-and-soft idea of the VIA approach was shown to apply and to be effective for realistic models of antagonistic actuation for a single joint. However, a VIA mechanism assembly can sometimes become heavy for practical purposes to be implemented in a serial link robot. It may be advantageous to use the actuators at base instead of respective joints, and motion can be transferred to joints via transmissions, where, the transmission element itself

Optimization and Fail–Safety of Antagonistic Actuation for pHRI

117

can be designed to have nonlinear stiffness, for example in a tendon driven system. In order for the system to guarantee safety of operations in the proximity of humans, its behaviour must remain safe in conditions where functionality is degraded by possible failures. A thorough examination of possible HW and SW failures of the system has been considered, and results have been compared with those of other possible solutions, with favourable result.

Acknowledgments Work฀partially฀done฀with฀the฀support฀of฀the฀EU฀Network฀of฀Excellence฀EURON฀-฀European฀Robotic฀Network,฀under฀the฀Prospective฀Research฀Project PHRIDOM

References 1.฀ A.฀ Albu-Schaffer,฀ A.฀ Bicchi,฀ G.฀ Boccadamo,฀ R.฀ Chatila,฀ A.฀ De฀ Luca,฀ A.฀ De Santis,฀G.฀Giralt,฀G.฀Hirzinger,฀V.฀Lippiello,฀R.฀Mattone,฀R.฀Schiavi,฀B.฀Siciliano,฀G.฀Tonietti,฀and฀L.฀Villani.฀Physical฀human-robot฀interaction฀in฀anthropic domains:฀ Safety฀ and฀ dependability.฀ 4th฀ IARP/IEEE-EURON฀ Workshop฀ on Technical฀Challenges฀for฀Dependable฀Robots฀in฀Human฀Environments,฀2005. 2.฀ A.฀Bicchi฀and฀G.฀Tonietti.฀ Fast฀and฀soft฀arm฀tactics:฀Dealing฀with฀the฀safety– performance฀ tradeoff฀ in฀ robot฀ arms฀ design฀ and฀ control.฀ IEEE฀ Robotics฀ and Automation฀Magazine,฀Vol.฀11,฀No.฀2,฀June,฀2004. 3.฀ D.G.฀Caldwell,฀G.A.฀Medrano-Cerda,฀and฀M.J.฀Goodwin.฀ Braided฀pneumatic actuator฀ control฀ of฀ a฀ multi-jointed฀ manipulator.฀ In฀ Systems฀ Engineering฀ in Service฀ of฀ Humans,฀ Proceedings฀ International฀ Conference฀ on,฀ IEEE฀ Systems Man฀and฀Cybernetics,฀volume฀1,฀pages฀423–428,฀1993. 4.฀ G.฀Giralt฀and฀P.฀Corke,฀editors.฀ Technical฀Challenge฀for฀Dependable฀Robots฀in Human฀Environments,฀Seoul,฀Korea,฀2001.฀IARP/IEEE฀Workshop. 5.฀ J.฀ Guiochet฀ and฀ A.฀ Vilchis.฀ Safety฀ analysis฀ of฀ a฀ medical฀ robot฀ for฀ teleechography.฀ In฀ Second฀ IARP฀ IEEE/RAS฀ joint฀ workshop฀ on฀ Technical฀ Challenge฀for฀Dependable฀Robots฀in฀Human฀Environments,฀Toulouse,฀France,฀October฀2002.฀LAAS-CNRS. 6.฀ J.฀Heinzmann฀and฀A.฀Zelinsky.฀ The฀safe฀control฀of฀human–friendly฀robots.฀ In Proc.฀IEEE/RSJ฀Int.฀Conf.฀on฀Intelligent฀Robots฀and฀Systems,฀pages฀1020–1025, 1999. 7.฀ K.฀ Ikuta,฀ H.฀ Ishii,฀ and฀ M.฀ Nokata.฀ Safety฀ evaluation฀ method฀ of฀ design฀ and control฀for฀human-care฀robots.฀The฀International฀Journal฀of฀Robotics฀Research, 22(5):281–297,฀May฀2003. 8.฀ I.฀Mizuuchi,฀R.฀Tajima,฀T.฀Yoshikai,฀D.฀Sato,฀K.฀Nagashima,฀M.฀Inaba,฀Y.฀Kuniyoshi,฀and฀H.฀Inoue.฀Design฀and฀control฀of฀the฀flexible฀spine฀of฀a฀fully฀tendondriven฀humanoid฀”kenta”.฀ In฀Proceedings฀of฀the฀2002฀IEEE/RSJ฀Intl.฀Conference฀on฀Intelligent฀RObots฀and฀Systems,฀EPFL,฀Lausanne,฀Switzerland,฀pages 2527–2532,฀2002.

118

G.฀Boccadamo฀et฀al.

9.฀ G.A.฀Pratt฀and฀M.฀Williamson.฀ Series฀elastics฀actuators.฀ In฀Proc.฀IEEE/RSJ Int.฀Conf.฀on฀Intelligent฀Robots฀and฀Systems,฀pages฀399–406,฀1995. 10.฀ G.฀Tonietti,฀R.฀Schiavi,฀and฀A.฀Bicchi.฀Design฀and฀control฀of฀a฀variable฀stiffness actuator฀for฀safe฀and฀fast฀physical฀human/robot฀interaction. In฀Proc.฀ICRA, pages฀528–533,฀2005. 11.฀ R.W.฀ van฀ der฀ Linde.฀ Design,฀ analysis,฀ and฀ control฀ of฀ a฀ low฀ power฀ joint฀ for walking฀robots,฀by฀phasic฀activation฀of฀mckibben฀muscles.฀ IEEE฀Transactions on฀Robotics฀and฀Automation,฀15(4):599–604,฀August฀1999. 12.฀ J.฀Versace.฀ A฀review฀of฀the฀severity฀index.฀ In฀Proc.฀of฀the฀Fifteenth฀Stapp฀Car Crash฀Conference,฀number฀SAE฀Paper฀No.฀710881,฀pages฀771–796.฀Society฀of Automotive฀Engineers,฀1971. 13.฀ J.฀Yamaguchi,฀S.฀Inoue,฀D.฀Nishino,฀and฀A฀Takanishi.฀Development฀of฀a฀bipedal humanoid฀robot฀having฀antagonistic฀driven฀joints฀and฀three฀dof฀trunk.฀ In฀Proceedings฀of฀the฀IEEE/RSJ฀International฀Conference฀on฀Intelligent฀Robots฀and Systems,฀Victoria,฀B.C.,฀Canada,฀pages฀96฀–฀101,฀1998. 14.฀ J.฀ Yamaguchi,฀ D.฀ Nishino,฀ and฀ A฀ Takanishi.฀ Realization฀ of฀ dynamic฀ biped walking฀varying฀joint฀stiffness฀using฀antagonistic฀driven฀joints.฀ In Proceedings of฀ the฀ IEEE฀ International฀ Conference฀ on฀ Robotics฀ and฀ Automation,฀ Leuven, Belgium,฀pages฀2022–2029,฀1998. 15.฀ M.฀Zinn,฀O.฀Khatib,฀B.฀Roth,฀and฀J.K.฀Salisbury.฀A฀new฀actuation฀approach฀for human฀friendly฀robot฀design.฀ In Proc.฀of฀Int.฀Symp.฀on฀Experimental฀Robotics -฀ISER’02,฀2002.

Bilateral Control of Different Order Teleoperators Jos´e M. Azor´ın1 , Rafael Aracil2 , Jos´e M. Sabater1 , Manuel Ferre2 , Nicol´ as M. Garc´ıa1 and Carlos P´erez1 1

2

Universidad Miguel Hern´ andez de Elche, Avda. de la Universidad s/n, 03202 Elche (Alicante), Spain [email protected] DISAM, ETSII, Universidad Polit´ecnica de Madrid, 28006 Madrid, Spain [email protected]

This paper presents a bilateral control method for teleoperation systems where the master and the slave are modeled by different order transfer functions. The proposed methodology represents the teleoperation system on the state space and it is based in the state convergence between the master and the slave. The method allows that the slave follows the master, and it is able to establish the dynamic behavior of the teleoperation system. The first results obtained when the method is being applied to a commercial teleoperation system in which the master and the slave are modeled by different order discrete transfer functions are shown in this paper.

1 Introduction Often, in a teleoperated system, the interaction force of the slave with the environment is reflected to the operator to improve the task performance. In this case, the teleoperator is bilaterally controlled [1]. The classical bilateral control architectures are the position–position [2] and force–position architecture [3]. Additional control schemes have been proposed in the literature, e.g. the bilateral control for ideal kinesthetic coupling [4] or the bilateral control based in passivity to overcome the time delay problem [5]. Usually the proposed bilateral control schemes consider simple master and slave models of the same order. They do not provide a design procedure to calculate the control system gains when the master and the slave are modeled by different order transfer functions. In [6] we presented a bilateral control method of teleoperation systems with communication time delay. In this method, the teleoperation system is modeled on the continuous-time state space from nth-order linear differential equations that represent the master and the slave. Through the state convergence between the master and the slave, the control method achieves that

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 119–127, 2006. © Springer-Verlag Berlin Heidelberg 2006

120

J.M.฀Azor´ın฀et฀al.

the฀slave฀follows฀the฀master฀and฀it฀allows฀to฀establish฀the฀desired฀dynamics฀of the฀teleoperation฀system.฀However,฀this฀control฀method฀considers฀only฀teleoperation฀systems฀where฀the฀master฀and฀the฀slave฀are฀modeled฀by฀differential equations฀of฀the฀same฀order. This฀paper฀explains฀how฀the฀state฀convergence฀methodology฀can฀be฀used฀to control฀teleoperation฀systems฀where฀the฀master฀and฀the฀slave฀are฀modeled฀by different฀order฀discrete฀transfer฀functions.฀In฀addition,฀in฀this฀paper,฀different from฀ [6],฀ the฀ teleoperation฀ system฀ is฀ modeled฀ in฀ the฀ discrete-time฀ domain, and฀the฀communication฀time฀delay฀is฀not฀considered฀in฀order฀to฀simplify฀the explanation.฀ Clearly,฀ the฀ results฀ presented฀ in฀ this฀ paper฀ could฀ be฀ directly applied฀to฀continuous-time฀teleoperation฀systems฀with฀communication฀time delay. The฀paper฀is฀organized฀as฀follows.฀Section฀2฀describes฀the฀bilateral฀control methodology฀of฀teleoperation฀systems฀based฀in฀the฀state฀convergence.฀The฀application฀of฀this฀methodology฀to฀control฀different฀order฀teleoperation฀systems is฀explained฀in฀section฀3.฀Section฀4฀shows฀the฀results฀obtained฀when฀a฀master฀and฀an฀slave฀with฀different฀order฀transfer฀functions฀have฀been฀controlled using฀the฀method฀proposed฀in฀the฀paper.฀Finally,฀section฀5฀summarizes฀the conclusions฀of฀this฀paper.

2 Bilateral Control by State Convergence This section describes the bilateral control method of teleoperation systems based in the state convergence [6]. However, different from [6], a teleoperation system without communication time delay where the master and the slave are modeled by nth-order discrete transfer functions is considered. The control method is based on the state space formulation and it allows that the slave follows the master through state convergence. The method is able also to establish the desired dynamics of this convergence and the dynamics of the slave manipulator. 2.1 Modeling of the Teleoperation System The modeling of the teleoperation system is based on the state space formulation, Fig. 1. The master system is represented in the state space like: xm (k + 1) = Gm xm (k) + Hm um (k) ym (k) = Cm xm (k)

(1)

and the slave system is represented similarly. The matrices that appear in the model are: G2 , influence in the slave of the operator force; Km and Ks , feedback matrices of the master and slave state; and Rm and Rs , interaction slave - master, and interaction master slave.

Bilateral Control of Different Order Teleoperators

121

Km

MASTER Fm

+

um

+ +

z

+

ym Cm

+

Rm

G2

xm

-1

Hm

Gm

Rs

SLAVE

+

+

us

+

-1

Hs

+

z

xs

Cs

ys

+

Gs

Ks

Fig. 1. Modeling of the teleoperation system

From the model shown in Fig. 1, the next state equation of the teleoperation system is obtained: xs (k + 1) Gs + Hs Ks Hs R s = Hm R m Gm + H m K m xm (k + 1)

xs (k) Hs G2 Fm (k) + Hm xm (k) (2) where Fm represents the force that the operator applies to the master. The master and the slave are modeled by nth-order discrete transfer functions. The representation of the master is given by: Gm (z) =

zn

bmn−1 z n−1 + · · · + bm1 z + bm0 + amn−1 z n−1 + · · · + am1 z + am0

(3)

and the representation of the slave is analogous. The master and the slave are modeled in the state space using the controller canonical form. Considering this master and slave represention, the matrices Km , Ks , Rm and Rs are row vectors of n components, and G2 is a real number. If the environment is modeled by means of a stiffness (ke ), the matrix Ks can be used to incorporate in the teleoperation system model the interaction

122

J.M.฀Azor´ın฀et฀al.

of฀the฀slave฀with฀the฀environment.฀In฀the฀same฀way,฀the฀matrix฀Rm฀ can฀be used฀to฀consider฀force฀feedback฀from฀the฀slave฀to฀the฀master฀assuming฀a฀force feedback฀gain฀kf฀. 2.2฀ Control฀Method฀Through฀State฀Convergence In฀the฀teleoperation฀system฀model฀shown฀in฀Fig.฀1,฀there฀are฀3n฀+฀1฀control gains฀that฀are฀necessary฀to฀obtain:฀Km฀ (n฀components),฀Ks฀ (n฀components), Rs฀ (n฀components)฀and฀G2฀ (one฀component).฀To฀calculate฀these฀control฀gains it฀is฀necessary฀to฀get฀3n฀+฀1฀design฀equations. Applying฀the฀next฀linear฀transformation฀to฀the฀system฀(2),฀the฀error฀state equation฀between฀the฀master฀and฀the฀slave฀is฀obtained: xs (k) I 0 = Es xs (k) − Em xm (k) Es −Em

xs (k) xm (k)

(4)

where Es = diag{bs0 , · · · , bsn−1 } and Em is similar to Es . Let xe (k) be the error between the slave and the master, xe (k) = Es xs (k)− Em xm (k). If the error converges to zero, the slave output will follow the master output. From the error state equation, n + 1 design equations can be obtained to achieve that the error evolves as an autonomous system, and the slave output follows the master output. In addition, n − 1 conditions between the numerator coefficients of the master and slave discrete transfer functions can be derived. If these conditions are not satisfied, there will be an error between the master and slave output. When the error evolves like an autonomous system, the characteristic polynomial of the system can be calculated. From this polynomial, 2n design equations can be obtained to establish the slave and the slave-master error dynamics. These 2n equations plus the n + 1 previous equations, form a system of 3n + 1 equations. Solving this equations system, it will be possible to obtain the 3n + 1 control gains. With these control gains, the slave manipulator will follow the master and the dynamics of the error and the slave will be established.

3฀Application฀to฀Teleoperation฀Systems฀ of฀Different฀Order The฀control฀methodology฀that฀considers฀teleoperation฀systems฀where฀the฀master฀and฀the฀slave฀are฀modeled฀by฀discrete฀transfer฀functions฀(DTFs)฀of฀the same฀order฀has฀been฀presented฀in฀the฀previous฀section.฀This฀section฀explains how฀to฀use฀the฀design฀equations฀to฀control฀teleoperation฀systems฀where฀the master฀and฀the฀slave฀are฀modeled฀by฀different฀order฀DTFs. If฀the฀master฀and฀the฀slave฀are฀modeled฀by฀DTFs฀of฀the฀same฀order฀the control฀method฀allows฀that฀the฀slave฀follows฀the฀master,฀and฀it฀is฀able฀also

Bilateral Control of Different Order Teleoperators

123

to establish the dynamics of the slave and the slave-master error (i.e. to fix the n poles of the slave and the n poles of the error). If the master and the slave are modeled by different order DTFs, there are two options to design the control system: to increase the order of the smaller DTF, or to reduce the order of the higher DTF. Next, each option is explained describing its effects in the control system. 3.1 To Increase the Order of the Smaller DTF In this option the order of the smaller DTF is increased to achieve that it has the same order of the higher DTF. The order is increased adding the necessary pole-zero pairs. Pole-zero pairs are added to avoid the increment of the delay attached to the system. The pole is placed in such a way that it does not affect to the system dynamics. The zero is placed near the pole, but avoiding the exact cancellation, in order to increase really the DTF order. It must be checked that the increased DTF has the same static gain that the original DTF. Below the effects of increasing the order of the smaller DTF in the control are described. First, if a zero is added, the conditions to achieve the evolution of the error as an autonomous system could not be verified, and a constant error between the master and the slave could exist. This is not a problem, because, e.g., in a teleoperation system where the master dimensions are smaller than the slave dimensions, and the output of both was the cartesian position, it will not be desirable that both positions were the same. The effects of increasing the master order are: • The slave dynamics is completely established. • All the desired error poles are fixed. However, because of the fact that the error has the same number of poles that the slave order, and the master order is smaller, part of the error depending on the master would be artificially established. On the other hand, the effects of increasing the slave order are: • Additional poles of the slave dynamics are established because the slave order has been increased. This is not a problem because all the poles of the original slave dynamics are fixed. • All the desired error poles are established. However, in a similar way to the previous case, part of the error depending on the slave would be artificially established 3.2 To Reduce the Order of the Higher DTF In this option the order of the higher DTF is reduced to achieve that it has the same order of the smaller DTF. That is, the same number of pole-zero pairs that there are in the smaller DTF will be considered to design the control

124

J.M.฀Azor´ın฀et฀al.

system.฀Therefore฀some฀pole-zero฀pairs฀of฀the฀higher฀DTF฀will฀be฀removed฀in the฀design.฀In฀order฀to฀remove฀a฀pole-zero฀pair฀it฀would฀be฀desirable฀to฀select the฀pole฀further฀from฀the฀dominant฀poles,฀and฀to฀select฀a฀zero฀near฀the฀pole to฀consider฀a฀pole-zero฀cancellation. Below฀the฀effects฀of฀reducing฀the฀order฀of฀the฀higher฀DTF฀in฀the฀control are฀described.฀First,฀if฀a฀dominant฀pole฀is฀removed,฀or฀a฀zero฀that฀is฀not฀near the฀pole฀selected฀is฀removed,฀the฀reduced฀DTF฀will฀be฀very฀different฀to฀the original฀DTF฀and฀the฀control฀gains฀obtained฀could฀not฀be฀applied฀to฀the฀real teleoperation฀system.฀In฀addition,฀when฀a฀zero฀is฀removed,฀the฀conditions฀to achieve฀the฀evolution฀of฀the฀error฀as฀autonomous฀system฀could฀not฀be฀verified. The฀effects฀of฀reducing฀the฀slave฀order฀are: • The฀slave฀dynamics฀is฀not฀completely฀established,฀because฀fewer฀poles฀have been฀considered฀in฀the฀design฀phase. • Only฀the฀number฀of฀error฀poles฀fixed฀by฀the฀master฀order฀can฀be฀established.฀ Therefore,฀ part฀ of฀ the฀ error฀ depending฀ on฀ the฀ slave฀ will฀ not฀ be established. On฀the฀other฀hand,฀the฀effects฀of฀reducing฀the฀master฀order฀are: • The฀slave฀dynamics฀is฀completely฀established. • Only฀ the฀ number฀ of฀ error฀ poles฀ fixed฀ by฀ the฀ slave฀ order฀ can฀ be฀ established.฀Therefore,฀part฀of฀the฀error฀depending฀on฀the฀master฀will฀not฀be established. 3.3฀ Comparison฀of฀Options If฀the฀order฀of฀the฀smaller฀DTF฀is฀increased,฀the฀control฀method฀can฀completely establish฀the฀dynamics฀of฀the฀teleoperation฀system,฀i.e.฀the฀slave฀and฀the฀error dynamics฀is฀fixed.฀On฀the฀other฀hand,฀if฀the฀order฀of฀the฀higher฀DTF฀is฀reduced, the฀dynamics฀of฀the฀teleoperation฀system฀can฀not฀be฀completely฀established. In฀this฀case,฀part฀of฀the฀error฀dynamics฀depending฀on฀the฀reduced฀DTF฀can฀not be฀fixed.฀In฀addition,฀if฀the฀slave฀order฀is฀reduced,฀the฀slave฀dynamics฀will฀not be฀completely฀established.฀In฀both฀options,฀when฀the฀DTF฀order฀is฀increased or฀reduced,฀the฀conditions฀to฀achieve฀the฀evolution฀of฀the฀error฀as฀autonomous system฀could฀not฀be฀verified,฀but฀this฀is฀not฀a฀problem.฀Therefore฀the฀best option฀to฀design฀the฀control฀system฀of฀a฀teleoperator฀where฀the฀master฀and the฀slave฀have฀different฀order฀is฀increasing฀the฀order฀of฀the฀smaller฀DTF. The฀ control฀ gains฀ that฀ are฀ obtained฀ modifying฀ the฀ master฀ or฀ the฀ slave DTF฀in฀the฀design฀phase฀must฀be฀used฀to฀control฀the฀real฀teleoperation฀system.฀The฀state฀convergence฀methodology฀allows฀that฀the฀control฀gains฀can฀be directly฀applied฀to฀the฀real฀teleoperation฀system฀because฀of฀the฀robustness฀of the฀control฀method฀to฀the฀uncertainty฀of฀the฀design฀parameters฀[7].฀On฀the other฀hand,฀state฀observers฀must฀be฀designed฀to฀apply฀the฀control.฀In฀the฀case of฀the฀reduced฀DTF,฀the฀state฀observer฀must฀estimate฀the฀number฀of฀state variables฀fixed฀by฀the฀higher฀DTF,฀so฀it฀must฀be฀designed฀using฀the฀increased DTF.

Bilateral Control of Different Order Teleoperators

125

4 Experimental Results The EL (Elbow Pivot) joint of the Grips teleoperation system has been bilaterally controlled in simulation mode using the state convergence methodology. The Grips manipulator system from Kraft Telerobotics is a six DOF teleoperator with force-feedback. The slave is an hydraulic manipulator, and the master is powered by electrical motors. The identified DTFs for the EL joint of the master and the slave, considering a sample time of T = 0.0005s, are, respectively: 3.79 × 10−7 z 2 (5) Gm (z) = 3 z − 2.974z 2 + 2.948z − 0.9741 1.357 × 10−5 z 1.357 × 10−5 z = (6) Gs (z) = 2 z − 1.9928z + 0.9928 (z − 1)(z − 0.9928) The process of identification has been done using the corresponding Matlab Toolbox. Both DTFs have been identified considering the Box-Jenkins model. As it has been previously explained, when the order of the master and slave DTF is different, the best option to design the control system is increasing the order of the smaller DTF. Therefore the order of the slave DTF must be increased. A pole-zero pair has been added in the slave DTF. The pole has been added far from the dominant poles to avoid that it affects to the system dynamics. And the zero has been placed near the pole: 1.357 × 10−5 z(z − 0.93) 1.357 × 10−5 z 2 − 1.262 × 10−5 z = 3 (z − 1)(z − 0.9928)(z − 0.9302) z − 2.923z 2 + 2.8465z − 0.9235 (7) In order to solve the design equations of the control method, all the numerator coefficients of (5) and (7) must not be null. In this case, the coefficients {bm1 , bm0 , bs0 } are null, i.e. there are some zeros placed in z = 0. For this reason, the zeros placed in z = 0 have been slightly modified, and the DTFs considered for the design are the following: Gs (z) =

3.79 × 10−7 z 2 + 10−8 z + 10−8 z 3 − 2.974z 2 + 2.948z − 0.9741

(8)

1.357 × 10−5 z 2 + 3.58 × 10−7 z + 3.58 × 10−7 z 3 − 2.923z 2 + 2.8465z − 0.9235

(10)

Gm (z) =

1.357 × 10−5 z 2 − 1.262 × 10−5 z + 10−8 (9) z 3 − 2.923z 2 + 2.8465z − 0.9235 In addition, to verify the conditions of the control method between the numerator coefficients of Gm (z) and Gs (z), and to achieve the evolution of the error as autonomous system, the numerator coefficients of Gs (z) in (9) have been modified for the design in this way: Gs (z) =

Gs (z) =

The control gains have been obtained considering that the force feedback gain is kf = 0.1, the slave interacts with a soft environment (ke = 10N m/rad), and that the poles of the error and the slave are placed in z = 0.95.

126

J.M.฀Azor´ın฀et฀al.

If฀the฀control฀gains฀are฀applied฀to฀the฀teleoperation฀system฀considered฀for the฀design,฀DTF฀(8)฀and฀(10),฀the฀slave฀follows฀the฀master฀without฀error,฀left part฀of฀Fig.฀2.฀However,฀if฀they฀are฀applied฀to฀the฀increased฀original฀system, DTF฀(5)฀and฀(7),฀the฀slave฀follows฀the฀master,฀but฀there฀is฀a฀constant฀error because฀the฀conditions฀to฀achieve฀the฀evolution฀of฀the฀error฀as฀autonomous system฀are฀not฀verified,฀right฀part฀of฀Fig.฀2.฀In฀both฀cases,฀a฀unitary฀constant operator฀force฀has฀been฀considered. −3

3.5

−3

x 10

3.5 3

master slave

2.5

master and slave position

master and slave position

3

2 1.5 1 0.5 0

x 10

master slave

2.5 2 1.5 1 0.5

0

0.5

1 time (s)

1.5

2

0

0

0.5

1 time (s)

1.5

2

Fig. 2. Master and slave position considering DTF (8) and (10) (left part), and DTF (5) and (7) (right part)

In order to verify the performance of the control over the original teleoperation system, two state observers have been designed for the identified DTFs of the EL joint (DTF (5) and (6)). Both state observers must estimate three state variables. The design of the state observer for the DTF (5) is straight. However, as the order of the DTF (6) is two, the observer has been designed from the DTF (7). In the left part of Fig. 3 the position evolution of the original teleoperation system using the designed observers is shown. It can be verified that the results obtained are the same as the shown in the right part of Fig. 2. The control signals applied to the master and the slave are shown in the right part of Fig. 3. In this figure, the slave control signal has initial values close to zero, but not null. These results allow verify that the control method by state convergence can be applied to teleoperation systems where the master and the slave are modeled by different order DTFs.

5 Conclusions This paper has presented a bilateral control method for teleoperation systems where the master and the slave are modeled by different order transfer functions. It has been verified that the order of the smaller DTF must be increased

Bilateral Control of Different Order Teleoperators

127

−3

3.5

x 10

1

master slave

master and slave control signal

3

master and slave position

master slave

0.8

2.5 2 1.5 1

0.6 0.4 0.2 0 −0.2 −0.4 −0.6

0.5

−0.8 0

0

0.2

0.4 0.6 time (s)

0.8

1

0

0.2

0.4 0.6 time (s)

0.8

1

Fig. 3. Master and slave position (left part), and control signal (right part)

to design the control system. Then, state observers that estimate the number of state variables fixed by the higher DTF must be designed in order to apply the control. The control method allows that the slave follows the master, and it is able to establish the dynamic behavior of the teleoperation system. In the paper the performance of the control scheme has been tested in simulation mode over the EL joint of the Grips teleoperation system. The next work is controlling the real joint. The final aim is controlling all the joints of the system applying the state convergence methodology.

References 1. B. Hannaford. Stability and performace tradeoffs in bilateral telemanipulation. In Proc. IEEE Int. Conf. on Robotics and Automation, pages 1764–1767, 1989. 2. R.C. Goertz and al. The anl model 3 master-slave manupulator design and use in a cave. In Proc. 9th Conf. Hot Lab. Equip., volume 121, 1961. 3. C.R. Flatau. Sm 229, a new compact servo master-slave manipulator. In Proc. 25th Remote Syst. Tech. Div. Conf., volume 169, 1977. 4. Y. Yokokohji and T. Yoshikawa. Bilateral control of master-slave manipulators for ideal kinesthetic couplig–formulation and experiment. IEEE Transactions on Robotics and Automation, 10 (5):605–620, 1994. 5. R.J. Anderson and M.W. Spong. Bilateral control for teleoperators with time delay. IEEE Transactions on Automatic Control, 34 (5):494–501, 1989. 6. J. M. Azorin, O. Reinoso, R. Aracil, and M. Ferre. Generalized control method by state convergence of teleoperation systems with time delay. Automatica, 40 (9):1575–1582, 2004. 7. J. M. Azorin, O. Reinoso, R. Aracil, and M. Ferre. Control of teleoperators with communication time delay through state convergence. Journal of Robotic Systems, 21 (4):167–182, April 2004.

Navigation฀and฀Planning฀in฀an฀Unknown Environment฀Using฀Vision฀and฀a฀Cognitive฀Map Nicolas฀Cuperlier,฀Mathias฀Quoy,฀and฀Philippe฀Gaussier ETIS-UMR฀8051 Universit฀de฀Cergy-Pontoise฀-฀ENSEA 6,฀Avenue฀du฀Ponceau 95014฀Cergy-Pontoise฀France [email protected] Summary.฀ We฀present฀a฀framework฀for฀Simultaneous฀Localization฀and฀Map฀building฀of฀an฀unknown฀environment฀based฀on฀vision฀and฀dead-reckoning฀systems.฀An omnidirectional฀ camera฀ gives฀ a฀ panoramic฀ image฀ from฀ which฀ no฀ a฀ priori฀ defined landmarks฀are฀extracted.฀The฀set฀of฀landmarks฀and฀their฀azimuth฀relative฀to฀the north฀given฀by฀a฀compass฀defines฀a฀particular฀location฀without฀any฀need฀of฀an฀external฀environment฀map.฀Transitions฀between฀two฀locations฀are฀explicitly฀coded.฀They are฀simultaneously฀used฀in฀two฀layers฀of฀our฀architecture.฀First฀to฀construct,฀during exploration฀(latent฀learning),฀a฀graph฀(our฀cognitive฀map)฀of฀the฀environment฀where the฀links฀are฀reinforced฀when฀the฀path฀is฀used.฀And฀second,฀to฀be฀associated,฀on฀an another฀layer,฀with฀the฀integrated฀movement฀used฀for฀going฀from฀one฀place฀to฀the other.฀During฀the฀planning฀phase,฀the฀activity฀of฀transition฀coding฀for฀the฀required goal฀ in฀ the฀ cognitive฀ map฀ spreads฀ along฀ the฀ arcs฀ of฀ this฀ graph฀ giving฀ transitions (nodes)฀an฀higher฀value฀to฀the฀ones฀closer฀from฀this฀goal.฀We฀will฀show฀that,฀when planning฀to฀reach฀a฀goal฀in฀this฀environment฀is฀needed,฀the฀interactions฀of฀these฀two levels฀ can฀ lead฀ to฀ the฀ selection฀ of฀ multiple฀ transitions฀ corresponding฀ to฀ the฀ most activated฀ones฀according฀to฀the฀current฀place.฀Those฀proposed฀transitions฀are฀finally exploited฀by฀a฀dynamical฀system฀(neural฀field)฀merging฀these฀informations.฀Stable solution฀of฀this฀system฀gives฀a฀unique฀movement฀vector฀to฀apply.฀Experimental฀results฀underline฀the฀interest฀of฀such฀a฀soft฀competition฀of฀transition฀information฀over a฀strict฀one฀to฀get฀a฀more฀accurate฀generalization฀on฀the฀movement฀selection.

1 Introduction Path planning requires from the agent or the robot to select the appropriate action to perform. This task might be complex when several actions are possible, and so different approaches have been proposed to choose what to do next. Experiments carried out on rats have led to the definition of cognitive maps used for path planning [21]. Most of cognitive maps models are based on graphs showing how to go from one place to an other [2, 3]. They mainly differ in the way they use the map in order to find the shortest path, in the way

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 129–142, 2006. © Springer-Verlag Berlin Heidelberg 2006

130

N.฀Cuperlier,฀M.฀Quoy,฀and฀P.฀Gaussier

they฀react฀to฀dynamical฀environment฀changes,฀and฀in฀the฀way฀they฀achieve contradictory฀ goal฀ satisfactions.฀ One฀ can฀ refer฀ to฀ [9,฀ 12]฀ for฀ a฀ comparative review฀of฀localisation฀and฀mapping฀models.฀Many฀methods฀rely฀on฀the฀combination฀of฀different฀algorithms฀that฀have฀to฀be฀triggered฀appropriately฀(and concurrently)฀when฀necessary.฀For฀instance,฀localisation฀may฀involve฀different sensors฀(laser,฀ultra-sound,฀visual฀feature฀recognition฀...)฀that฀have฀to฀be฀chosen฀appropriately.฀Some฀works฀use฀ruled-based฀algorithms,฀classical฀functional approach,฀that฀can฀exhibit฀the฀desired฀behaviors,฀we฀will฀not฀discuss฀them฀in this฀paper,฀but฀one฀can฀refer฀to฀[8]. Instead,฀other฀works฀try฀to฀look฀at฀what฀the฀nature฀does฀by฀taking฀inspiration฀from฀neurobiology฀to฀design฀control฀architectures.฀There฀are฀at฀least฀two reasons฀for฀this: • first,฀getting฀robust,฀adaptive,฀opportunistic฀and฀ready-made฀solutions฀for control฀architecture. • second,฀if฀robotic฀results฀can฀be฀compared฀to฀experimental฀results฀involving several฀parts฀of฀the฀brain,฀which฀are฀generally฀difficult฀to฀study฀due฀to฀its complexity,฀it฀can฀help฀neurobiologists฀to฀understand฀how฀a฀neurobiological model฀behaves. Hence฀ we฀ propose฀ here฀ a฀ unified฀ neuronal฀ framework฀ based฀ on฀ an฀ hippocampal฀and฀and฀prefrontal฀model฀where฀vision,฀place฀recognition฀and฀deadreckoning฀are฀fully฀integrated฀(see฀Fig.฀4฀for฀an฀overview฀of฀the฀architecture). This฀model฀relies฀on฀a฀topological฀map:฀the฀environment฀is฀coded฀via฀a฀set฀of distinctive฀nodes฀and฀by฀the฀way฀a฀robot฀can฀go฀from฀one฀node฀to฀another.฀In our฀work,฀those฀nodes฀are฀not฀directly฀places฀of฀the฀environment฀but฀rather the฀ transition฀ between฀ two฀ of฀ them.฀ No฀ cartesian฀ metric฀ informations฀ and no฀ occupancy฀ grid฀ are฀ used฀ to฀ construct฀ the฀ map.฀ Localisation฀ is฀ achieved using฀a฀biomimetic฀model฀designed฀to฀emulate฀the฀neural฀activity฀of฀particular฀ neurons฀ found฀ in฀ the฀ rat฀ hippocampus฀ named฀ place฀ cells.฀ Those฀ cells learn฀direction฀and฀identity฀(recognition)฀of฀punctual฀landmarks฀leading฀to฀a place฀definition.฀A฀key฀point฀for฀the฀understanding฀of฀this฀model฀is฀the฀distinction฀between฀transition฀coding฀for฀the฀succession฀of฀two฀place฀cells฀at฀the recognition฀(visual)฀level฀and฀motor฀transition฀encoding,฀on฀a฀motor฀level,฀the integrated฀movement฀performed฀to฀go฀from฀one฀place฀to฀the฀other.฀Whereas these฀two฀kinds฀of฀transition฀are฀strongly฀dependent฀and฀linked฀in฀a฀unique way,฀they฀do฀not฀have฀the฀same฀modality:฀one฀is฀related฀to฀vision฀coding฀and the฀other฀is฀related฀to฀motor฀coding.฀Keeping฀this฀basic฀distinction฀in฀mind, we฀can฀list฀the฀assets฀of฀this฀model: • autonomous฀ landmarks฀ extraction฀ based฀ on฀ characteristic฀ points฀ (section฀2) • autonomous฀place฀building฀via฀place-cells-like฀neurons:฀there฀are฀no฀a฀priori predefined฀squares,฀or฀world฀model฀(section฀3) • autonomous฀creation฀of฀transitions฀at฀recognition฀level฀(recognition฀transitions).฀A฀neuron฀codes฀the฀succession฀of฀two฀recognized฀place฀cells฀without any฀combinatorial฀explosion฀(section฀4).

Navigation฀and฀Planning฀in฀an฀Unknown฀Environment

131

• autonomous฀cognitive฀map฀building฀based฀on฀those฀recognition฀transitions between฀places,฀giving฀topological฀informations฀like฀adjacency฀of฀two฀transitions฀linked฀(i.e:฀if฀transitions฀AB฀and฀BC฀are฀linked:฀adjacency฀of฀the destination฀place฀cell฀AB฀with฀the฀place฀cell฀of฀origine฀of฀transition฀cell BC).฀But,฀this฀map฀can฀also฀give฀a฀kind฀of฀metric฀via฀the฀value฀of฀the฀arc of฀this฀graph.฀(section฀5). • autonomous฀association฀of฀recognition฀transitions฀with฀integrated฀movement฀giving฀motor฀transitions฀which฀can฀be฀used฀for฀planning฀(section฀6). • autonomous฀planning฀using฀both฀the฀cognitive฀map฀(graph฀of฀recognition transitions)฀and฀the฀corresponding฀motor฀transitions฀(section฀7) • stable฀movement฀given฀by฀the฀stable฀fixed฀point฀solution฀of฀a฀dynamical system฀(section฀8) Drawbacks฀will฀be฀left฀for฀conclusion.

2฀Autonomous฀Landmark฀Extraction฀Based฀on Characteristic฀Points Images฀ are฀ taken฀ by฀ a฀ panoramic฀ camera฀ at฀ a฀ low฀ resolution.฀ This฀ allows to฀handle฀lighter฀images฀that฀may฀be฀processed฀in฀a฀real฀time.฀In฀order฀to eliminate฀problems฀induced฀by฀luminance฀variability,฀we฀only฀use฀the฀gradient image฀as฀input฀of฀the฀system฀(a฀1500 × 240฀pixels฀image฀extracted฀from฀the 640฀× 480฀pixels฀panoramic฀image฀which฀is฀originally฀circular).฀Two฀processes then฀occur฀in฀parallel: • First,฀curvature฀points฀are฀extracted฀from฀this฀gradient฀image฀by฀Difference Of฀ Gaussian฀ (D.O.G)฀ filtering.฀ Those฀ curvature฀ points฀ are฀ robust฀ focal points฀due฀to฀the฀low฀resolution.฀Each฀focal฀point฀is฀the฀center฀of฀a฀32x32 pixels฀ small฀ image฀ giving฀ the฀ local฀ visual฀ area฀ extracted฀ around฀ it฀ (cf. circles฀on฀Fig.฀1)฀.฀This฀image฀is฀binarized฀through฀a฀log-polar฀transform [19,฀11]฀and฀next฀it฀is฀learned฀on฀a฀neuron฀coding฀for฀this฀landmark.฀A฀soft competition฀between฀ landmark฀ neurons,฀ allowing฀several฀interpretations of฀a฀given฀local฀snapshot,฀is฀then฀computed • Second,฀each฀landmark฀is฀linked฀with฀its฀angular฀position฀relative฀to฀the north฀given฀by฀a฀compass฀[20,฀14]. In฀a฀panoramic฀image,฀32฀(landmark,฀azimuth)฀pairs฀are฀extracted฀(see฀Fig. 1).฀Thus,฀this฀visual฀system฀provides฀both฀a฀what and฀a where฀informations: the฀recognition฀of฀a฀32฀× 32฀pixels฀snapshot฀in฀log-polar฀coordinates,฀and฀the azimuth฀of฀the฀focal฀point.฀What฀ and where informations฀are฀then฀merged in฀a฀product฀space฀that฀memorizes฀the฀incoming฀inputs฀during฀a฀given฀time. The฀number฀of฀landmarks฀needed฀is฀a฀balance฀between฀the฀robustness฀of฀the algorithm฀and฀the฀speed฀of฀the฀process.฀If฀all฀landmarks฀were฀fully฀recognized, only฀three฀of฀them฀will฀be฀needed.฀But฀as฀some฀of฀them฀may฀not฀be฀recognized in฀case฀of฀changing฀conditions฀(luminance;฀occlusion),฀taking฀a฀greater฀number

132

N.฀Cuperlier,฀M.฀Quoy,฀and฀P.฀Gaussier

Fig. 1. Image taken from a panoramic camera. Below are 15 examples of 32 × 32 log-polar transforms taken as landmarks and their corresponding position in the image.

is enough to guarantee the robustness. Moreover the log-polar transform gives some rotation and depth robustness.

3฀Autonomous฀Place฀Building Each฀set฀of฀(landmark,฀azimuth)฀pairs,฀merged฀in฀the฀product฀space,฀is฀learned and฀thus฀characterizes฀one฀location.฀The฀neuron฀coding฀for฀this฀location฀is called฀ a฀ “place฀ cell”฀ (PC)฀ as฀ the฀ one฀ found฀ in฀ the฀ rat’s฀ hippocampus฀ [14]. Place฀cell’s฀activity฀is฀the฀result฀of฀a฀matching฀function฀computing฀the฀distance between฀the฀learned฀set฀and฀the฀current฀set฀(the฀distance฀is฀computed฀only on฀the฀recruited฀neurons).฀Thus,฀the฀activity฀of฀the฀k th฀ PC฀can฀be฀expressed as฀follows: Pk฀ =฀ N

1 lk

NL i=1

L ωik .fs (Li ).gd (θik − θi )

(1)

L with lk = i=1 ωik the number of landmarks used for the k th PC, where ωik = {0, 1} expresses the fact that landmark i has been used to encode PC k, with NL the number of learned landmarks, Li the activity of the landmark i, fs (x) the activation function of the neurons in the landmark recognition L group, θik the learned azimuth of the ith landmark for the k th PC, θi the azimuth of the current local view interpreted as the landmark i. d is the angular diffusion parameter which defines the shape of the function gd (x). The purpose of fs (x) and gd (x) is to adapt respectively the dynamics of what and where groups of neurons. They are defined as follow :

Navigation฀and฀Planning฀in฀an฀Unknown฀Environment

g฀d฀(x)฀=฀ 1 − fs (x) =

1 1−s

|x| d.π

133

+

[x฀− s]

+

where฀[฀x]฀+ =฀x฀if฀ x฀>฀0฀,฀ and 0฀otherwise. The฀s฀parameter฀rescales฀the฀activity฀of฀the฀landmark฀neuron฀over฀s฀between฀0฀and฀1.฀The฀d฀parameter฀modulates฀the฀weight฀of฀the฀angular฀displacement. More฀information฀on฀the฀neural฀model฀for฀place฀cell฀coding฀may฀be฀found in฀[10,฀7].

An

R.T

Outdistance

Fig. 2. Idealised shape of a place cell (An ) response, according to the distance to the exact location where it has been learned. This cell has its maximal response in this place.

Recruitment of a new place cell for encoding a new location is performed autonomously, without any external signal. If the activities of all previously learned place cells are below a given recognition threshold (R.T), then a new neuron is recruited for coding this recently discovered location (see Fig. 2). Hence, the density of location learned depends on the level of this threshold. But it depends also on the position of the robot in the environment. Namely, more locations are learned near walls or doors because a fast change in the angular position of near landmarks, or in the (dis)appearance of landmarks may occur. In other locations, small changes produce a small variation in the place cell activity (see Fig. 2) . If at a given place, several cells responds with an activity greater than the R.T, a competition takes place so that the most activated cell wins and codes this location. These cells are created during the exploration of the unknown environment. We use random exploration but naturally other kind of algorithms can be used. At the end of this task, the environment is fully covered by place cells, so that in any part of it a place cell responds specifically for it (see Fig. 3.).

134

N.฀Cuperlier,฀M.฀Quoy,฀and฀P.฀Gaussier

Fig. 3. A simulated environment fully explored. Each region represents the response domain for which a particular place cell win the competition for recognition. After a full exploration the entire environment is covered by the place cell population.

4฀Autonomous฀Building฀of฀Transitions฀ at฀Recognition฀Level Two฀successively฀reached฀places฀are฀coded฀by฀a transition฀cell (see฀Fig.฀4). Hence฀two฀successively฀reached฀places฀(A,฀then฀B)฀are฀coded฀by฀a transition cell฀(AB).฀A฀relevant฀question฀is฀about฀the฀growth฀of฀the฀number฀of฀these฀cells. Before฀to฀show฀the฀experimental฀results,฀we฀already฀can฀make฀two฀important remarques฀about฀this฀growth. • First,฀this฀growth฀is฀intimately฀linked฀with฀the฀number฀of฀place฀cells.฀This last฀one฀mainly฀depends฀on฀two฀parameters: –฀ the฀value฀of฀R.T:฀the฀higher฀R.T฀is฀the฀bigger฀is฀the฀number฀of฀cell created. –฀ the฀complexity฀of฀the฀environment:฀the฀number฀and฀the฀location฀of฀its landmarks฀and฀the฀number฀of฀obstacles฀found฀inside. • Second,฀we฀do฀not฀create฀all฀possible฀transitions฀but฀only฀physically฀feasible transitions฀between฀“adjacent”฀place฀cells.฀And฀since฀the฀number฀of฀a฀place cell฀neighbours฀is฀necessary฀limited฀(see฀Fig฀3),฀the฀number฀of฀transition created฀is฀also฀limited. Hence,฀we฀have฀studied฀the฀ratio฀between฀created฀transition฀cells฀over฀created place฀cells฀for฀three฀environments฀of฀increasing฀complexity฀according฀to฀their obstacle฀ configuration฀ (a฀ single,฀ a฀ two฀ and฀ a฀ four฀room฀environment).฀ The tests฀have฀been฀performed฀in฀simulation฀using฀a฀virtual฀robot฀(or฀animat). For฀each฀simulation,฀10฀animats฀have฀explored฀their฀environment฀for฀50000 cycles.฀This฀number฀has฀been฀chosen฀high฀enough฀to฀be฀sure฀that฀the฀animat has฀learned฀a฀complete฀cognitive฀map.฀The฀results฀shown฀here฀are฀the฀average on฀these฀10฀animat฀results.฀The฀ratio฀remains฀stable฀around฀the฀mean฀value 5.45฀for฀all฀environments฀(see฀table฀1.).

Navigation฀and฀Planning฀in฀an฀Unknown฀Environment

Env / RT nbp nbt ratio nbp nbt ratio nbp nbt ratio

0.97 133.8(2.85) 735.8(19.80) 5.49(0.06) 606.2(6.89) 3389.2(56.38) 5.59(0.08) 643.7(9,88) 3281,2(48,80) 5.09(0,04)

135

Table฀ 1.฀Ratio฀ of฀ the฀ number฀ of฀ place cells฀ (nbp)฀ created฀ over฀ the฀ number฀ of transitions฀ created฀ (nbt)฀ according฀ to the฀ number฀ of฀ room฀ in฀ the฀ environment:฀with฀one฀room฀(top฀line),฀with฀two rooms฀(middle฀line)฀and฀four฀rooms฀(bottom฀ line).฀ Standard฀ deviation฀ is฀ given into฀brackets.฀This฀ratio฀remains฀stable. There฀are฀five฀times฀more฀transition฀cells than฀place฀cells.

Indeed,฀ only฀ a฀ few฀ transitions฀ can฀ be฀ created฀ from฀ a฀ given฀ place฀ cell, since฀ a฀ transition฀ is฀ a฀ link฀ between฀ “adjacent”฀ place฀ cells฀ and฀ since฀ the number฀ of฀ a฀ place฀ cell฀ neighbours฀ is฀ necessary฀ limited.฀ So฀ there฀ is฀ no combinatorial฀ explosion฀ on฀ the฀ number฀ of฀ created฀ transitions฀ and฀ we฀ do not฀need฀a฀”full฀matrix”฀to฀create฀the฀transitions:฀a฀matrix฀of฀only฀M*N฀is enough.฀With฀M฀the฀number฀of฀place฀cells,฀and฀N฀the฀maximal฀number฀of possible฀neighbours.

Place recognition t−1

Azimuth

...

... ... Landmark

Place recognition t−1 Cognitive map

...

Landmark − azimuth Place recognition t

Transition map ...

...

Recognition transitions Place recognition t

Motor transitions

... ...

Motor command

One to one links − No learning One to all links − Learning

Fig. 4. Sketch of the model. From left to the right: merging landmarks and their azimuth in a product space, then learning of the corresponding set on a place cell. Two successive place cells (the one at time t and the previous one at t − 1) define a transition cell. They are used to build up the cognitive map and are also linked with the corresponding motor transition that integrated the movement performed.

136

N.฀Cuperlier,฀M.฀Quoy,฀and฀P.฀Gaussier

5฀Autonomous฀Cognitive฀Map฀Building Since฀ our฀ robotic฀ model฀ is฀ inspired฀ from฀ the฀ animat฀ approach฀ [13]฀ we฀ use three฀ contradictory฀ animal฀ like฀ motivations฀ (eating,฀ drinking,฀ and฀ resting). Each฀one฀is฀associated฀with฀a฀satisfaction฀level฀that฀decreases฀over฀time฀and increases฀when฀the฀robot฀is฀on฀the฀proper฀source.฀When฀a฀level฀of฀satisfaction falls฀bellow฀a฀given฀threshold,฀the฀corresponding฀motivation฀is฀triggered฀so that฀the฀robot฀has฀to฀reach฀a฀place฀allowing฀to฀satisfy฀this฀need.฀Hence฀this place฀ becomes฀ the฀ goal฀ to฀ reach.฀ More฀ sources฀ can฀ be฀ added฀ and฀ one฀ can increase฀the฀number฀of฀sources฀associated฀with฀a฀given฀motivation.฀Each฀time a฀new฀transition฀is฀created,฀a฀new฀node฀is฀recruited฀in฀the฀cognitive฀map.฀This node฀is฀then฀linked฀with฀the฀previous฀transition฀used.฀When฀a฀transition฀leads to฀a฀place฀cell฀where฀a฀source฀can฀be฀found,฀a฀link฀between฀the฀corresponding motivation฀and฀the฀most฀active฀node฀on฀the฀map฀is฀created฀and฀set฀to฀one (latent฀learning),฀otherwise฀this฀link฀is฀set฀to฀zero.฀After฀some฀time,฀exploring the฀environment฀leads฀to฀the฀creation฀of฀the฀cognitive฀map฀(see฀Fig.฀5).฀This map฀may฀be฀seen฀as฀a฀graph฀where฀each฀node฀is฀a฀transition฀and฀the฀arcs the฀fact฀that฀the฀path฀between฀these฀two฀transitions฀was฀used.฀We฀can฀give a฀fixed฀value฀(lower฀than฀one)฀to฀each฀link฀at฀the฀creation฀time.฀This฀value is฀ increased฀ if฀ the฀ link฀ is฀ used,฀ and฀ decreased฀ if฀ it฀ is฀ not.฀ After฀ some฀ time passed฀in฀the฀environment,฀some฀links฀are฀reinforced.฀These฀links฀correspond to฀paths฀that฀are฀often฀used.฀In฀particular,฀this฀is฀the฀case฀when฀some฀particular฀locations฀have฀to฀be฀reached฀more฀often฀than฀others฀(see฀section฀7)฀[15].

Fig. 5. Cognitive map (in red) build by exploration of the environment. The triangles give the successive robot positions starting from the right to the goal (on the left). Landmarks are represented by blue crosses. The blue rectangle are obstacles.

Navigation฀and฀Planning฀in฀an฀Unknown฀Environment

137

6฀Autonomous฀Motor฀Transitions฀Creation Each฀of฀these฀cells฀is฀linked฀with฀the฀direction฀used฀to฀go฀from฀the฀starting location฀to฀the฀ending฀location.฀For฀instance,฀going฀from฀place฀A฀to฀place฀B creates฀a฀recognition฀transition฀cell฀AB฀ and฀the฀corresponding฀node฀on฀the map.฀In฀the฀same฀time฀another฀transition฀cell฀is฀created฀on฀the฀motor฀level. This฀motor฀transition฀associates฀the฀direction฀(relative฀to฀the฀north)฀for฀going from฀A฀to฀B฀with฀the฀node฀AB฀on฀the฀map.฀This฀direction฀integrates฀all฀direction฀changes฀performed฀between฀A฀and฀the฀creation฀of฀B฀using฀robot฀wheel encoders฀to฀compute฀elementary฀displacement฀vectors.฀Direction฀changes฀are induced฀by฀a฀new฀movement฀vector฀generated฀by฀the฀exploration฀mechanism (random฀ exploration)฀ or฀ due฀ to฀ the฀ obstacle฀ avoidance฀ mechanism.฀ Hence the฀integrated฀vector฀(I.V)฀takes฀care฀of฀all฀these฀movement฀changes.฀Each time฀this฀transition฀is฀performed,฀the฀I.V.฀is฀averaged฀with฀the฀corresponding learned฀direction.฀The฀I.V฀is฀then฀reset฀when฀entering฀a฀different฀place฀cell. The฀norm฀of฀this฀vector฀is฀also฀computed฀in฀the฀same฀way.

7฀Autonomous฀Planning฀Using฀the฀Cognitive฀Map฀and Motor฀Transitions When฀a฀goal฀has฀to฀be฀reached,฀the฀transitions฀leading฀to฀it฀are฀activated฀via the฀links฀learned฀during฀exploration฀between฀those฀transitions฀and฀the฀corresponding฀motivation.฀This฀activation฀is฀then฀diffused฀on฀the฀cognitive฀map graph,฀each฀node฀taking฀the฀maximal฀incoming฀value฀which฀is฀the฀product between฀the฀weight฀on฀the฀link฀(lower฀than฀one)฀and฀the฀activity฀of฀the฀node sending฀the฀link.฀After฀stabilization,฀this฀diffusion฀process฀gives฀the฀shortest path฀between฀all฀nodes฀and฀the฀goal฀nodes.฀This฀is฀a฀neural฀version฀of฀the Bellman-Ford฀algorithm฀[5,฀17]฀(see฀Fig.6). When฀the฀robot฀is฀in฀a฀particular฀location฀A,฀all฀possible฀transitions฀beginning with฀A฀are฀possible.฀The฀top-down฀effect฀of฀the฀cognitive฀map฀is฀to฀bias฀the possible฀transitions฀such฀that฀the฀ones฀chosen฀by฀the฀cognitive฀map฀have฀a higher฀value.฀This฀value฀reflects฀a฀topological฀distance฀measure:฀the฀number of฀ intermediate฀ node฀ to฀ get฀ in฀ touch฀ with฀ to฀ actually฀ reach฀ the฀ goal.฀ This small฀ bias฀ is฀ enough฀ to฀ select/filter฀ the฀ appropriate฀ transitions฀ via฀ a฀ competition฀mechanism฀(however฀see฀section฀8).฀This฀mechanism฀realizes฀a฀soft competition:฀several฀motor฀transitions฀are฀proposed฀at฀this฀level.฀They฀allow to฀compute฀a฀more฀accurate฀direction฀than฀a฀strict฀competition฀since฀transitions฀starting฀only฀from฀places฀really฀close฀to฀the฀current฀one฀and฀topologically close฀to฀the฀goal฀(on฀the฀graph)฀are฀selected.฀Final฀selection฀of฀the฀motor฀action฀results฀from฀the฀merging฀of฀these฀global฀decisions฀with฀local฀constraints such฀as฀obstacle฀avoidance,฀robot฀inertia...

138

N.฀Cuperlier,฀M.฀Quoy,฀and฀P.฀Gaussier 1 MotivA

Go al m ap

MotivB

Pl ac eR ec og ni tio n

0.81

Transition

0.9

1

W=0.9

0.81 Ni

Pi

0.9

Fig. 6. Diffusion of the activity in the graph corresponding to the cognitive map. Diffusion is starting from the goal. Each node keeps the maximal activity coming from its neighbours. The activity is the product between the value of the link and the activity of the sending node.

8 Movement and neural field dynamics As seen in section 7, after planning, different movements are proposed. a simple competition mechanism selects the neuron with the higher value. But how may these informations be used ? We have first tested strict competition between them, but why do not also use other transitions that contains interesting informations about the agent location context ? So we now have several transitions to be taken into account for the movement. The solution used for having a stable continuous direction to follow is to define a dynamical system where the stable fixed point solution is the direction to follow. This is achieved using a neural field [1, 18, 16]. τ.

df (x, t) = −f (x, t) + I(x, t) + h + dt

z∈Vx

w(z).f (x − z, t)dz

(2)

Where f (x, t) is the activity of neuron x, at time t. I(x, t) is the input to the system. h is a negative constant. τ is the relaxation rate of the system. w is the interaction kernel in the neural field activation. A difference of Gaussian (DOG) models these lateral interactions that can be excitatory or inhibitory. Vx is the lateral interaction interval that defines the neighbourhood. Without inputs the constant h ensures the stability of the neural field homogeneous pattern since f (x, t) = h. In the following, the x dimension is an angle (direction to follow according to the north). This equation allows the computation of attractors corresponding to fixed points of the dynamics and to local maxima of the neural field activity. A stable direction to follow is reached when the system is on any of the attractors. The angle of a candidate transition is used as input. The intensity of this input depends on the corresponding goal transition activity, but also on its origin place cell recognition activity (see Fig 7). If only one transition is proposed, there will be only one input with an angle xtarg = x∗ and it erects only one

Navigation฀and฀Planning฀in฀an฀Unknown฀Environment

139

Animat Location

B

AB

A

B

BD

A D C

level of place recognition in the neighborhood

AC

global mvt

C

CD

D

level of place recognition in the neighborhood + transitions cells and actions

Fig.฀7.฀The merging mechanism allows to get a better direction (global movement) than the use of the single information obtained from current transition (BD). It takes into account the previous movement performed and the transitions predicted from close enough place cell (C).

attractor x∗ =฀xtarg฀ on฀the฀neural฀field.฀If฀xc฀ is฀the฀current฀orientation฀of฀the robot,฀its฀rotation฀speed฀will฀be฀proportional฀to฀w฀=฀x˙ = df (x,t) dt |xc . Merging of several transition informations depends on the distance between them. Indeed if the inputs are spatially close, the dynamics give rise to a single attractor corresponding to the average of them. Otherwise, if we progressively amplify the distance between inputs, a bifurcation point appears for a critical distance, and the previous attractor becomes a repellor and two new attractors emerge. Obstacles are detected by 12 infra-red sensors. A reflex behavior is triggered by a Braitenberg-like architecture [6]. When an obstacle is detected on a given direction, the reflex system will generate a negative input in this orientation. Hence the bubble of the neural field activity will move. Consequently the computed direction will reflect this change and allow to avoid this obstacle (see Fig 8). Oscillations between two possible directions are avoided by the hysteresis property of this input competition/cooperation mechanism. It is possible to adjust this distance to a correct value by calibrating the two elements responsible for this effect: spatial filtering is obtained by the convolution of the Dirac like signal coming from motor transition information with a Gaussian and taking it as the input to the system. This combined with the lateral interactions allows the fusion of distinct input as a same attractor. The larger the curve, the larger the merging will be.

9 Conclusion Our model currently running on robots (Koala robots and Labo3 robots) has interesting properties in terms of autonomous behavior. However, this autonomy has some drawbacks:

140

N.฀Cuperlier,฀M.฀Quoy,฀and฀P.฀Gaussier

Robot at ending position

Robot at starting position B

Inputs to the neural field

Location A:

A

Robot trajectory

Obstacle input Neural field

Final movement selection

Neural field

Final movement selection

Joystick direction input

Location B : Inputs to the neural field

Obstacle input

Joystick direction input

Fig.฀8.฀Top: Traectory of a Labo3 robot in an open environment with obstacles. The movement direction is given by a oystick input. We focus on the final action selection on two points in this environment: A and B. During all the traectory, the movement ordered by the oystick is go straight. Turn movements in the robot traectory are only due to obstacle avoidance. Middle (A): Neural field activity without any obstacle. The direction taken corresponds to the oystick input. Bottom (B): Neural field activity with an obstacle. The obstacle shifts the neural field maximal activity leading to a turning move.

Navigation฀and฀Planning฀in฀an฀Unknown฀Environment

141

• we฀are฀not฀able฀to฀build฀a฀cartesian฀map฀of฀the฀environment฀because฀all location฀learned฀are฀robot฀centered.฀However,฀the฀places฀in฀the฀cognitive map฀and฀the฀directions฀used฀give฀a฀skeleton฀of฀the฀environment. • we฀ have฀ no฀ information฀ about฀ the฀ exact฀size฀ of฀ the฀ rooms฀ or฀ corridors. Again,฀the฀cognitive฀map฀only฀gives฀a฀sketch฀of฀the฀environment. • some฀parameters฀have฀to฀be฀set:฀the฀recognition฀threshold฀(section฀3)฀and the฀diffusion฀size฀of฀the฀interaction฀kernel฀of฀the฀neural฀field฀(section฀8).฀The first฀one฀determines฀the฀density฀of฀build฀places.฀The฀higher฀the฀threshold, the฀more฀places฀are฀created.฀The฀second฀parameter฀has฀to฀be฀tuned฀for฀each robot฀depending฀on฀its฀size฀and฀on฀the฀position฀of฀the฀infra-red฀sensors฀for obstacle฀avoidance.฀For฀instance,฀a฀too฀high฀diffusion฀value฀prevents฀the robot฀from฀going฀through฀the฀doors. The฀transition฀used฀in฀this฀model฀may฀also฀be฀the฀elementary฀block฀of฀a sequence฀learning฀process.฀Thus,฀we฀are฀able฀to฀propose฀a฀unified฀vision฀of฀the spatial฀ (navigation)฀ and฀ temporal฀ (memory)฀ functions฀ of฀ the฀ hippocampus [4]. Acknowledgments This฀work฀is฀supported฀by฀two฀french฀ACI฀programs.฀The฀first฀one฀on฀the฀modeling of฀the฀interactions฀between฀hippocampus,฀prefontal฀cortex฀and฀basal฀ganglia฀in฀collaboration฀with฀B.฀Poucet฀(CRNC,฀Marseille)฀JP.฀Banquet฀(INSERM฀U483)฀and฀R. Chatila฀(LAAS,฀Toulouse).฀The฀second฀one฀(neurosciences฀integratives฀et฀computationnelles)฀on฀the฀dynamics฀of฀biologically฀plausible฀neural฀networks฀in฀collaboration with฀M.฀Samuelides฀(SupAero,฀Toulouse),฀G.฀Beslon฀(INSA,฀Lyon),฀and฀E.฀Dauce (Perception฀et฀mouvement,฀Marseille).

References 1.฀ S.฀Amari.฀Dynamics฀of฀pattern฀formation฀in฀lateral-inhibition฀type฀neural฀fields. Biological฀Cybernetics,฀27:77–87,฀1977. 2.฀ M.฀Arbib฀and฀I.฀Lieblich.฀ Motivational฀learning฀of฀spatial฀behavior.฀ In฀J.฀Metzler,฀editor,฀Systems฀Neuroscience,฀pages฀221–239.฀Academic฀Press,฀1977. 3.฀ I.฀Bachelder฀and฀A.฀Waxman.฀Mobile฀robot฀visual฀mapping฀and฀localization:฀A view-based฀neurocomputational฀architecture฀that฀emulates฀hippocampal฀place learning.฀ Neural฀Networks,฀7(6/7):1083–1099,฀1994. 4.฀ J.฀ Banquet,฀ P.฀ Gaussier,฀ M.฀ Quoy,฀ A.฀ Revel,฀ and฀ Y.฀ Burnod.฀ A฀ hierarchy of฀associations฀in฀hippocampo-cortical฀systems:฀cognitive฀maps฀and฀navigation strategies.฀ Neural฀Computation,฀17(6),฀2005. 5.฀ R.฀Bellman.฀On฀a฀routing฀problem.฀Quarterly฀of฀Applied฀Mathematics,฀16:87–90, 1958. 6.฀ V.฀ Braitenberg. Vehicles฀ :฀ Experiments฀ in฀ Synthetic฀ Psychology.฀ Bradford Books,฀Cambridge,฀1984. 7.฀ J.฀B.฀C.฀Giovannangeli,฀P.฀Gaussier.฀Robot฀as฀a฀tool฀to฀study฀the฀robustness฀of visual฀place฀cells.฀ In฀I3M’2005:฀International฀Conference฀on฀Conceptual฀Modeling฀and฀Simulation,฀Marseille,฀2005.

142

N.฀Cuperlier,฀M.฀Quoy,฀and฀P.฀Gaussier

8.฀ J.฀ Donnart฀ and฀ J.฀ Meyer.฀ Learning฀ reactive฀ and฀ planning฀ rules฀ in฀ a฀ motivationnally฀ autonomous฀ animat.฀ IEEE฀ Transactions฀ on฀ Systems,฀ Man฀ and Cybernetics-Part฀B,฀26(3):381–395,฀1996. 9.฀ D.฀Filliat฀and฀J.-A.฀Meyer.฀Map-based฀navigation฀in฀mobile฀robots฀-฀I.฀a฀review of฀localisation฀strategies.฀Journal฀of฀Cognitive฀Systems฀Research,฀4(4):243–282, 2003. 10.฀ P.฀Gaussier,฀A.฀Revel,฀J.฀Banquet,฀and฀V.฀Babeau.฀ From฀view฀cells฀and฀place cells฀to฀cognitive฀map฀learning:฀processing฀stages฀of฀the฀hippocampal฀system. Biological฀Cybernetics,฀86:15–28,฀2002. 11.฀ C.฀ Joulain,฀ P.฀ Gaussier,฀ and฀ A.฀ Revel.฀ Learning฀ to฀ build฀ categories฀ from perception-action฀ associations.฀ In International฀ Conference฀ on฀ Intelligent Robots฀ and฀ Systems฀ -฀ IROS’97,฀ pages฀ 857–864,฀ Grenoble,฀ France,฀ September 1997.฀IEEE/RSJ. 12.฀ J.-A.฀Meyer฀and฀D.฀Filliat.฀ Map-based฀navigation฀in฀mobile฀robots฀-฀II.฀a฀review฀of฀map-learning฀and฀path-planing฀strategies.฀Journal฀of฀Cognitive฀Systems Research,฀4(4):283–317,฀2003. 13.฀ J.-A.฀Meyer฀and฀S.฀Wilson.฀From฀animals฀to฀animats.฀In฀M.฀Press,฀editor, First International฀Conference฀on฀Simulation฀of฀Adaptive฀Behavior.฀Bardford฀Books, 1991. 14.฀ J.฀O’Keefe฀and฀N.฀Nadel.฀The฀hippocampus฀as฀a฀cognitive฀map.฀Clarendon฀Press, Oxford,฀1978. 15.฀ M.฀Quoy,฀P.฀Gaussier,฀S.฀Leprˆetre,฀A.฀Revel,฀C.฀Joulain,฀and฀J.฀Banquet.฀Lecture Notes฀in฀Artificial฀Intelligence฀Series,฀1812,฀chapter฀A฀planning฀map฀for฀mobile robots:฀speed฀control฀and฀paths฀finding฀in฀a฀changing฀environment,฀pages฀103– 119.฀ Springer,฀ISBN฀3-540-41162-3,฀2000. 16.฀ M.฀Quoy,฀S.฀Moga,฀and฀P.฀Gaussier.฀ Dynamical฀neural฀networks฀for฀top-down robot฀control.฀ IEEE฀transactions฀on฀Man,฀Systems฀and฀Cybernetics,฀Part฀A, 33(4):523–532,฀2003. 17.฀ A.฀Revel,฀P.฀Gaussier,฀S.฀Leprˆetre,฀and฀J.฀Banquet.฀Planification฀versus฀sensorymotor฀conditioning:฀what฀are฀the฀issues฀?฀ In From฀Animals฀to฀Animats฀:฀Simulation฀of฀Adaptive฀Behavior฀SAB’98,฀pages฀129–138,฀1998. 18.฀ G.฀Sch¨ oner,฀M.฀Dose,฀and฀C.฀Engels.฀ Dynamics฀of฀behavior:฀theory฀and฀applications฀for฀autonomous฀robot฀architectures.฀Robotics฀and฀Autonomous฀System, 16(2-4):213–245,฀December฀1995. 19.฀ L.฀ Schwartz.฀ Computational฀ anatomy฀ and฀ functional฀ architecture฀ of฀ striate cortex:฀a฀spatial฀mapping฀approach฀to฀perceptual฀coding.฀ Vision฀Res.,฀20:645– 669,฀1980. 20.฀ N.฀Tinbergen.฀ The฀study฀of฀instinct.฀ Oxford฀University฀Press,฀London,฀1951. 21.฀ E.฀Tolman.฀ Cognitive฀maps฀in฀rats฀and฀men. The฀Psychological฀Review,฀55(4), 1948.

Exploiting฀Distinguishable฀Image฀Features฀in Robotic฀Mapping฀and฀Localization Patric฀Jensfelt1 ,฀John฀Folkesson1 ,฀Danica฀Kragic1 ,฀and฀ Henrik฀I.฀Christensen1 Centre฀for฀Autonomous฀Systems Royal฀Institute฀of฀Technology SE-100฀44฀Stockholm,฀SWEDEN [patric,johnf,danik,hic]@nada.kth.se Summary.฀ Simultaneous฀ localization฀ and฀ mapping฀ (SLAM)฀ is฀ an฀ important฀ research฀area฀in฀robotics.฀Lately,฀systems฀that฀use฀a฀single฀bearing-only฀sensors฀have received฀significant฀attention฀and฀the฀use฀of฀visual฀sensors฀have฀been฀strongly฀advocated.฀In฀this฀paper,฀we฀present฀a฀framework฀for฀3D฀bearing฀only฀SLAM฀using฀a single฀camera.฀We฀concentrate฀on฀image฀feature฀selection฀in฀order฀to฀achieve฀precise localization฀and฀thus฀good฀reconstruction฀in฀3D.฀In฀addition,฀we฀demonstrate฀how these฀features฀can฀be฀managed฀to฀provide฀real-time฀performance฀and฀fast฀matching to฀ detect฀ loop-closing฀ situations.฀ The฀ proposed฀ vision฀ system฀ has฀ been฀ combined with฀an฀extended฀Kalman฀Filter฀(EKF)฀based฀SLAM฀method.฀A฀number฀of฀experiments฀have฀been฀performed฀in฀indoor฀environments฀which฀demonstrate฀the฀validity and฀effectiveness฀of฀the฀approach.฀We฀also฀show฀how฀the฀SLAM฀generated฀map฀can be฀used฀for฀robot฀localization.฀The฀use฀of฀vision฀features฀which฀are฀distinguishable allows฀a฀straightforward฀solution฀to฀the฀“kidnapped-robot”฀scenario.

1 Introduction It is widely recognized that an autonomous robot needs the ability to build maps of the environment using natural landmarks and to use them for localization, [19, 2, 4, 18, 20]. One of the current research topics related to SLAM is the use of vision as the only exteroceptive sensor, [3, 5, 6, 17, 15] due to the low cost. In this paper, we present a SLAM system that builds 3D maps using visual features using a single camera. We describe how we deal with a set of open research issues such as producing stable and well-localized landmarks, robust matching procedure, landmark reconstruction and detection of loop closing. These issues are of extreme importance when working for example in an EKF setting where the computational complexity grows quadratically with the number of features. Robust matching is required for most recursive formulations of SLAM where decisions are final.

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 143–157, 2006. © Springer-Verlag Berlin Heidelberg 2006

144

P.฀Jensfelt฀et฀al.

Besides฀low฀cost,฀another฀aspect฀of฀using฀cameras฀for฀SLAM฀is฀the฀much greater฀richness฀of฀the฀sensor฀information฀as฀compared฀to฀that฀from,฀for฀example,฀a฀range฀sensor.฀Using฀a฀camera฀it฀is฀possible฀to฀recognize฀features฀based on฀their฀appearance.฀This฀can฀then฀simplify฀one฀of฀the฀most฀difficult฀problems฀in฀SLAM,฀namely฀data฀association.฀We฀demonstrate฀just฀how฀powerful an฀advantage฀this฀is฀by฀using฀the฀SLAM฀map฀to฀perform฀robot฀localization without฀any฀initial฀pose฀estimate. The฀main฀contributions฀of฀this฀work฀are฀i)฀a฀method฀for฀the฀initialization฀of visual฀landmarks฀for฀SLAM,฀ii)฀a฀robust฀and฀precise฀feature฀detector,฀iii)฀the management฀ of฀ the฀ measurement฀ to฀ make฀ on-line฀ estimation฀ possible,฀ and iv)฀the฀demonstration฀of฀how฀this฀framework฀can฀facilitate฀loop฀closing฀and localization.฀Due฀to฀the฀low฀complexity,฀we฀believe฀that฀our฀approach฀scales to฀larger฀environments.

2 Related Work Bearing only/single camera SLAM suffers from the problem that a single observation of a landmark does not provide an estimate of its full pose. This problem is typically addressed by combining the observations from multiple views as in the structure-from-motion (SFM) approaches in computer vision. The biggest difference between SLAM and SFM is that SFM considers mostly batch processing while SLAM typically requires on-line real-time performance. The most important problem that has to be addressed in bearing only SLAM is landmark initialization, again because a single observation does not allow all degrees of freedom to be determined. A particle filter used to represent the unknown initial depth of features has been proposed in [3]. The drawback of the approach is that the initial distribution of particles has to cover all possible depth values for a landmark which makes it difficult to use when the number of detected features is large. A similar approach has been presented in [9] where the initial state is approximated using a Gaussian Sum Filter for which the computational load grows exponentially with number of landmarks. The work in [10] proposes an approximation with additive growth. Similarly to our work, a multiple view approach has been presented in [6]. This work demonstrates the difficulties related to landmark reconstruction when the robot performs only translational motion along the optical axis. To cope with the reconstruction problem, a stereo based SLAM method was presented in [17] where Difference-of-Gaussians (DoG) is used to detect distinctive features which are then matched using SIFT descriptors. One of the important issues mentioned is that their particle filter based approach is inappropriate for large-scale and textured environments. The contribution of our work is that we deal with this problem using a feature detector that produces fewer features (presented in more detail in Section 4). In our work we use only a few high quality features for performing SLAM.

Exploiting฀Distinguishable฀Image฀Features

145

We฀ have฀ also฀ considered฀ another฀ problem฀ raised฀ in฀ [17]฀ related฀ to฀ time consuming฀feature฀matching฀and฀use฀KD-trees฀to฀make฀our฀matching฀process very฀fast.฀The฀visual฀features฀used฀in฀our฀work฀are฀Harris฀corner฀features฀across different฀scales฀represented฀by฀a฀Laplacian฀pyramid฀for฀feature฀detection.฀For feature฀matching,฀we฀take฀a฀combination฀of฀a฀modified฀SIFT฀descriptor฀and KD-trees. In฀man-made฀environments,฀there฀are฀many฀repetitive฀features฀and฀a฀single SIFT฀descriptor฀is฀not฀discriminative฀enough฀in฀itself฀to฀solve฀the฀data฀association฀problem.฀To฀deal฀with฀this฀problem,฀the฀approach฀using฀“chunks”฀of฀SIFT points฀to฀represent฀landmarks฀in฀an฀outdoor฀environment฀has฀been฀presented in฀[12].฀This฀is฀motivated฀by฀the฀success฀that฀SIFT฀has฀had฀in฀recognition฀applications฀where฀and฀object/scene฀is฀represented฀as฀a฀set฀of฀SIFT฀points.฀In฀our approach,฀the฀position฀of฀a฀landmark฀is฀defined฀by฀a฀series฀of฀single฀modified SIFT฀points฀representing฀different฀views฀of฀the฀landmark.฀Each฀such฀point฀is accompanied฀with฀a฀chunk฀of฀descriptors฀that฀make฀the฀matching/recognition of฀landmarks฀more฀robust.฀Our฀experimental฀evaluation฀shows฀also฀that฀our approach฀performs฀successful฀matching฀even฀with฀a฀narrow฀field฀of฀view฀which was฀mentioned฀as฀a฀problem฀in฀[6],฀[17]. One฀of฀the฀more฀challenging฀problem฀in฀SLAM฀is฀loop฀closing.฀In฀[15]฀visually฀salient฀so฀called฀“maximally฀stable฀extremal฀regions”฀or฀MSERs,฀encoded using฀SIFT฀descriptors,฀are฀used฀to฀detect฀when฀the฀robot฀is฀revisiting฀an฀area. A฀number฀of฀approaches฀have฀been฀presented฀for฀other฀sensory฀modalities,฀[7]. We฀also฀show฀how฀our฀framework฀can฀be฀used฀for฀this฀purpose. In฀our฀work,฀a฀distinction฀is฀made฀between฀recognition฀and฀location฀features.฀ A฀ single฀ location฀ feature฀ will฀ be฀ associated฀ with฀ several฀ recognition features.฀ The฀ recognition฀ features’฀ descriptors฀ then฀ give฀ robustness฀ to฀ the match฀between฀the฀location฀features฀in฀the฀map฀and฀the฀features฀in฀the฀current฀image.฀The฀key฀idea฀is฀to฀use฀a฀few฀high฀quality฀features฀to฀define฀the location฀of฀landmarks฀and฀then฀use฀the฀other฀features฀for฀recognition.

3 Landmark Initialization In this work, we use feature matching across N frames to determine which points make good landmarks. Features that are successfully matched over a number of frames are candidates for landmarks in the map. Having a buffer of N frames also allows us to calculate an estimate of the 3D position of the corresponding landmark. The SLAM process is fed with data from the output side of the buffer, i.e. with the data from frames that are delayed N steps with respect to the input side of the buffer. Fig. 1 illustrates the idea. Since an estimate of the 3D position of landmarks can be supplied with the first observation of a landmark presented to the SLAM process (see Fig. 1), the landmarks can immediately be fully initialized in the SLAM process. This allows immediate linearization without the need to apply multiple hypotheses [10] or particle filtering [3] techniques to estimate the depth. It is important

146

P.฀Jensfelt฀et฀al. Matching, Selection & 3D estimation Camera

image(k)

....

meas(k−N) 3D info

SLAM

Fig. 1. Features are tracked over N frames in a buffer to allow for quality assessment and feature selection. The 3D position is estimated and landmarks formed from features that are tracked over many frames and are of high quality. The input to the SLAM process is delayed by N frames, but in return an estimate of the 3D position of the points landmarks can be supplied.

to point out that that the approximate 3D position found from the buffer of frames is only used to be able to initialize the point landmark at the correct depth with respect to the camera at the first observation. In a sense it allows us to know which of the multiple hypotheses about the depth is correct right away which of course saves computations. Having the correct depth allows us, as said before, to handle the linearization errors that would results from having a completely wrong estimate of the depth. In our experience having the SLAM output lagging N frames is not a problem. For cases where the current position of the robot is needed, such as when performing robot control, a prediction can be made using dead-reckoning forward in time from the robot pose given by the SLAM process. For typical values of N , the addition to the robot position error caused by the deadreckoning is small and we believe that the benefits of being able to initialize landmarks using bearing-only information and perform feature quality checks are more significant. It is also important to understand that we predict forward the short distance from the SLAM time to the current time in every iteration so the errors from the prediction do not accumulate. When selecting what features to use as landmarks in the map several criteria are considered: i) The feature is detected in more than a predefined number of frames, ii) The image positions of the feature allow good triangulation, and iii) The resulting 3D point is stable over time in the image. The first requirement removes the noise and moving targets. The second removes the features that have a high triangulation uncertainty (small baseline, features with bearings near the direction of motion). The third requirement removes features that lack sharp positions in all images due to parallax or a lack of a strong maximum in scale space. Difference in scales of the images can also cause apparent motion of features, such as for example a corner of a non-textured object. In our implementation the length of the buffer, i.e. the number N is fixed. We have tested with values between 10 and 50. Since one of the key purposes with the buffer is to allow for 3D estimation we demand that the camera has moved to add a new frame to the buffer. This way, it is likely that there is a baseline for estimating the location. The value of N depends very much on the motion of the robot/camera and the camera parameters. For a narrow field

Exploiting฀Distinguishable฀Image฀Features

147

of฀view฀camera฀mounted฀in฀the฀direction฀of฀motion฀of฀the฀robot฀as฀in฀our฀case the฀effective฀baseline฀will฀be฀quite฀small.฀As฀part฀of฀our฀future฀work฀we฀plan to฀compare฀the฀results฀that฀we฀get฀from฀using฀an฀omnidirectional฀camera.

4 Feature Description It has been shown in [14] that the SIFT descriptor [11] is the most robust regarding scale and illumination changes. The original version of the SIFT descriptor uses feature points determined by the peaks of a series of Difference of Gaussians (DoG) on varying scales. In our system, peaks are found using Harris-Laplace features, [13] since they respond to regions of high curvature, instead of blob-like image structures obtained by series of DoG. This leads to features accurately localized spatially, which is essential when features are used for reconstruction and localization, instead of just recognition. In addition, Harris-Laplace generates on average one fourth of points[8] compared to the regular DoG approach which is an important benefit for SLAM where we strive to keep the number of features low for computational reasons. The original SIFT descriptor assigns canonical orientation at the peak of smoothed gradient histograms. This means that similar corners but with a significant rotation difference can have similar descriptors. In a sparse, indoor environment many of the detected features come from corner features and there may potentially be many false matches. For example, the four corners of the waste bin in Fig. 2 might all match if rotated. Therefore, we use a rotationally ’variant’ SIFT descriptor where we avoid the canonical orientation at the peak of smoothed gradient histogram and leave the gradient histogram as it is.

Fig. 2. Example image from an indoor environment showing that many corner structures are very similar if rotated. Examples are the corners of the waste bin and the window sections on the door.

148

P.฀Jensfelt฀et฀al.

5฀Feature฀Tracking As฀outlined฀in฀Section฀3฀a฀buffer฀with฀the฀last฀N฀frames฀is฀stored฀in฀memory. The฀ image฀ data฀ itself฀ does฀ not฀ need฀ to฀ be฀ stored,฀ it฀ is฀ the฀ feature฀ points extracted฀ from฀ the฀ frames฀ that฀ are฀ stored.฀ The฀ feature฀ points฀ are฀ tracked in฀consecutive฀image฀frames฀as฀the฀robot฀moves.฀The฀similarity฀between฀two feature฀points฀is฀the฀distance฀between฀the฀descriptors฀which฀are฀vectors฀in฀a 128-dimensional฀space.฀To฀manage฀the฀matching฀between฀frames,฀lists฀with association/points฀are฀maintained.฀Fig.฀3฀shows฀the฀basic฀organization฀of฀this frame฀ memory.฀ On฀ the฀ left฀ we฀ have฀ the฀ buffer฀ with฀ the฀ N฀ frames฀ drawn vertically฀with฀the฀input฀side฀at฀the฀top฀and฀output฀side฀that฀is฀fed฀into฀the SLAM฀process฀at฀the฀bottom.฀On฀the฀right฀side฀of฀the฀buffer฀in฀the฀frame memory฀we฀show฀the฀association฀list฀that฀keep฀track฀of฀which฀feature฀points in฀ the฀ different฀ frames฀ have฀ been฀ matched.฀ Each฀ such฀ association฀ list฀ item corresponds฀to฀one฀landmark฀in฀the฀world. The฀SIFT฀descriptor฀of฀a฀feature฀changes฀when฀the฀camera฀moves.฀However฀given฀that฀the฀feature฀is฀continuously฀tracked฀this฀change฀in฀descriptor is฀typically฀small฀between฀consecutive฀frames฀and฀can฀be฀handled.฀Each฀association/point฀in฀the฀world,฀pi ,฀will฀have฀a฀set฀of฀descriptors฀dj฀ corresponding to฀different฀vantage฀points฀for฀each฀of฀the฀tracked฀features฀over฀the฀N฀frames. These฀lists฀can฀be฀analyzed฀to฀judge฀the฀quality฀of฀the฀corresponding฀landmark candidate฀as฀described฀in฀Section฀3.฀The฀output฀from฀the฀tracking฀module฀is a฀selection฀of฀the฀points฀in฀the฀oldest฀frame฀in฀the฀buffer.฀These฀points฀correspond฀to฀either฀already฀initialized฀landmarks฀or฀new฀landmarks฀of฀high฀quality that฀can฀be฀initialized.฀Along฀with฀new฀landmarks฀an฀estimate฀of฀the฀3D฀position฀is฀provided.฀This฀estimate฀is฀only฀used฀in฀the฀initialization฀step฀and฀thus only฀have฀to฀be฀performed฀for฀new฀landmarks.฀The฀number฀of฀points฀that฀are output฀is฀only฀a฀small฀fraction฀of฀all฀points฀detected฀in฀a฀frame. To฀make฀the฀matching฀procedure฀faster฀and฀more฀robust,฀we฀predict฀the approximate฀image฀location฀of฀the฀features฀from฀previous฀frames฀using฀odometry฀and฀optical฀flow฀estimates.฀The฀buffer฀allows฀us฀to฀predict฀the฀position of฀features฀detected฀not฀only฀in฀the฀previous฀frame฀but฀also฀in฀frames฀before that.฀We฀make฀one฀prediction฀for฀each฀potential฀landmark฀in฀the฀old฀frames, i.e.,฀we฀only฀use฀the฀last฀of฀features฀that฀have฀been฀matched฀between฀frames. One฀important฀observation฀in฀our฀investigation฀was฀that฀even฀with฀very฀small changes฀between฀frames฀the฀same฀features฀would฀typically฀not฀be฀detected฀in every฀frame฀which฀would฀mean฀loosing฀track฀of฀most฀features฀if฀only฀matching against฀the฀previous฀frame.฀This฀observation฀is฀true฀also฀when฀using฀the฀DoG detector฀for฀the฀features. Using฀the฀prediction฀allows฀us฀to฀reduce฀the฀search฀window฀when฀looking for฀matches฀between฀frames.฀Features฀that฀do฀not฀match฀the฀predicted฀positions฀of฀points฀currently฀in฀the฀frame฀memory฀are฀matched฀to฀landmarks฀in the฀database.฀This฀also฀allows฀for฀fast฀detection฀of฀potential฀loop฀closing฀situations.฀The฀first฀time฀the฀location฀of฀a฀landmark฀has฀been฀established฀through triangulation฀and฀it฀has฀met฀the฀other฀criteria฀listed฀in฀Section฀3,฀it฀is฀added

Exploiting฀Distinguishable฀Image฀Features

149

to฀the฀database฀as฀a฀new฀location฀feature.฀This฀is฀discussed฀further฀in฀the฀next section. DATABASE

... ...

... ... ...

... Output frame

Landmarks

FN

... ...

.......

.......

...

F2

...

F1

Recognition

... ...

N

.......

...

Input frame

Location

Associations/ Points

...

FRAMEMEMORY Frames

Fig. 3. A schematic view of the frame memory that stores the points and the associations between points in the N last frames and the database. The point in the frame memory that have been matched or added to the database have a reference to the corresponding landmark there. Each landmark in the database has a set of descriptors that corresponds to location features seen from different vantage points. To validate a match, each of these descriptors keeps a list of the other descriptors found in the same frame. We refer to these as recognition descriptors. These provide the ability to “recognize” a landmark again.

6 Database Management As the robot moves along in the environment, features will leave its field of view and thus eventually also the frame memory described in Section 5. In SLAM, it is important for the robot to detect if it is revisiting an area and find the correspondence between the new features in the image and existing landmarks. In fact, this is an issue not only when revisiting an area after closing a large loop but also when turning the camera back to an area that has not been looked at for a while. To handle this we use a database that stores information about the appearance of the landmarks from different view points. That is, we let the SLAM process take care of estimating the position of the landmarks and leave it to the data base to deal with the appearance and matching of them.

150

P.฀Jensfelt฀et฀al.

In฀our฀database,฀each฀landmark฀has฀a฀set฀of฀SIFT฀descriptors฀that฀correspond฀to฀different฀vantage฀points.฀These฀descriptors฀are฀provided฀by฀the฀frame memory฀that฀matches฀the฀image฀points฀between฀frames฀and฀stores฀this฀in฀the association/point฀lists.฀A฀new฀descriptor฀is฀added฀to฀a฀landmark฀when฀it฀differs฀more฀than฀a฀certain฀threshold฀from฀all฀other฀descriptors฀attached฀to฀that landmark.฀Fig.฀3฀shows฀the฀structure฀of฀the฀database฀where฀the฀landmarks are฀denoted฀with฀F1 ,฀F2 ,฀.฀.฀.฀,฀FN฀.฀The฀dashed฀box฀contains฀the฀descriptors for฀each฀of฀the฀landmarks.฀A฀KD-tree฀representation฀and฀a฀Best-Bin-First฀[1] search฀ allow฀ for฀ real-time฀ matching฀ between฀ new฀ image฀ feature฀ descriptors and฀those฀in฀the฀database. There฀may฀be฀potentially฀many฀similar฀descriptors฀corresponding฀to฀different฀features.฀Looking฀once฀again฀at฀the฀image฀in฀Fig.฀2฀the฀four฀corresponding corner฀points฀on฀the฀glass฀windows฀on฀the฀left฀hand฀side฀door฀section฀will฀all be฀very฀similar.฀Trying฀to฀find฀a฀correspondence฀between฀images฀features฀in a฀new฀frame฀and฀the฀map฀landmarks฀with฀a฀bit฀of฀uncertainty฀induced฀by฀a loop฀can฀be฀hard฀based฀only฀on฀a฀single฀SIFT฀descriptor.฀Therefore,฀we฀require multiple฀matches฀to฀increase฀the฀robustness฀[6].฀We฀refer฀to฀these฀multiple฀descriptors฀as฀recognition฀descriptors฀and฀the฀corresponding฀image฀features฀as recognition฀features.฀That฀is,฀when฀matching฀a฀feature฀to฀the฀database฀we฀first look฀for฀matches฀between฀its฀location฀descriptor1฀ and฀the฀descriptors฀in฀the database.฀Then,฀we฀verify฀the฀match฀using฀the฀corresponding฀two฀sets฀of฀recognition฀descriptors.฀As฀an฀additional฀test,฀it฀is฀required฀that฀the฀displacement in฀image฀coordinates฀for฀the฀descriptor฀is฀consistent฀with฀the฀transformation between฀the฀two฀frames฀estimated฀from฀the฀matched฀recognition฀descriptors. Currently,฀the฀calculation฀is฀simplified฀by฀checking฀the฀2D฀image฀point฀displacement.฀This฀final฀confirmation฀eliminates฀matches฀that฀are฀close฀in฀the environment฀and฀thus฀share฀recognition฀descriptors฀such฀as฀would฀be฀the฀case with฀the฀glass฀windows฀in฀Fig.฀2. To฀summarize,฀the฀matching฀of฀new฀features฀with฀the฀database฀proceeds as฀follows: 1.฀ Find฀matching฀candidates฀by฀matching฀with฀the฀set฀of฀location฀descriptors in฀ the฀ database฀ (dashed฀ box฀ in฀ the฀ database฀ in฀ Fig.฀ 3).฀ The฀ KD-tree representation฀allows฀for฀fast฀matching฀and฀an฀effective฀way฀to฀eliminate all฀but฀a฀few฀of฀the฀potential฀location฀feature฀matches฀in฀the฀database. 2.฀ Validate฀the฀candidates฀using฀all฀extracted฀descriptors฀from฀the฀current frame,฀i.e.฀the฀recognition฀feature฀and฀the฀recognition฀features฀associated with฀the฀matches฀from฀step฀1. 3.฀ Confirm฀candidate฀by฀checking฀that฀the฀motion฀given฀all฀the฀matches฀is consistent. 1

Each feature in a new image is a potential location feature and the rest of the features in that frame will be its recognition features. A feature can thus be both a location feature and act as recognition feature for one or more other location features in that frame.

Exploiting฀Distinguishable฀Image฀Features

151

7฀Using฀the฀Location฀Features฀for฀SLAM฀and Localization We฀have฀seen฀in฀the฀previous฀sections฀how฀features฀are฀tracked฀between฀frames using฀a฀buffer฀of฀N฀frames฀and฀how฀keeping฀this฀buffer฀allows฀for฀judging฀the quality฀ of฀ potential฀ landmarks฀ and฀ for฀ finding฀ an฀ estimate฀ of฀ the฀ 3D฀ position฀of฀the฀landmarks฀before฀they฀are฀initialized฀in฀SLAM.฀We฀use฀an฀EKF based฀implementation฀for฀SLAM฀but฀the฀output฀from฀the฀frame฀memory฀and database฀can฀be฀fed฀into฀any฀number฀of฀SLAM฀algorithms.฀The฀main฀difference฀between฀the฀EKF฀implementation฀used฀here฀for฀SLAM฀and฀the฀standard formulation฀is฀that฀we฀supplement฀the฀first฀bearing-only฀measurement฀of฀a฀new landmark฀with฀the฀additional฀information฀about฀the฀approximate฀distance฀as determined฀through฀the฀triangulation฀in฀the฀frame฀buffer.฀The฀distance฀information฀s฀not฀given฀in฀the฀form฀of฀an฀EKF฀measurement,฀these฀are฀always only฀bearings.฀It฀is฀used฀more฀like฀some฀oracle฀told฀the฀SLAM฀filter฀where฀in depth฀to฀initialize฀a฀new฀landmark.฀As฀a฀result฀an฀accurate฀linearization฀can be฀made฀in฀the฀EKF฀as฀of฀the฀first฀sighting฀without฀the฀need฀for฀any฀special arrangement฀to฀account฀for฀an฀unknown฀depth฀as฀in฀e.g.฀[3,฀10]. Once฀the฀map฀is฀built,฀it฀can฀be฀used฀for฀localization.฀Since฀the฀landmarks in฀the฀database฀are฀distinguishable,฀it฀allows฀the฀robot฀to฀recognize฀areas฀that are฀part฀of฀its฀map฀using฀a฀single฀landmark.฀Thus฀the฀robot฀can฀automatically initialize฀ itself฀ in฀ the฀ map฀ after฀ which฀ the฀ map฀ can฀ be฀ used฀ to฀ track฀ the robot฀pose.฀This฀is฀similar฀to฀the฀situation฀when฀closing฀a฀loop฀and฀having฀to make฀the฀association฀between฀new฀measurements฀and฀old฀landmarks฀in฀the map.฀In฀the฀latter฀case฀one฀has฀a฀rough฀idea฀about฀where฀to฀look฀for฀such฀a match.฀When฀performing฀global฀localization฀the฀uncertainty฀is฀the฀entire฀space and฀the฀search฀for฀matches฀can฀thus฀become฀quite฀expensive.฀The฀framework presented฀in฀this฀work฀allows฀for฀fast฀matching฀against฀the฀database฀which฀we will฀further฀investigate฀in฀the฀experimental฀section. One฀ of฀ the฀ benefits฀ of฀ our฀ representation฀ with฀ multiple฀ descriptors฀ for the฀landmarks,฀each฀of฀them฀encoding฀the฀appearance฀from฀a฀certain฀vantage point,฀is฀that฀this฀information฀can฀be฀used฀to฀deduce฀the฀approximate฀position of฀the฀robot฀from฀a฀single฀point฀observation.฀To฀allow฀for฀this,฀the฀pose฀from which฀each฀descriptor฀was฀first฀observed฀is฀stored฀along฀with฀the฀descriptor. The฀idea฀is฀not฀that฀to฀get฀an฀exact฀position฀out฀of฀an฀observation฀but฀to narrow฀down฀the฀area฀that฀the฀robot฀is฀likely฀to฀be฀in. To฀find฀the฀pose฀of฀the฀robot฀in฀the฀map฀we฀initialize฀a฀particle฀filter฀as soon฀as฀a฀match฀to฀the฀database฀is฀found.฀The฀particles฀are฀initially฀distributed around฀the฀pose฀indicated฀by฀the฀database฀match.฀The฀orientation฀for฀each particle฀in฀the฀initial฀sample฀set฀is฀given฀by฀the฀bearing฀angle฀of฀the฀observation.฀This฀cuts฀the฀unconstrained฀degrees฀of฀freedom฀down฀to฀two.฀Given that฀the฀3D฀position฀of฀the฀landmark฀is฀given,฀that฀the฀floor฀is฀flat฀and฀that a฀measurement฀of฀the฀bearing฀to฀the฀landmark฀is฀given฀not฀only฀in฀the฀xyplane฀but฀also฀in฀the฀vertical฀direction฀the฀distance฀to฀the฀landmark฀can฀also be฀estimated฀from฀a฀single฀measurement.฀In฀our฀current฀implementation฀have

152

P.฀Jensfelt฀et฀al.

not฀incorporated฀this฀last฀bit฀of฀information.฀After฀the฀first฀observation฀it฀will then฀require฀only฀a฀few฀other฀database฀matches฀to฀reduce฀the฀spread฀of฀particles฀enough฀to฀say฀that฀the฀position฀is฀known.฀In฀our฀current฀implementation we฀initialize฀a฀normal฀EKF฀localization฀algorithm฀at฀this฀point.

8 Experimental Evaluation The experimental evaluation has been carried out on a PowerBot platform from ActivMedia. It has a differential drive base with two rear caster wheels. The camera used in the experiments is a Canon VC-C4 CCD camera. This camera has built-in pan-tilt-zoom capabilities but in the experiments these were all fixed with a slight tilt upwards to reduce the amount of floor in the image. The non-holonomic motion constraints of the base makes it hard to generate large baselines for triangulation as the motion mostly is along the optical axis. This in combination with the relatively narrow field of view2 contributes to the difficulty of the problem.

Fig. 4. The experimental PowerBot platform with the Canon VC-C4 camera mounted on the laser scanner.

The experimental evaluation is carried out in two steps. In the first step we let the robot build up a database of visual landmarks while feeding them into SLAM to build a map with the 3D position of these landmarks. This test will show that i) our framework produces few but stable landmarks well suited for map building ii) we can match new observation to the database when closing the loop. 2

The field of view of the Canon VC-C4 is about 45◦ in the horizontal plane and 35◦ vertically when the camera is in “wide-angle mode” as in our experiments.

Exploiting฀Distinguishable฀Image฀Features

153

In฀the฀second฀step฀the฀robot฀is฀given฀the฀task฀to฀find฀its฀position฀in฀this map฀given฀the฀database฀and฀the฀map.฀This฀test฀will฀underscore฀the฀ability฀to match฀observations฀to฀the฀database฀without฀false฀matches฀and฀highlight฀the strength฀our฀representation฀provides฀to฀a฀localization฀system.฀The฀setting฀for the฀experiments฀is฀an฀area฀around฀an฀atrium฀that฀consists฀of฀loops฀of฀varying sizes. In฀the฀map฀building฀experiment฀we฀let฀the฀robot฀drive฀3฀laps฀around฀a฀part of฀the฀environment,฀each฀lap฀being฀approximately฀30m฀long.฀This฀trajectory฀is shown฀along฀with฀a฀map฀built฀using฀a฀laser฀scanner฀in฀Fig.฀5฀(dark฀“circular” path).฀The฀experiment฀took฀8฀minutes฀and฀40฀seconds฀and฀the฀time฀to฀process the฀data฀was฀7฀min฀7s฀on฀a฀1.8GHz฀laptop฀computer฀which฀shows฀that฀the system฀can฀run฀in฀real-time฀even฀if฀all฀unmatched฀features฀are฀matched฀to the฀database฀in฀every฀processed฀frame.฀After฀the฀first฀loop฀the฀database/map consisted฀of฀98฀landmarks.฀The฀landmarks฀are฀shown฀as฀red/dark฀squares฀in Fig.฀5.฀After฀the฀3฀loops฀were฀completed฀the฀map฀consisted฀of฀113฀landmarks. This฀shows฀that฀the฀robot฀was฀successfully฀matching฀most฀of฀the฀observations to฀the฀database฀during฀the฀last฀two฀laps.฀Otherwise฀one฀would฀expect฀the฀map to฀be฀roughly฀3฀times฀the฀size฀after฀the฀first฀loop. There฀are฀two฀important฀characteristics฀to฀note฀about฀the฀map฀with฀3D visual฀landmarks.฀Firstly,฀there฀are฀far฀fewer฀features฀then฀is฀typically฀seen in฀other฀works฀where฀SIFT฀like฀features฀are฀used฀in฀the฀map฀[17,฀16].฀This can฀ be฀ attributed฀ to฀ using฀ only฀ the฀ most฀ stable฀ points฀ features฀ as฀ SLAM landmarks฀and฀the฀rest฀for฀recognition/matching฀of฀those฀landmarks.฀Realtime฀performance฀was฀not฀demonstrated฀in฀[17,฀16]. Secondly,฀the฀landmarks฀are฀well฀localized฀which฀can฀be฀seen฀by฀comparing with฀the฀walls฀from฀the฀superimposed฀map฀built฀using฀the฀laser฀scanner3 .฀Some of฀the฀landmarks฀are฀lamps฀hanging฀from฀the฀ceiling฀such฀as฀the฀one฀at฀the upper฀right฀corner฀of฀the฀robot฀path฀in฀Fig.฀5.฀This฀lamp฀is฀also฀visible฀in฀the upper฀image฀on฀the฀right฀side฀of฀the฀same฀figure.฀The฀area฀to฀the฀right฀in฀the map฀with฀a฀large฀number฀of฀landmarks฀is฀shown฀in฀the฀lower฀right฀image฀in Fig.฀5.฀This฀area฀has฀many฀objects฀at฀differing฀depths.฀In฀the฀back฀against the฀walls฀there฀are฀five฀heavily฀textured฀paintings.฀The฀line฀in฀the฀laser฀based map฀comes฀from฀the฀benches฀that฀are฀in฀front฀of฀the฀wall฀which฀accounts฀for the฀ line฀ of฀ visual฀ landmarks฀ behind฀ the฀ line฀ made฀ using฀ the฀ laser.฀ Part฀ of the฀spread฀is฀also฀due฀to฀uncertainty฀in฀depths฀of฀the฀landmarks.฀The฀robot moved฀close฀to฀orthogonal฀to฀the฀wall฀creating฀very฀little฀baseline฀in฀the฀data fed฀into฀SLAM.฀It฀is฀worth฀while฀repeating฀that฀the฀depth฀estimate฀from฀the frame฀memory฀only฀serves฀to฀get฀the฀initialization฀of฀the฀depth฀roughly฀right to฀reduce฀the฀linearization฀errors฀but฀that฀the฀real฀estimate฀of฀the฀3D฀position is฀calculate฀through฀the฀SLAM฀process฀using฀the฀bearing-only฀measurements. Depending฀on฀the฀scene,฀a฀typical฀frame฀has฀between฀40-100฀point฀features. The฀time฀to฀perform฀the฀tracking฀over฀frames฀has฀constant฀complexity.฀Out฀of 3

Note that the laser based map is only shown for reference. Only vision was used in the experiments

154

P.฀Jensfelt฀et฀al.

Fig. 5. The SLAM map is shown as red squares at the locations of landmarks. To help visualize the environment, a separately made laser scanner map is superimposed on this map. Also shown are the trajectories from when the map was built (3 loops) and when the robot was trying to localize. The feature match between the evening database image image (top) and the daytime localization experiment (middle) is indicated. The bottom image corresponds to the area of the map where the density of landmarks is greatest.

the features in each new frame as many as half typically do not match any of the old features in the frame memory and are thus matched to the database. The matching to the database uses the KD-tree in the first step which makes this first step fast. This often results only in a few possible matching candidates. A typical landmark in the database has around 10 descriptors acquired from different viewing angles. As we noted while building this map, the landmarks inserted into the database during the first loop were then matched and updated on the second two loops with no difficulty. In order to demonstrate the usefulness of this result we then used the map and database in the second experiment to localize the robot at a different time of day, when there was sunlight streaming through the windows and students milling about. When performing localization we do not need to estimate the 3D location of the points and thus the output from the frame memory is shifted to be at the input side, i.e., without the delay. The

Exploiting฀Distinguishable฀Image฀Features

155

frame฀memory฀is฀still฀used฀to฀track฀the฀points฀over฀time฀but฀not฀to฀verify฀the quality฀which฀is฀most฀important฀when฀building฀the฀map.฀The฀robot฀was฀given no฀initial฀pose฀information฀relative฀to฀the฀database฀and฀map.฀In฀Fig.฀5฀the path฀of฀the฀robot฀in฀this฀second฀experiment฀is฀overlayed.฀The฀robot฀initially moves฀around฀in฀the฀lower฀left฀corner฀of฀the฀figure฀before฀it฀moves฀to฀the฀right for฀some฀time฀and฀then฀eventually฀moves฀up฀into฀regions฀that฀were฀mapped in฀ the฀ first฀ experiment.฀ The฀ robot฀ moves฀ in฀ unmapped฀ areas฀ for฀ almost฀ 6 minutes.฀ During฀ this฀ time฀ it฀ is฀ constantly฀ trying฀ to฀ find฀ matches฀ between unmatched฀features฀in฀the฀image฀to฀the฀database.฀More฀than฀1,000฀images are฀added฀to฀the฀frame฀memory฀and฀matched฀without฀any฀false฀matches฀to the฀database.฀When฀the฀robot฀enters฀an฀area฀were฀it฀can฀observe฀landmarks from฀ the฀ database/map฀ it฀ almost฀ immediately฀ recognized฀ a฀ landmark฀ and a฀ particle฀ filter฀ consistent฀ with฀ the฀ observation฀ was฀ initialized.฀ In฀ seconds, after฀repeated฀observations฀of฀several฀landmarks,฀the฀particles฀had฀converged enough฀to฀initialize฀the฀EKF฀localization.฀The฀evolution฀of฀the฀particle฀filter from฀initialization฀to฀convergence฀is฀shown฀in฀Fig.฀6฀in฀four฀frames฀from฀left to฀right.฀In฀the฀last฀frame฀in฀this฀figure,฀the฀precision฀of฀the฀localization฀is indicated฀by฀the฀alignment฀of฀the฀laser฀scan฀(not฀used฀for฀localization,฀only for฀comparison)฀and฀the฀walls฀in฀the฀map.

Fig. 6. Here we show the evolution of the localization particle filter. On the left we see the initial distribution after first match to the database. On the right is shown the robot after initialization in the map. The laser scan is also shown in the right image to help confirm that the localization is indeed correct.

The problems of a narrow field of view for an ordinary camera were overcome in these experiments partly by carefully driving the robot. Better results were observed when the robot was driven so that features remained in view for as long as possible, (notice the rather crooked paths in Fig. 5).

156

P.฀Jensfelt฀et฀al.

9฀Conclusion฀and฀Future฀Work The฀contributions฀of฀the฀framework฀presented฀here฀are฀the฀feature฀selection and฀matching฀that฀allows฀for฀real-time฀vision฀based฀bearing-only฀SLAM.฀We distinguish฀between฀location฀and฀recognition฀features.฀The฀location฀features correspond฀to฀points฀in฀3D฀for฀which฀the฀robot฀motion฀allows฀good฀triangulation฀and฀which฀are฀used฀as฀landmarks฀in฀the฀map.฀The฀matching฀is฀made robust฀by฀the฀inclusion฀of฀many฀recognition฀features฀from฀the฀image฀for฀each location฀feature. The฀use฀of฀Harris-Laplace฀corner฀detection฀combined฀with฀a฀scale฀space maximization฀gives฀rotationally฀variant฀features฀which฀are฀more฀appropriate for฀the฀camera฀motions฀generated฀by฀a฀camera฀mounted฀on฀a฀mobile฀robot. We฀use฀three฀criteria฀to฀select฀features฀for฀SLAM:฀persistence,฀range฀estimate฀quality฀and฀image฀stability.฀The฀persistence฀criteria฀eliminates฀spurious and฀dynamic฀features฀by฀requiring฀that฀the฀feature฀be฀observed฀many฀times. The฀quality฀of฀the฀range฀estimate฀depends฀on฀the฀motion฀of฀the฀robot฀relative to฀the฀3D฀point.฀If฀the฀motion฀does฀not฀produce฀a฀sufficient฀baseline฀there is฀no฀reason฀to฀use฀the฀associated฀vision฀feature.฀The฀stability฀in฀the฀image depends฀on฀how฀well฀the฀feature฀is฀localized฀in฀the฀image. The฀evaluation฀of฀our฀vision฀based฀landmarks฀was฀done฀on฀data฀collected from฀a฀robot฀moving฀through฀a฀indoor฀environments.฀We฀were฀able฀to฀report฀no false฀matches฀and฀the฀creation฀of฀accurate฀3D฀maps.฀We฀also฀showed฀that฀those maps฀could฀be฀used฀to฀localize฀the฀robot฀automatically฀in฀the฀environment. One฀topic฀for฀future฀research฀is฀to฀actively฀control฀the฀pan-tilt฀degrees฀of freedom฀of฀the฀camera฀on฀the฀robot.฀This฀would฀allow฀the฀robot฀to฀focus฀on a฀landmark฀for฀a฀longer฀time฀and฀create฀a฀better฀baseline฀and฀thus฀a฀more accurate฀map.฀Another฀way฀to฀address฀the฀problem฀with฀the฀limited฀field฀of view฀is฀to฀use฀an฀omnidirectional฀camera฀which฀is฀also฀part฀of฀the฀planned future฀work.

References 1. J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In IEEE CVPR, pages 1000–1006, 1997. 2. J. A. Castellanos and J. D. Tard´ os. Mobile Robot Localization and Map Building: A Multisensor Fusion Approach. Kluwer Academic Publishers, 1999. 3. A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In ICCV, 2003. 4. G. Dissanayake, P. Newman, S. Clark, H.F. Durrant-Whyte, and M. Corba. A solution to the slam building problem. IEEE TRA, 17(3):229–241, 2001. 5. J. Folkesson, Jensfelt P, and H. I. Christensen. Vision slam in the measurement subspace. In IEEE ICRA05, 2005. 6. L. Goncavles, E. di Bernardo, D. Benson, M. Svedman, J. Ostrovski, N. Karlsson, and P. Pirjanian. A visual fron-end for simultaneous localization and mapping. In IEEE ICRA, pages 44–49, 2005.

Exploiting฀Distinguishable฀Image฀Features

157

7.฀ J.฀ Gutmann฀ and฀ K.฀ Konolige.฀ Incremental฀ mapping฀ of฀ large฀ cyclic฀ environments.฀In IEEE฀Int.฀Symposium฀on฀Computational฀Intelligence฀in฀Robotics฀and Automation,฀volume฀1,฀pages฀318–325,฀1999. 8.฀ Patric฀Jensfelt,฀Danica฀Kragic,฀John฀Folkesson,฀and฀M˚ arten฀Bj¨ orkman.฀A฀framework฀for฀vision฀based฀bearing฀only฀3D฀SLAM.฀In Proc.฀of฀the฀IEEE฀International Conference฀on฀Robotics฀and฀Automation฀(ICRA’06),฀Orlando,฀FL,฀May฀2006. 9.฀ N.฀M.฀Kwok,฀G.฀Dissanayake,฀and฀Q.P.฀Ha.฀Bearing฀only฀SLAM฀using฀a฀SPRT based฀gaussian฀sum฀filter.฀ In฀IEEE฀ICRA05,฀2005. 10.฀ T.฀Lemaire,฀S.฀Lacroix,฀and฀J.฀Sol` a.฀ A฀practical฀3D฀bearing-only฀SLAM฀algorithm.฀ In฀IEEE/RSJ฀IROS,฀pages฀2757–2762,฀2005. 11.฀ David฀G.฀Lowe.฀Object฀recognition฀from฀local฀scale-invariant฀features.฀In฀ICCV, pages฀1150–57,฀1999. 12.฀ R.H.฀Luke,฀J.฀M.฀Keller,฀M.฀Skubic,฀and฀S.฀Senger.฀ Acquiring฀and฀maintaining abstract฀landmark฀chunks฀for฀cognitive฀robot฀navigation.฀ In฀IEEE/RSJ฀IROS, 2005. 13.฀ K.฀Mikolajczyk฀and฀C.฀Schmid.฀Indexing฀based฀on฀scale฀invariant฀interest฀points. In฀IEEE฀ICCV,฀pages฀525–531,฀2001. 14.฀ K.฀Mikolajczyk฀and฀C.฀Schmid.฀ A฀performance฀evaluation฀of฀local฀descriptors. In฀IEEE฀CVPR,฀pages฀257–263,฀2003. 15.฀ P.฀Newman฀and฀K.฀Ho.฀ SLAM-loop฀closing฀with฀visually฀salient฀features.฀ In IEEE฀ICRA,฀pages฀644–651,฀2005. 16.฀ S.฀Se,฀D.฀G.฀Lowe,฀and฀J.฀Little.฀ Mobile฀robot฀localization฀and฀mapping฀with uncertainty฀using฀scale-invariant฀visual฀landmarks.฀ IJRR,฀21(8):735–58,฀2002. 17.฀ R.฀Sim,฀P.฀Elinas,฀M.฀Griffin,฀and฀J.฀J.฀Little.฀Vision-based฀slam฀using฀the฀raoblackwellised฀particle฀filter.฀In฀IJCAI฀Workshop฀on฀Reasoning฀with฀Uncertainty in฀Robotics,฀July฀2005. 18.฀ J.D.฀Tard´ os,฀J.฀Neira,฀P.M.฀Newman,฀and฀J.J.฀Leonard.฀ Robust฀mapping฀and localization฀in฀indoor฀environments฀using฀sonar฀data.฀ IJRR,฀4,฀2002. 19.฀ S.฀ Thrun,฀ D.฀ Fox.,฀ and฀ W.฀ Burgard.฀ A฀ probalistic฀ approach฀ to฀ concurrent mapping฀ and฀ localization฀ for฀ mobile฀ robots.฀ Autonomous฀ Robots,฀ 5:253–271, 1998. 20.฀ S.฀ Thrun,฀ Y.฀ Liu,฀ D.Koller,฀ A.฀ Ng,฀ Z.฀ Ghahramani,฀ and฀ H.฀ Durrant-White. SLAM฀with฀sparse฀extended฀information฀filters.฀ IJRR,฀23(8):690–717,฀2004.

Image-Based Visual Servoing for Mobile Robots with Catadioptric Camera Gian Luca Mariottini, Domenico Prattichizzo, and Andrea Cerbella Dipartimento di Ingegneria dell’Informazione, Universit` a di Siena, Via Roma 56, 53100 Siena, Italy. {gmariottini,prattichizzo}@ing.unisi.it Summary. This paper presents an image-based visual servoing strategy for holonomic mobile robots equipped with a catadioptric camera. This kind of vision sensor combines lens and mirrors to enlarge the field of view. The proposed visual servoing is based on the auto-epipolar property, a special configuration which occurs when the desired and the current views undergo a pure translation. This occurrence is detectable from a set of image points, observing when the so-called bi-conics have a common intersection. The proposed visual servoing is divided in two sequential steps. Lyapunov-based stability demonstrates the parametric robustness of the proposed method. Experimental results are presented to show the applicability of our strategy in a real context.

1 Introduction In this paper we propose an image-based visual servoing (IBVS) strategy for the autonomous navigation of a holonomic mobile robot. Both the current and the desired robot configurations are only specified through a set of features points acquired by the on-board omnidirectional camera. Differently to position-based visual servoing (PBVS), in IBVS both the control objective and the control laws are directly expressed in the image domain. As a consequence, such schemes do not need any a priori knowledge of the 3D structure of the observed scene points. Moreover designing the feedback at the sensor level improves the system performances especially when uncertainties and disturbances affect the model of the robot and the camera calibration [8]. However, system convergence can typically be guaranteed only in a neighborhood of the desired configuration. In [10] a hybrid (2-1/2 D) visual servoing approach for articulated manipulators has been presented. It combines IBVS and PBVS features and guarantees convergence in the whole task space. In particular camera rotation and translation between the current and the desired views are directly estimated from the observation of planar objects and then used in the control law.

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 159–170, 2006. © Springer-Verlag Berlin Heidelberg 2006

160

G.L.฀Mariottini,฀D.฀Prattichizzo,฀and฀A.฀Carbella

While฀ most฀ of฀ these฀ works฀ deal฀ with฀ articulated฀ manipulators,฀ there has฀ been฀ an฀ increasing฀ interest฀ in฀ the฀ PBVS฀ control฀ of฀ mobile฀ robots,฀ as in฀[16,฀7,฀2].฀One฀of฀the฀main฀problems฀in฀visual฀servoing฀is฀that฀of฀keeping the฀ features฀ in฀ the฀ camera฀ field฀ of฀ view฀ (FOV).฀ This฀ problem฀ is฀ naturally solved฀ if฀ a฀ catadioptric฀ camera฀ is฀ used฀ in฀ the฀ servo฀ loop฀ since฀ it฀ provides panoramic฀FOV.฀Catadioptric฀cameras฀consist฀of฀a฀coupling฀between฀mirrors (catoptric฀elements)฀and฀conventional฀cameras฀(dioptric฀elements)฀[13].฀Some PBVS฀strategies฀using฀catadioptric฀cameras฀have฀been฀presented฀[4,฀3,฀17]. In฀ recent฀ years฀ there฀ has฀ been฀ a฀ great฀ interest฀ in฀ the฀ vision-based฀ control฀with฀central฀catadioptric฀cameras.฀However,฀some฀of฀these฀works฀assume known฀the฀camera-feature฀distance฀(in฀meters)[12,฀1].฀The฀original฀contribution฀of฀this฀work฀is฀the฀design฀of฀a฀IBVS฀algorithm฀for฀holonomic฀mobile฀robots with฀a฀catadioptric฀camera฀on฀it,฀that฀does฀not฀need฀for฀any฀3-D฀scene฀metrical information฀and฀that฀does฀need฀of฀any฀multiple-view฀estimation฀process. Our฀control฀algorithm฀is฀based฀on฀the฀execution฀of฀two฀sequential฀steps. The฀first฀one฀compensates฀the฀orientation฀error,฀i.e.,฀aligns฀the฀robot฀to฀the desired฀configuration฀by฀exploiting฀the฀auto-epipolar฀property฀ for฀panoramic cameras฀(see฀Sect.฀2).฀The฀second฀step฀zeroes฀the฀translational฀displacement using฀feature฀points฀to฀drive฀the฀robot฀to฀the฀target. The฀paper฀is฀organized฀as฀follows.฀In฀Section฀2,฀the฀auto-epipolar฀property฀is฀presented.฀Section฀3฀introduces฀the฀image-based฀control฀law.฀Section฀4 describes฀some฀experimental฀results.฀Conclusions฀are฀presented฀in฀Section฀5.

2฀Auto-epipolar฀Configurations This฀section฀presents฀the฀auto-epipolar฀property฀for฀panoramic฀cameras.฀The auto-epipolar฀property฀[6]฀has฀been฀first฀applied฀by฀[15]฀for฀vision-based฀control with฀pinhole฀cameras. Consider฀ the฀ case฀ in฀ Fig.฀ 1(a)฀ with฀ a฀ catadioptric฀ camera฀ made฀ of฀ a parabolic฀ mirror฀ centered฀ at฀ f฀ and฀ an฀ orthographic฀ camera.฀ By฀ construction,฀every฀scene฀point฀p฀∈ IR3฀ is฀projected฀onto฀the฀mirror฀surface฀at฀x฀∈ IR3 through฀ the฀ focus f .฀ The฀ image฀ point฀ m฀ (pixels)฀ is฀ then฀ obtained฀ via฀ orthographic฀ projection฀ of฀ x.฀ For฀ a฀ more฀ detailed฀ description฀ of฀ the฀ imaging model฀for฀catadioptric฀cameras฀the฀reader฀can฀refer฀to฀[5].฀Consider฀now฀two panoramic฀views฀as฀in฀Fig.฀1(b),฀acquired฀in฀fc฀ and฀fd฀ by฀the฀same฀camera and฀referred฀to฀as฀{c} current฀ and฀{d} desired,฀respectively.฀Without฀loss฀of generality,฀we฀choose฀the฀world฀reference฀frame฀centered฀at฀the฀desired฀camera frame฀in฀fd .฀This฀allows฀us฀to฀express฀all฀image฀entities฀directly฀in฀the฀mirror฀reference฀frame฀onto฀the฀mirror฀surface,฀if฀not฀otherwise฀specified.฀Given a฀pair฀of฀views฀of฀a฀scene฀point฀pi ,฀there฀exists฀a฀matrix฀E฀ ∈ IR3×3 ,฀called essential฀matrix฀[6],฀such฀that xdi T฀Exci฀ =฀0฀ where฀ E฀=฀tR฀ i฀=฀1, ..., N

(1)

IBVS for Mobile Robots with Catadioptric Camera p

image plane

p1

p2 π2

P cc2

X O ym

xc2

xc1

π1 xd2 xd1 cd1

cc1

fd

fc

xm zm

parabolic mirror

161

Cc R,t

cd2

Cd

(b)

(a)

Fig. 1. (a) Imaging model of central catadioptric camera. (b) Basic epipolar geometry setup for central catadioptric cameras.

for all N corresponding mirror points xci and xdi obtained via back-projection of the corresponding image points mdi and mci . t is the skew-symmetric matrix obtained from t. It is an easy matter to show that when R = I, i.e., pure translations, then the essential matrix is skew symmetric, i.e., E = t. Other configurations exist (such as rotations of π around t) providing a skew symmetric essential matrix, as pointed out in [9], but they will not occur in our planar motion case. Definition 1 (Auto-epipolar configurations). The relative configurations between two cameras yielding to a skew-symmetric essential matrix are referred to as auto-epipolar configurations. Remark 1. We now define the new concept of bi-conic, that will allow us to detect the auto-epipolar configurations for catadioptric cameras directly in the image plane and without any estimation. Definition 2 (Bi-conics). Consider two projections of the same scene point p onto two corresponding image points md and mc . Overlap the two images. Let xd and xc be their back-projections to their respective mirror surfaces. The bi-conic is the intersection between the plane ψ (with normal vector n = xd × xc ) and the desired camera mirror1 (see Fig. 2). Proposition 1 (Common intersection line). Consider a set of N corresponding points mdi and mci , i = 1, ..., N and let xdi and xci be their backprojections onto the mirror. Consider the matrix M ∈ IRN ×3 containing all the normal vectors ni = xdi × xci of the planes ψi , namely M = [n1 , ..., nN ]T . If the matrix M is rank 2, then all planes ψi intersect at a common line ∈ IR3 or, equivalently, xdi T ˆxci = 0, ∀ i.

In other terms, 1

is perpendicular to ni , ∀i.

Note that xc and xd belong to the bi-conic.

162

G.L.฀Mariottini,฀D.฀Prattichizzo,฀and฀A.฀Carbella p image plane

md mc

xd catadioptric mirror

fd xc fc

xc

xd n

ψ bi-conic

Fig. 2. Construction of a bi-conic through the superimposition of xc and xd . xc and xd define a plane with normal n that intersects the mirror at the bi-conic.

Proof. If M is rank 2, then there exists a right null vector such that M = 0, namely (xdi ×xci )T = 0 ∀i. This means that each plane ψi contains the vector and since all planes pass through the origin of the desired mirror frame, then there exists a unique common intersection line between all planes. The last equivalence comes directly from the triple scalar product property: the expression (xdi × xci )T = 0 is equivalent to xdi T ˆxci = 0 ∀i. In conclusion, if N different planes ψi all share a common intersection line, then the bi-conics intersect at two common points. The relationship between bi-conics and auto-epipolar configurations is stated in the following theorem. Theorem 1. Consider N corresponding point pairs (mdi , mci ) in two views and let xdi and xci be their back-projections onto the mirror surfaces. Assume T that the matrix A = [a1 , a2 , . . . , aN ] is of rank 8, where ai = (xci ⊗ xdi )T , being the symbol ⊗ the Kronecker product and the resulting vector ai ∈ IR9×1 . The N planes ψi with normal ni = xdi × xci all intersect at only one line if and only if the essential matrix E is skew symmetric. Proof. (⇒) If all planes ψi with normal vector ni = xdi × xci intersect at a T single line then the matrix M = [n1 , ..., nN ] is rank 2. This is equivalent to say that there exist a subspace S of dimension equal to one (i.e. a line) that belongs to all planes. In other terms ∃ ∈ S : (ni )

T

= 0 ∀ i = 1, 2, ..., N.

IBVS for Mobile Robots with Catadioptric Camera

163

w

cc2

y(0) fc

cc1

cc2

y(0)

fc

θ

cc1 cc2

y(t)

t

fc

cc1 v=[vx,vy] cd1 fd

cd1

cd1 fd

x(0)

cd2

fd

x(0)

Starting configuration

x(t)

cd2

cd2

Second Step Translational motion

First Step Auto-epipolar configuration Desired features

No intersection

s(t)

Current features

Fig. 3. The visual servoing algorithm steers the holonomic camera-robot from the starting configuration centered at fc (no common intersection between bi-conics exists), through an intermediate configuration when both views have the same orientation (common intersection exists), towards the position fd (s(t) = 0).

From the triple scalar product property, this is equivalent to xdi T ˆxci = 0 ∀i. Being E unique for the full rank hypothesis on A [9], then E = ˆ, i.e., E is skew symmetric. (⇐) Being rank(E) = 2, then ∃ w = 0 s.t. Ew = 0. If E is skew symmetric, then we distinguish two cases [9]: E1 = t and E2 = teuˆπ (where u = t/||t||). If we consider E1 then the epipolar constraint reads as: xdi T E1 xci = (xdi × xci )T t = 0

∀ i = 1, ..., N.

(2)

It is an easy matter to show that E2 = −t and then the same epipolar constraint of (2) is verified also for E2 . Equation (2) is equivalent to impose that all the planes ψi with normal vector ni intersect at a common line t, i.e., the baseline. Fig. 3 (bottom-left) shows the image plane bi-conics when the planes ψi do not share a common intersection line, i.e., the cameras are not in autoepipolar configuration (Fig. 3 (upper-left)). On the other hand Fig. 3 (bottomcenter ) shows the image plane bi-conics when all planes ψi have a common intersection line and the cameras are in auto-epipolar configuration (Fig. 3 (up-center )). The previous theorem and proposition provide an important tool for recognize when the cameras are in auto-epipolar configuration. It is worthwhile to mention that to build matrix M we only need to know backprojected mirror points xci and xdi that can be retrieved from the image points

164

G.L.฀Mariottini,฀D.฀Prattichizzo,฀and฀A.฀Carbella 0.04 0.035 0.03

Value ξ

0.025 0.02 0.015 0.01 0.005

-150

-100

-50

0 50 Angle θ [deg.]

100

150

2

Fig. 4. Singular value ratio ξ = σσ32 with respect to the current camera orientation variation. ξ resulted to be convex in many simulation results.

mai and mdi without any time consuming estimation process. It is paramount to note that in the case of camera planar motion, the auto-epipolar condition can be checked when at least N = 3 corresponding points are observed [9], also in the case of coplanar 3D scene points. To test if M is rank deficient we exploit the smallest singular value σ3 obtained by means of SVD of M . As the displacement approaches to a pure translation, then the smallest singular value approaches to zero. In practical cases, however, we experimentally verified that better results are obtained if the smallest singular value is squared and normalized with respect to the second singular value σ3 2 . (3) ξ= σ2฀ If฀the฀two฀cameras฀are฀purely฀translated฀then฀ξ฀is฀exactly฀zero,฀as฀shown฀in Fig.฀4. Remark฀2.฀Function฀ ξ฀ will฀ be฀ used฀ as฀ a฀ control฀ variable฀ for฀ the฀ rotational part฀of฀the฀proposed฀visual฀servoing.฀Note฀that฀the฀derivative฀of฀ξ฀is฀an฀odd function฀in฀a฀very฀large฀neighborhood฀of฀θ฀=฀0◦ (Fig.฀4).

3฀The฀Image-Based฀Control฀Law In฀this฀section฀we฀present฀an฀image-based฀visual฀servoing฀control฀law฀for฀holonomic฀mobile฀robots฀equipped฀with฀a฀catadioptric฀camera.฀This฀control฀law (1st฀step)฀is฀based฀on฀the฀auto-epipolar฀property฀for฀panoramic฀cameras฀presented฀in฀the฀previous฀section. Let฀q(t)฀=฀[x(t)฀y(t)฀θ(t)]T฀ be฀the฀robot฀configuration฀vector฀with฀respect to฀the฀world฀frame฀ Oxyz w .฀The฀holonomic฀robot฀kinematic฀model฀can฀be written฀as

IBVS for Mobile Robots with Catadioptric Camera

x˙ = vx y˙ = vy θ˙ = ω

165

(4)

where฀v฀=฀[vx฀ vy ]T฀ is฀the฀translational฀velocity฀and ω฀is฀the฀rotational฀velocity. Suppose฀that฀a฀catadioptric฀camera฀is฀fixed฀on฀the฀mobile฀robot฀such฀that฀its optical฀axis฀is฀perpendicular฀to฀the฀xy-plane. We฀suppose฀that฀a฀target฀image,฀referred฀to฀as฀desired,฀has฀been฀previously฀acquired฀in฀the฀desired฀configuration฀qd฀ =฀[0฀0฀0]T฀.฀The฀current image is฀available฀at฀each฀time฀instant฀from฀the฀camera-robot฀in฀the฀current฀position.฀As฀shown฀in฀the฀initial฀setup฀of฀Fig.฀3฀(upper-left),฀the฀robot฀disparity between฀the฀current฀and฀the฀desired฀poses฀is฀characterized฀by฀a฀rotation฀of an฀angle฀θ฀and฀a฀translation฀t฀∈ IR2 .฀The฀control฀law฀will฀be฀able฀to฀drive฀to zero฀this฀disparity฀only฀using฀visual฀information฀(i.e., θ฀→ 0฀and ||t|| → 0).฀In particular,฀we฀will฀regulate฀separately฀the฀rotational฀disparity฀(first฀step)฀and the฀translational฀displacement฀(second฀step). 3.1฀ First฀Step:฀Rotation฀Compensation The฀goal฀of฀the฀first฀step฀is฀to฀drive฀the฀current฀camera-robot฀to฀have฀the฀same orientation฀of฀the฀desired฀one.฀This฀condition฀is฀simply฀detectable฀when฀ξ฀=฀0 in฀(3).฀During฀this฀step,฀the฀translational฀velocities฀are฀set฀to฀zero฀vx฀ =฀vy฀ =฀0, while฀the฀angular฀velocity฀is฀set฀to ω฀=฀−αω σs (t)ξ

(5)

˙ and ξ is the control variable where฀ αω฀ is฀ a฀ positive฀ scalar,฀ σs (t)฀=฀sign(฀ξ) introduced in (3). The local asymptotic convergence of this control law has been proven in [11] by means of the Lyapunov function V = 21 θ2 .฀Although฀the฀convergence฀is฀local,฀the฀domain฀of฀attraction฀for฀θ฀ is฀wide.฀Many฀simulative฀and experimental฀results฀shown฀that฀convergence฀is฀guaranteed฀with฀respect฀to฀θ in฀a฀very฀large฀set฀of฀angles฀([−150◦ ,฀150◦ ])฀around฀the฀equilibrium฀θ฀=฀0◦ . 3.2฀ Second฀Step:฀Translation฀Compensation Let฀ us฀ now฀ address฀ the฀ second฀ step฀ that฀ will฀ start฀ when฀ ||ξ(t)|| =฀ 0,฀ i.e., when฀ only฀ a฀ pure฀ translation฀ is฀ needed฀ to฀ reach฀ the฀ target฀ configuration. Actually,฀in฀the฀experiments,฀a฀user-defined฀threshold฀ε฀is฀considered.฀During this฀translational฀motion฀to฀the฀desired฀position,฀all฀bi-conics฀Cdi฀ intersect฀and the฀corresponding฀feature฀points฀mci฀ will฀slide฀onto฀the฀conic฀Cdi฀ toward฀mdi . The฀translation฀will฀stop฀when฀all฀features฀match,฀i.e.,฀when฀s(t)฀=฀ si (t) is฀ equal฀ to฀ zero,฀ being฀ si (t)฀ the฀ arc-length฀ of฀ the฀ conic฀ Cdi฀ from฀ mci (t)฀ to mdi (t)฀(see฀Fig.฀3฀bottom-right).฀The฀direction฀of฀translation฀to฀the฀desired configuration฀fd฀ can฀be฀retrieved฀from฀the฀knowledge฀of฀the฀epipole฀cd฀ pointing

166

G.L.฀Mariottini,฀D.฀Prattichizzo,฀and฀A.฀Carbella

toward฀fc ,฀chosen฀as฀the฀intersection฀between฀bi-conics.฀In฀[11]฀it฀has฀been shown฀that v฀=฀−s(t)[ cdx฀ cdy฀]T฀. (6) guarantees฀the฀robot฀converges฀to฀the฀target฀configuration.

4 Experiments In order to validate the proposed image-based visual servoing, a series of experiments have been carried out at the SIena Robotics and Systems lab (SIRS lab). The experimental setup consists of a NOMAD XR4000 holonomic robot with a catadioptric camera mounted on. The catadioptric camera is made of a CCD color camera LU175C by Lumenera screwed on a folded catadioptric mirror NetVision360 by Remote Reality. The whole catadioptric camera can be modeled by the equation x2 +y 2 zm + a2 = m2a m (Fig. 1(a)) where a = 33.4 mm. The orthographic camera parameters are fx = 13 mm, fy = 14 mm, u0 = 616.3 and v0 = 628.2 pixels. All image acquisition and processing algorithms are realized using Intel’s OpenCV libraries on a notebook endowed with a 2 GHz Pentium 4 processor. A set of n = 3 feature points have been selected in the image in order to show the applicability of our technique with the theoretically minimum number of points. This implies that we have three bi-conics on the image plane as shown in Fig. 5. The initial configuration of the camera-robot is set to qini = [1 m 1 m π/4 rad] while the desired one is set to qdes = [0 0 0] as shown in Fig. 5(a) where the desired and the initial robot configurations are superimposed. In the initial configuration being the orientation not equal to zero bi-conics do not intersect at the same two points (Fig. 5(b)). First, the rotational controller in Sect. 3.1, based on the auto-epipolar property, is used to compensate the rotation (see Fig. 5(c)) with respect to the desired configuration, until all bi-conics intersect, as in Fig. 5(d). Note that in real experiment, notwithstanding image noise and unmodeled dynamics, variable ξ is very close zero at the end of the first step as shown in Fig. 6(a). After the first step, the translation discussed in Sect. 3.2 is performed. The robot moves along fc fd and its velocity decreases as the distance si (t) get close to zero (see Fig. 6(b)). The overall performance of the visual servoing strategy was interesting. In fact, the final configuration differs from the desired one of only 1.96cm corresponding to an average distance of about 5 pixels between final and target feature points. The complete motion of the feature points is shown in Fig. 7. Note that the trajectory discontinuity of the features is due to the change of the control law between the first and the second step. Note also that the perfect bi-conic intersection (i.e., equal orientation to the desired position) is lost during the second step (compare Fig. 5(f) with

IBVS for Mobile Robots with Catadioptric Camera

167

desired position

initial position

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 5. Image-based visual servoing: 2-steps algorithm. (a) Initial and desired robot positions; (b) When a rotational disparity occurs then all bi-conic do not intersect; (c) Rotation compensation after the 1st step; (d) After the 1st step all bi-conics intersect at the same two points; (e) Translational motion to the target position; (f) Current feature points (dot) reach the desired ones (cross).

Fig. 5(d)). The main reason of this error is the slippage of the wheels during the pure translation. In order to improve performances and to increase robustness with respect to wheels slippage and other unmodeled dynamics and errors, we implemented an iterative procedure as in [14] to robustly perform the proposed visual servoing procedure. Experiments have been executed also with this iterative implementation and more details are in [11]. Globally, the iterated-2steps algorithm exhibits good performances for the robot motion, and the positioning error between the final and the desired camera-robot configurations decreases from 2cm to 0.5cm (iterative).

168

G.L.฀Mariottini,฀D.฀Prattichizzo,฀and฀A.฀Carbella -6

x 10 4

250

3.5 200

3 s [pixels]

Csi

2.5 2

150

100

1.5 1

50

0.5 0

5

10

15

20

0

Steps

(a)

50

100

150

200 Steps

250

300

350

(b)

Fig. 6. Visual servoing (2-steps). (a) First step: Parameter ξ(t) decreases (rotation compensation); (b) Second step: Distance s(t) between corresponding feature points measured along bi-conics decreases.

Fig.฀7.฀Visual฀servoing.฀Motion฀of฀the฀features฀points฀in฀the฀image฀plane฀from฀the initial฀(circle)฀to฀the฀desired฀image฀(cross).

5฀Conclusions฀and฀Future฀Work In฀this฀paper฀we฀presented฀an฀image-based฀visual฀servoing฀for฀holonomic฀mobile฀robot฀equipped฀with฀a฀catadioptric฀camera.฀This฀vision฀sensor฀combines

IBVS for Mobile Robots with Catadioptric Camera

169

lens and mirrors to enlarge the field of view. The main feature of our technique consists in exploiting the auto-epipolar property, a special configuration which occurs when the desired and the current views undergo a pure translation. This condition can be detected directly in the image plane, without requiring any estimation and it has been used to compensate the rotation disparity between initial and desired image (1st step). Then a feature-based translational step (2nd step) is needed to drive the robot to the target position. Experimental results on a real robot validate the applicability of our image-based strategy. A current hypothesis of this work is that camera calibration parameters are know. Future work will deal with the extension to uncalibrated catadioptric cameras.

References 1. J.P. Barreto, F. Martin, and R. Horaud. Visual servoing/tracking using central catadioptric images. In 8th International Symposium on Eperimental Robotics, pages 863–869, 2002. 2. F. Conticelli, B. Allotta, and P. K. Khosla. Image-based visual servoing of nonholonomic mobile robots. In 38th IEEE Conference on Decision and Control, 1999. 3. N. Cowan, O. Shakernia, R. Vidal, and S. Sastry. Vision-based follow the leader. In IEEE Intelligient Robots and Systems, pages 1797–1801, 2003. 4. J. Gaspar, N. Winters, and J. Santos-Victor. Vision-based navigation and environmental representations with an omnidirectional camera. In IEEE Transactions on Robotics and Automation, volume 16, pages 890–898, 2000. 5. C. Geyer and K. Daniilidis. A unifying theory for central panoramic systems. In 6th European Conference on Computer Vision, pages 445–462, 2000. 6. R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000. 7. K. Hashimoto and T. Noritsugu. Visual servoing of nonholonomic cart. In IEEE International Conference on Robotics and Automation, pages 1719–1724, 1997. 8. S. A. Hutchinson, G. D. Hager, and P. I. Corke. A tutorial on visual servo control. IEEE Transaction on Robotics and Automation, 12(5):651–670, 1996. 9. Y. Ma, S. Soatto, J. Koˇseck´ a, and S. Shankar Sastry. An invitation to 3-D Vision, From Images to Geometric Models. Springer Verlag, New York, 2003. 10. E. Malis, F. Chaumette, and S. Boudet. 2-1/2-D visual servoing. IEEE Transactions on Robotics and Automation, 15(2):238–250, 1999. 11. G.L. Mariottini and D. Prattichizzo. Image-based visual servoing for holonomic robots with central catadioptric camera. In Internal Report, http://www.dii.unisi.it/~sirslab/MaPr_IR05.pdf, September 2005. University of Siena. 12. Y. Mezouar, H. Haj Abdelkader, P. Martinet, and F. Chaumette. Central catadioptric visual servoing from 3d straight lines. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 1, pages 343–349, 2004. 13. S.K. Nayar. Catadioptric omnidirectional camera. In International Conference on Computer Vision and Pattern Recognition, pages 482–488, 1997.

170

G.L.฀Mariottini,฀D.฀Prattichizzo,฀and฀A.฀Carbella

14.฀ G.฀ Oriolo฀ and฀ M.฀ Vendittelli.฀ A฀ framework฀ for฀ the฀ stabilization฀ of฀ general nonholonomic฀systems฀with฀an฀application฀to฀the฀plate-ball฀mechanism.฀ IEEE Transactions฀on฀Robotics,฀21(2):162–175,฀2005. 15.฀ J.฀ Piazzi,฀ N.฀ Cowan,฀ and฀ D.฀ Prattichizzo.฀ Auto-epipolar฀ visual฀ servoing.฀ In Proc.฀IEEE฀Conference฀on฀Intelligent฀Robots฀and฀Systems,฀2004. 16.฀ R.฀Pissard-Gibollet฀and฀P.฀Rives.฀Applying฀visual฀servoing฀techniques฀to฀control a฀mobile฀hand-eye฀system.฀In฀IEEE฀Transactions฀on฀Robotics฀and฀Automation, pages฀166–171,฀1995. 17.฀ K.฀Usher,฀P.฀Ridley,฀and฀P.฀Corke.฀ Visual฀servoing฀for฀a฀car-like฀vehicle฀-฀An application฀of฀omnidirectional฀vision.฀ In฀2003฀IEEE฀International฀Conference on฀Robotics฀and฀Automation,฀pages฀4288–4293,฀2003.

Multi-knowledge฀Approach฀for฀Mobile฀Robot Identification฀of฀a฀Changing฀Environment QingXiang฀Wu12 ,฀T.M.฀McGinnity1 ,฀Girijesh฀Prasad1 ,฀David฀Bell2฀ and Brendan฀Glackin1 1฀

School฀of฀Computing฀and฀Intelligent฀Systems,฀University฀of฀Ulster฀at฀Magee, Londonderry,฀BT48฀7JL,฀N.Ireland,฀UK฀[email protected] 2฀ Department฀of฀Computer฀Science,฀Queens฀University,฀Belfast,฀UK [email protected]

Summary.฀ In฀this฀paper,฀a฀large฀environment฀is฀divided฀into฀sub-areas฀to฀enable a฀ robot฀ to฀ apply฀ precise฀ localization฀ technology฀ efficiently฀ in฀ real฀ time.฀ Sub-area features฀are฀represented฀in฀a฀feature฀information฀system฀so฀that฀conventional฀machine฀learning฀or฀data฀mining฀approaches฀can฀be฀applied฀to฀identify฀the฀sub-areas. However,฀conventional฀representations฀with฀a฀single฀body฀of฀knowledge฀encounter many฀problems฀when฀the฀sub-area฀features฀are฀changed.฀In฀order฀to฀deal฀with฀changing฀environments,฀the฀multi-knowledge฀approach฀is฀applied฀to฀the฀identification฀of environments.฀Multi-knowledge฀is฀extracted฀from฀a฀feature฀information฀system฀by means฀ of฀ multiple฀ reducts฀ (feature฀ sets)฀ so฀ that฀ a฀ robot฀ with฀ multi-knowledge฀ is capable฀of฀identifying฀an฀environment฀with฀some฀changing฀features.฀A฀case-study demonstrates฀that฀a฀robot฀with฀multi-knowledge฀can฀cope฀better฀with฀the฀identification฀of฀an฀environment฀with฀changing฀features฀than฀conventional฀single฀body฀of knowledge.

1 Introduction Current robot localization techniques include position tracking techniques [1, 2, 3], global localization techniques [4], Markov localization [5, 6, 7], landmark localization [8, 9, 10], and visual-based localization [11, 12, 13, 14]. These techniques aim to estimate the position of a robot within a given environment by means of sensor data and a map of environment. In order to estimate a robot’s position with a high precision, a fine-grained discretization in the state space is usually applied. The spatial resolution is usually between 5 cm and 50 cm and the angular resolution is usually 2 to 5 degrees. For a mediumsized environment of 40 x 40 m2 , an angular grid resolution of 2o , and a cell size of 10 x 10cm2 the state space consists of 28,800,000 states. If a map of an environment is very large, localization techniques encounter a high cost in computation, and are very difficult to use in real time. In this paper, a strategy is proposed to address this problem. A large map may be divided into area

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 171–180, 2006. © Springer-Verlag Berlin Heidelberg 2006

172

Q.฀Wu฀et฀al.

maps,฀ and฀ a฀ robot฀ can฀ apply฀ conventional฀ machine฀ learning฀ approaches฀ to recognize฀area฀environment฀features,฀for฀example,฀in฀a฀building,฀in฀a฀room, or฀in฀a฀car฀park.฀Therefore,฀the฀robot฀carries฀out฀a฀precise฀localization฀in฀a small฀area฀using฀the฀specific฀information฀on฀the฀area฀environment฀such฀as฀an electronic฀map.฀However,฀conventional฀single฀body฀of฀knowledge฀encounters problems฀ in฀ a฀ changing฀ environment.฀ A฀ novel฀ multi-knowledge฀ approach฀ is proposed฀to฀solve฀this฀problem.฀In฀Section฀ 2,฀an฀environment฀feature฀decision system฀ is฀ used฀ to฀ divide฀ a฀ large฀ map฀ into฀ sub-maps฀ so฀ that฀ conventional machine฀ learning฀ [15]฀ and฀ data฀ mining฀ methods฀ [16,฀ 17]฀ can฀ be฀ applied฀ to identify฀ the฀ environment.฀ The฀ problems฀ with฀ conventional฀ approaches฀ are analyzed฀in฀Section฀ 3.฀In฀Section฀ 4฀the฀multi-knowledge฀representation฀is defined฀ and฀ it฀ is฀ demonstrated฀ that฀ a฀ robot฀ with฀ multi-knowledge฀ has฀ the ability฀to฀identify฀its฀environment฀even฀if฀some฀features฀of฀the฀environment changed.฀The฀conclusion฀is฀given฀in฀Section฀ 5.

2 Representation of Environment Features 2.1 Environment Features In order to illustrate the application of multi-knowledge to robot identification of a changing environment, suppose that there is a robot in a building with six rooms shown in Fig. 1. Rooms from No.1 to No.3 are offices. The area marked 4 is the corridor and rooms No.5 and No.6 are laboratories. Features of the environment are selected from those that can be detected by the sensors in the robot. In this case, the features of the environment are Area, GroundColour, WallColour, CeilingHeight, and Illumination. After the robot had gone through all the rooms in the building, the environment feature information system was obtained. The colours are obtained from a colour camera. Illumination is detected by photosensitive sensors. Distances are detected by distance detectors.

Fig. 1. A robot in a building with six rooms

Mobile Robot Identification of a Changing Environment

173

Fig. 2. A robot detects a room area

Area is calculated from the distance data. As shown in Fig. 2, let r(θ) represent distance from the robot to a wall along the direction θ. The distance r(θ) can be detected by distance detectors in the robot. The area can be calculated as follows. s=

1 2



r2 (θ)dθ

(1)

0

For simplicity, r(θ) is represented by a set of distances r0 , r1 , r2 , ..., rN with angular interval Δθ = 2π/N . The area between rn and rn+1 can be calculated by using the small triangle if Δθ is very small. 1 rn rn+1 Δθ 2 The room area can be calculated by sn =

s=

N −1 n=0

sn =

N −1 n=0

1 rn rn+1 Δθ 2

(2)

(3)

where฀r 0฀ =฀r N฀ =฀r(θ฀=฀0).฀Note฀that฀the฀robot฀can฀detect฀the฀area with฀this฀formula฀at฀any฀point฀in฀the฀room.฀This฀is฀an฀approximate฀formula. The฀error฀depends฀on฀the฀angular฀interval฀Δθ฀is.฀The฀other฀features฀can฀be detected฀by฀the฀corresponding฀sensors฀in฀the฀robot.฀An฀example฀environment information฀system฀is฀shown฀in฀Table฀ 1. 2.2฀ Environment฀Feature฀Instance฀System Following฀[17],฀let฀I฀=฀represent฀an instance฀system,฀where฀U฀= u1 ,฀u2 ,฀...฀,ui ,฀...,฀u|U | is฀a฀finite฀non-empty฀set,฀called฀an฀instance฀space฀or universe,฀and฀where฀ui฀ is฀called฀an฀instance฀in U .฀A฀=฀ a1 ,฀a2 ,฀a3 ,฀...,฀ai ,฀..., a|A| ,฀also฀a฀finite฀non-empty฀set,฀is฀a฀set฀of฀attributes฀of฀the฀instances,฀where฀ai

174

Q.฀Wu฀et฀al.

Table฀1.฀Room฀Feature฀Decision฀System฀ S:Area,GC:GroundColor,฀WC:WallColor, CH:CeilingHeight,฀B:Illumination,฀R:Room U S GC

WC

CH B R

1 2 3 4 5 6

yellow white white white yellow white

2.6 2.6 2.6 2.8 3.0 3.0

9 12 9 13 15 15

yellow yellow blue gray gray yellow

1 2 1 2 3 3

1 2 3 4 5 6

is฀an฀attribute฀of฀a฀given฀instance.฀D฀is฀a฀non-empty฀set฀of฀decision฀attributes, and A฀ ∩ D฀ =฀ 0. For฀ every฀ a฀ ∈ A฀ there฀ is฀ a฀ domain,฀ represented฀ by Va ,฀ and฀ there฀ is฀ a mapping฀a(u)฀:฀U฀ → Va฀ from฀U฀ into฀the฀domain฀Va ,฀where฀a(u)฀represents the฀value฀of฀attribute฀a฀of฀instance฀u฀and฀is฀a฀value฀in฀the฀set฀Va . Va฀ =฀a(U )฀=฀a(u)฀:฀u฀∈ U f or฀a฀∈ A

(4)

For฀a฀decision฀system,฀the฀domain฀of฀decision฀attribute฀is฀represented฀by Vd฀ =฀d(U )฀=฀d(u)฀:฀u฀∈ U f or฀d฀∈ D

(5)

For฀example,฀Table฀ 1฀is฀regarded฀as฀an฀environment฀feature฀decision฀system฀in฀which฀the฀information฀is฀detected฀by฀a฀robot฀that฀has฀gone฀through all฀the฀rooms฀in฀the฀building฀shown฀in฀Fig฀ 1.฀The฀attribute฀set฀A฀=Area, GroundColor,฀WallColor,฀CeilingHeight,฀Illumination.฀The฀decision฀attribute is฀D=Room.฀According฀to฀Equation฀(฀4),฀VArea฀ =฀9,฀12,฀13,฀15.฀So฀|VArea | =฀4. By฀analogy,฀we฀have |VGround | =฀3,฀|VW all | =฀2.|VCeiling | =฀2, |VLight | =฀3, and฀the฀size฀of฀decision฀attribute฀domain฀|VRoom | =฀6. 2.3฀ Condition฀Vector฀Space The฀condition฀vector฀space,฀which฀is฀generated฀from฀attribute฀domain฀Va ,฀is denoted฀by V×A฀ =฀ × Va฀ =฀Va1฀× Va2฀× ...฀× Va|A| a∈A฀

(6)

and฀the฀size฀of฀the฀space฀is |V×A | =

|A| i=1

|Vai |

(7)

Mobile Robot Identification of a Changing Environment

175

For Table 1, |V฀×A | =

|A|

฀ |V฀ai | = |VArea | × |VGround | × |VW all | × |VCeiling | × |VLight | (8)

i=1

|V฀×A | =฀4฀× 3฀× 2฀× 2฀=฀48

(9)

A(u)฀=฀(a฀1 (u),฀a฀2 (฀u฀,฀....,฀a฀ ) |A| (฀u))

(10)

A(U )฀=฀{A(u) : u฀∈ U }

(11)

V฀×D =฀ × Vd = Vd1 × Vd2฀× ... × Vd|D|

(12)

This฀ means฀ that฀ there฀ are฀ 48฀ condition฀ vectors฀ in฀ the฀ condition฀ vector space฀|V฀×A |.฀Every฀condition฀vector฀corresponds฀to฀a฀combination฀of฀attribute values.฀Every฀instance฀corresponds฀to฀a฀vector฀in฀the฀vector฀space฀by฀its฀attribute฀values.

For฀example,฀A(Room1)฀=฀(9,฀yellow,฀yellow, 2฀6฀ .฀ , 1).฀If฀A(u)฀=฀A(v)฀for u฀∈ U฀ and฀v฀∈ U ,฀instance฀u฀and฀instance฀v฀have฀the฀same฀condition฀vector. Instance฀u฀and฀instance฀v฀are฀indiscernible.฀A(U )฀represents฀a฀set฀of฀vectors which฀exist฀in฀the฀decision฀system.

If฀ |A(U )| =฀ |V×A฀|,฀ the฀ system฀ is฀ called฀ a฀ complete฀ instance฀ system฀ or complete฀system.฀In฀the฀real฀world,฀training฀sets฀for฀decision-making฀or฀feature฀classification฀are฀rarely฀completed฀systems.฀Table฀ 1฀has฀only฀6฀existing condition฀vectors,฀while฀the฀size฀of฀condition฀space฀is |V฀×A | =฀48.฀Thus฀Table 1฀is฀not฀a฀complete฀instance฀system. The฀decision฀vector฀space฀is฀represented฀by d∈D

V฀d is฀equivalent฀to฀V×D in฀cases฀in฀which฀there฀is฀only฀one฀decision฀attribute. 2.4฀ Conventional฀Knowledge฀Representations Knowledge฀can฀be฀represented฀ in฀many฀forms฀such฀as฀rules,฀decision฀trees, neural฀networks,฀Bayesian฀belief฀networks฀and฀other฀methods฀[15].฀In฀general, knowledge฀can฀be฀defined฀as฀a฀mapping฀from฀condition฀vector฀space฀to฀decision space. ϕA฀ :฀V฀×A → V×D

(13)

V×B = × Va = Va1 × Va2฀× ... × Va|B|

(14)

Let฀B฀ ⊂ A.฀We฀have

a∈B

and

176

Q.฀Wu฀et฀al.

|V×B | =

|B| i=1

|Vbi |

(15)

|V×B | is called a subspace of |V×A |. The conventional approach in machine learning or data mining is to select a “best” subset B to get a mapping ϕB based on the subset B. ϕB : V×B → V×D

(16)

ϕB is then applied instead of ϕA to make decisions or in feature classification. Different mappings can be obtained because the training set is an incomplete system. For example, the decision tree, which is shown in Figure 3, is obtained from Table 1 by means of information entropy.

Fig. 3. Knowledge ϕB represented by a decision tree

The advantage of this approach is that it enables a robot to identify the rooms in the building with only two attributes – Area and Ground.

3฀Problems฀with฀Single฀Knowledge A฀ single฀ body฀ of฀ knowledge฀ encounters฀ problems฀ in฀ cases฀ of฀ dealing฀ with changing฀attributes,฀incomplete฀attributes,฀and฀unseen฀instances.฀Issues฀arise when฀using฀a฀single฀body฀of฀knowledge฀in฀a฀changing฀environment.฀For฀example฀suppose฀that฀the฀features฀of฀the฀rooms฀in฀Table฀ 1฀have฀been฀changed. The฀new฀features฀are฀shown฀in฀Table฀ 2. Using฀ the฀ decision฀ tree฀ in฀ Fig.฀ 3฀ consider฀ a฀ robot฀ that฀ goes฀ through the฀rooms฀with฀the฀changed฀features฀as฀shown฀in฀Table฀ 2.฀When฀the฀robot enters฀room฀1,฀it฀obtains฀the฀condition฀vector฀(9,฀yellow,฀yellow,฀2.6,฀2).฀As the฀ decision฀ tree฀ only฀ tests฀ Area฀ and฀ GroundColor,฀ it฀ does฀ not฀ care฀ that Illumination฀has฀changed.

Mobile Robot Identification of a Changing Environment

177

Table 2. Changed Room Feature Values S:Area,GC:GroundColor, WC:WallColor, CH:CeilingHeight, B:Illumination, R:Room US

GC

WC

CH

B

R

1 2 3 4 5 6

yellow yellow brown(blue) blue(gray) gray yellow

yellow white white yellow(white) yellow white

2.6 2.8(2.6) 2.9(2.6) 2.8 3.0 3.0

2(1) 2 3(1) 2 3 3

1 2 3 4 5 6

9 12 9 13 14(15) 13(15)

By using the decision tree, the robot can determine that it is in room 1. For the other instances from U2 to U6, the results are listed as follows. U2: (12, yellow, white, 2.8, 2) → decision tree → Room2. U3: (9, black, white, 2.9, 3) → decision tree → Unknown. U4: (13, blue, yellow, 2.8, 2) → decision tree → Room4. U5: (14, gray, yellow, 3.0, 3) → decision tree → Unknown. U6: (13, yellow, white, 3.0, 3) → decision tree → Room4. In many cases, the decision tree will give a wrong answer or cannot give any answer. This problem is not only encountered by decision trees, but other methods as well when using single body representations.

4฀Environment฀Identification฀Based฀on Multi-knowledge A฀multi-knowledge฀approach฀can฀be฀applied฀in฀the฀machine-learning฀domain to฀solve฀ different฀ problems.฀For฀ example฀ a฀combination฀ of฀multi-knowledge and฀the฀Bayes฀Classifier฀have฀been฀used฀successfully฀to฀improve฀the฀accuracy of฀a฀classification฀system฀[18].฀Here฀we฀apply฀the฀concept฀of฀multi-knowledge to฀a฀robot฀to฀identify฀the฀changed฀environment.฀A฀single฀body฀of฀knowledge representation฀usually฀does฀not฀need฀all฀attributes฀in฀an฀instance฀system.฀For example,฀the฀decision฀tree฀in฀Figure฀ 3฀contains฀only฀two฀attributes-Area฀and GroundColor.฀This฀attribute฀set฀is฀called฀a฀reduct.฀In฀general,฀there฀are฀many reducts฀in฀a฀decision฀system.฀In฀a฀conventional฀data฀mining฀method,฀firstly,฀a good฀reduct(or฀a฀set฀of฀good฀features)฀is฀selected฀and฀then฀extracted฀to฀form฀a single฀body฀of฀knowledge฀based฀on฀this฀reduct.฀The฀multi-knowledge฀approach however฀ encourages฀ finding฀ as฀ many฀ reducts฀ as฀ possible.฀ Every฀ reduct฀ can contribute฀ to฀ a฀ single฀ body฀ of฀ knowledge.฀ A฀ set฀ of฀ these฀ single฀ bodies฀ of knowledge฀is฀called฀the฀multi-knowledge.฀Definition:฀Given฀a฀decision฀system I฀ =฀.฀Multi-knowledge฀is฀defined฀as Φ฀=฀{ϕB |B฀∈ RED}

(17)

178

Q.฀Wu฀et฀al.

where฀ϕB฀ is฀a฀mapping฀from฀the฀condition฀vector฀space฀V×B฀ to฀the฀decision space฀V×D .฀RED฀is฀a฀set฀of฀reducts฀from฀the฀decision฀system.฀Reducts฀RED can฀ be฀ found฀ by฀ the฀ algorithm฀ in฀ [18].฀ For฀ Table฀ 1,฀ there฀ are฀ 5฀ reducts. RED={{Area,฀ Wall},฀ {Area,฀ Ground},฀ {Ground,฀ Illumination},฀ {Ground, Ceiling,฀Wall},฀{Wall,฀Illumination,฀Ceiling}} Applying฀these฀5฀reducts,฀5฀single฀bodies฀of฀knowledge฀can฀be฀obtained. For฀example,฀let฀reduct฀B={Area,฀Wall}.฀The฀existing฀condition฀vector฀space is A{Area,W all} (U )฀=฀{(9,฀yellow),฀(12,฀white),฀(9,฀white), (13,฀yellow),฀(15,฀yellow),฀(15,฀white)}

(18)

Extracting฀rules฀in฀this฀condition฀vector฀space฀from฀Table฀ 1฀and฀generalizing฀the฀rules,฀we฀have 

ϕ(Area,W all)

Room1 Room2  Room3 = Room4 Room5 Room6  U nknown f or other cases

if (Area, W all) = (9, yellow) if (Area) = (12) if (Area, W all) = (9, white) if (Area) = (13) if (Area, W all) = (15, yellow) if (Area, W all) = (15, white)

(19) where ϕ(Area,W all) is presented by using the existing condition vector space A(Area,W all) instead of V×(Area,W all) because the decision system is not a complete system. ϕ(Area,Ground) , ϕ(Ground,Illumination) , ϕ(Ground,Ceiling,W all) and ϕ(Illumination,Ceiling,W all) can be obtained by analogy. Every instance may get multiple decisions from multi-knowledge. In order to merge these decisions, a decision support degree, which is denoted by Sup(di ), is defined by a probability as follows. Sup(di ) = P (di |di = ϕB )

f or ϕB ∈ Φ

(20)

where di ∈ Vd is a decision in the decision space, ϕB is a single body of knowledge among the multi-knowledge Φ. The final decision is made by dF inal = arg max Sup(di )

(21)

di ∈Vd

Now consider a robot has multi-knowledge Φ extracted from Table 1 and it enters the changed environment shown in Table 2. For instance U3 in Table 2, ϕ(Area,W all) ϕ(Area,Ground) ϕ(Ground,Illumination) ϕ(Ground,Ceiling,W all) ϕ(Illumination,Ceiling,W all)

= ϕ(9,white) = ϕ(9,brown) = ϕ(brown,3) = ϕ(brown,2.9,white) = ϕ(3,2.9,white)

= = = = =

Room3 U nknown U nknown U nknown U nknown

(22)

Mobile Robot Identification of a Changing Environment

179

So Sup(Room1) = 0, Sup(room2) = 0, Sup(Room3) = 1, Sup(Room4) = 0, Sup(Room5) = 0, Sup(Room6) = 0. The final decision is as follows. dF inal = arg max Sup(di ) = Room3

(23)

di ∈Vd

It is possible, according to this algorithm, to calculate dF inal for all the instances in Table 2, The results are shown in Table 3. The results show that multi-knowledge can cope with a changing environment much better than any single body of knowledge. Table 3. Results for multi-knowledge to identify the changed environment Φ

U1

U2

U3

U4

U5

U6

ϕ(Area,W all) ϕ(Area,Ground) ϕ(Ground,Illum)

Room1 Room1 Room2 Room1 Unknown Room1

Room2 Room2 Room2 Unknown Unknown Room2

Room3 Unknown Unknown Unknown Unknown Room3

Room4 Room4 Unknown Unknown Unknown Room4

Unknown Unknown Room5 Room5 Room5 Room5

Unknown Unknown Room6 Room6 Room6 Room6

ϕ(Ground,Ceiling,W all) ϕ(Illum,Ceiling,W all) dF inal

5 Conclusion In this paper the problem of a robot identifying its location in a changing environment has been considered. A large environment is divided into small area environments. An instance information system is applied to represent environment features of the sub area environments. Conventional machine leaning approaches are applied to identify sub area environments. The problem for identifying a changing environment was analyzed, where a robot using conventional machine leaning mechanisms finds it difficult to solve the problem. In contrast a multi-knowledge approach was proposed to solve the problem. A case-study was presented to demonstrate that a robot with multi-knowledge copes with a changing environment much better than a conventional singlebody knowledge representation.

References 1. Leonard JJ, Durrant-Whyte HF (1991) Mobile robot localization by tracking geometric beacons, IEEE Transactions on Robotics and Automation, 7(3): 376−382. 2. Feng L, Borenstein J, Everett HR (1994) ”Where am I?” Sensors and methods for autonomous mobile robot positioning. Technical Report UMMEAM9412, University of Michigan, Ann Arbor, MI.

180

Q.฀Wu฀et฀al.

3.฀ Jensfelt฀ P,฀ Kristensen฀ S฀ (2001)฀Active฀global฀localization฀ for฀a฀ mobile฀robot using฀multiple฀hypothesis฀tracking,฀IEEE฀Transactions฀on฀ Robotics฀and฀Automation,฀Volume:฀17(5):฀748−760. 4.฀ Dissanayake฀ MW,฀ Newman฀ MG,฀ Clark฀ S,฀ Durrant-Whyte,฀ HF,฀ Csorba฀ M (2001)฀A฀solution฀to฀the฀simultaneous฀localization฀and฀map฀building฀(SLAM) problem.฀IEEE฀Transactions฀on฀Robotics฀and฀Automation,฀17(3):฀229−241. 5.฀ Fox฀ D,฀ Burgard฀ W,฀ and฀ Thrun฀ S฀ (1999)฀ Markov฀ Localization฀ For฀ Mobile฀ Robots฀ in฀ Dynamic฀ Environments.฀ Journal฀ of฀ Artificial฀ Intelligence฀ Research,11:฀391−427. 6.฀ Fox฀D,฀Burgard฀W,฀Kruppa฀H,฀and฀Thrun฀S฀(2000)฀A฀probabilistic฀approach to฀collaborative฀multi-robot฀localization.฀Autonomous฀Robots,฀Special฀Issue฀on Heterogeneous฀Multi-Robot฀Systems,฀8(3):฀325−344. 7.฀ Thrun฀S฀(2001)฀A฀probabilistic฀online฀mapping฀algorithm฀for฀teams฀of฀mobile robots.฀International฀Journal฀of฀Robotics฀Research,฀20(5):฀335−363. 8.฀ Thrun฀S,฀(1998)฀Bayesian฀Landmark฀Learning฀for฀Mobile฀Robot฀Localization. Machine฀Learning,฀33(1):฀41−76. 9.฀ Shimshoni฀I฀(2002)฀On฀mobile฀robot฀localization฀from฀landmark฀bearings,฀IEEE Transactions฀on฀Robotics฀and฀Automation,฀V18(6):971−976. 10.฀ Briechle฀K,฀Hanebeck฀UD฀(2004),฀Localization฀of฀a฀mobile฀robot฀using฀relative bearing฀measurements;฀IEEE฀Transactions฀on฀Robotics฀and฀Automation,฀20(1): 36−44. 11.฀ Se฀S,฀Lowe฀D,฀Little฀J฀(2001)฀Vision-based฀mobile฀robot฀localization฀and฀mapping฀using฀scale-invariant฀features.฀In฀Proceedings฀of฀the฀IEEE฀International Conference฀on฀Robotics฀and฀Automation฀(ICRA),฀pp.2051-2058,฀Seoul,฀Korea. 12.฀ Niall฀W,฀Jos฀SV฀(2002)฀Information฀Sampling฀for฀vision-based฀robot฀navigation, Robotics฀and฀Autonomous฀Systems,฀41(฀2-3):฀145−159. 13.฀ Ayache฀N฀(1991)฀Artificial฀Vision฀for฀Mobile฀robots฀-฀Stereo-vision฀and฀Multisensor฀Perception,฀MIT-Press,฀page฀342. 14.฀ Kiyosumi฀K,฀Jun฀M,฀Yoshiaki฀S฀(2002)฀Autonomous฀visual฀navigation฀of฀a฀mobile฀robot฀using฀a฀human-guided฀experience,฀Robotics฀and฀Autonomous฀Systems,฀40(2-3):฀121−130. 15.฀ Mitchell฀MT฀(1997)฀Machine฀Learning,฀McGraw฀Hill,฀Co-published฀by฀the฀MIT Press฀Companies,฀Inc. 16.฀ Lin฀ TY,฀ Cercone฀ N(eds)฀ (1997)฀ Rough฀ Sets฀ and฀ Data฀ Mining:฀ Analysis฀ for Imprecise฀Data,฀Boston,฀Mass;฀London฀:฀Kluwer฀Academic. 17.฀ Polkowski฀L,฀Tsumoto฀S,฀Lin฀TY฀(2000)฀Rough฀Set฀Methods฀and฀Applications, New฀Developments฀in฀Knowledge฀Discovery฀in฀Information฀Systems,฀PhysicaVerlag,฀A฀Springer-Verlag฀Company. 18.฀ Wu฀QX,฀Bell฀DA,฀McGinnity฀TM฀(2005)฀Multi-knowledge฀for฀Decision฀Making, International฀Journal฀of฀Knowledge฀and฀Information฀Systems,฀Springer-Verlag London฀Ltd.฀7(2):฀246−266.

Robust฀Monte-Carlo฀Localization฀Using Adaptive฀Likelihood฀Models Patrick฀Pfaff1 ,฀Wolfram฀Burgard1 ,฀and฀Dieter฀Fox2 1฀

Department฀of฀Computer฀Science,฀University฀of฀Freiburg,฀Germany, {pfaff,burgard}@informatik.uni-freiburg.de 2฀ Department฀of฀Computer฀Science฀&฀Engineering,฀University฀of฀Washington, USA,฀[email protected] Summary.฀ In฀probabilistic฀mobile฀robot฀localization,฀the฀development฀of฀the฀sensor model฀plays฀a฀crucial฀role฀as฀it฀directly฀influences฀the฀efficiency฀and฀the฀robustness of฀the฀localization฀process.฀Sensor฀models฀developed฀for฀particle฀filters฀compute฀the likelihood฀of฀a฀sensor฀measurement฀by฀assuming฀that฀one฀of฀the฀particles฀accurately represents฀ the฀ true฀ location฀ of฀ the฀ robot.฀ In฀ practice,฀ however,฀ this฀ assumption is฀often฀strongly฀violated,฀especially฀when฀using฀small฀sample฀sets฀or฀during฀global localization.฀In฀this฀paper฀we฀introduce฀a฀novel,฀adaptive฀sensor฀model฀that฀explicitly takes฀the฀limited฀representational฀power฀of฀particle฀filters฀into฀account.฀As฀a฀result, our฀approach฀uses฀smooth฀likelihood฀functions฀during฀global฀localization฀and฀more peaked฀ functions฀ during฀ position฀ tracking.฀ Experiments฀ show฀ that฀ our฀ technique significantly฀outperforms฀existing,฀static฀sensor฀models.

1 Introduction Probabilistic mobile robot localization estimates a robot’s pose using a map of the environment, information about the robot’s motion, and sensor measurements that relate the robot’s pose to objects in the map. The sensor model plays a crucial role as it directly influences the efficiency and the robustness of the localization process. It specifies how to compute the likelihood p(z | x, m) or short p(z | x) of a measurement z given the vehicle is at position x in the environment m. A proper likelihood function has to take various sources of noise into account, including sensor uncertainty, inaccuracy of the map, and uncertainty in the robot’s location. An improperly designed likelihood function can make the vehicle overly confident in its position and in this way might lead to a divergence of the filter. On the other hand, the model might be defined in a too conservative fashion and this way prevent the robot from localizing as the sensor information cannot compensate the uncertainty introduced by the motion of the vehicle. Monte Carlo localization (MCL) is a particle-filter based implementation of recursive Bayesian filtering for robot localization [2, 6]. In each iteration of

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 181–194, 2006. © Springer-Verlag Berlin Heidelberg 2006

182

P.฀Pfaff,฀W.฀Burgard,฀and฀D.฀Fox

MCL,฀the฀likelihood฀function฀p(z฀| x)฀is฀evaluated฀at฀sample฀points฀that฀are randomly฀distributed฀according฀to฀the฀posterior฀estimate฀of฀the฀robot฀location. Sensor฀models฀developed฀for฀MCL฀typically฀assume฀that฀the฀vehicle฀location x฀ is฀ known฀ exactly;฀ that฀ is,฀ they฀ assume฀ that฀ one฀ of฀ the฀ sampling฀ points corresponds฀to฀the฀true฀location฀of฀the฀robot.฀However,฀this฀assumption฀is฀only valid฀in฀the฀limit฀of฀infinitely฀many฀particles.฀Otherwise,฀the฀probability฀that the฀pose฀of฀a฀particle฀exactly฀corresponds฀to฀the฀true฀location฀of฀the฀robot฀is virtually฀zero.฀As฀a฀consequence,฀these฀likelihood฀functions฀do฀not฀adequately model฀the฀uncertainty฀due฀to฀the฀finite,฀sample-based฀representation฀of฀the posterior.฀In฀practice,฀a฀typical฀solution฀to฀this฀problem฀is฀to฀artificially฀inflate the฀ noise฀ of฀ the฀ sensor฀ model.฀ Such฀ a฀ solution฀ is฀ not฀ satisfying,฀ however, since฀ the฀ additional฀ uncertainty฀ due฀ to฀ the฀ sample-based฀ representation฀ is not฀known฀in฀advance;฀it฀strongly฀varies฀with฀the฀number฀of฀samples฀and฀the uncertainty฀in฀the฀location฀estimate. In฀ this฀ paper฀ we฀ introduce฀ a฀ novel,฀ adaptive฀ sensor฀ model฀ that฀ explicitly฀ takes฀ location฀ uncertainty฀ due฀ to฀ the฀ sample-based฀ representation฀ into account.฀In฀order฀to฀compute฀an฀estimate฀of฀this฀uncertainty,฀our฀approach borrows฀ideas฀from฀techniques฀developed฀for฀density฀estimation.฀The฀goal฀of density฀estimation฀is฀to฀extract฀an฀estimate฀of฀the฀probability฀density฀underlying฀a฀set฀of฀samples.฀Most฀approaches฀to฀density฀estimation฀estimate฀the density฀at฀a฀point฀x฀by฀considering฀a฀region around x,฀where฀the฀size฀of฀the region฀typically฀depends฀on฀the฀number฀and฀local฀density฀of฀the฀samples฀(for instance,฀see฀Parzen฀window฀or฀k-nearest฀neighbor฀approaches฀[4]). The฀density฀estimation฀view฀suggests฀a฀way฀for฀computing฀the฀additional uncertainty฀that฀is฀due฀to฀the฀sample-based฀representation:฀When฀evaluating the฀likelihood฀function฀at฀a฀sample฀location,฀we฀consider฀the฀region฀a฀density estimation฀technique฀would฀take฀into฀account฀when฀estimating฀the฀density฀at that฀location.฀The฀likelihood฀function฀applied฀to฀the฀sample฀is฀then฀computed by฀integrating฀the฀point-wise฀likelihood฀over฀the฀corresponding฀region.฀As฀a result,฀ the฀ likelihood฀ function฀ automatically฀ adapts฀ to฀ the฀ local฀ density฀ of samples,฀being฀smooth฀during฀global฀localization฀and฀highly฀“peaked”฀during position฀tracking.฀Our฀approach฀raises฀two฀questions. 1.฀ How฀large฀should฀the฀region฀considered฀for฀a฀sample฀be? 2.฀ How฀can฀we฀efficiently฀determine฀this฀region฀and฀integrate฀the฀observation likelihood฀over฀it? While฀ our฀ idea฀ can฀ be฀ applied฀ to฀ arbitrary฀ particle฀ filter฀ approaches,฀ this paper฀ focuses฀ on฀ how฀ to฀ address฀ these฀ questions฀ in฀ the฀ context฀ of฀ mobile robot฀localization.฀In฀particular,฀we฀estimate฀the฀region฀associated฀to฀a฀particle using฀a฀measure฀applied฀in฀k-nearest฀neighbor฀density฀estimation,฀in฀which the฀region฀of฀a฀point฀grows฀until฀a฀sufficient฀number฀of฀particles฀lies฀within it.฀We฀show฀that฀by฀considering฀the฀simple฀case฀of฀k฀ =฀1฀and฀learning฀the appropriate฀smoothness฀of฀the฀likelihood฀function,฀we฀can฀effectively฀improve the฀speed฀required฀for฀global฀localization฀and฀at฀the฀same฀time฀achieve฀high accuracy฀during฀position฀tracking.

Robust฀Monte-Carlo฀Localization฀Using฀Adaptive฀Likelihood฀Models

183

This฀paper฀is฀organized฀as฀follows.฀After฀discussing฀related฀work฀in฀the next฀section,฀we฀briefly฀describe฀Monte฀Carlo฀localization฀in฀Section฀3.฀Our approach฀to฀dynamically฀adapt฀the฀likelihood฀model฀is฀presented฀in฀Section฀4. Finally,฀in฀Section฀5,฀we฀present฀experimental฀results฀illustrating฀the฀advantages฀of฀our฀model.

2 Related Work In the literature, various techniques for computing the likelihood function for Monte Carlo localization can be found. They depend mainly on the sensors used for localization and the underlying representation of the map m. For example, several authors used features extracted from camera images to calculate the likelihood of observations. Typical features are average grey values [2], color values [8], color transitions [10], feature histograms [17], or specific objects in the environment of the robot [9, 12, 13]. Additionally, several likelihood models for Monte-Carlo localization with proximity sensors have been introduced [1, 15]. These approaches either approximate the physical behavior of the sensor [7] or try to provide smooth likelihood models to increase the robustness of the localization process [14]. Whereas these likelihood functions allow to reliably localize a mobile robot in its environment, they have the major disadvantage that they are static and basically independent of the state the localization process has. In the past, is has been observed that the likelihood function can have a major influence on the performance of Monte-Carlo Localization. For example, Pitt and Shepard [11] as well as Thrun et al. [16] observed that more particles are required if the likelihood function is peaked. In the limit, i.e., for a perfect sensor, the number of required particles becomes infinite. To deal with this problem, Lenser and Veloso [9] and Thrun et al. [16] introduced techniques to directly sample from the observation model and in this way ensure that there is a critical mass of samples at the important parts of the state space. Unfortunately, this approach depends on the ability to sample from observations, which can often only be done in an approximate, inaccurate way. Alternatively, Pitt and Shepard [11] apply a Kalman filter lookahead step for each particle in order to generate a better proposal distribution. While this technique yields superior results for highly accurate sensors, it is still limited in that the particles are updated independently of each other. Hence, the likelihood function does not consider the number and density of particles. Another way of dealing with the limitations of the sample-based representation is to dynamically adapt the number of particles, as done in KLD sampling [5]. However, for highly accurate sensors, even such an adaptive technique might require a huge number of samples in order to achieve a sufficiently high particle density during global localization. The reader may notice that Kalman filter based techniques do consider the location uncertainty when integrating a sensor measurement. This is done

184

P.฀Pfaff,฀W.฀Burgard,฀and฀D.฀Fox

by฀mapping฀the฀state฀uncertainty฀into฀observation฀space฀via฀a฀linear฀approximation.฀ However,฀ Kalman฀ filters฀ have฀ limitations฀ in฀ highly฀ non-linear฀ and multi-modal฀systems,฀and฀the฀focus฀of฀this฀paper฀is฀to฀add฀such฀a฀capability to฀particle฀filters.฀More฀specifically,฀in฀this฀paper฀we฀propose฀an฀approach฀that dynamically฀adapts฀the฀likelihood฀function฀during฀localization.฀We฀dynamically฀calculate฀for฀each฀particle฀the฀variance฀of฀the฀Gaussian฀governing฀the likelihood฀function฀depending฀on฀the฀area฀covered฀by฀that฀particle.

3 Monte Carlo Localization To estimate the pose x of the robot in its environment, we consider probabilistic localization, which follows the recursive Bayesian filtering scheme. The key idea of this approach is to maintain a probability density p(x) of the robot’s own location, which is updated according to the following rule: p(xt | z1:t , u0:t−1 ) = α · p(zt | xt ) ·

p(xt | ut−1 , xt−1 ) · p(xt−1 ) dxt−1 (1)

Here α is a normalization constant ensuring that p(xt | z1:t , u0:t−1 ) sums up to one over all xt . The term p(xt | ut−1 , xt−1 ) describes the probability that the robot is at position xt given it executed the movement ut−1 at position xt−1 . Furthermore, the quantity p(zt | xt ) denotes the probability of making observation zt given the robot’s current location is xt . The appropriate computation of this quantity is the subject of this paper. Throughout this paper we use a sample-based implementation of this filtering scheme also denoted as Monte Carlo localization [2, 6]. In Monte-Carlo localization, which is a variant of particle filtering [3], the update of the belief generally is realized by the following two alternating steps: 1. In the prediction step, we draw for each sample a new sample according to the weight of the sample and according to the model p(xt | ut−1 , xt−1 ) of the robot’s dynamics given the action ut−1 executed since the previous update. 2. In the correction step, the new observation zt is integrated into the sample set. This is done by bootstrap re-sampling, where each sample is weighted according to the likelihood p(zt | xt ) of sensing zt given by its position xt .

4 Dynamic Sensor Models The likelihood model p(z | x) plays a crucial role in the correction step of the particle filter. Typically, very peaked models require a huge number of particles. This is because even when the particles populate the state space very densely, the likelihoods of an observation might differ by several orders of

Robust฀Monte-Carlo฀Localization฀Using฀Adaptive฀Likelihood฀Models

185

magnitude.฀As฀the฀particles฀are฀drawn฀proportional฀to฀the฀importance฀weights, which฀themselves฀are฀calculated฀as฀the฀likelihood฀of฀zt฀ given฀the฀pose฀xt฀ of the฀corresponding฀particle,฀a฀minor฀difference฀in฀xt฀ will฀result฀in฀a฀difference of฀one฀order฀of฀magnitude฀in฀the฀likelihood฀and฀thus฀will฀result฀in฀a฀depletion of฀ such฀ a฀ particle฀ in฀ the฀ re-sampling฀ step.฀ Accordingly,฀ an฀ extremely฀ high density฀of฀particles฀is฀needed฀when฀the฀sensor฀is฀highly฀accurate.฀On฀the฀other hand,฀the฀sheer฀size฀of฀the฀state฀space฀prevents฀us฀from฀using฀a฀sufficiently large฀number฀particles฀during฀global฀localization฀in฀the฀case฀that฀the฀sensor is฀highly฀accurate.฀Accordingly฀the฀sensor฀model฀needs฀to฀be฀less฀peaked฀in the฀case฀of฀global฀localization,฀when฀the฀particles฀are฀sparsely฀distributed฀over the฀state฀space.฀This฀is฀essentially฀the฀idea฀of฀the฀approach฀proposed฀in฀this paper. 4.1฀ Likelihood฀Model฀for฀Range฀Sensors The฀likelihood฀model฀used฀throughout฀this฀paper฀calculates฀p(z฀ | x)฀based on฀the฀distance฀d฀to฀the฀closest฀obstacle฀in฀the฀direction฀of฀the฀measurement given฀x (and฀the฀map฀m).฀Accordingly,฀p(z฀| x)฀is฀calculated฀as p(z฀| x)฀=฀p(z฀| d).

(2)

To฀ determine฀ this฀ quantity,฀ we฀ follow฀ the฀ approach฀ described฀ in฀ Thrun฀ et al.฀[15]฀and฀Choset฀et฀al.฀[1].฀The฀key฀idea฀of฀this฀sensor฀model฀is฀to฀use฀a mixture฀of฀four฀different฀distributions฀to฀capture฀the฀noise฀and฀error฀characteristics฀of฀range฀sensors.฀It฀uses฀a฀Gaussian phit (z฀| d)฀to฀model฀the฀likelihood in฀situations฀in฀which฀the฀beam฀hits฀the฀next฀object฀in฀the฀direction฀of฀the measurement.฀Additionally,฀it฀uses฀a฀uniform฀distribution฀prand (z฀| d)฀to฀represent฀random฀measurements.฀It฀furthermore฀models฀objects฀not฀contained฀in the฀map฀as฀well฀as฀the฀effects฀of฀cross-talk฀between฀different฀sensors฀by฀an exponential฀distribution฀pshort (z฀| d).฀Finally,฀it฀deals฀with฀detection฀errors, in฀which฀the฀sensor฀reports฀a฀maximum฀range฀measurement,฀using฀a฀uniform distribution฀ pmax (z฀ | d).฀ These฀ four฀ different฀ distributions฀ are฀ mixed฀ by฀ a weighted฀average,฀defined฀by฀the฀parameters฀αhit ,฀αshort ,฀αmax ,฀and฀αrand . p(z฀| d)

(3) T฀

=฀(αhit , αshort , αmax , αrand )฀· (phit (z฀| d), pshort (z฀| d), pmax (z฀| d), prand (z฀| d)) . Note that these parameters need to satisfy the constraints that none of them should be smaller than zero and that p(z | d) integrates to 1 over all z for a fixed d. A plot of this sensor model for a given set of parameters is shown in Figure 1. Also note that the exponential distribution is only used to model measurements that are shorter than expected, i.e., for measurements z with z < d. Therefore, there is a discontinuity at z = d (see Thrun et al. [15] for details).

186

P.฀Pfaff,฀W.฀Burgard,฀and฀D.฀Fox sigma_von_r

sigma_2

sigma_1

radius_r_mal_alpha

Fig. 1. Sensor model given by a mixture of different distributions (left image) and piecewise linear function used to calculate the standard deviation based on the distance d = 2r to the closest particle.

4.2 Adapting the Variance As already mentioned above, the particle set should approximate the true posterior as closely as possible. Since we only have a limited number of particles, which in practice is often chosen as small as possible to minimize the computational demands, we need to take into account potential approximation errors. The key idea of our approach is to adjust the variance of the Gaussian governing phit (z | d), which models the likelihood of measuring z given that the sensor detects the closest obstacle in the map, such that the particle set yields an accurate approximation of the true posterior. To achieve this, we approximatively calculate for each particle i the space it covers by adopting the measure used in k-nearest neighbor density estimation [4]. For efficiency reasons we rely on the case of k = 1, in which the spatial region covered by a particle is given by the minimum circle that is centered at the particle and contains at least one neighboring particle in the current set. To calculate the radius ri of this circle, we have to take both, the Euclidean distance of the positions and the angular difference in orientation, into account. In our current implementation we calculate ri based on a weighted mixture of the Euclidean distance and the angular difference d(x, x ) =

(1 − ξ)((x1 − x1 )2 + (x2 − x2 )2 ) + ξ δ(x3 − x3 )2 ,

(4)

where x1 and x1 are the x-coordinates, x2 and x2 are the y-coordinates, and δ(x3 − x3 ) is the differences in the angle of x and x . Additionally, ξ is a weighing factor that was set to 0.8 in all our experiments. Figure 2 shows the fraction of a map and a particle distribution. The circle around each particle represents the radius r = 12 d(x, x ) to the closest particle x . The next step is to adjust for each particle the standard deviation σ of the Gaussian in phit (z | d) based on the distance r = 21 d(x, x∗ ), where x∗ is the particle closest to x with respect to Equation 4. In our current implementation we use a piecewise linear function σ(r) to compute the standard deviation of phit (z | d):

Robust฀Monte-Carlo฀Localization฀Using฀Adaptive฀Likelihood฀Models

187

Fig. 2. Fraction of a particle set consisting of 10,000 particles during a global localization run. The circles around the individual particles represent the radius r = 21 d(x, x ) calculated from the distance to the particle x closest to x.

Fig. 3. Distribution of 1,000,000 particles after 0, 1 and 2 integrations of measurements with the sensor model according to the specifications of the SICK LMS sensor.

  σ1 σ(r) = σ2  αr

if αr < σ1 if αr > σ2 otherwise.

(5)

To learn the values of the parameters σ1 , σ2 , and α of this function, we performed a series of localization experiments on recorded data, in which we determined the optimal values by minimizing the average distance of the obtained distributions from the true posterior. Since the true posterior is unknown, we determined a close approximation of it by increasing the number of particles until the entropy of a three-dimensional histogram calculated from the particles did not change anymore. In our experiment this turned out to be the case for 1,000,000 particles. Throughout this experiment, the sensor model corresponded to the error values provided in the specifications of the laser range finder used for localization. In the remainder of this section, we will denote the particle set representing the true posterior by S ∗ . Figure 3 shows the set S ∗ at different points in time for the data set considered in this paper.

188

P.฀Pfaff,฀W.฀Burgard,฀and฀D.฀Fox

To฀calculate฀the฀deviation฀of฀the฀current฀particle฀set฀S฀from฀the฀true฀posterior฀represented฀by S ∗ ,฀we฀evaluate฀the฀KL-distance฀between฀the฀distributions obtained฀by฀computing฀a฀histogram฀from฀S฀and฀S ∗ .฀Whereas฀the฀spatial฀resolution฀ of฀ this฀ histogram฀ is฀ 40cm,฀ the฀ angular฀ resolution฀ is฀ 5฀ degrees.฀ For discrete฀probability฀distributions,฀p฀=฀p1 , .฀. .฀, pn฀ and฀q฀=฀q1 , .฀. .฀, qn ,฀the฀KLdistance฀is฀defined฀as KL(p,฀q)฀=฀

pi log2 i

pi . qi

(6)

Whenever we choose a new standard deviation for phit (z | d), we adapt the weights αhit , αshort , αmax , and αrand to ensure that the integral of the resulting density is 1. Note that in principle phit (z | d) should encode also several other aspects to allow for a robust localization. One such aspect, for example, is the dependency between the individual measurements. For example, a SICK LMS laser range scanner typically provides 361 individual distance measurements. Since a particle never corresponds to the true location of the vehicle, the error in the pose of the particle renders the individual measurements as dependent. This dependency, for example, should also be encoded in a sensor model to avoid the filter becoming overly confident about the pose of the robot. To reduce potential effects of the dependency between the individual beams of a scan, we only used 10 beams at angular displacements of 18 degrees from each scan.

5 Experimental Results The sensor model described above has been implemented and evaluated using real data acquired with a Pioneer PII DX8+ robot equipped with a laser range scanner in a typical office environment. The experiments described in this section are designed to investigate if our dynamic sensor model outperforms static models. Throughout the experiments we compared our dynamic sensor model with different types of alternative sensor models: 1. Best static sensor model. This model has been obtained by evaluating a series of global localization experiments, in which we determined the optimal variance of the Gaussian by maximizing the utility function I

U (I, N ) = i=1

(I − i + 1)

Pi , N

(7)

where I is the number of integrations of measurements into the belief during the individual experiment, N is the number of particles, and Pi is the number of particles lying in a 1m range around the true position of the robot.

Robust฀Monte-Carlo฀Localization฀Using฀Adaptive฀Likelihood฀Models

189

2.฀ Best฀tracking฀model.฀We฀determined฀this฀model฀in฀the฀same฀way฀as฀the best฀static฀sensor฀model.฀The฀only฀difference฀is฀that฀we฀have฀learned฀it from฀situations฀in฀which฀the฀filter฀was฀tracking฀the฀pose฀of฀the฀vehicle. 3.฀ SICK฀LMS฀model.฀This฀model฀has฀been฀obtained฀from฀the฀hardware฀specifications฀of฀the฀laser฀range฀scanner. 4.฀ Uniform฀dynamic฀model.฀In฀our฀dynamic฀sensor฀model฀the฀standard฀deviation฀of฀the฀likelihood฀function฀is฀computed฀on฀a฀per-particle฀basis.฀We also฀analyzed฀the฀performance฀of฀a฀model฀in฀which฀a฀uniform฀standard deviation฀is฀used฀for฀all฀particles.฀The฀corresponding฀value฀is฀computed by฀taking฀the฀average฀of฀the฀individual฀standard฀deviations.

Fig. 4. Distribution of 10,000 particles after 1, 2, 3, 5, and 11 integrations of measurements with our dynamic sensor model (left column) and with the best static sensor model (right column).

5.1 Global Localization Experiments The first set of experiments is designed to evaluate the performance of our dynamic sensor model in the context of a global localization task. In the particular data set used to carry out the experiment, the robot started in the kitchen of our laboratory environment (lower left room in the maps depicted in Figure 3). The evolution of a set of 10,000 particles during a typical global localization run with our dynamically adapted likelihood model and with the

190

P.฀Pfaff,฀W.฀Burgard,฀and฀D.฀Fox

dynamic sensor model uniform dynamic sensor model best static sensor model

100

particles closer than 1m to ground thruth [%]

particles closer than 1m to ground thruth [%]

Fig. 5. Evolution of the distance d(x, x ) introduced in Equation (4). Distribution of 10,000 particles after 1, 3, 4 and 5 integrations of measurements with our dynamic sensor model.

80 60 40 20 0

0

5

10

15 iteration

20

25

dynamic sensor model uniform dynamic sensor model best static sensor model

100 80 60 40 20 0

0

5

10

15

20

25

iteration

Fig. 6. Percentage of particles within a 1m circle around the true position of the robot with our dynamic sensor model, the uniform dynamic model, and the best static model. The left image shows the evolution depending on the number of iterations for 10,000 particles. The right plot was obtained with 2,500 particles.

best static sensor model is depicted in Figure 4. As can be seen, our dynamic model improves the convergence rate as the particles quickly focus on the true pose of the robot. Due to the dynamic adaptation of the variance, the likelihood function becomes more peaked so that unlikely particles are depleted earlier. Figure 5 shows the evolution of the distance d(x, x ) introduced in Equation (4). The individual images illustrate the distribution of 10,000 particles after 1, 2, 3, and 5 integrations of measurements with our dynamic sensor model. The circle around each particle represents the distance r = 21 d(x, x ) to the next particle x . Figure 6 shows the convergence of the particles to the true position of the robot. Whereas the x-axis corresponds to the time step, the y-axis shows the number of particles in percent that are closer than 1m to the true position. Shown are the evolutions of these numbers for our dynamic sensor model,

Robust฀Monte-Carlo฀Localization฀Using฀Adaptive฀Likelihood฀Models dynamic sensor model best static sensor model best tracking model SICK LMS model dynamic sensor model with one beam SICK LMS model with one beam

125

100 success rate [%]

191

75

50

25

0

2500

5000

7500

10000

number of particles

Fig. 7. Success rates for different types of sensor models and sizes of particle sets. The shown results are computed in extensive global localization experiments.

a uniform dynamic model, and the best fixed model for 10,000 and 2,500 particles. Note that the best static model does not reach 100%. This is due to the fact that the static sensor model relies on a highly smoothed likelihood function, which is good for global localization but does not achieve maximal accuracy during tracking. In the case of 10,000 particles, the variances in the distance between the individual particles are typically so small, that the uniform model achieves a similar performance. Still, a t-test showed that both models are significantly better than the best fixed model. In the case of 2,500 particles, however, the model that adjusts the variance on a per-particle basis performs better than the uniform model. Here, the differences are significant whenever they exceed 20. Figure 7 shows the number of successful localizations after 35 integrations of measurements for a variety of sensor models and for different numbers of particles. Here, we assumed that the localization was achieved when the mean of the particles differed by at most 30cm from the true location of the robot. First it shows that our dynamic model achieves the same performance as the best static model for global localization. Additionally, the figure shows that the static model that yields the best tracking performance has a substantially smaller success rate. Additionally, we evaluated a model, denoted as the SICK LMS model, in which the standard deviation was chosen according to the specifications of the SICK LMS sensor, i.e., under the assumption that the particles in fact represent the true position of the vehicle. As can be seen, this model yields the worst performance. Furthermore, we evaluated, how the models perform when only one beam is used per range scan. With this experiment we wanted to analyze whether the dynamic model also yields better results in situations in which there is no dependency between the individual beams of a scan. Again, the plots show that our sensor model outperforms

192

P.฀Pfaff,฀W.฀Burgard,฀and฀D.฀Fox

average standard deviation

0.45

10000 particles 7500 particles 5000 particles 2500 particles

0.4

0.35

0.3

0.25

5

10

15

iteration

Fig. 8. Evolution of the average standard deviation during global localization with different numbers of particles and our dynamic sensor model. 0.2

0.16

0.16

0.14

0.14

0.12 0.1 0.08

0.12 0.1 0.08

0.06

0.06

0.04

0.04

0.02

5

10

15 iterations

20

dynamic sensor model static sensor model

0.18

localization error

localization error

0.2

dynamic sensor model static sensor model

0.18

25

0.02

5

10

15 iterations

20

25

Fig. 9. Average localization error of 10,000 (left image) and 2,500 (right image) particles during a position tracking task. Our dynamic sensor model shows a similar performance as the best tracking model.

the model, for which the standard deviation corresponds to the measuring accuracy of the SICK LMS scanner. Finally, Figure 8 plots the evolution of the average standard deviations of several global localization experiments with different numbers of particles. As can be seen from the figure, our approach uses more smoothed likelihood functions when operating with few particles (2,500). The more particles are used in the filter, the faster the standard deviation converges to the minimum value. 5.2 Tracking Performance We also carried out experiments, in which we analyzed the accuracy of our model when the system is tracking the pose of the vehicle. We compared our

Robust฀Monte-Carlo฀Localization฀Using฀Adaptive฀Likelihood฀Models

193

dynamic฀sensor฀model฀to฀the฀best฀tracking฀model฀and฀evaluated฀the฀average localization฀error฀of฀the฀individual฀particles.฀Figure฀9฀depicts฀the฀average฀localization฀error฀for฀two฀tracking฀experiments฀with฀10,000฀and฀2,500฀particles.฀As can฀be฀seen฀from฀the฀figure,฀our฀dynamic฀model฀shows฀the฀same฀performance as฀the฀tracking฀model฀whose฀parameters฀have฀been฀optimized฀for฀minimum localization฀error.฀This฀shows,฀that฀our฀dynamic฀sensor฀model฀yields฀faster convergence฀rates฀in฀the฀context฀of฀global฀localization฀and฀at฀the฀same฀time achieves฀the฀best฀possible฀tracking฀performance.

6 Conclusions In this paper we presented a new approach for dynamically adapting the sensor model for particle filter based implementations of probabilistic localization. Our approach learns a function that outputs the appropriate variance for each particle based on the estimated area in the state space represented by this particle. To estimate the size of this area, we adopt a measure developed for density estimation. The approach has been implemented and evaluated in extensive experiments using laser range data acquired with a real robot in a typical office environment. The results demonstrate that our sensor model significantly outperforms static and optimized sensor models with respect to robustness and efficiency of the global localization process.

Acknowledgment The authors would like to thank Dirk Zitterell for fruitful discussions on early versions of this work. This work has partly been supported by the German Research Foundation (DFG) within the Research Training Group 1103 and under research grant SFB/TR-8, and by the National Science Foundation under grant number IIS-0093406.

References 1. H. Choset, K.M. Lynch, S. Hutchinson, G. Kantor, W. Burgard, Kavraki L.E., and S. Thrun. Principles of Robot Motion Planing. MIT-Press, 2005. 2. F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte Carlo localization for mobile robots. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 1999. 3. A. Doucet, N. de Freitas, and N. Gordon, editors. Sequential Monte Carlo in Practice. Springer-Verlag, New York, 2001. 4. R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley-Interscience, 2001.

194

P.฀Pfaff,฀W.฀Burgard,฀and฀D.฀Fox

5.฀ D.฀ Fox.฀ Adapting฀ the฀ sample฀ size฀ in฀ particle฀ filters฀ through฀ KLD-sampling. International฀Journal฀of฀Robotics฀Research,฀22(12):985฀–฀1003,฀2003. 6.฀ D.฀Fox,฀W.฀Burgard,฀F.฀Dellaert,฀and฀S.฀Thrun.฀Monte฀Carlo฀localization:฀Efficient฀position฀estimation฀for฀mobile฀robots.฀In฀Proc.฀of฀the฀National฀Conference on฀Artificial฀Intelligence฀(AAAI),฀1999. 7.฀ D.฀Fox,฀W.฀Burgard,฀and฀S.฀Thrun.฀ Markov฀localization฀for฀mobile฀robots฀in dynamic฀environments.฀ Journal฀of฀Artificial฀Intelligence฀Research,฀11,฀1999. 8.฀ H.-M.฀ Gross,฀ A.฀ K¨ onig,฀ Ch.฀ Schr¨ oter,฀ and฀ H.-J.฀ B¨ ohme.฀ Omnivision-based probalistic฀self-localization฀for฀a฀mobile฀shopping฀assistant฀continued.฀In฀Proc.฀of the฀ IEEE/RSJ฀ Int.฀ Conf.฀ on฀ Intelligent฀ Robots฀ and฀ Systems฀ (IROS),฀ pages 1505–1511,฀2003. 9.฀ S.฀ Lenser฀ and฀ M.฀ Veloso.฀ Sensor฀ resetting฀ localization฀ for฀ poorly฀ modelled mobile฀ robots.฀ In฀ Proc.฀ of฀ the฀ IEEE฀ Int.฀ Conf.฀ on฀ Robotics฀ &฀ Automation (ICRA),฀2000. 10.฀ E.฀ Menegatti,฀ A.฀ Pretto,฀ and฀ E.฀ Pagello.฀ Testing฀ omnidirectional฀ visionbased฀ Monte-Carlo฀ localization฀ under฀ occlusion.฀ In฀ Proc.฀ of฀ the฀ IEEE/RSJ Int.฀Conf.฀on฀Intelligent฀Robots฀and฀Systems฀(IROS),฀pages฀2487–2494,฀2004. 11.฀ M.฀K.฀Pitt฀and฀N.฀Shephard.฀ Filtering฀via฀simulation:฀auxiliary฀particle฀filters. Journal฀of฀the฀American฀Statistical฀Association,฀94(446),฀1999. 12.฀ T.฀R¨ ofer฀and฀M.฀J¨ ungel.฀Vision-based฀fast฀and฀reactive฀Monte-Carlo฀localization. In฀Proc.฀of฀the฀IEEE฀Int.฀Conf.฀on฀Robotics฀&฀Automation฀(ICRA),฀pages฀856– 861,฀2003. 13.฀ M.฀Sridharan,฀G.฀Kuhlmann,฀and฀P.฀Stone.฀Practical฀vision-based฀Monte฀Carlo localization฀on฀a฀legged฀robot.฀ In฀Proc.฀of฀the฀IEEE฀Int.฀Conf.฀on฀Robotics฀& Automation฀(ICRA),฀2005. 14.฀ S.฀Thrun.฀A฀probabilistic฀online฀mapping฀algorithm฀for฀teams฀of฀mobile฀robots. International฀Journal฀of฀Robotics฀Research,฀20(5):335–363,฀2001. 15.฀ S.฀Thrun,฀W.฀Burgard,฀and฀D.฀Fox.฀ Probabilistic฀Robotics.฀ MIT-Press,฀2005. 16.฀ S.฀Thrun,฀D.฀Fox,฀W.฀Burgard,฀and฀F.฀Dellaert.฀Robust฀Monte฀Carlo฀localization for฀mobile฀robots.฀ Artificial฀Intelligence,฀128(1-2),฀2001. 17.฀ J.฀Wolf,฀W.฀Burgard,฀and฀H.฀Burkhardt.฀ Robust฀vision-based฀localization฀by combining฀ an฀ image฀ retrieval฀ system฀ with฀ Monte฀ Carlo฀ localization.฀ IEEE Transactions฀on฀Robotics,฀21(2):208–216,฀2005.

Metric฀Localization฀with฀Scale-Invariant฀Visual Features฀Using฀a฀Single฀Perspective฀Camera Maren฀Bennewitz,฀Cyrill฀Stachniss,฀Wolfram฀Burgard,฀and฀Sven฀Behnke University฀of฀Freiburg,฀Computer฀Science฀Institute,฀D-79110฀Freiburg,฀Germany

Abstract. The Scale Invariant Feature Transform (SIFT) has become a popular feature extractor for vision-based applications. It has been successfully applied to metric localization and mapping using stereo vision and omnivision. In this paper, we present an approach to Monte-Carlo localization using SIFT features for mobile robots equipped with a single perspective camera. First, we acquire a 2D grid map of the environment that contains the visual features. To come up with a compact environmental model, we appropriately down-sample the number of features in the final map. During localization, we cluster close-by particles and estimate for each cluster the set of potentially visible features in the map using ray-casting. These relevant map features are then compared to the features extracted from the current image. The observation model used to evaluate the individual particles considers the difference between the measured and the expected angle of similar features. In real-world experiments, we demonstrate that our technique is able to accurately track the position of a mobile robot. Moreover, we present experiments illustrating that a robot equipped with a different type of camera can use the same map of SIFT features for localization.

1 Introduction Self-localization is one of the fundamental problems in mobile robotics. The topic was studied intensively in the past. Many approaches exist that use distance information provided by a proximity sensor for localizing a robot in the environment. However, for some types of robots, proximity sensors are not the appropriate choice because they do not agree with their design principle. Humanoid robots, for example, which are constructed to resemble a human, are typically equipped with vision sensors and lack proximity sensors like laser scanners. Therefore, it is natural to equip these robots with the ability of vision-based localization.

H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 195–209, 2006. © Springer-Verlag Berlin Heidelberg 2006

196

M.฀Bennewitz฀et฀al.

In฀ this฀ paper,฀ we฀ present฀ an฀ approach฀ to฀ vision-based฀ mobile฀ robot฀ localization฀ that฀ uses฀ a฀ single฀ perspective฀ camera.฀ We฀ apply฀ the฀ well-known Monte-Carlo฀localization฀(MCL)฀technique฀[5]฀to฀estimate฀the฀robot’s฀position.฀MCL฀uses฀a฀set฀of฀random฀samples,฀also฀called฀particles,฀to฀represent฀the belief฀of฀the฀robot฀about฀its฀pose.฀To฀locate฀features฀in฀the฀camera฀images, we฀use฀the฀Scale฀Invariant฀Feature฀Transform฀(SIFT)฀developed฀by฀Lowe฀[15]. SIFT฀features฀are฀invariant฀to฀image฀translation,฀scale,฀and฀rotation.฀Additionally,฀they฀are฀partially฀invariant฀to฀illumination฀changes฀and฀affine฀or฀3D projection.฀These฀properties฀make฀SIFT฀features฀particularly฀suitable฀for฀mobile฀robots฀since,฀as฀the฀robots฀move฀around,฀they฀typically฀observe฀landmarks from฀different฀angles฀and฀distances,฀and฀with฀a฀different฀illumination. Whereas฀existing฀systems,฀that฀perform฀metric฀localization฀and฀mapping using฀SIFT฀features,฀apply฀stereo฀vision฀in฀order฀to฀compute฀the฀3D฀position฀of the฀features฀[20,฀7,฀21,฀2],฀we฀rely฀on฀a฀single฀camera฀only฀during฀localization. Since฀we฀want฀to฀concentrate฀on฀the฀localization฀aspect,฀we฀facilitate฀the฀map acquisition฀process฀by฀using฀a฀robot฀equipped฀with฀a฀camera฀and฀a฀proximity sensor.฀During฀mapping,฀we฀create฀a฀2D฀grid฀model฀of฀the฀environment.฀In each฀cell฀of฀the฀grid,฀we฀store฀those฀features฀that฀are฀supposed฀to฀be฀at฀that 2D฀ grid฀ position.฀ Since฀ the฀ number฀ of฀ observed฀ SIFT฀ features฀ is฀ typically high,฀we฀appropriately฀down-sample฀the฀number฀of฀features฀in฀the฀final฀map. During฀MCL,฀we฀then฀rely฀on฀a฀single฀perspective฀camera฀and฀do฀not฀use฀any proximity฀ information.฀Our฀ approach฀estimates฀ for฀ clusters฀of฀particles฀the set฀of฀potentially฀visible฀features฀using฀ray-casting฀on฀the฀2D฀grid.฀We฀then compare฀those฀features฀to฀the฀features฀extracted฀from฀the฀current฀image.฀In the฀observation฀model฀of฀the฀particle฀filter,฀we฀consider฀the฀difference฀between the฀ measured฀ and฀ the฀ expected฀ angle฀ of฀ similar฀ features.฀ By฀ applying฀ the ray-casting฀technique,฀we฀avoid฀comparing฀the฀features฀extracted฀out฀of฀the current฀ image฀ to฀ the฀ whole฀ database฀ of฀ features฀ (as฀ the฀ above฀ mentioned approaches฀do),฀which฀can฀lead฀to฀serious฀errors฀in฀the฀data฀association.฀As we฀ demonstrate฀ in฀ practical฀ experiments฀ with฀ a฀ mobile฀ robot฀ in฀ an฀ office environment,฀our฀technique฀is฀able฀to฀reliably฀track฀the฀position฀of฀the฀robot. We฀also฀present฀experiments฀illustrating฀that฀the฀same฀map฀of฀SIFT฀features can฀be฀used฀for฀self-localization฀by฀different฀types฀of฀robots฀equipped฀with฀a single฀camera฀only฀and฀without฀proximity฀sensors. This฀paper฀is฀organized฀as฀follows.฀After฀discussing฀related฀work฀in฀the following฀section,฀we฀describe฀the฀Monte-Carlo฀localization฀technique฀that฀is applied฀ to฀ estimate฀ the฀ robot’s฀ position.฀ In฀ Section฀ 4,฀ we฀ explain฀ how฀ we acquire฀2D฀grid฀maps฀of฀SIFT฀features.฀In฀Section฀5,฀we฀present฀the฀observation฀model฀used฀for฀MCL.฀Finally,฀in฀Section฀6,฀we฀show฀experimental฀results illustrating฀the฀accuracy฀of฀our฀approach฀to฀estimate฀the฀robot’s฀position.

Metric฀Localization฀with฀SIFT฀Features฀Using฀a฀Single฀Camera

197

2฀Related฀Work Monte-Carlo฀methods฀are฀widely฀used฀for฀vision-based฀localization฀and฀have been฀shown฀to฀yield฀quite฀robust฀estimates฀of฀the฀robot’s฀position.฀Several localization฀approaches฀are฀image-based,฀which฀means฀that฀they฀store฀a฀set of฀reference฀images฀taken฀at฀various฀locations฀that฀are฀used฀for฀localization. Some฀of฀the฀image-based฀methods฀rely฀on฀an฀omnidirectional฀camera฀in฀order฀to฀localize฀a฀mobile฀robot.฀The฀advantages฀of฀omnidirectional฀images฀are the฀circular฀field฀of฀view฀and฀thus,฀the฀knowledge฀about฀the฀appearance฀of the฀ environment฀ in฀ all฀ possible฀ gaze฀ directions.฀ Recent฀ techniques฀ were฀ for example฀presented฀by฀Andreasson฀et฀al.฀[1]฀who฀developed฀a฀method฀to฀match SIFT฀features฀extracted฀from฀local฀interest฀points฀in฀panoramic฀images,฀by Menegatti฀et฀al.฀[16]฀who฀use฀Fourier฀coefficients฀for฀feature฀matching฀in฀omnidirectional฀images,฀and฀by฀Gross฀et฀al.฀[9]฀who฀compare฀the฀panoramic฀images using฀color฀histograms.฀Wolf฀et฀al.฀[23]฀apply฀a฀combination฀of฀MCL฀and฀an image฀retrieval฀system฀in฀order฀to฀localize฀a฀robot฀equipped฀with฀a฀perspective camera.฀The฀systems฀presented฀by฀Ledwich฀and฀Williams฀[12]฀and฀by฀K˘os´ecka and฀Li฀[11]฀perform฀Markov฀localization฀within฀a฀topological฀map.฀They฀use the฀SIFT฀feature฀descriptor฀to฀match฀the฀current฀view฀to฀the฀reference฀images.฀ Whenever฀ using฀ those฀ image-based฀ methods,฀ care฀ has฀ to฀ be฀ taken฀ in deciding฀at฀which฀positions฀to฀collect฀the฀reference฀images฀in฀order฀to฀ensure a฀complete฀coverage฀of฀the฀space฀the฀robot฀can฀travel฀in.฀In฀contrast฀to฀this, our฀approach฀stores฀features฀at฀the฀positions฀where฀they฀are฀located฀in฀the environment฀and฀not฀for฀all฀possible฀poses฀the฀robot฀can฀be฀in. Additionally,฀localization฀techniques฀have฀been฀presented฀that฀use฀a฀database฀of฀observed฀visual฀landmarks.฀SIFT฀features฀have฀become฀very฀popular for฀ metric฀ localization฀ as฀ well฀ as฀ for฀ SLAM฀ (simultaneous฀ localization฀ and mapping,฀[21,฀2]).฀Se฀et฀al.฀[20]฀were฀the฀first฀who฀performed฀localization฀using฀SIFT฀features฀in฀a฀restricted฀area.฀They฀did฀not฀apply฀a฀technique฀to฀track the฀position฀of฀the฀robot฀over฀time.฀Recently,฀Elinas฀and฀Little฀[7]฀presented a฀ system฀ that฀ uses฀ MCL฀ in฀ combination฀ with฀ a฀ database฀ of฀ SIFT฀ features learned฀in฀the฀same฀restricted฀environment.฀All฀these฀approaches฀use฀stereo vision฀to฀compute฀the฀3D฀position฀of฀a฀landmark฀and฀match฀the฀visual฀features in฀the฀current฀view฀to฀all฀those฀in฀the฀database฀to฀find฀correspondences.฀To avoid฀matching฀the฀observations฀to฀the฀whole฀database฀of฀features,฀we฀present a฀system฀that฀determines฀the฀sets฀of฀visible฀features฀for฀clusters฀of฀particles. These฀relevant฀features฀are฀then฀matched฀to฀the฀features฀in฀the฀current฀image. This฀way,฀the฀number฀of฀ambiguities,฀which฀can฀occur฀in฀larger฀environments, is฀ reduced.฀ The฀ relevant฀ features฀ are฀ determined฀ by฀ applying฀ a฀ ray-casting technique฀in฀the฀map฀of฀features.฀The฀main฀difference฀to฀existing฀metric฀localization฀systems฀using฀SIFT฀features฀is฀however฀that฀our฀approach฀is฀applicable to฀robots฀that฀are฀equipped฀with฀a฀single฀perspective฀camera฀only,฀whereas the฀other฀approaches฀require฀stereo฀vision฀or฀omnivision. Note฀that฀Davison฀et฀al.฀[3]฀and฀Lemaire฀et฀al.฀[13]฀presented฀approaches to฀feature-based฀SLAM฀using฀a฀single฀camera.฀These฀authors฀use฀extended

198

M.฀Bennewitz฀et฀al.

Kalman฀filters฀for฀state฀estimation.฀Both฀approaches฀have฀only฀been฀applied to฀robots฀moving฀within฀a฀relatively฀small฀operational฀range. Vision-based฀MCL฀was฀first฀introduced฀by฀Dellaert฀et฀al.฀[4].฀The฀authors constructed฀a฀global฀ceiling฀mosaic฀and฀use฀simple฀features฀extracted฀out฀of images฀obtained฀with฀a฀camera฀pointing฀to฀the฀ceiling฀for฀localization.฀Systems that฀apply฀vision-based฀MCL฀are฀also฀popular฀in฀the฀RoboCup฀domain.฀In฀this scenario,฀the฀robots฀use฀environment-specific฀objects฀as฀features฀[19,฀22].

3 Monte-Carlo Localization To estimate the pose xt (position and orientation) of the robot at time t, we apply the well-known Monte-Carlo localization (MCL) technique [5], which is a variant of Markov localization. MCL recursively estimates the posterior about the robot’s pose: p(xt | z1:t , u0:t−1 ) = η · p(zt | xt ) ·

xt−1

p(xt | xt−1 , ut−1 ) · p(xt−1 | z1:t−1 , u0:t−2 ) dxt−1 (1)

Here, η is a normalization constant resulting from Bayes’ rule, u0:t−1 denotes the sequence of all motion commands executed by the robot up to time t − 1, and z0:t is the sequence of all observations. The term p(xt | xt−1 , ut−1 ) is called motion model and denotes the probability that the robot ends up in state xt given it executes the motion command ut−1 in state xt−1 . The observation model p(zt | xt ) denotes the likelihood of making the observation zt given the robot’s current pose is xt . To determine the observation likelihood, our approach compares SIFT features in the current view to those SIFT features in the map that are supposed to be visible (see Section 5). MCL uses a set of random samples to represent the belief of the robot (i) about its state at time t. Each sample consists of the state vector xt and a (i) weighting factor ωt that is proportional to the likelihood that the robot is in the corresponding state. The update of the belief, also called particle filtering, is typically carried out as follows. First, the particle states are predicted according to the motion model. For each particle a new pose is drawn given the executed motion command since the previous update. In the second step, new individual importance weights are assigned to the particles. Particle i is (i) weighted according to the likelihood p(zt | xt ). Finally, a new particle set is created by resampling from the old set according to the particle weights. Each particle survives with a probability proportional to its importance weight. Due to spurious observations it is possible that good particles vanish because they have temporarily a low likelihood. Therefore, we follow the approach proposed by Doucet [6] that uses the so-called number of effective particles [14] to decide when to perform a resampling step

Metric฀Localization฀with฀SIFT฀Features฀Using฀a฀Single฀Camera

Neff฀ =

1 N i=1

w(i)

2,

199

(2)

where N is the number of particles. Neff estimates how well the current particle set represents the true posterior. Whenever Neff is close to its maximum value N , the particle set is a good approximation of the true posterior. Its minimal value 1 is obtained in the situation in which a single particle has all the probability mass contained in its state. We do not resample in each iteration, instead, we only resample each time Neff drops below a given threshold (here set to N2 ). In this way, the risk of replacing good particles is drastically reduced.

4 Acquiring 2D Maps of Scale-Invariant Features We use maps of visual landmarks for localization. To detect features, we use the Scale Invariant Feature Transform (SIFT). Each image feature is described by a vector p, s, r, f where p is the subpixel location, s is the scale, r is the orientation, and f is a descriptor vector, generated from local image gradients. The SIFT descriptor is invariant to image translation, scaling, and rotation and also partially invariant to illumination changes and affine or 3D projection. Lowe presented results illustrating robust matching of SIFT descriptors under various image transformations [15]. Mikolajczyk and Schmid compared SIFT and other image descriptors and showed that SIFT yields the highest matching accuracy [17]. Ke and Sukthankar [10] presented an approach to compute a more compact representation for SIFT features, called PCA-SIFT. They apply principal components analysis (PCA) to determine the most distinctive components of the feature vector. As shown in their work, the PCA-based descriptor is more distinctive and more robust than the standard SIFT descriptor. We therefore use that representation in our current approach. As suggested by Ke and Sukthankar, we apply a 36 dimensional descriptor vector resulting from PCA. To acquire a 2D map of SIFT features, we used a B21r robot equipped with a perspective camera and a SICK laser range finder. We steered the robot through the environment to obtain image data as well as proximity and odometry measurements. The robot was moving with a speed of 40cm/s and collected images at a rate of 3Hz . To be able to compute the positions of features and to obtain ground truth data, we used an approach to grid-based SLAM with Rao-Blackwellized particle filters [8]. Using the information about the robot’s pose and extracted SIFT features out of the current camera image, we can estimate the positions of the features in the map. More specifically, we use the distance measurement of the laser beam that corresponds to the horizontal angle of the detected feature and the robot’s pose to calculate the 2D position of the feature. Thus, we assume that the features are located on the obstacles detected by the laser range finder. In the office environment in

200

M.฀Bennewitz฀et฀al.

which฀we฀performed฀our฀experiments,฀this฀assumption฀leads฀to฀quite฀robust estimates฀even฀if฀there฀certainly฀exist฀features฀that฀are฀not฀correctly฀mapped. In฀each฀2D฀grid฀cell,฀we฀store฀the฀set฀of฀features฀that฀are฀supposed฀to฀be฀at that฀2D฀grid฀position.฀Currently,฀we฀use฀a฀grid฀resolution฀of฀10฀by฀10cm.฀In the฀first฀stage฀of฀mapping,฀we฀store฀all฀observed฀features. After฀the฀robot฀moved฀through฀the฀environment,฀the฀number฀of฀observed SIFT฀features฀is฀extremely฀high.฀Typically,฀we฀have฀150-500฀features฀extracted per฀image฀with฀a฀resolution฀of฀320฀by฀240฀pixels.฀This฀results฀in฀around฀600,000 observed฀features฀after฀the฀robot฀traveled฀for฀212m฀in฀a฀typical฀office฀environment.฀After฀map฀acquisition,฀we฀down-sample฀a฀reduced฀set฀of฀features฀that฀is used฀for฀localization.฀For฀each฀grid฀cell,฀we฀randomly฀draw฀features.฀A฀drawn feature฀is฀rejected฀if฀there฀is฀already฀a฀similar฀feature฀within฀the฀cell.฀We฀determine฀similar฀features฀by฀comparing฀their฀PCA-SIFT฀vectors฀(see฀below). We฀sample฀a฀maximum฀of฀20฀features฀for฀each฀grid฀cell.฀Using฀the฀sampling process,฀features฀that฀were฀observed฀more฀often฀have฀a฀higher฀chance฀to฀be selected฀and฀features฀that฀were฀detected฀only฀once฀(due฀to฀failure฀observations or฀noise)฀are฀eliminated.฀The฀goal฀of฀this฀sampling฀process฀is฀to฀reduce฀computational฀resources฀and฀at฀the฀same฀time฀obtain฀a฀representative฀subset฀of features.฀To฀choose฀good฀representatives฀for฀the฀features,฀a฀clustering฀based on฀the฀descriptor฀vectors฀can฀be฀carried฀out. The฀left฀image฀of฀Figure฀3฀shows฀a฀2D฀grid฀map฀of฀SIFT฀features฀of฀an office฀environment฀that฀was฀acquired฀by฀the฀described฀method.฀The฀final฀map contains฀approximately฀100,000฀features.฀Note฀that฀also฀a฀stereo฀camera฀system,฀which฀was฀not฀available฀in฀our฀case,฀would฀be฀an฀appropriate฀solution for฀map฀building.฀The฀presented฀map฀acquisition฀approach฀is฀not฀restricted to฀robots฀equipped฀with฀a฀laser฀range฀finder.

5 Observation Model for SIFT Features In the previous section, we described how to built a map of SIFT features using a robot equipped with a camera and a proximity sensor. In this section, we describe how a robot without a proximity sensor can use this environmental model for localization with a single perspective camera. Sensor observations are used to compute the weight of each particle by estimating the likelihood of the observation given the pose of the particle in the map. Thus, we have to specify how to compute p(zt | xt ). In our case, an observation zt consists of the SIFT features in the current image: zt = {o1 , . . . , oM } where M is the number of features in the current image. To determine the likelihood of an observation given a pose in the map, we compare the observed features with the features in the map by computing the Euclidean distance of their PCA-SIFT descriptor vectors. In order to avoid comparing the features in the current image to the whole set of features contained in the map, we determine the potentially visible

Metric฀Localization฀with฀SIFT฀Features฀Using฀a฀Single฀Camera

201

features.฀This฀helps฀to฀cope฀with฀an฀environment฀that฀contains฀similar฀landmarks฀at฀different฀locations฀(e.g.฀several฀similar฀rooms).฀In฀case฀one฀matches the฀current฀observation฀against฀the฀whole฀set฀of฀features,฀this฀leads฀to฀serious errors฀in฀the฀data฀association. To฀compute฀the฀relevant฀features,฀we฀group฀close-by฀particles฀to฀a฀cluster. We฀determine฀for฀each฀particle฀cluster฀the฀set฀of฀features฀that฀are฀potentially visible฀from฀these฀locations.฀This฀is฀done฀using฀ray-casting฀on฀the฀feature฀grid map.฀To฀speed-up฀the฀process฀of฀finding฀relevant฀features,฀one฀could฀also฀precompute฀for฀each฀grid฀cell฀the฀set฀of฀features฀that฀are฀visible.฀However,฀in฀our experiments,฀computing฀the฀similarity฀of฀the฀feature฀vectors฀took฀substantially longer฀than฀the฀ray-casting฀operations.฀Typically,฀we฀have฀150-500฀features฀per image. In฀order฀to฀compare฀two฀SIFT฀vectors,฀we฀use฀a฀distance฀function฀based on฀the฀Euclidian฀distance.฀The฀likelihood฀that฀the฀two฀PCA-SIFT฀vectors฀f and f belong฀to฀the฀same฀feature฀is฀computed฀as ฀ f฀− f p(f฀=฀f )฀=฀exp − 2 · σ12

,

(3)

where σ1 is the variance of the Gaussian. In general, one could use Eq. (3) to determine the most likely correspondence between an observed feature and the map features. However, since it is possible that different landmarks exist that have a similar descriptor vector, we do not determine a unique corresponding map feature for each observed feature. In order to avoid misassignments, we instead consider all pairs of observed features and relevant map features. This set of pairs of features is denoted as C. For each pair of features in C we use Eq. (3) to compute the likelihood that the corresponding PCA-SIFT vectors belong to the same feature. (i) This information is than used to compute the likelihood p(zt | xt ) of an (i) observation given the pose xt of particle i, which is required for MCL. Since a single perspective camera does not provide depth information, we can use only the angular information to compute this likelihood. We therefore consider the difference between the horizontal angles of the currently observed (i) features in the image and the features in the map to compute p(zt | xt ). More specifically, we compute the distribution over the angular displacement of a particle given the observation and the map. For each particle, we compute a histogram over the angular differences between the observed features and the map features. The x-values in that histogram represent the angular displacement and the y-values its likelihood. The histogram is computed using the pairs of features in C evaluated using Eq. (3). In particular, we compute for each pair (o, l) ∈ C the difference between the horizontal angle at which the feature was observed and the angle at which the feature should be located according to the map and the particle pose. We add the likelihood that these features are equal, which is given by Eq. (3), to

202

M.฀Bennewitz฀et฀al.

the฀corresponding฀bin฀of฀the฀histogram.฀As฀a฀result,฀we฀obtain฀a฀distribution about฀the฀angular฀error฀of฀the฀particle. In฀mathematical฀terms,฀the฀value฀h(b)฀of฀a฀bin฀b฀(representing฀the฀interval of฀angular฀differences฀from฀α− (b)฀to฀α+ (b))฀in฀the฀histogram฀is฀given฀by p(fo = fl ),

h(b)฀=฀β฀+ (o,l)∈C

(4)

α− (b)≤α(o)−α(l)