239 60 4MB
English Pages 209 [208] Year 2006
Springer Tracts in Advanced Robotics Volume 22 Editors: Bruno Siciliano · Oussama Khatib · Frans Groen
Springer Tracts in Advanced Robotics Edited by B. Siciliano, O. Khatib, and F. Groen
Vol. 21: Ang Jr., H.; Khatib, O. (Eds.) Experimental Robotics IX 618 p. 2006 [3-540-28816-3] Vol. 20: Xu, Y.; Ou, Y. Control of Single Wheel Robots 188 p. 2005 [3-540-28184-3] Vol. 19: Lefebvre, T.; Bruyninckx, H.; De Schutter, J. Nonlinear Kalman Filtering for Force-Controlled Robot Tasks 280 p. 2005 [3-540-28023-5] Vol. 18: Barbagli, F.; Prattichizzo, D.; Salisbury, K. (Eds.) Multi-point Interaction with Real and Virtual Objects 281 p. 2005 [3-540-26036-6]
Vol. 10: Siciliano, B.; De Luca, A.; Melchiorri, C.; Casalino, G. (Eds.) Advances in Control of Articulated and Mobile Robots 259 p. 2004 [3-540-20783-X] Vol. 9: Yamane, K. Simulating and Generating Motions of Human Figures 176 p. 2004 [3-540-20317-6] Vol. 8: Baeten, J.; De Schutter, J. Integrated Visual Servoing and Force Control 198 p. 2004 [3-540-40475-9] Vol. 7: Boissonnat, J.-D.; Burdick, J.; Goldberg, K.; Hutchinson, S. (Eds.) Algorithmic Foundations of Robotics V 577 p. 2004 [3-540-40476-7]
Vol. 17: Erdmann, M.; Hsu, D.; Overmars, M.; van der Stappen, F.A (Eds.) Algorithmic Foundations of Robotics VI 472 p. 2005 [3-540-25728-4]
Vol. 6: Jarvis, R.A.; Zelinsky, A. (Eds.) Robotics Research { The Tenth International Symposium 580 p. 2003 [3-540-00550-1]
Vol. 16: Cuesta, F.; Ollero, A. Intelligent Mobile Robot Navigation 224 p. 2005 [3-540-23956-1]
Vol. 5: Siciliano, B.; Dario, P. (Eds.) Experimental Robotics VIII 685 p. 2003 [3-540-00305-3]
Vol. 15: Dario, P.; Chatila R. (Eds.) Robotics Research { The Eleventh International Symposium 595 p. 2005 [3-540-23214-1]
Vol. 4: Bicchi, A.; Christensen, H.I.; Prattichizzo, D. (Eds.) Control Problems in Robotics 296 p. 2003 [3-540-00251-0]
Vol. 14: Prassler, E.; Lawitzky, G.; Stopp, A.; Grunwald, G.; Hagele, M.; Dillmann, R.; Iossiˇdis. I. (Eds.) Advances in Human-Robot Interaction 414 p. 2005 [3-540-23211-7]
Vol. 3: Natale, C. Interaction Control of Robot Manipulators { Six-degrees-of-freedom Tasks 120 p. 2003 [3-540-00159-X]
Vol. 13: Chung, W. Nonholonomic Manipulators 115 p. 2004 [3-540-22108-5] Vol. 12: Iagnemma K.; Dubowsky, S. Mobile Robots in Rough Terrain { Estimation, Motion Planning, and Control with Application to Planetary Rovers 123 p. 2004 [3-540-21968-4] Vol. 11: Kim, J.-H.; Kim, D.-H.; Kim, Y.-J.; Seow, K.-T. Soccer Robotics 353 p. 2004 [3-540-21859-9]
Vol. 2: Antonelli, G. Underwater Robots { Motion and Force Control of Vehicle-Manipulator Systems 209 p. 2003 [3-540-00054-2] Vol. 1: Caccavale, F.; Villani, L. (Eds.) Fault Diagnosis and Fault Tolerance for Mechatronic Systems { Recent Advances 191 p. 2002 [3-540-44159-X]
Henrik I. Christensen (Ed.)
European Robotics Symposium 2006 With 89 Figures
Professor Bruno Siciliano, Dipartimento di Informatica e Sistemistica, Universit`a degli Studi di Napoli Federico II, Via Claudio 21, 80125 Napoli, Italy, email: [email protected] Professor Oussama Khatib, Robotics Laboratory, Department of Computer Science, Stanford University, Stanford, CA 94305-9010, USA, email: [email protected] Professor Frans Groen, Department of Computer Science, Universiteit van Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands, email: [email protected]
Editor Professor Dr. Henrik I. Christensen Center for Autonomous Systems CAS/CVAP/NADA Kungliga Tekniska Hoegskolan (KTH) 10044 Stockholm Sweden
ISSN print edition: 1610-7438 ISSN electronic edition: 1610-742X ISBN-10 3-540-32688-X Springer Berlin Heidelberg New York ISBN-13 978-3-540-32688-5 Springer Berlin Heidelberg New York Library of Congress Control Number: 2006921135 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in other ways, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Digital data supplied by editor. Data-conversion and production: PTP-Berlin Protago-TEX-Production GmbH, Germany Cover-Design: design & production GmbH, Heidelberg Printed on acid-free paper 89/3141/Yu - 5 4 3 2 1 0
Editorial Advisory Board EUROPE Herman Bruyninckx, KU Leuven, Belgium Raja Chatila, LAAS, France Henrik Christensen, KTH, Sweden Paolo Dario, Scuola Superiore Sant’Anna Pisa, Italy R¨udiger Dillmann, Universit¨at Karlsruhe, Germany AMERICA Ken Goldberg, UC Berkeley, USA John Hollerbach, University of Utah, USA Lydia Kavraki, Rice University, USA Tim Salcudean, University of British Columbia, Canada Sebastian Thrun, Stanford University, USA
EUR ON
ASIA/OCEANIA Peter Corke, CSIRO, Australia Makoto Kaneko, Hiroshima University, Japan Sukhan Lee, Sungkyunkwan University, Korea Yangsheng Xu, Chinese University of Hong Kong, PRC Shin’ichi Yuta, Tsukuba University, Japan
European
***
***
STAR (Springer Tracts in Advanced Robotics) has been promoted under the auspices of EURON (European Robotics Research Network)
Research Network
***
***
ROBOTICS
An Ontology of Robotics Science John Hallam1 and Herman Bruyninckx2 1
2
Mærsk Institute, University of Southern Denmark, Campusvej 55, DK–5230 Odense M, Denmark [email protected] Dept Mechanical Engineering, Katholieke Universiteit Leuven, Celestijnenlaan 300B, B-3001 Leuven, Belgium. [email protected]
Summary. Thispaperdescribesground-breakingworkonthecreationofanontology forthedomainof roboticsasascience.Anontologyisacollectionofterms, concepts and their inter-relationships, represented in a machine-usable form. An ontologyinaparticulardomainisuseful if thestructureitaddstothedomainis simpleenoughtobeunderstoodquicklyandintuitively,andrichenoughtoincrease insightintothewholedomaintoalevelwherethisincreasedinsightcanleadto innovationandincreasedefficiencyinscientificandpracticaldevelopments. Thispaperpresentsanontologyfor thescienceofrobotics,andnotfor robots asobjects:thelatterontologydescribesthephysicalandtechnicalsemanticsand properties of individual robots and robot components, while the ontology of the scienceofroboticsencodesthesemanticsofthemeta-levelconceptsanddomainsof robotics.Forexample, surgicalrobotics andindustrialautomationaretwoconcepts intheontologyofthescienceofrobotics,whilethesemanticsofrobotkinematics anddynamics,orofaparticularrobotcontrolalgorithmbelongtotheontologyof robotsasobjects. Thestructureinthepresentedontologyforthescienceofroboticsconsistsof two complementary sub-structures: (i) the robot agent and robot system models (i.e.,whatcomponentsarerequiredinarobotdevice,andinaroboticapplication, respectively),and(ii)theContextSpace (i.e.,ordinalandcategoricalrelationships betweenphysicalandcomputationalaspectsofrobotagentsandsystemsinwhich thesub-domainsofroboticscanbemappedout).TheimplementationoftheContext Spaceconceptusingstandardontologytoolsisexplained. Thepaperillustratesitsexpectedusefulnesswithexamplesofsub-domainsof roboticsexpressedascontextsinthecontextspace,andwithtwousecasesforthe ontology:(i)theclassificationofconferenceorjournalpapersubmissions,and(ii) theguidanceofnewresearchersintothedomainofrobotics.
1Introduction 1.1 RoboticsasaScience Roboticsistoalargeextenta scienceofintegration,constructing(modelsof) robotic systems using concepts, algorithms and components borrowed from
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 1–14, 2006. © Springer-Verlag Berlin Heidelberg 2006
2
J.HallamandH.Bruyninckx
variousmorefundamentalsciences,suchasphysicsandmathematics,control theory,artificialintelligence,mechanismdesign,sensorandactuatortechnology. Thefunctionandpropertiesofaroboticsystemdependonthecomponents fromwhichitismade—thespecificsensors,actuators,algorithms,mechanism — but, beyond that, they depend on the way those components are integrated.Furthermore,afulldescriptionofaroboticsystemmustinclude informationaboutthetaskitistoperformandtheenvironmentinwhichthe taskistobeundertaken.Asweshallsee,thisviewofrobotics(i.e.,thesystem formedbytherobotagent,itstaskandtheenvironmentinwhichithasto performthattask)iscrucialindevelopingafullontologyforrobotics. 1.2 OntologiesforRobotics Anontologyisaformaldefinitionofconcepts,termsandrelationshipsappropriatetosomedomainofknowledge,generallyexpressedinaformalismthat allowsmachine-usabilityoftheencodedknowledge.(TheWikipediacontains agoodintroductiontotheconceptofanontology,[3].)Ontologiestypically servetwopurposes: 1. providingagreedandunambiguousterminologyforadomain,withthe goalofhelpinghumans express,transformandtransfertheirknowledge moreeffectivelyandaccurately. 2. allowingautomatic useofthatknowledge,forinstanceinexchangeorlinkingofdatabetweenprocesses,ortranslationoftermsbetweenlanguages. ThelatterpurposeissometimesbetterknownunderthenameSemanticWeb, [1],andtheW3Chavedefinedstandardsforcomputer-readablerepresentation ofontologies,[2]. Robotics,likeothersciences,hasneedofsuitableontologies.Forinstance, anagreedontologyforsensordatawouldmakepossiblemuchgreaterinteroperabilitybetweensensingandothermodulesinaroboticsystem.Individualmodulescouldlabelthedatatheygenerateusingtermsfromtheagreed ontology;otherprocesseswouldthenhaveavailableadefinitesemanticsfor thedata.Similarapplicationscanbeenvisagedinthedomainsofmechanical/electronicandcontrol/softwareengineeringinrobotics.Suchontological knowledgewouldalsobeofusefordesignersofroboticsystemswhenselecting andmatchingcomponents. Wedescribethiskindofontologyasanontologyofrobotsasobjects.While acompleteontologyofroboticsrequiressuchknowledge,thesourceofthat knowledgeislargelyinthesciencesandtechnologiesonwhichroboticsbuilds, viz. physics, control engineering, and so on. Such an ontology may be extremelyvaluable,butisnotthefocusorpurposeofthispaper.However,the ontologypresentedinthispapercouldbeconnectedtotheseobjectontologies withoutproblem.
An Ontology of Robotics Science
3
Robotics, in its guise as a science of integration, implies a second kind of ontology. Consider a typical robotics conference, at which one may find sessions or talks on “field robotics” or (perhaps) “factory automation”. Roboticists use such labels as a taxonomy of their field, and have a more-or-less clear, but informal, understanding of the significant problems, opportunities and special requirements of each sub-field: of the context for the particular sub-domain. So, in the context of papers presented at conferences, even the short labels “field robotics” or “factory automation” are sufficient to help reviewers and readers focus their attention to the contributions of the paper in the topics that are supposed to be difficult and relevant in the state of the art in those domains. It takes students and practitioners in robotics quite some time to learn the links between the short labels on the one hand, and the scientific challenges and contributions they can expect in papers labelled in that way. This paper makes a first contribution in making these links more explicit and structured, in order to not only speed up the learning phase of human practitioners but also to facilitate (semi) automatic support for selection of reviewers, categorization of papers or research proposals, web or library searches, etc. The starting point of the presented research is thus that there is knowledge about the structure of the robotics science itself — what sub-domains are there; how are they inter-related; how can we express the relationships between sub-domains in a principled way? Once again, the idea of an ontology can help. We call an ontology dealing with this knowledge an ontology of robotics science. The existence and formalisation of that ontology is the key insight and focus of the present paper. 1.3 Organisation of the Paper In the following sections we motivate, illustrate and explore the idea of an ontology of robotics science. Section 2 considers robotic systems in general: comprising the agent, the task and the environment. A model of the components expected in a robotic system is presented. Section 3 then presents the key concepts in the ontology of robotics science, particularly the notion of scaling laws, which structure the domain of robotics. Section 4 investigates how the proposed ontology might be applied, and demonstrates its potential usefulness. Section 5 illustrates how the key concept of Context Space can be implemented using standard ontology tools. The final section summarises the paper’s conclusions and indicates planned future work.
2 A Model of Robotics Representing human knowledge about technological domains, such as robotics, is equivalent to defining sets of models with which humans can analyse the real world, make calculations about it, make predictions, etc. In other words,
4
J.HallamandH.Bruyninckx
amodelisamathematicalsimplificationoftherealworld,representingonly thoseaspectsoftherealworldthatarerelevanttoaparticularhumanpurpose. Thepurposeofthispaper is tobringstructure to human knowledge of robotics,butnottogivedetailedtechnologicaldescriptionsoftheworkings of robotics devices or algorithms. We believe that the graphical models in figure1and2provideuswiththeappropriatebalancebetweendetailand generalityforthepurposeofthispaper. Figure 1 gives a “robot-centric” model of robotics: it depicts a robot “agent”asanintegrationofmechanicalhardware,sensinghardwareandactuationhardware,withplanningsoftware,controlsoftware,sensorprocessing softwareandactuatordrivesoftware.Asuccessfulrobotinterconnectsknowledgeandhardwareat“appropriate”scalesandwith“appropriate”interfaces. Latersectionsofthepaperwillindicatemoreclearlywhat“appropriate”reallymeans,andmakeclearthattermssuchas“fieldrobotics”or“surgical robotics”indicateparticularsub-manifoldsofthe“robot-centric”continuum (and the “application-centric” continuum of figure 2, see below) that have provensufficientlysuccessfulorinterestingtoattracttheintenseresearcheffortsfromlargecommunities.
Fig. 1. The “robot-centric,” or agent-model, of robotics.
Fig. 2. Robotic systems as the manifold of robot agents, tasks and environments.
Figure 2 gives an “application-centric” or “systems-centric” view of robotics complementary to the previous figure: it makes clear that success of a robot agent is not a property of the agent in itself, but is determined also by (i) the task the robot is expected to perform, and (ii) the environment in which that task is to be performed. Each successful robotic systems domain covers a certain subspace of the triangular continuum depicted in figure 2. In some
An Ontology of Robotics Science
5
cases, it is the robot device that contributes most to the success of the system, and its subspace is situated close to the agent vertex of the triangle. In other systems, it is the environment (“hardware”) that is most important, for example in sub-sea or airborne robots where water or air provide the necessary support to the robot. In yet other systems, the task is the determining factor.
3 An Ontology of Robotics Science The key to defining relationships and structure between the various subfields of robotics is an understanding of how the subfields come about. Recall that a particular subfield expresses similarities between a class of robotic systems, and implies relevant background knowledge, significant problems, useful properties and so on. What accounts for much of this variation is scaling laws. Consider the difference in the flight of an air-plane and a bumblebee. Each exploits different aerodynamic properties accessible to it because of the physical scale (tens of metres vs. millimetres) in which it operates: air behaves quite differently in different physical circumstances. This difference is an expression of a physical scaling phenomenon wherein physical system properties depend on the characteristic length of the system. Analogues of this principle can be constructed in other physical scales, such as time or mass, and in more abstract scales such as computational complexity. It is the placement with respect to a set of such scales that determines the properties, problems and possibilities of a given subclass of robotic system. 3.1 Context Space The Context Space is a conceptual space spanned by the various scaling dimensions available to a robotic system (or to its designer). Since, as we have argued in section 2, such a system comprises a device or agent, an environment, and a task, each of these essential components contributes scale dimensions and constraints to the system. In this section we examine some of these dimensions. The example dimensions we have chosen are intended to be illustrative rather than complete — space precludes an exhaustive listing of all possibilities, and such is not necessary to make our point. Each essential component of a complete robotic system contributes scaling dimensions. Thus device scales are the most straightforward to specify. The characteristic length and time constant of the robotic system, its mass, the number of components it comprises and the extent to which they interact with each other are important characteristics, as are the number of controllable degrees of freedom and the total degrees of freedom of the system.
6
J.HallamandH.Bruyninckx
environmentscales define the type and properties of the environment in whichthetaskiscarriedout.Therearetwoclassesofenvironment:bulk andinterface.Examplesoftheformerarevacuum(asinspacerobotics) orliquid(asinunderwaterrobotics).Bulkenvironmentscanbeordered bytheirmechanicalimpedance.Examplesofthelatter,interface,typeincludesolid-gas(e.g.atabletop,ortheoutdoorlandsurface)orliquid-gas (e.g.thesurfaceofthesea).Salientpropertieshereincludethesizeand spatialfrequencyofsurfacevariation—howsmooththeinterfaceis. The size of the environment (its characteristic length); the speed with which events occur (its characteristic time constant); and whether the environmentisabioticorbiotic(orontheinterfaceofthetwo)constitute otherexamplesofrelevantdimensionsofvariability. taskscales includethespatiotemporalpropertiesofthetask,forinstanceits characteristicprecisionandthelifetimeoftheroboticsystemwhenworkingonthetask,andtheriskcost,i.e.thecostassociatedwithfailureor errorinthetask.Thelattercaninformallybedefinedasthecostoffixing adisastercausedbysystemfailure.Forlaboratoryroboticsitis,relatively speaking,verysmall;forsurgicalorspaceroboticsitisrelativelylarge. Many of these scales have an ordinal structure, that is, the axis representingthedimensionhasa‘smallvalue’endanda‘largevalue’end,with intermediatevaluesthatmaybearrangedinorderofsize.Asweshallsee belowinsection3.3,theseorderingrelationshipsalongindividualaxescontributetotherichstructureofthecontextspace. 3.2 Contexts Contexts are the basic mechanism for using the ontology. They provide a meanstodistinguishgroupsofroboticsystems—andsodifferentsubfields of robotics as a whole — whose properties are restricted to sub-ranges of certain scale axes. For example, nanorobots have a device length scale in therange10−9 –10−7 metres.Moregenerally,wecanspecifysetsofsystems whose properties lie within specific ranges on several axes simultaneously; mostgenerally,acontextmaybea unionofsuchconvexregionsinthespace. Conceptually,acontextisasub-classoftheclassofallpossiblerobotic systems.Suchclasseswillusually(butnotnecessarily)haveintuitivenames, for example “field robotics,” “medical robots,” “swarm robots,” and each contextimpliesasetofrelevanttechnology,backgroundknowledge,scientific problemsandsolutionsfollowingfromtheparticularchoicesofscalewhich definethecontext. 3.3 RelationshipsBetweenContexts Contexts,asdefinedabove,canbethoughtofassub-classesoftheclassof roboticsystems;sub-classesgeneratedbyrestrictingthevaluesofcertainscalingproperties(e.g.devicelength,riskcost,numberofcomponents,andso
An Ontology of Robotics Science
7
on) to particular values or sets of values from their respective scales (e.g. millimetre-metre, EUR 100–10000, 5–10, etc.). As such, contexts can take part in the standard class relationships of set theory: they can be super- or sub-classes of other classes; intersections, unions and complements can be defined; and membership can be checked. However, more interesting are the relationships implied by the ordinal structure of the scale axes. For instance, robot devices whose physical size (characteristic length) is in the centimetre-decimetre range are neighbours of those in the decimetre-metre range: they will have many properties in common. On the other hand, nanometre scale robotic devices will be quite different from centimetre ones, in some respects, since the dominant physical effects differ between those characteristic sizes. Neighbouring contexts are often contexts between which there exist loose analogies or trade-offs. For instance, one could describe a 100 mobile robot collective as a group of 100 6-dof systems or a single 600-dof system: exchange physical dimensions for interconnected system piece complexity. Neighbourhood implies quasi-invariance between aspects of the relationships that describe the same system in different contexts. Change in one scaling dimension is compensated by a corresponding change in another dimension. Sufficiently sophisticated reasoning mechanisms could exploit neighbourhood relationships to propose novel connections between robotic systems, or identify possible similarities between hithertofore disjoint subfields of the discipline. The neighbourhood relationships provided by the context scales also force human students in robotics to think more explicitly about the really unique, respectively shared, properties of specific robotics domains.
4 Using the Ontology The proposed ontology is unusual in that it focusses on the structure of a field of science rather than on the objects with which that field is concerned. Nevertheless, such an ontology has considerable practical use. In addition, it is easier to start with the ontology of robotics as a science, because, as presented in this paper, this ontology can be explained and understood (almost) completely in a relative short amount of time. In this section we give three illustrative examples of practical usefulness of the ontology. 4.1 Structuring the Domain of Robotics Science To illustrate the ideas presented so far, consider the various sub-fields of robotics typically found in the contents listings of conference proceedings: can we define them using context space ideas and axes? Suppose we start with Field Robotics. We can say that the device scale (robot size) is in the metre range; the environment length scale is 102 to 1010
8
J.HallamandH.Bruyninckx
metres;thedevice/missionlifetimerangesbetween103 and1010 seconds;the environmentmaybevacuum,gas,liquidoraninterface. Sofarsogood;nowconsiderunderwaterrobotics.ThisisFieldRobotics in a bulk liquid environment. Space Robotics is Field Robotics in a bulk vacuumenvironment,withenvironmentlengthscalefrom105 to1010 metres, device/missionlifetime106 –1010 seconds,say(forspacecraft)andahighrisk cost. ConsiderResearchLabRobotics.Inthiscasewehavedevicelengthsof 10−3 –100 metres; environment length scale 101 metres, say; gas or smooth gas-solidinterface;lifetimeinthe102 –105 secondrange;andlowriskcost. Table-top robotics, for instance using the popular Khepera robot, is a specialisationoftheResearchlabcontext,with10−2 metredevicescale,100 metreenvironmentscale,verysmoothsolid-gasinterfaceenvironment,very low risk factor, low environmental interaction and lifetime in the 102 –103 secondrange. As a final example, consider factory automation or industrial robotics. Herewehavedevicescalesinthe10−3 –100range,environmentsinthe100 –101 metrerange,gas-solidinterface,lifetimesfrom105 –107 seconds;medium-high riskcost;moderatesystemcomplexityandlowenvironmentinteraction. Althoughtheseexamplesarestillsomewhatvague—recall,forinstance, ‘medium-highriskcost’or‘moderatesystemcomplexity’—theynevertheless illustratethatthedifferentsub-fieldsofroboticsoccupyspecifiablepartsof thecontextspaceandthatrelatedsub-fieldsoccupyneighbouringorsubset regions of the context space. Thus it is possible to define the subfields of roboticsbylabellingtheappropriateregionsofthecontextspaceandthereby inferrelationshipsbetweensubfieldsthatmightnotimmediatelybeapparent from just their names. In addition, being confronted with the different ordinalscalesthatarerelevantinaparticularroboticsdomainwillstimulate studentsandresearcherstothinkabouthowthefundamentalphysicalpropertiesofaparticularcontextcouldchangewhen“moving”toaneighbouring context,ortheycouldgetinspirationfromsuccessfulsolutionsimplemented inneighbouringdomainsthattheyhadnotthoughtofbefore. 4.2 UseCase:GuidedReadingforNewRoboticists Allteachersinroboticshaveexperiencedthefollowingevolutionintheirstudents:thefirstcoupleofweeksormonthsintheirstudyofrobotics,students arereadingtonsofpapers,workinghardtoseetheforestforthetrees,but aremostlyunabletoevaluatetheessentialcontributions(orlackthereof)of everyindividualpapertheyread.Goodteachersalsoknowthattheymust organiseregularreadingsessionswiththeirstudents,inordertoguidethis processofgettingacquaintedwithaparticularroboticsresearchdomain. We believe that a key role of the teachers in this coaching phase is to indicate to their students what are the relevant relationships between the detailsofthescienceexplainedinparticularpapers,andthegoalsofmaking
An Ontology of Robotics Science
9
a better robot system (i.e., the combination of agent, task and environment) in the particular domain of interest to the student. We believe that teachers (most often) implicitly convey to their students the kind of ontology that this paper presents, and that making these ontological relationships explicit will help the students in their learning. In addition, by using the ontology of this paper in real-world reading classes, the ontology will become more complete and refined. In addition, it could possibly become the basis for a series of textbooks that cover the whole domain of robotics in the most efficient way. 4.3 Use Case: Classification of Papers and Reviewers A second application of these ideas is to paper classification. Imagine that authors submitting papers to a robotics journal or conference (or, indeed, a funding agency) are invited to indicate approximately where, on figure 2 and figure 1, their paper content falls. Then, depending on their answer, they are invited to specify where it lies on various appropriate scale axes. For example, the author of a paper on inertial guidance for underwater robots would indicate ‘agent-centred’, ‘sensors’, and choose ‘bulk liquid environment’, ‘environment size 102 –103 metre’, ‘lifetime 103 –104 second’, and so on. Reviewers could similarly indicate their areas of expertise. Software based on the ontology could then match reviewers to papers depending on their distance in the contextual space. For most papers, this would produce the expected results: a paper on space robotics, for instance, would go to space robotics referees. But interesting desirable effects are possible. A paper on collective robotics might be offered to a reviewer interested in the control of very highly redundant systems, on the arguably sensible grounds that 100 6-dof robots have some similarity with one 600-dof one — the two subjects are neighbours in (some dimensions of) the context space.
5 Implementing the Ontology of the Science of Robotics To be useful, it must be possible to implement the ontology of the science of robotics using the standard ontology tools of the semantic web. The World Wide Web Consortium has selected OWL as its standard for the representation of ontological knowledge [4]. OWL is built on top of standard infrastructure (RDF and XML) and provides mechanisms for representing concepts (classes), instances, properties and relationships. It comes in three flavours — OWL-Lite, OWL-DL and Full OWL — which represent different compromises between representational power and computational tractability. The ontology described here is implemented using OWL-DL.
10
J.HallamandH.Bruyninckx
5.1 ImplementingtheContextSpace ThekeyconcepttoimplementistheContextSpace.Todothat,weidentifya contextwiththesetofallroboticsystemsfallingintothatcontext;i.e.‘Space Robotics,’intermsoftheontology,isdefinedastheclassofroboticsystems studiedbythatfield.Theclassofsuchsystemsisactuallydefinedusingmore fundamentalproperties,suchastheenvironment,spatialandtemporalscales inwhichthesystemsoperate,andsoon. ThebasisforimplementingtheContextSpaceisthentodefinetheclassof allroboticsystems,sub-classesofwhichrepresentindividualcontexts.Contextsaredefinedusingstandardclassoperationsofunion,intersection,complementandpropertyvaluerestriction(restrictingthelegalvaluesofaspecific instancepropertytoaknownclass). Inlinewiththediscussionofsection2,wethereforeassertthateachrobotic system (an instance of the robotic System Class) comprises a device (the agent), an environment and a task. We further define three classes, which wecall AspectClasses,consistingrespectivelyofalldevices,allenvironments andalltasks,asthevaluerangesofthethreeinstancepropertiesofasystem.Associatedwiththeinstancesofeachaspectclassarescalingproperties, forexampledevice-characteristic-lengthorenvironment-impedance,thevaluesforwhicharedrawnfromScaleClasses representingthevariousscaling axes.Noticethatdifferentaspectsmayhavepropertiesrelatingtothesame scale:device,environmentandtaskcharacteristiclengthsortimeswouldbe anobviousexample. Figure3illustratesthisgeneralscheme.Technically,eachinstanceofthe roboticSystemsclasshasthreeproperties,whosevaluesarerestrictedtolie eachinoneofthethreeaspectclasses;andeachinstanceofanaspectclass (forexample,aparticularenvironment)hasscalingpropertieswhosevalues arerestrictedtoparticularsub-classesfromgivenscaleaxes.Thisconstruction hasthecrucialadvantagethattheclassstructureisdefinedintensionally— wearenotrequiredtoenumerateallroboticsystems,allenvironments,etc. —whilealsoallowingspecificsystems,devicesandsoontoberepresentedas instances. Toillustrate the implementation strategy, consider systems usingnanoscaleroboticdevices,whichcanbecharacterisedbythefactthattheirdevicelengthscaleisinthenanometrerange.Figure4illustrateshowthisisrepresented:asub-classofthelengthscaleclassisdefinedtorepresentthenanometrerange,andasub-classofthedevicesaspectclassisdefinedsuchthatthe device-lengthpropertyisrestrictedtotakevaluesfromthenanometrerange class.Finally,asub-classofthesystemsclasswithdevicepropertyvaluesrestrictedtothejust-definednano-deviceclasscompletestherepresentationof nano-scaleroboticsystems.Noticethatwehavenotrestrictedtheenvironmentsinwhichthesesystemsmayoperatenorthetaskstowhichtheymay beapplied.Thered(dashed)componentsinthefigureillustratethenewly
An Ontology of Robotics Science
11
Fig. 3. Class hierarchy in the Robotics-as-Science Ontology: the universe is represented as the class of all robotic Systems, where each system (instance) has three Aspects — device, environment and task. Instances of the Aspect Classes have scaling properties, e.g. characteristic length, with values from the Scale Classes.
Fig. 4. Nano-Robotic Systems
Fig. 5. Medical Nano-Robotics
definedsub-classes;thestandardOWLreasonersareabletoinferthatthese areindeedsub-classesoftheirparents. Figure 5 takes the example one stage further. Consider medical nanorobotics.Theprincipaldistinctionbetweenthisandnano-systemsrobotics, forourillustrativepurposes,istheenvironmentinwhichthesystemsmust work — on a macro-molecular scale and in interaction with living matter. Thisisrepresentedbytheconstructionofasub-classofbioticnanoscaleenvironmentstowhichtheenvironmentpropertyofthesystemisrestricted.The red(dashed)componentsinthefigureillustratethesenewdefinitions.Once again,standardOWLreasonerscaninferthatthemedicalnano-systemsclass
12
J.HallamandH.Bruyninckx
isasub-classofthenano-roboticsystemsor,equivalently,thatmedicalnanoroboticsisasub-fieldofnano-scalerobotics. 5.2 ContextRelationships Theabilitytorepresentcontextsiscrucialtotheontology,butsoistheability torepresentorinferrelationships.Aswehaveseen,certainrelationshipssuch asclassinclusioncanbeinferredbythestandardOWLreasoners. Notsowithneighbourhoodrelationships,however.Theproblemisthat OWLreasonerscontainnomachineryforinferring‘distance’betweenclasses. Toillustratethis,considerthelengthscale.WecanconstructinOWLapartitionofthisscaleinto(amongstothers)millimetre,centimetre,decimetreand metrelengthswiththeappropriateorderingrelationships(e.g.,centimetres are longer than millimetres and shorter than metres). But it is not possiblewithstandardreasonerstoinferthatthemetresub-classisanindirect neighbourofthecentimetreclasstwostepsuptheordinalscale. Whatthismeansisthatneighbourhoodrelationships,whichdependon the‘distances’betweencontextpropertiesintheordinalscaleaxes,cannotbe inferredbystandardreasonerssincetheordinal‘distances’themselvescannot beinferredbythosereasoners.However,morespecialistreasonerscanbebuilt tomakesuchinferences,andonecanimaginesuchreasonersworkingonthe ontologyenumeratingneighbourhoodrelationshipsbetweentherepresented contextsandautomaticallyaddingtotheontologyexplicitrepresentationsof discoveredneighbourhoodrelationships. 5.3 Summary Theontologyofthescienceofroboticsdependsonthekeyconceptofcontexts, whichwecanstraightforwardlyandintuitivelyrepresentusingthestandard ontologylanguageOWLbydefiningacontext(asub-fieldofrobotics)tobe representedbytheclassofallsystemsbelongingtothatcontext(sub-field). Standardreasonerscandeterminesimpleclassrelationshipsbetweencontexts. Neighbourhoodrelationshipsbetweencontextscannotbeinferredusingstandard reasoners, but specialist discovery processes could be implemented to annotatetheontologywithsuchrelationships.
6 Conclusions and Future Work 6.1 Conclusions The present paper has motivated the notion of an ontology of the science of robotics as opposed to an ontology for the objects of which robotic systems are composed. An ontology of robotics science allows us to define terms and concepts such as “surgical robotics,” “field robotics,” or “nanorobotics” in an
An Ontology of Robotics Science
13
objective way in terms of more primitive technological concepts. This allows us to give formal (machine-usable) meaning to these various terms and to explore the relationships between them. These meanings and relationships provide a meta-level structure in the scientific discipline of robotics, that could be exploited in various ways. The formal setting in which such definitions are made is a Context Space established by considering the physical, environmental and task-related scaling laws and relationships that apply to robotic systems. New sub-fields of robotics, and new relationships between them and between existing subfields, may be discovered by considering the relationships between them in context space. The ontology has been implemented using the standard ontology language OWL-DL using the Proteg´e suite. The proposed ontology is believed to be simple enough to be understood by humans within the span of a couple hours, while at the same time it is rich enough to bring non-trivial and hence useful structure to scientific domain of robotics. The ontology may be used, inter alia, as a tool for classification of robotic material, the matching of reviewers to conference, journal or funding agency submissions, or during teaching robotics to new students. Although the description of the ontology in the paper is somewhat incomplete, this is natural: to our knowledge there is no state of the art at all addressing this kind of ontology for robotics and also no truly comparable work in other (technological or non-technological) domains. Hence the goal of the paper is to present in outline a novel but, we believe, valuable tool for roboticists. 6.2 Future Work Before an ontology of robotics science is complete and fully usable, an amount of future work remains — both conceptual and practical. Conceptual work includes elaborating the set of scale axes to identify a complete, minimal and consistent set; identification of the kinds of questions and use cases for which the ontology provides a useful resource; and wide consultation with the robotics community on the clarity, utility and completeness of the conceptual framework. Practical work comprises completing the implementation of the concepts and relationships of the ontology tools conforming to the standards and recommendations established by the World-Wide Web Consortium; and the formal definition and implementation of suitable query and reasoning interfaces to the implemented ontology. The ontology must also be tested more intensely on the “educational workfloor” of teaching robotics to masters and PhD level students.
14
J.HallamandH.Bruyninckx
Acknowledgement Theauthorsgratefullyacknowledgethesupportofthe EURONResearchAtelierRose:aroboticsontologyforthesemanticweb,and themanystimulatingdiscussionswithmembersofEURON’sEducationand Trainingcommittee,whichhave,toalargeextent,helpedshapetheideasof theontologypresentedinthispaper.WeiXucontributedmanyusefulideas tothedescribedimplementationoftheontology.
References 1. Grigoris,A.andVanHarmelen,F.ASemanticWebPrimer,MITPress,2004. 2. W3C. Semantic Web, http://www.w3.org/2001/sw/, accessed on-line on September30th,2005. 3. Wikipedia. Ontology (computer science), http://en.wikipedia.org/wiki/ Ontology(computerscience),accessedon-lineonSeptember30th,2005. 4. W3C. OWL Web Ontology Language Overview, http://www.w3.org/TR/ owl-features/,accessedon-lineonJanuary12th,2006.
AMulti-agentSystemArchitecturefor ModularRoboticMobilityAids GeorgiosLidorisandMartinBuss InstituteofAutomaticControlEngineering TechnischeUniversit¨ atM¨ unchen D-80290Munich,Germany [email protected]
Abstract In this paper a multi-agent system architecture for a modular mobility enhancement system is presented. The system consists of one or multiple mobileroboticplatformsandasetofuserdefinedapplicationmodules,suchas chairs,tablesmultifunctionalchairsetc.Allthesemodulescanbeinexpensive everydaylifeitems,thatbecomefunctionalwhenamobileplatformisproperly attachedtothem.Thiswaythesystemcanactasawheelchair,asacarrier ofobjects,orevenasawalker.∗
1 Introduction With the increase in the number of senior citizens, there is a growing demand for mobility aids. Older people have problems moving themselves, handling or moving objects and at the same time operating complicated devices. Therefore mobility aids with intelligent capabilities become a necessity. Several robotic, intelligent wheelchairs have been proposed demonstrating navigation, manipulation and transportation capabilities. Examples include Rolland [1] , TAO [2], OMNI [3], NavChair [4] and many other. However existing systems are still difficult to operate, to adjust to the needs of the individual user and are very costly. At the same time research on multi-agent systems has provided both principles for the construction of complex systems involving multiple agents and mechanisms for the coordination of independent agent behaviours. In this paper we intend to apply well-analysed concepts from multi-agent and multi-robot systems research, to create a system architecture for a modular mobility enhancement aid. This is a new and very demanding application ∗
This work was partially supported by the Specific Targeted Research Project MOVEMENT co-funded by INFSO DG of the European Commission (contract number 511670)
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 15–26, 2006. © Springer-Verlag Berlin Heidelberg 2006
16
G.LidorisandM.Buss
domainformulti-agentsystemsandbesidespresentingtheproposedapproach weintendtoexposeitsspecificrequirements. Thesystemcanconsistofoneormultiplemobileroboticplatformsanda setofuserdefinedapplicationmodules,suchaschairs,tables,multifunctional chairs etc. All these modules can be inexpensive everyday life items, that become functional when a robotic mobile platform is properly attached to them.Thiswaythesystemcanactasawheelchair,asacarrierofobjects, orevenasawalker.Alsomultiplelevelsofautonomyaresupportedbythe systemarchitecture,frommanualoperationtoautonomousfunctionality.Such amodularsystem,addressesmanydiverseneedsandcanbeusedbypeople withdifferentmobilityproblems.Itsscalablemulti-agentarchitectureallows ittobedeployedforthehomecareofanindividual,aswellasinlargehealth careinstitutions.
2 Related Work Several approaches have been proposed in the multi-agent community to coordinate agent actions. Distributed behaviour-based architectures have been developed, like ALLIANCE [5] and [6], which were capable of dealing with complex tasks but lacked on scalability and robustness. Other approaches address multi-robot cooperation through negotiation, such as M+[7], Traderbots [8], MURDOCH [9] and numerous others. Market-based architectures are winning popularity and already a large amount of literature relevant to the field exists. An interesting survey of market based multi-robot coordination can be found in [10]. Our approach uses a negotiation mechanism for task allocation, as well as a behavioural-based layer for the execution of complex tasks. This way the robustness of market-based mechanisms is combined with the ability of behaviour-based approaches to deal with tasks of increased complexity. A contribution to the challenges of multi-agent systems research as described in [10] is made, by introducing a new real-world application domain where long-term, reliable and robust operation of human, robot teams can be demonstrated.
3 Concept of System Architecture As we mentioned above our application domain introduces several special requirements. We are dealing with a safety critical domain, where we have to ensure the continuous functionality of our system. Direct task allocation and supervision of system’s decisions by the user must be supported. Tasks arrive in an unpredicted manner, from several user interfaces. Finally new modules can be inserted to and removed from the system at any time.
AMulti-agentSystemArchitectureforModularRoboticMobilityAids
17
The system architecture is responsible for the coordination of modules consistingthesystem.Itshallenabletheuserdependentusageofthemodulesandhasthetaskofcontrollingtheinterconnectionofmodulesandtheir behaviour. More specifically we present a Multi-Agent System architecture whichisresponsibleforreceivinghigh-levelcommandsfromtheuserinterface,decomposesthemanddelegatessub-taskstotheappropriateapplication modules.Thesemodulesaredynamicallyselectedaccordingtoutilityfunctions,whicharecreatedusingperformanceindicatorsandtaskrequirements. Intheformalmulti-robottaskallocation(MRTA)frameworkintroducedin [11],oursystemaddresses”Single-taskrobots,singlerobottasks,instantaneous assignment” (ST-SR-IA). Furthermore our system continuously runs backgroundtaskssuchasmanagingmodulepowerlevels,performingconflict resolutionbetweensystemmodulesduringtaskexecutionsandfinallyenabling thesystem’sautonomousfunctionalitiesbymakinguseofabehaviour-based taskmodel. Componentselectionfortaskallocationisperformedbyusingasimple distibutedfirst-priceoneroundauctionprotocol,similartotheContract-Net protocol[12].Theprogressoftaskexecutionismonitoredbythewinnerand ifnoprogressismade,thenthetaskisre-assignedthroughanewnegotiation. Thisisaveryimportantfault-preventionfunctionalityofthesystem. 3.1 AgentModel Takingintoaccountthedistributednatureandcomplexityoftheapplication domain,wedecidedtosplitthecontrolofoursystemamongstanumberofdeliberativeagents.Eachagentiscapableofreasoning,accordingtotheclassical hybridarchitectureparadigm[13].Atthesametime,bydistributingcontrol toanumberofagents,weachieverobustnessandscalabilitygains.Therefore oursystemcanbeconsideredasanextensionofthelayeredapproach[14], [15],[16]toamulti-agentdomain,exploitingadvantagesofbothcentralised anddistributedapproaches. ThegeneralinternalstructureofasystemagentisillustratedinFigure 1.Itconsistsofthreeparts.Thefirstoneisresponsibleforcommunicating withotheragentsandhandlingcommunicationrequests.Thesecondpartis ascheduler,whichreceivestaskrequests,createsaqueueandschedulestheir executionfromthethirdlayer.Thetaskexecutionlayerisresponsibleforthe agent’s actions, which according to each agent’s role can vary from simple communicationactstomovements,forourphysicalagents. By using such an internal organization, we are able to deal with coordinationandcooperationissuesthatarisebetweenouragents.Oursystem becomescapableofhavingphysicalagentsthatrespondquicklytodynamic eventse.g.collisions,whileatthesametimeproducingandexecutingcomplexstrategiesforachievingmultiplegoals.Thisisduetothefactthateach systemcomponentisabletoscheduleitsownactionsaccordingtoitsinternal capabilitiesandstate.
18
G.LidorisandM.Buss communication
scheduling
taskexecution
Fig. 1. Internal structure of a system agent
3.2 Overview of the Proposed Approach The proposed system architecture consists of several software agents that are responsible for performing task decomposition, task allocation through negotiations as well as system monitoring and module management. It also contains physical agents which represent the modules that are available to the system at each time point. No limitations exist to the number of possible modules that can be inserted to the system, enabling it to be used in small setups with only one mobile base and a few application modules, as well as in large care institutions with an arbitrary number of mobile bases and application modules. In Figure 2 we present an overview of our architectural concept, for a setup where the available modules that can be docked with the platforms are chairs, tables, multifunctional chairs and walker modules. When the user issues a command via the input hardware of the User Interface, this command is integrated and preprocessed and then sent to a link layer. This link layer between the Multi-Agent System and the User Interface is responsible for transforming the issued command, into an appropriate form in order to be transmitted to the control architecture. Important information being encapsuled are: • • • •
Level of autonomy in which the system is currently operating Type of operation to be performed by the system Which application module issues the command Command specific parameters e.g. goal position coordinates, in the autonomous functionality or speeds to be executed by the mobile platform, read by the input device for the manual functionality
This high-level command is transmitted from the user interface link layer, to the system. In order to make clear how a command is handled by the system, we have to describe each agent’s role in more detail.
AMulti-agentSystemArchitectureforModularRoboticMobilityAids
19
UserInterface LinkLayer High-level commands/notifications
System State Agent
Mediator Agent
Task Allocation Agent
Platforms
Chairs
Background Tasks Agent
Tables
MF-Chairs
Lifter/ Walkers Component Selection
Sensors
Hardware/ Actuators
Fig. 2. Overview of the proposed approach
Mediator Agent This multi-purpose agent is responsible for accepting commands from the user interface’s Link Layer, initiating task execution procedures and querying for specific module information. Finally, after the task is executed, it informs the interface for changes. It also receives safety notifications from the Background Tasks Agent and initiates actions accordingly, giving them priority. System State Agent It is responsible for providing management and naming services for the rest of the system and monitoring the state of the system’s application modules. It informs, upon request, the rest of the system about the available modules and their state. The possible states of each application module depend on its functionality. For each robotic platform its state is a tuple < d, t > showing if its docked with an application module and which task it is currently undertaking. For an application module its state shows if it is docked with a platform. Task Allocation Agent If it is assigned a task by the Mediator Agent, then decomposes it, so that the necessary components are chosen and initiates a negotiation procedure similar to Contract Net Protocol, in order to find the best module to perform the task. After it receives bids from the Component Agents it chooses a winner (the highest bidder) and assigns it the task. When it receives information from the Component Agent that the task was performed
20
G.LidorisandM.Buss
itinformstheMediator.Inthecaseofpassivemodulesliketablesandchairs wedon’thaveataskassignmentbutthewinneristhemodulethatismore appropriateforagiventask.Forexampleifwewantagivenplatformtogo andbringatable,thenthenegotiationfindsthemostsuitableone(e.g.the onethatiscloser)andtheMediatorisinformed,sothatitcanassignthetask totheplatform. BackgroundTasksAgent Monitorssystemcomponentsandissuescommands totheMediatorAgent,inordertodealwithsafetyissues.Forexample,we wanttoavoidtheriskofpowerfailureofourmobileroboticplatformsduring taskexecution.ThereforetheBackgroundTasksAgentconstantlychecksthe powerlevelofallmobileplatforms.Ifaplatform’spowerlevelisfoundlessthan anacceptableminimum,thenaChargecommandisissuedtotheMediator and the platform in question charges its batteries. It also can be used for settingoutalertsforpre-programmedsystemmaintenanceoperations,module failures etc. Generally the commands issued by this agent to the Mediator Agenthavehigher-priorityasnormalcommands. Component Agents Each of these agents is created when a new hardware moduleisinsertedtothesystemanditrepresentsthismoduletotheMAS. Theyareinvolvedinthecomponentselectionnegotiationprocedure.Ifthey areassignedatasktheyinitiateapre-programmedsetofbehavioursinorder toaccomplishit.Thetaskdescriptionmustcontaininformationaboutitstype andalsootherspecificinformationregardingpossiblepartsofthetask.These agents have control over the hardware they embody and pass appropriate commands to it, by sequencing high level movement descriptions into low levelcommands.Whenthelevelofautonomyissettomanual,theyjustpass lowlevelcommandsthatarereceivedfromtheUItothehardware. Passivemodulessuchaschairs,tables,etc,donotperformtasksthemselves butneedtobedockedwithaplatform.Theytakeparttothenegotiations initiatedbytheTaskAllocationAgentwhenweneedtofindthemostsuitable module,inordertominimizeresourceusage.Forexamplelet’ssupposethat theuserwantsagivenplatformtobringatable.TableComponentAgents computetheirutilitiesbasedontheirdistancefromtheplatform.Thisisuseful especiallyforhomeusescenarios,whereonlyoneplatformexistsbutmany tablesandchairs,whicharereallyinexpensive. PlatformComponentAgentsrelyonabehaviourbasedsystem.Behaviours areseenascondition/actionpairswhereconditionsarelogicalexpressions overtheinputsandactionsarethemselvesbehaviours.Inbothcases,abehaviourisadirectedacyclicgraphofarbitrarydepth.Theleavesofthegraphs arethebehaviourtype’srespectiveoutputs. Thisnotionofbehaviourisalsoconsistentwiththatlaidoutin[17]and [18].Behavioursarenestedatdifferentlevels:selectionamonglower-levelbehaviourscanbeconsideredahigher-levelbehaviour,withtheoverallagent behaviourconsideredasingle”do-the-task”behaviour.Theseresultinghighleveltasksarealsointerconnectedwitheachotherandalltogetherconsist
AMulti-agentSystemArchitectureforModularRoboticMobilityAids
21
ourarchitecture’staskmodel.Figure3illustratestheinterconnectionbetween thesetasksandhowtheyaredecomposedtosimplerbehaviours. robot+ module totarget
robotto target
dock withmodule
...
no docked withmodule
no
yes
path planning no user confirmation
...
platform localization
target reached yes stop
path execution
no
obstacle detection
yes
obstacle avoidance
no target reached
path tracking
yes stop
Fig. 3. Task model
When a robotic platform is assigned a task, it becomes unavailable for further tasks. This means that it does not take part in negotiations. Instead it monitors task execution and if no progress is made, it contacts the Task Allocation Agent to re-assign the task. Monitoring of task execution can simply be made by recalculating the utility of the task, in a given time space. If the utility remains unchanged or decreases below a given threshold, task execution must be terminated and the task has to be re-allocated. In the following section we will give a definition of task utility.
4 Utility for Task Assignment Component selection based on performance measures and task requirements is of great importance for our system. Imagine use cases where multiple modules of the same functionality exist. For example a user may have more than one table modules, which are really inexpensive, in his house. The system must then be in position to choose the best module for a given task e.g. in a fetch and carry scenario, where the mobile platform has to dock with a table, reach a goal location and return to the user, the best module in order to minimize resource usage would be the one closer to the goal location. We are dealing here with an instance of the Optimal Assignment Problem (OAP) [19] that was
22
G.LidorisandM.Buss
originallystudiedingametheory.Aformulationofthisproblemforrobotics canbefoundin[20]. Toaddressthisproblemweusetheconceptofutility.Theideaisthateach individualcaninternallyestimatethevalueofexecutinganaction.Thisestimationresultsautilityfunctionwhichishighlydomainandtaskdependent. Aninstructivedefinitionofutilityfollows[11].Itisassumedthateachagent/ hardwaremoduleiscapableofestimatingitsfitnessforeverytaskofwhichit iscapable.Thisestimationtakesintoaccounttwotask-androbot-dependent factors: • expectedqualityoftaskexecution,giventhemethodandequipmenttobe used • expectedresourcecost,giventhespatio-temporalrequirementsofthetask (e.g.thelengthofthepathtoatargetetc) SogivenamoduleMandataskT,ifMiscapableofexecutingT,thenwecan defineQM T (p)andCM T (d)asthequalityandcost,respectively,expectedto resultfromtheexecutionofTbyM.Qualityoftaskexecutioncanbeseenasa functionofapplicationmodulespecificparameters,denotedbyp.Anexample canbethepowerlevelofthebatteryofaroboticmobileplatform.Ifthepower levelsarelowthentheplatformwillhavetochargesoon,thereforebecoming temporaryunavailabletoexecutetasks.Costcanbeseenasafunctionofthe pathlengthfromthegoalposition.Anon-negativeutilitymeasurecannow bedefined. UMT =
QMT (p)− CMT (d)ifMiscapableofexecutingTandQMT (p)>CMT (d) 0 otherwise
Eachagent’sobjectiveistomaximizeutility.Utilitymaximizationisequivalent in our case with effective resource usage and therefore better system performance. We can incorporate learning into utility estimation, therefore adaptingthesystem’sperformancetotheuserbehaviourandneeds.
5 Experimental Design To validate our system’s functionality and take some first simulation results, we have created an experimental testbed. For the implementation of our agents we decided to use JADE (Java Agent Development Framework) as a middleware. JADE is a software development framework aimed at developing multi-agent systems and applications conforming to FIPA (Foundation for Intelligent Physical Agents) standards for intelligent agents. Using JADE as a middleware basically means that we take advantage of its communication structures and naming directory services, in order to develop our experimental set-ups effectively and with respect to the standards of the agent research community. This way we are able to analyse interactions between agents, as well as decision-making procedures. For the simulations we have used the
AMulti-agentSystemArchitectureforModularRoboticMobilityAids
23
Player/Stagesimulator.Thestructureofourexperimentalimplementationis illustratedinFigure4. Fromthediscussionaboveithasbeenmadeevidentthatalargenumber ofpossiblesystemstatesandsetupscanexist,makingitalmostimpossible to simulate all of them. For that reason we had to choose some scenarios to be simulated which would test the task allocation and fault prevention mechanismsofourarchitecture.Thefollowingthreewerechosen: • one mobileroboticplatform,twodifferenttypesofapplicationmodules andmultipletaskstobeperformedautonomously • multiplemobileroboticplatforms,twodifferenttypesofapplicationmodulesandonesingletasktobeperformedautonomously,withrobotmalfunctionandtaskreallocation • multiplemobileroboticplatforms,twodifferenttypesofapplicationmodulesandmultipletaskstobeperformedautonomously Resultsfromthesimulationsareveryencouraging,withthesystemassingingtaskstomodulesinanefficientmanner.Evenwhentaskexecutionwas interruptedonpurposebyus,taskswerere-allocatedsuccesfully.Agentcommunicationisoptimalposingalmostnocommunicationoverhead.Morespecif-
JADEPlatform UI-Agent
System State Agent
Mediator Agent
Task Deocomposition Agent
Platforms PlayerClient PositionPlayerDevice LaserPlayerDevice
Background Tasks Agent
Chairs PlayerClient PositionPlayerDevice LaserPlayerDevice
.....
Stage
Player
GUI
Fig. 4. Experimental implementation
24
G.LidorisandM.Buss
ically, agent interactions are shown in Figure 5 for the task of driving an initially undocked chair (ch1) to a given destination (target). FIPA Agent CommunicationLanguageperformativesareused.
UI-C/ LinkLayer
Mediator Agent
System State Agent
Task Allocation Agent
Background Tasks Agent
Component Agents
inform(ch1,auto,goto(target)) query-if(ch1,”docked”) inform(ch1,null) inform(”performingdocking”) request(ch1,dock) cfp(ch1,dock) propose(bid) accept(pl1) inform(”done”) inform(”done”,pl1) inform(”donedocking”) request(pl1,goto(target)) inform(”done”) inform(”done”)
Fig. 5. Agent interactions while performing a complex task
At first Mediator Agent checks with the System State Agent whether the requested chair is docked with a mobile platform or not. The answer is negative, so a module selection procedure is iniated by the Task Allocation Agent in order to find the most suitable platform (pl1 in Figure). When it is found, it is ordered to dock with the chair and then reach the defined destination. The User Interface is appropriately updated by the Mediator while task execution is progressing. We can see that only few messages are used to accomplish a complicated task.
6 Conclusions We have defined a Multi-Agent System architecture for modular mobility aids in terms of software architecture, modelling and requirements. We have shown how components interact with each other in order to achieve effective functionality of the whole system. A quantification of each module’s ability to undertake a task was also presented and used in a negotiation based module selection mechanism. Throughout this effort, our main concern was simplicity
AMulti-agentSystemArchitectureforModularRoboticMobilityAids
25
andfunctionality,giventhesafetycriticalnatureoftheapplicationdomain. Finally,acompleteexperimentaltestbedhasbeenpresentedwhichhasbeen usedtovalidateourconcepts. Sincethisisanewanddemandingapplicationdomainmuchworkremains onthearchitecture.Weneedtodevelopcompleteperformancemeasuresand enhancethesystemwithlearningmechanismsthatwillallowittoadaptits performancetotheuser’sbehaviour.Wealsohavetolookdeeperintomechanismsforfaultpreventionandsystemrecovery.Finally,testsonphysically embodiedrobotswillbemade.
References 1. Mandel C, Huebner K, Vierhuff T (2005) Towards an Autonomous Wheelchair: Cognitive Aspects in Service Robotics. In Proceedings of Towards Autonomous Robotic Systems (TAROS 2005): 165-172 2. Gomi T, Griffith A (1998) Developing intelligent wheelchairs for the handicapped. In: Assistive Technology and Artificial Intelligence: Applications in Robotics, User Interfaces and Natural Language Processing 3. Hoyer H (1999) The OMNI-Wheelchair - State of the Art. In: Proceedings of Conference on Technology and Persons with Disabilities, Los Angeles 4. Simpson R et al (1998) Navchair: An assistive wheelchair navigation system with automatic adaptation. In: Assistive Technology and Artificial Intelligence: Applications in Robotics, User Interfaces and Natural Language Processing 5. Parker L (1998) Alliance: An architecture for fault tolerant multirobot cooperation. In: IEEE Transactions on Robotics and Automation, 14(2): 220-240 6. Simmons R, Smith T, Dias M B, Goldberg D, Hershberger D, Stentz A, Zlot R (2002) A layered architecture for coordination of mobile robots. In: Multi-Robot Systems: From Swarms to Intelligent Automata 7. Botelho S, Alami R (1999) M+: A schema for multi-robot cooperation through negotiated task allocation and achievement. In: Proceedings of the IEEE International Conference on Robotics and Automation: 1234-1239 8. Zlot R, Stentz A, Dias M B, Thayer S (2002) Multi-robot exploration controlled by a market economy. In: Proceedings of the IEEE International Conference on Robotics and Automation: 3016-3023 9. Gerkey B, Mataric M J (2002) Sold!: Auction methods for multi-robot coordination. In: IEEE Transactions on Robotics and Automation, 16(5):758-768 10. Dias M B, Zlot R M, Kalra N, Stentz A (2005) Market-Based Multirobot Coordination: A Survey and Analysis. Tech report CMU-RI-TR-05-14, Robotics Institute, Carnegie Mellon University 11. Gerkey B, Mataric M J (2004) A formal analysis and taxonomy of task allocation in multi-robot systems. In: Intl. Journal of Robotics Research, 23(9):939954 12. Smith R G (1980) The Contract Net Protocol: high level communication and control in a distributed problem solver. In: IEEE Transactions on Computers, C-29(12) 13. Gat E (1997)On three-layer architectures. In: Artificial Intelligence and Mobile Robots. MIT AAAI Press
26
G.LidorisandM.Buss
14. Bonasso P, Kortenkamp D (1995) Characterizing an architecture for intelligent,reactiveagents.In:WorkingNotes,AAAISpringSymposiumonLessons LearnedfromImplementedSoftwareArchitecturesforPhysicalAgents:29-34 15. Gat E. (1991) Integrating planning and reacting in a heterogeneous asynchronousarchitectureformobilerobots.In:SIGARTBulletin2 16. ConnellJ(1992)Ahybridarchitectureappliedtorobotnavigation.In:ProceedingsofIEEEICRA’92,NiceFrance 17. StoneP(1998)Layeredlearninginmulti-agentsystems.PhdThesis,Carnegie MellonUniversity 18. Mataric M (1994) Interaction and intelligent behavior. Phd Thesis, MIT AI Lab 19. GaleD(1960)Thetheoryoflineareconomicmodels.McGraw-HillBookCompany,NewYork 20. Gerkey B (2003) On multi-robot task allocation. Phd Thesis, University of SouthCalifornia,LosAngeles
How to Construct Dense Objects with Self-Reconfigurable Robots Kasper Stoy University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark [email protected]
Summary. Self-reconfigurable robots can change their shape by rearranging the modules from which they are built. This self-reconfiguration process has proven difficult to control, because it involves control of a distributed system of autonomous, but mechanically coupled modules connected in time-varying ways. In this work we address this control problem and specifically look at how to construct dense configurations. The basic idea behind our solution is that holes inside the desired configuration and modules outside attract each other. The effect is that holes propagate to the surface of the desired configuration and modules outside move in to fill them. This basic solution is augmented with a scaffold and gradient-based algorithm which ensures that the robot does not get stuck in local minima or get disconnected during self-reconfiguration. The approach is evaluated in simulation and we find that the self-reconfiguration process always converges and the time to complete a configuration scales approximately linearly with the number of modules. This is, to the best of our knowledge, the first time this result has been obtained for dense configuration.
1 Introduction Reconfigurable robots are robots built from modules. These robots can be reconfigured by changing the way these modules are connected. If a robot is able to change the way the modules are connected autonomously, the robot is a self-reconfigurable robot. Self-reconfigurable robots are versatile because they can adapt their shape to fit the task. Self-reconfigurable robots are also robust, because if a module fails, it can be ejected from the system and be replaced by a spare module. Potential applications for such robots include search and rescue missions, planetary exploration, building and maintaining structures in space, and entertainment. In order to realize these applications, research has to focus both on the development of the hardware of self-reconfigurable robots and on their control. The latter is the focus of this paper.
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 27–37, 2006. © Springer-Verlag Berlin Heidelberg 2006
28
K. Stoy
Fig. 1. This figure shows a simulated selfreconfigurable robot transforming from an initial random configuration to a configuration resembling a rhinoceros.
Fig. 2. This shows how a CAD model is represented using a set of overlapping bricks. This representation is used to drive the self-reconfiguration algorithm.
One of the fundamental control problems of self-reconfigurable robots is control of the self-reconfiguration process: the process by which the robot changes from one shape to another. An example of a self-reconfiguration process is shown in Figure 1, where a simulated self-reconfigurable robot reconfigures from a random initial configuration to a configuration resembling a rhinoceros. The method proposed here consists of two steps. In the first step a 3D CAD model, representing the desired configuration, is transformed into a set of overlapping bricks as shown in Figure 2. The set of overlapping bricks is just a space efficient way to represent the volume described by the CAD model. The second step is the actual self-reconfiguration process. The desired configuration is grown from a seed using the volume described by a set of overlapping bricks to guide the growth process. The growth process consists of two parallel-running processes: the scaffold and the fill process. The scaffold process is responsible for growing a porous scaffold which guarantees convergence and connectivity of the system. The fill process is responsible for filling in the pores of the scaffold and in this way create a dense configuration. The solution to the self-reconfiguration problem presented here is, to the best of our knowledge, the first self-reconfiguration algorithm which can be used to build dense configurations while only mildly restricting the class of configurations that can be constructed. At a more general level the algorithm we present provides a scalable, systematic, and convergent solution to the self-reconfiguration problem.
HowtoConstructDenseObjects
29
Thepaperisstructuredasfollows.InSection2,therelatedworkisdescribed.InSection3,wedescribethealgorithmweusetotransformthe3D CADmodelintoasetofoverlappingbricks.Section4describeshowtherepresentationofthedesiredconfigurationinteractswiththeothercomponents ofthesystemtorealizetheself-reconfigurationalgorithm.Finally,inSection 6,thesystemisevaluatedusingasimulatedself-reconfigurablerobotwhere scalabilityandconvergencepropertiesaredocumented.
2 Related Work The self-reconfiguration problem is: given a start configuration, possibly a random one, how to move the modules in order to arrive at the desired final configuration. It is computational intractable to find the optimal solution (see [4] for a discussion). Therefore, self-reconfiguration planning and control are open to heuristic-based methods. Chirikjian, Pamecha, and Chiang propose iterative improvement as a way around the computational explosion [4, 9, 3]. The idea is to find a suboptimal sequence of moves, which leads from the initial configuration to the final configuration. This suboptimal sequence of moves can then be optimized using local searches. Rus et al. simplify the planning problem by using a specific intermediate chain configuration that is easy to configure into and out of [11]. Kotay, Unsal, and Prevas propose a hierarchical planner [7, 15, 10]. At the highest level, some of the motion constraints of the underlying hardware are abstracted away to facilitate efficient planning. Based on these high-level plans, the lower level then produces the detailed sequence of actions. Another approach, suggested by Kotay [7], is to use meta-modules consisting of a small number of modules. The idea is that by planning at the meta-module level there are no or few motion constraints. This makes the planning tractable. On the other hand, meta-modules consist of several modules, making the granularity of the robot larger. A related approach is to make sure that the robot maintains a uniform scaffolding structure, facilitating planning [14]. Butler implemented the distributed Pacman algorithm on the Crystalline robot [2]. The Pacman algorithm was improved and implemented on the Telecubes by Vassilvitskii [16]. These two robots have few motion constraints, making the planning problem easier. The approaches reviewed so far are deterministic and planning-based. The following approaches are not deterministic, but are much simpler, because a planning process is not needed. In early work, the idea was to use local rules as far as possible and then add randomness to deal with the problems that could not be solved using local rules. This is, for instance, true for the work on Fracta [8] and also later work on other robots [18]. The problem tended to be that even though the robot often ended up in the desired configuration, it was not always so. We refer to this as the problem of convergence. This problem was also present in the work of Yim et al [17]. However, the chance of reaching
30
K. Stoy
the final configuration was higher in this work, because local communication was used to get some idea about the global shape at the local level. There are two solutions to the problem of convergence. One solution, proposed by Bojinov et al. [1], is not to focus on a specific configuration. Instead, the idea is to build something with the right functionality. Using this approach it is acceptable if a few modules are stuck as long as the structure maintains its functionality. Alternatively, Jones et al. insist on a specific configuration, but achieve convergence by enforcing a specific sequence of construction [6]. This paper is based on the work described in [12, 13], but extends this work by handling dense configuration.
3 From CAD Model to Representation It is difficult and time-consuming to hand-code local rules that result in a desired configuration being assembled. Therefore, we need an automatic way of transforming a human-understandable description of a desired configuration into a representation we can use for control. In our system, the desired configuration is specified using a connected three-dimensional volume in the VRML 1997 or Wavefront .obj file format, which are industry standards produced by most CAD programs. We transform the model into a representation whose size scales with the complexity of the three-dimensional model. The representation consists of a set of overlapping bricks of different sizes which approximates the input shape. The choice to use bricks is fairly arbitrary and other basic geometric shapes, such as spheres or cones, could be used as well. The representation is automatically generated from a CAD model. In the first step, the algorithm builds an approximation of the CAD model using unit-bricks. A unit-brick is placed at a given coordinate, if its coordinate is contained in the model. This is true if a line from this coordinate to infinity intersects the CAD model an uneven number of times. Afterwards, the algorithm optimizes the representation by repeatedly combining side-sharing bricks and by eliminating bricks which are contained in other bricks. Figure 3 shows a simple example of two overlapping bricks and their representation.
4 From Representation to Self-Reconfiguration Algorithm The challenge is to develop a distributed control algorithm which can reconfigure the robot from an initial random configuration to the desired configuration as described by the representation. The basic idea behind our solution is that holes inside the desired configuration and modules outside attract each other. The effect is that holes propagate to the surface of the desired configuration and modules outside move in to fill them. The end result is that the volume
HowtoConstructDenseObjects
B
31
A
A: (0,0,1) → (3,1,2) B: (0,0,0) → (2,2,2) Fig. 3. This figure shows two overlapping bricks and their representation (left). In this example eight unit-cubes are placed in the box labeled B and one in the box labeled A. These unit-cubes merge until only two cubes are left. One cube containingBwithcornercoordinates(0,0,0)and(2,2,2)andonecontainingAand overlappingBwithcornercoordinates(0,0,1)and(3,1,2).Thisfigurealsoshowsa basicunitofscaffoldconsistingoffourmodules(middle)andalatticeconsisting of8scaffoldelements(right).Alatticebuiltfromthesescaffoldelementsisporous allowingmodulestopassthroughit.
ofthedesiredconfigurationisfilledimplyingthatthedesiredconfiguration hasbeenreached. 4.1 Gradients Holes and modules use gradients to attract each other. Gradients provide moduleswithinformationaboutrelativedistancesanddirectionsintheconfiguration.Agradientiscreatedbyasourcemodulewhichsendsoutaninteger,representingtheconcentration,toallneighbours.Anon-sourcemodule calculatestheconcentrationatitspositionbytakingtheminimumreceived valueandaddingone.Thisconcentrationisthenpropagatedtoallneighbours andsoon.Forexample,ifthesourcemodulesendsout100,neighboursofthis modulewillhaveaconcentrationof101,theirneighbours102andsoon. Amodulecanlocatethesourcebydescendingthegradientofconcentration.However,ifamodulehastorelyonthebasicinteger-basedgradientto locatethesource,itwouldhavetomovearoundrandomlyforawhiletodetect thedirectionofthegradient.Inordertoeliminatetheseunnecessarymoves, weintroduceavectorgradientwhichmakesdirectioninformationavailable locally.Thebasicgradientimplementationisextendedwithavectorindicatingthelocaldirectionofthegradient.Thisvectorispropagatedbycopying the vector from the neighbour with the lowest concentration and adding a unitvectorinthedirectionofthisneighbourandrenormalizingtheresult. Weuselocalmessagepassingtoimplementthegradient.Amessagetakes onetimesteptotravelbetweenneighboursandthereforeitcantakemany timestepsforgradientstobepropagatedinthesystem.
32
K. Stoy
4.2 Building a Porous Scaffold Unfortunately, the basic gradient-based algorithm alone does not work, because the algorithm also has to ensure that the robot does not get stuck in local minima or get disconnected during self-reconfiguration. In order to address these problems we introduce a porous scaffold. In general, the set of positions contained in the scaffold Ps can be a subset of the positions in the set of positions in the desired configuration Pdc . However, in our implementation we use a fixed scaffold as shown in Figure 3. The algorithm that constructs the scaffold works as follows. All modules are initially connected in a random configuration and have a copy of the representation of the desired configuration. An arbitrary module is given an arbitrary coordinate contained in the desired configuration and the scaffold. This ensures that the seed module is not placed in a hollow part of the configuration. The seed module defines the coordinate system of the robot. Throughout the self-reconfiguration process all modules keep track of their position in this coordinate system using local communication. Note that internal localization is a simple problem in self-reconfigurable robots, because all modules are physically connected throughout the self-reconfiguration process as opposed to multi-robot systems where localization is a major challenge. If a module at random moves into a position next to a scaffold module and finds that this position is also part of the scaffold and the desired configuration, it changes to the scaffold state. In this way the scaffold is grown from the first scaffold module. The porous scaffold simplifies the reconfiguration problem, because it allows modules to move through the structure and thereby remove local minima from the configuration. The scaffold is also used to prevent modules from disconnecting from the robot. Modules in the system cannot move independently of each other, because they depend on each other for connection to the robot. The problem is then to keep the system connected while allowing wandering modules to move. In our solution scaffold modules emit a connection gradient and modules only move if they do not perturb the gradient. This is enough to guarantee that the system stays connected. Detailed rules and proofs were presented in [12]. The scaffold puts a mild constraint on the class of configuration where convergence is guaranteed. In our implementation the constraint is that the desired configuration can be covered with a set of overlapping 2x2x2 cubes. Any cube needs an overlap of at least four modules on the same face with neighbouring cubes. This constraint ensures that there is one connected scaffold in the desired configuration. This constraint prevents the algorithm from working with desired configurations containing sub-configurations that are one module thick. This constraint may be reduced or might even be removed if the scaffold is generated specifically for each configuration.
HowtoConstructDenseObjects
33
4.3 Self-ReconfigurationAlgorithm We can now implement the self-reconfiguration algorithm on top of these support algorithms. The idea here is, as mentioned, to 1) propagate holes towardmodulesoutsidethedesiredconfigurationand2)movemodulesoutside thedesiredconfigurationtowardsholesinthedesiredconfiguration. Amodulecanbeinoneofthreestates:wander,fill,andscaffold.Wander modules are outside the desired configuration. Fill modules are inside the desiredconfiguration,butarenotpartofthescaffold.Finally,scaffoldmodules areinsidethedesiredconfigurationandbelongtothescaffold. Wandermodulesdescendaholegradientcreatedbyscaffoldorfillmodules toreachholesinthedesiredconfiguration.Wandermodulesalsoemitawander gradient.Wandermoduleschangetofillwhentheirpositionsarecontainedin thedesiredconfigurationandarepositionednexttoascaffoldmodule.If,in addition,theirpositionsareincludedinthescaffold,theychangetoscaffold. Fillmodulesclimbthewandergradient,ifandonlyifitdoesnottakethem outofthedesiredconfiguration.Thismeansthatfillingmodulesmoveaway from wandering modules and in effect propagate holes towards the surface and the wandering modules. Fill modules change to scaffold if they move intoapositioncontainedinthescaffold.Bothfillandscaffoldmodulesemit theholegradientifaneighbourpositionisunfilledandpartofthedesired configuration. Theself-reconfigurationalgorithmrunsinparallelwiththescaffoldbuildingalgorithm.Thetwoalgorithmsinteractimplicitlywitheachother:1)the scaffoldisporousandthereforeallowstheself-reconfigurationalgorithmto run.2)theself-reconfigurationalgorithmmovesmodulestoscaffoldpositions makingthescaffoldalgorithmefficient.
5 Simulated System In our simulation, we use modules, which are more powerful than any existing hardware platforms, but do fall within the definition of a Proteo module put forward by Yim et al. [17]. The modules are assumed to be cubical and when connected they form a lattice structure. They have six hermaphrodite connectors and can connect to six other modules in the directions: east, west, north, south, up, and down. Modules directly connected to a module are referred to as neighbours. A module can sense whether there are modules in neighbouring lattice cells. In this implementation, we do not control the actuator of the connection mechanism, but assume that neighbour modules are connected and disconnected appropriately. A module can only communicate with its neighbours. It is able to rotate around neighbours and to slide along the surface of a layer of modules. Finally, we assume that direction vectors can be uniquely transformed from
34
K. Stoy
the coordinate system of a module to the coordinate systems of its neighbours. This is necessary to propagate the vector gradient and use the brick representation. The simulator is programmed in Java3D. The simulation uses time steps. In each time step all the modules are picked in a random sequence and are allowed: 1) to process the messages they have received since last time step, 2) to send messages to neighbours (but not wait for reply), and 3) to move if possible. Note that this implies that it can take many time steps for gradients to be propagated in the system. In order for a module to move it has to satisfy the motion constraints of the system. This means that in order to move to a new position the new position has to be free. If using the rotate primitive, the position that the module has to rotate through to get to the new position has to be free as well. This is all assumed to be sensed locally. In our implementation the problem of two modules moving into the same cell at the same time is not addressed, because in the simulation only one module is allowed to move at a time. However, in a real parallel-running system this problem might be solved by rolling the module back to its original position in case of collision or through local right-of-way rules.
6 Experiments The task we pick for our experiment is to reconfigure the self-reconfigurable robot from a random initial configuration to another random final configuration satisfying the constraint on desired configurations. The representations of the desired configurations are automatically generated. The representation is then downloaded into the modules of the simulation and the assembly process is started. We repeated the experiments ten times with changing numbers of modules. For each experiment we recorded the time steps needed, the total number of moves, and calculated the theoretical lower bound on the needed number of moves. The lower bound is the minimum number of moves needed to reconfigure from the initial configuration, to the final configuration assuming that interference with other modules can be ignored [5]. The lower bound was only calculated for configurations with 800 modules or below, because with a higher number of modules it became computationally infeasible. The time steps needed to reconfigure grow sub-linearly with the number of modules (see Figure 4) which is comparable to our earlier results with porous configurations [12] and the results reported by Yim et al. [17]. Figure 4 shows that the number of moves grows approximately linearly with the number of modules. These results do of course depend on the shape of the desired configurations which, even though they are randomly generated, tend to be spherical for larger numbers of modules. In all experiments the system converges to the desired configuration.
HowtoConstructDenseObjects 800
70000
700
60000
600
Moves
Time steps
Lower bound Observered
50000
500 400 300
40000 30000 20000
200
10000
100 0
35
0
500
1000
1500
2000
Modules
2500
3000
3500
0
0
500
1000
1500
2000
2500
3000
Modules
Fig. 4. This figure shows how the number of time steps (left) and moves (right) needed to reconfigure depends on the configuration and the number of modules. The theoretical lower bound on the number of moves is also shown.
Fig.5.This figure shows the total number of messages used for communication as a function of the number of modules. It is shown how the messages are divided between different tasks (the order of items in the legend is the same as in the column stacks).
Thenumberofmovesandthetimetocompleteareconfigurationareimportantcharacteristicsofanalgorithm.However,anotheraspectistheamount ofcommunicationneeded.ThisinformationissummarizedinFigure5.The systemreliesheavilyoncommunicationandincaseofthe3200-modulesexperimentsonaverage89 . ± 23messagesaresentpermodulepertimestep.These . messages originate from propagating three changing gradients, propagating positioninformation,andmessageexchangesneedtorunthedistributedconnectionalgorithm.Inourcurrentimplementation,littleisdonetominimize thenumberofmessages,sothereisroomtoimprovetheperformance.
36
K.Stoy
7ConclusionandFutureWork Wehavepresentedanapproachtoself-reconfigurationworkingintwosteps. First,arepresentationofadesiredconfigurationisgeneratedautomatically basedona3DCADmodel.Second,aself-reconfigurablerobotusesthisrepresentationtotransformitselffromarandominitialconfigurationtothedesiredconfiguration.Thetransformationiscontrolledbyanalgorithmwhich propagatesholesinthedesiredconfigurationtowardsmodulesoutside.This algorithmisextendedwithascaffoldbuildingalgorithmthatremoveslocal minimafromthedesiredconfigurationandtherebyguaranteesconvergence ofthealgorithm.Furthermore,aconnectionalgorithmensuresthattherobot staysconnectedduringthetransformationprocessbyguaranteeingthatmodulesatalltimesareconnectedtothescaffold. Inexperimentswithasimulatedself-reconfigurablerobotwehavedemonstrated that the system represents an efficient, systematic, and convergent solutiontotheself-reconfigurationproblem,whileonlyrestrictingtheclass ofpossibleconfigurationsmildly.Thealgorithmrepresents,tothebestofour knowledge, the first algorithm able to maintain these characteristics while handlingdenseconfigurations.
Acknowledgments TheauthorwouldliketothankGeorgeKonidaris,UniversityofMassachusetts atAmherst,forthefruitfuldiscussionsresultinginthispaper.
References 1. H. Bojinov, A. Casal, and T. Hogg. Emergent structures in modular selfreconfigurable robots. In Proc., IEEE Int. Conf. on Robotics & Automation (ICRA’00), volume 2, pages 1734 –1741, San Francisco, California, USA, 2000. 2. Z. Butler, S. Byrnes, and D. Rus. Distributed motion planning for modular robots with unit-compressible modules. In Proc., IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS’01), volume 2, pages 790–796, Maui, Hawaii, USA, 2001. 3. C.-J. Chiang and G.S. Chirikjian. Modular robot motion planning using similarity metrics. Autonomous Robots, 10(1):91–106, 2001. 4. G. Chirikjian, A. Pamecha, and I. Ebert-Uphoff. Evaluating efficiency of selfreconfiguration in a class of modular robots. Robotics Systems, 13:317–338, 1996. 5. G.S. Chirikjian and J.W. Burdick. The kinematics of hyper-redundant robot locomotion. IEEE transactions on robotics and automation, 11(6):781–793, 1995. 6. C. Jones and Maja J. Matari´c. From local to global behavior in intelligent self-assembly. In Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA’03), pages 721–726, Taipei, Taiwan, 2003.
HowtoConstructDenseObjects
37
7. K. Kotay and D. Rus. Algorithms for self-reconfiguring molecule motion planning. In Proc., IEEE/RSJ Int. Conf.onIntelligent Robots and Systems (IROS’00,volume3,pages2184–2193,Maui,Hawaii,USA,2000. 8. S.Murata,H.Kurokawa,andS.Kokaji. Self-assemblingmachine. InProc., IEEE Int. Conf. on Robotics & Automation (ICRA’94), pages 441–448, San Diego,California,USA,1994. 9. A.Pamecha,I.Ebert-Uphoff,andG.S.Chirikjian. Usefulmetricsformodularrobotmotionplanning. IEEETransactionsonRoboticsandAutomation, 13(4):531–545,1997. 10. K.C. Prevas, C. Unsal, M.O. Efe, and P.K. Khosla. A hierarchical motion planningstrategyforauniformself-reconfigurablemodularroboticsystem. In Proc.,IEEEInt.Conf.onRoboticsandAutomation(ICRA’02),volume1,pages 787–792,Washington,DC,USA,2002. 11. D.RusandM.Vona.Self-reconfigurationplanningwithcompressibleunitmodules. InProc.,IEEEInt.Conf.onRoboticsandAutomation(ICRA’99),volume4,pages2513–2530,Detroit,Michigan,USA,1999. 12. K.Støy.Controllingself-reconfigurationusingcellularautomataandgradients. In Proc.,8thint.conf.onintelligentautonomoussystems(IAS-8),pages693– 702,Amsterdam,TheNetherlands,2004. 13. K.StøyandR.Nagpal. Self-reconfigurationusingdirectedgrowth. InProc., int. conf. on distributed autonomous robot systems (DARS-04), pages 1–10, Toulouse,France,2004. ¨ and P.K. Khosla. A multi-layered planner for self-reconfiguration of a 14. C. Unsal uniform group of i-cube modules. In Proc., IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS’01), volume 1, pages 598–605, Maui, Hawaii, USA, 2001. ¨ 15. C. Unsal, H. Kiliccote, and P.K. Khosla. A modular self-reconfigurable bipartite robotic system: Implementation and motion planning. Autonomous Robots, 10(1):23–40, 2001. 16. S. Vassilvitskii, M. Yim, and J. Suh. A complete, local and parallel reconfiguration algorithm for cube style modular robots. In Proc., IEEE Int. Conf. on Robotics and Automation (ICRA’02), volume 1, pages 117–122, Washington, DC, USA, 2002. 17. M. Yim, Y. Zhang, J. Lamping, and E. Mao. Distributed control for 3d metamorphosis. Autonomous Robots, 10(1):41–56, 2001. 18. E. Yoshida, S. Murata, H. Kurokawa, K. Tomita, and S. Kokaji. A distributed reconfiguration method for 3-d homogeneous structure. In Proc., IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS’98), volume 2, pages 852–859, Victoria, B.C., Canada, 1998.
InSituAutonomousBiomechanical Characterization MehdiBoukallel,MaximeGirot,andSt´ephaneR´egnier LaboratoiredeRobotiquedeParis Universit´ePierreetMarieCurie-Paris6 CNRS-FRE2507 18routeduPanorama-92265FontenayAuxRoses-France boukallel,girot,[email protected] Summary. Thispaperpresentsafullyautomatedmicroroboticsystembasedon force/vision referenced control designed for cell mechanical characterization. The design of the prototype combines Scanning Probe Microscopy (SPM) techniques withadvancedroboticsapproaches.Asaresult,accurateandnon-destructivemechanical characterization based on soft contact mechanisms are achieved. The in vitroworkingconditionsaresupportedbytheexperimentalsetupsothatmechanicalcharacterizationscanbeperformedinbiologicalenvironmentalrequirementsas wellasincyclicaloperatingmodeduringseveralhours.Thedesignofthedifferent modules which compose the experimental setup are detailed. Mechanical cell characterizationexperimentsunderinvitro conditionsonhumanadherentcervix EpithelialHelacellsarepresentedtodemonstratetheviabilityandeffectivenessof theproposedsetup.
Keywords: Invitromechanicalcellcharacterization;ScanningProbeMicroscopy(SPM)techniques;humanadherentcervixEpithelialHelacellsmechanicalcharacterization.
1 Introduction Nowadays robotics and microrobotics techniques play an important role for exploring the mechanical cell behaviour. The ability to study accurately the mechanical behaviour of individual cells is a key component in understanding the elementary biological functions that various biological cells exhibit. Furthermore, the individual mechanical cell characterization is a major step towards the knowledge and understanding the behavior of the complex mechanical tissues. Up to date, several experimental setups have been developed to identify the control mechanisms of the cell mechanical response [1]-[5]. Among these robotics and microrobotics systems, the most promising ones
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 39–50, 2006. © Springer-Verlag Berlin Heidelberg 2006
40
M.Boukallel,M.Girot,andS.Regnier
involvesScanningProbeMicroscopy(SPM)techniquesfornanoscale.These techniques have the potential to give an accurate quantitative information about local forces and contact mechanisms. The Atomic Force Microscope (AFM)hasbecomeacommonlyusedtooltomeasuremechanicalproperties ofthebiologicalsamples[6]-[10].Inthesestudies,aflexiblecantileverwith lowspringconstantandanatomicsharptipisbroughtinthevicinityofthe biologicalsample.Deflectionofthecantileverasaresultofthemechanical interactionbetweenthemicroindenterandthesampleismonitoredbyasplit photodiodeandtheuseofalaserbeamreflectedonthebackofthecantilever. ThesignificantresearchinvolvingAFMmakepossiblethemeasuringofrelevant cells mechanical properties (Young’s modulus, bulk modulus, surface roughness,...).However,mostofthesestudieshavenotbeenperformedin biologicalcleanroomenvironment.Sinceelementarybiologicalfunctionsand mechanicalpropertiesofbiologicalcellsarewidelyaffectedbytheexperimentalconditions,identifiedpropertiesmaynotberelevant.Furthermore,asthe reactionofthebiologicalsamplestostressvarygreatlyinagivenlapsetime, itisimportanttomonitorthecharacterizationprocesscontinuouslyinanin vitro environment. Moreover,someproblemsarecommonlyassociatedwiththeuseofstandard commercialcantileverswithasharptipforthesoftsamplemechanicalcharacterization.Thenanometerssizedimensionsoftipscancauseimportantlocal strainswhicharehigherthanelasticdomain.Furthermore,dependingonthe magnitudeoftheappliedforceonthesoftsamples,cantileverstipsaswell asthesamplescanbeeasilydamagedsothatthelocalstrainappliedinthe indentedareabecomeschanged.Toovercometheseproblems,Mahaffyetal. [11]suggesttoattachamicro-sphereatthecantilevertip.However,besides thenecessityofdexterousmanipulationsforaccurateplacementofthemicrosphere,thismethodneedstousewell-knowmaterials(geometry,mechanical properties,...). Anotherdifficultyisassociatedwiththemechanicalcharacterizationofbiologicalsamplesusingsharpcantilever.Usually,thespectroscopycurvescollectedwiththeAFMareusedinconjunctionwithanappropriateanalytical modeltoestimateYoung’smodulus,friction,wearandothermaterialproperties.Accordingtotheliterature,theHertzianmodelwhichdescribesthe relationshipbetweenforceandidentationisthecommonlyapproachusedfor fittingtheexperimentaldata.Also,twomajorassumptionsaremade:linear elasticityandinfinitesamplethickness.Someauthorshaveshowedthatinthe caseofsoftcontactmechanisms,modelsderivedfromlinearelasticitycanlead tosignificanterrors[12].Moreover,duetotheimperfectionsofthetipradius ofcurvatureanill-definedcontactregionresultsbetweentheprobeandthe sample.Consequently,uncertaintiesareintroducedforchoosingtheappropriatefittinganalyticalmodel.Ithasbeenalsoshown,dependingontheapplied forceandonthesamplethickness,thatlargeerrorsmayresultwhenusing infinite thickness models [13][14]. The authors compute force-displacement
In Situ Autonomous Biomechanical Characterization
41
curves for finite sample thickness to show that, for soft and thin sample the error in the estimated elasticity modulus can be an order of magnitude. In our opinion, the cell mechanical characterization in in vitro environment using tipless cantilever seems to be a promising solution. As study involving such cantilever are less prone to problems associated to sharp tip cantilever, enhanced non-destructive cell mechanical characterization should be achieved. For this purpose, a Force Bio-Microscope (FBM) system has been developed which combines SPM techniques and advanced robotics approaches. A tipless chemically inert cantilever is used for this study. Contact mechanisms resulting from the biomechanical process are modeled with the DMT (Derjaguin, Muller and Toporov) contact theory which taking into account both adhesion forces and finite sample thickness. In order to demonstrate the accuracy of the DMT model in the case of soft contact mechanisms, the estimated force-deformation curve is compared to the one predicated by the Hertz theory.
2ExperimentalSetupOverview TheautomatedForceBio-Microscope(FBM)deviceisanhybridAFMmicroscopeassociatingbothscanningmicroscopyapproachandbiologicalenvironmentconstraint.TheFBM consistsmainlyofthreeunits:themechanical sensingunitwhichperformsdetection,positioningandsensingfeatures,the imaging/grabbingunitforimagingandcelltrackingfeaturesandtheclean room in vitro unit which allows experiments to be conducted in biological environment. As the FBM experimental setup provides suitable conditions for study in controlled environment, the mechanical measurements process can be done onthebiologicalsampleonanextendedlapseoftime.Amastercomputeris usedtodrivetheFBM inaautomaticoperatingmodebasedonforce/vision referencedcontrol.Auser-definablegraphicalinterfacehasbeendevelopedin order to make configuration of the automatic cell mechanical characterizationprocesseasier.Toavoidundesiredmechanicalvibrationsduringthecell characterizationtasks,theFBM isinstalledonananti-vibrationtable.The overallconfigurationoftheFBM andthedifferentworkingcomponentsare showninfigure1(A). 2.1 MechanicalSensingUnit Themechanicalsensingunitisbasedonthedetectionofthedeflectionofa cantileverbyanopticaltechnique.Afourquadrantphotodiodewithinternalamplifiersassociatedtoalowpowercollimatedlaserdiode(wavelength 650nm) are used in order to perform both axial and lateral nanoNewtons forcesmeasurements. The opticalpath ofthe Gaussianlaser beam is optimizedusingapairofmirrorsandaglassasphericcondenserlens.Thetotal
42
M.Boukallel,M.Girot,andS.Regnier
sensingareaofthephotodiodeis7mm2 withaspectralresponsefrom400to 1100nm.Thesensitivityoftheopticaldetectiondeviceis5mV/µm. Alowspringconstant(0.2N/m)uncoatedtiplesssiliconcantileverisusedas probeforthecellmechanicalcharacterization.Theleveris450 µmlong,90 µmlargeand2µmthick.Thebiologicalsamplesareaccuratelypositioned belowthecantileverbya3DOF’s(x,yandz)micropositioningencodedstage withasubmicrometerresolution(0.1µm).Thekinematicsfeaturesofthemicropositioningstageallowstoachieveaccuratemechanicalmeasurementsina workspaceof25x25x25mm3 withagoodrepeatability.Theconfiguration ofthemechanicalsensingunitincludingtheopticaldetectiondeviceispresentedinfigure1(B).Amagnifiedpictureofthecantileverwiththefocused laserbeamonitsreflectivesurfaceisshowninthesamefigure.
(A) Cage incubator
100 mm
(B) 3 DOF micropositioning stage
Heating module
Antivibration table
Miror
Laser diode
15 mm
10 mm Photodiode Glass aspheric condenser lens Silicon tipless cantilever
Fais Focused laser beam 100 µm
Inverted microscope
CCD camera
10 mm
Cantilever
Fig.1.(A)TheFBM experimentalsetupoverview.(B)Themechanicalsensing unit.
Forthepreliminarystudy,wefocusedonforcefeedbackcontrolofcantilever flexural deflection. Thus, only the vertical micropositioning stage is servoed.Byknowingthemicromotorsverticalpositionaswellasdeflection ofthecantileverusingtheopticaldetectiondevice,aoptimizedProportional andDerivative(PD)controllerwasdesignedtoensureoptimalcontrolperformance.The(PD)termsareoptimizedusingtheZiegler-Nicholsmethod. Figure 2(A) shows experimental results on the force referenced control approachfordifferentdesiredforces. 2.2 Imaging/GrabbingUnit Theimaging/grabbingunitconsistsofaninvertedmicroscope(OlympusIMT2)withNikon10xand20xobjectives.Aphasecontrastdeviceismountedon
In Situ Autonomous Biomechanical Characterization
43
the microscope for precise contrast operation. The inverted microscope is fitted out with a CCD camera (754x488 pixels resolution). Using a frame grabber and a specialized imaging library package (Matrox Imaging) associated to the CCD camera, automatic mechanical characterization based on image features tracking is achieved (cf. figure 2(B)). The pixel to real world calibration of the CCD camera is achieved by means of a calibrated glass micro-array as well as calibrated micro-spheres (cf. figure 3). (A)
(B)
Cell target
100 µm
Cantilever
Fig. 2. (A) The force feedback control approach. (B) The vision feedback control approach.
Fig. 3. Pixel to real world calibration of the CCD camera using the calibrated glass micro-array.
44
M.Boukallel,M.Girot,andS.Regnier
2.3 CleanRoominvitroUnit The incubating system is formed with a controlled heating module which maintainstemperatureonthecageincubatorat37 o C usingasinglethermocoupleprobe.Thecageincubatorensuresatemperaturestabilitywithin the0.1o C.Amixedstreamcomposedbya5%CO2 andhumidifiedairisfed intoasmallincubatingchambercontainingthebiologicalsamples,avoiding inthiswaycondensationonthecagewallsthatcoulddamagethemechanical parts of the microscope and the micropositioning stage. The whole system includingtheFBM isplacedinapositivepressurecleanroomtoprotectthe biologicalenvironment.
3InvitroMechanicalCharacterizationExperiments Experiments presented in this paper are focused on the mechanical stress response of human cervix Epithelial Hela (EpH) cells under controlled environment.Inordertobevalid,someassumptionsaremadeinthisregard: (i)thebiologicalmembraneisassimilatedtoaperfectlyelasticsphericalmaterial;(ii)theenclosingliquidbythebiologicalmembraneoftheEpH cells isconsideredasincompressible;(iii)thesurfacesareassumedfrictionlessso thatonlynormalforcesaretransmitted;(iv)theosmoticinfluenceonvolume modificationisneglected. TheEpH cellscanbeassimilatedmorphologicallytoanellipticalcellswitha thinsurroundingbiomembrane.Inthepresentstudy,theaveragedimensions ofthebiologicalsampleis10 µmlong,9µmlargeand6µmheight(cf.figure 4). TheEpithelialHelacells(EpH)arepreparedonPetridisheswithspecificculturemediumformedbyDulbecco’sModifiedEagle’sMedium(DMEM)with highglucoseandL-glutaminecomponentsand10%offoetalbovineserum.
Fig. 4. (A) Magnified image of the cervix Epithelial Hela cells obtained with an 63x objective. (B) The cervix Epithelial Hela cells morphology observed by fluorescence techniques.
InSituAutonomousBiomechanicalCharacterization
45
3.1 Cell’sMechanicalResponseCharacterization Figure5(A)showstheexperimentalcurvesofthephotodiodeoutputasfunctionofthesampledisplacement(Δz)performedonbothsingleEpH celland hardsurface.Thesinglestepofthesampledisplacementis200nmandthe totaldisplacementis8µm.Deformationsδ oftheEpH cellaremonitored by calculating the difference between the sample displacement Δz and the cantileverdeflectionΔd.Thenon-linearelasticbehaviouroftheEpH canbe seeninthefigure5(B)whichpresentsthesampledeformationδasfunction oftheloadforceappliedbythecantilever. (A)
(B)
3.5 Hard surface Biological sample
3
0.25
2.5 Deformation δ (µm)
Vertical photodiode output (V)
0.3
0.2
0.15
2
1.5
0.1
1
0.05
0.5
0 0
5 10 Sample displacement µm) (
0 0
0.1 0.2 0.3 Load forceµN) (
0.4
Fig. 5. (A) Experimental data of the photodiode output as function of the sample displacement performed on both single EpH cell and hard surface. (B) Experimental curve of the sample deformation δ as function of the applied load by the cantilever.
3.2 ViscoelasticBehaviourCharacterization TheviscoelasticbehaviouroftheEpH cellsarealsoinvestigatedbytheFBM device.Cyclicalandautomaticapproachandretractexperimentationswere conductedonthesamebiologicalsampleduring2hourswith3minutesintervals.Forthisgivenstudy,themotionamplitudeandthesinglestepofthe verticalmicrostagearefixedto8 µm and200 nm,respectively.Inorderto reducethecantileverdampingoscillationsduringthemechanicalcharacterizationprocess,velocityofthesamplepositioningstageischosensmall(0.5 µm/s).Figure6(A)shows3approachesandretractscurvesmonitoredatdifferenttimeintervals(t=0mn,40 mn and80mn)ofthecyclicalexperiments. Asinglereferencedapproachandretractcurvesperformedonhardsurface are given in figure 6(B). According to the collected data, the EpH sample exhibitthesameviscoelasticbehaviourduringalltheexperimentation.
46
M.Boukallel,M.Girot,andS.Regnier (A)
(B)
0.45
0.45 t=0 mn
Appoach
t=40 mn
0.4
Retract
0.4
t=80 mn 0.35
Measured force (µN)
Measured force (µN)
0.35
0.3
0.25
0.2
0.15
0.3
0.25
0.2
0.15
0.1
0.1
0.05
0.05
0
0 0
2
4
6
8
0
Sample displacement (µm)
2
4
6
8
Sample displacement (µm)
Fig. 6. (A) Experimental spectroscopy curves (approach and retract) performed on a single EpH cell at different time intervals (t=0 mn, 40 mn and 80 mn). (B) Single referenced approach and retract curves performed on hard surface. 0.7
ac
t
0.6
Strain ε
Re
tr
0.5
0.4
ch
oa
pr
0.3
Ap
Adhesion forces
0.2
0.1
0
0
1
2
3
4
5
6
Sample displacement (µm)
Fig. 7. Dimensionless strain per module length ε as function of the sample displacement Δz.
In Situ Autonomous Biomechanical Characterization
47
Compared to the approach and retract curves performed on hard surface (cf. figure 6(B)), the retract collected data performed on the Eph sample are different from the approach one. This suggests adhesion forces between the cantilever probe and the biological sample which modifies the total strain energy. Figure 7 shows the strain ε according to the sample displacement Δz. This figure emphasizes adhesion interaction between the cantilever and the biological sample.
3.3 InvitroEfficiencyApproachforCellMechanical Characterization Inordertostudycorrelationbetweenthemechanicalcellpropertiesandthe environmentalcultureconditions,wehaveexperimentedautomaticandcyclicalspectroscopytasksonasingleEpH cellduringseveralminuteswithout theuseoftheincubatingsystem.Astheprecedentstudy,thesampledisplacementandthesinglestepoftheverticalmicropositioningstagearefixedto 8µmand200 nmrespectively.Figure8(A)showsevolutionofthe EpH cell mechanicalbehaviourofcyclicalspectroscopyoperationwithandwithoutthe useoftheincubatingsystem.Morespecifically,curve(A)showstheapproach andretractcurvesusingthecageincubator.Curves(B),(C)and(D)show themechanicalbehaviourofthestudiedEpH cellfordifferentelapsedtimes t0 oncethecageincubatoristurnedoff. Thesemechanicalcharacterizationexperimentsobviouslyrevealthatmechanicalpropertiesofthestudiedsampleareaffectedbythetemperatureconditions.Thisdifferencesuggeststhattheintraorextra-cellularmatrixreactto thevariationoftemperature. 3.4 AnalyticalModelforSampleDeformationEstimation Thedeformationδresultingfromthemechanicalcellcharacterizationprocess is estimated using an appropriate analytical model. The DMT (Derjaguin, MullerandToporov)theoryischosenforthispurpose. Noting R the radius of the biological sample (R=10 µm), w the adhesion work, P the load force applied by the cantilever and a0 the radius of the contactareawhenP =0,bothcontactarearadiusaanddeformationδcan beexpressedaccordingtoDMT[15]by P a3 =a30 ( + 1) 2πRw a2 δ= R
(1) (2)
48
M.Boukallel,M.Girot,andS.Regnier
TheDMTmodelsuggeststhatadhesionworkwcanbeexpressedaccordingtothepull-offforcePof f neededtoovercomeadhesionforcesas[15] Pof f =2πRw
(3)
Asthepull-offforcePof f andthecontactarea a0 areaccuratelymeasured using the FBM (Pof f 20 nN , a0 2 µm), the values of w and a0 are introducedinequation1forthedeterminationofa.Equation2givestheestimationofδ. Figure8(B)showstheestimationofthebiologicalsampledeformationδas afunctionofthesimulatedloadforceP usingtheHertzandtheDMTtheories.Theseanalyticalresultsarecomparedtotheexperimentalmeasurements performedwiththeFBMandpresentedinsection3.1.Asadhesionforcesare notconsidered,largeerrorsareobservedbetweentheexperimentaldataand thepredicatedforce-deformationcurves(inorderof0.2µmofmagnitude)in thecaseoftheHertzmodel. (A)
(B) 3.5
0.4 t0 = 0 mn
(A)
3
t0 = 5 mn
0.3
t0 = 9 mn
0.25
t0 = 13 mn
(B) (C)
0.2 (D)
0.15 0.1
Deformation δ (µm)
Measured force (µN)
0.35
Experimental data Hertz DMT
2.5 2 1.5 1
0.05 0.5
0 0.05 0
1
2
3
4
5
6
Sample displacement (µm)
7
8
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Applied load P (µN)
Fig. 8. Evolution of the measured force as a function of the sample displacement for different elapsed times t0 = 0, 5, 9 and 13 mn. (B) Estimation of the biological sample deformation δ as a function of the simulated load force P using the Hertz and the DMT theories compared to the experimental data.
4 Conclusion This paper has presented the development of a micro-force sensing system for in vitro mechanical cell characterization. The experimental setup combines Scanning Probe Microscopy (SPM) techniques with advanced robotics approaches. As the developed system operates in a fully automatic mode based on visual and force tracking control, effective mechanical characterization and reliable data acquisition are achieved. The Force Bio-Microscope
In Situ Autonomous Biomechanical Characterization
49
device (FBM) consists of three units with autonomous force sensing and measurements capabilities. Each unit is designed, calibrated or configured towards an effective in vitro mechanical cell characterization. Mechanical characterization is conducted using the FBM on human adherent cervix Epithelial Hela cells. These experiments demonstrate the efficiency of the experimental setup developed to explore the mechanical properties in in vitro conditions of adherent biological samples. The contact mechanisms resulting form the cell mechanical characterization process are predicted using appropriate model (DMT) taking into account both adhesion forces and finite sample thickness.
5 Future work Current work involves modelling the soft contact mechanisms using a tipless cantilever taking into account friction, osmotic influence and anisotropic non-linear elastic properties. Future work will be focused on exploring the mechanical transduction of living cells in in vitro environment conditions. One of the most promising results of this study will be the description of both mechanosensitivity and mechanical transduction mechanisms at the microscale for biological cells or tissue configuration.
Acknowledgment The authors are grateful to J. Angulo-Mora from the Genetic of the Radiosensitivity Laboratory (LGR) of the French Atomic Energy Commission (CEA), for cells preparation and for his enthusiastic help.
References 1. Sun Y, Nelson B J (2004) Jrnl. of Information Acquisition 1:23–32 2. Van Vliet K J, Bao G, Suresh S (2003) Acta Materialia journal 51:5881–5905 3. Kim D H, Yun S, Kim B (2004) in proc.of the Int. Conference on Robotics and Automation 5013–5018 4. Guck J, Ananthakrishnan R, Cunningham C C, Kas J (2002) J.Physics:Condens. Matter 14:4843–4856 5. Sharp J M, Clapp A R, Dickinson R B (2003) Colloids and Surface B:Biointerfaces 27:355–364 6. Radmacher M (1997) IEEE Engineering in Medicine and Biology 47–57 7. Sahin O, Yaralioglu G, Grow R, Zappe S F, Atalar A, Quate C, Solgaard O (2004) Sensors and Actuators A 114:183–190 8. Park S, Costa K, Ateshian G (2004) Journal of Biomechanics 37:1687–1697 9. Dimitriadis E K, Horkay F, Maresca J, Kachar B, Chadwick R S (2002) Biophysical Journal 82:2798–2810
50
M.Boukallel,M.Girot,andS.Regnier
10. YaoX,WalterJ,BurkeS,StewartS,JerichoMH,PinkD,HunterR,Beveridge TJ(2002)ColloidsandSurfaceB:Biointerfaces23:213–230 11. MahaffyR,ShihCK,MackintoshFC,KasJ(2000)Phys.Rev.Lett.85:880– 883 12. CostaK,YinFC(1999)J.Biomeh.Engr.Trans.ASME121:462–471 13. DomkeJ,RadmacherM(1998)Langmuir14:3320–3325 14. AkhremitchevB,WalkerG(1999)Langmuir15:5630–5634 15. Maugis D (2000), Contact, Adhesion and rupture of elastic solids, SpringerVerlag
Incremental Learning of Task Sequences with Information-Theoretic Metrics∗ Michael Pardowitz, Raoul Z¨ ollner, and Rudiger Dillmann Institute of Computer Science and Engineering Universit¨ at Karlsruhe (TH) Karlsruhe, D-76138, Germany, P.O. Box 6980 Email: {pardow, zoellner, dillmann}@ira.uka.de Summary. Learning tasks from human demonstration is a core feature for household service robots. Task knowledge should at the same time encode the constraints of a task while leaving as much flexibility for optimized reproduction at execution time as possible. This raises the question, which features of a task are the constraining or relevant ones both for execution of and reasoning over the task knowledge. In this paper, a system to record and interpret demonstrations of household tasks is presented. A measure for the assessment of information content of task features is introduced. This measure for the relevance of certain features relies both on general background knowledge as well as task-specific knowledge gathered from the user demonstrations. The results of the incremental growth of the task knowledge when more task demonstrations become available is demonstrated within the task of laying a table.
1 Introduction During the last years, human-centered robotics became a major trend in the robotics community. One crucial point in HCR is to design systems that adapt to unstructured and new situations the human user comes up with. This can be achieved by giving the robot the ability of learning to transform the knowledge the user has into knowledge utilizable by a mobile service robot. One of the most intuitive ways to acquire new task knowledge is to learn it from the human user via demonstration and interaction. This approach to task learning is widely known as Programming by Demonstration (PbD). PbD-systems should be capable of learning a task from a single demonstration sufficiently for execution while being able to generalize it when further ∗
This work has been partially conducted within the german collaborative research center ”SFB Humanoide Roboter” granted by Deutsche Forschungsgemeinschaft, and within the EU Integrated Project COGNIRON (”The cognitive robot companion”), funded by the European Commision Division FP6-IST Future and Emerging Technologies under Contract FP6-002020.
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 51–63, 2006. © Springer-Verlag Berlin Heidelberg 2006
52
M.Pardowitz,R.Z¨ ollner,andR.Dillmann
demonstrationsofthesametaskbecomeavailable.Incrementallearningapproachesthatgraduallyrefinetaskknowledgepavethewaytowardssuitable PbD-systemsforhumanoidrobots. One aspect that can only be learned incompletely from a single user demonstrationisthesequencethesubpartsofacertaintaskcanbescheduled. Sothetaskknowledgemustexplicitlyencodethoseinordertoensureareliable,faultlessandsaveexecutionofthetask.Afterseeingasingledemonstration,PbD-systemscanstatemultiplevalidhypothesesonthesequentialconstraintsataskmustobey.Whenmoredemonstrationsbecomeavailable,the incorrecthypothesesconflictingwiththenewdemonstrationscanbepruned. Identifyingthecorrespondingsubtasksperformedindifferentdemonstrationsofthesametaskisnotaneasyjob.Manyfeaturesofthetaskareexposed tonoise.Additionally,somefeaturesposesshigherrelevancetoacertaintask, whilebeingnegligibleforothers.Amethodforidentifyingthefeaturesthat characterizeataskandbasingtherecognitionofcorrespondingsubtaskson thisselectionispresentedinthispaper. Theremainderofthispaperisorganizedasfollows:Thenextsectiongives anoverviewonrelatedworkconcerningprogrammingbydemonstrationand task learning from user demonstrations. Section 3 describes the system for theacquisitionoftaskknowledgefromasingleuserdemonstration.Section 4introducestaskprecedencegraphsandproposesamethodforlearningthe underlyingprecedencegraphofataskwithtwoormoresequentialdemonstrationsgiven.Themethodsforidentifyingcorrespondingsubtaskswithin differenttaskdemonstrationsdescribedinsection5areanimportantpreliminaryforthiscomputation.Finally,themethodsdesscribedinearliersections areevaluatedinsection6.
2 Related Work Since [7] proposed a programming system for industrial robots using visual observation of a human user, several programming by demonstration (PbD) systems and approaches based on human demonstrations have been proposed during the past years. Many of them address special problems or a special subset of objects only. An overview and classification of the approaches can be found in [5, 14]. The mentioned Systems can be discriminated by the learning paradigm applied: Imitation learning systems are used to re-identify and execute human motions [3, 1, 2], aiming at a high similarity of the robot’s movements to the demonstrated trajectories. Examples for the entities learned by imitation are skills like grasp or release actions ([15]) or moves of manipulated objects with constraints like ”move along a table surface” etc. The classical ”peg in hole” problem can be learned by imitation, for example using a force controller which minimizes a cost function or a task consisting of a sequence of skills like ”find the hole”, ”set perpendicular position” and ”insert”. These systems
Incremental Learning of Task Sequences
53
require a large set of task demonstrations for generalizing the trajectory before the learning process can start. Background knowledge based or deductive PbD-systems, as presented in [13, 16], usually require much less or even only a single user demonstration to generate executable task descriptions. Often, the analysis of a demonstration takes place observing the changes in the scene, described by using relational expressions or contact relations [7, 10]. When mapping the learned task to the executing system, background knowledge based approaches are used in order to replicate the effects in the environment [12]. Sequential analysis of tasks is presented in [4, 9]. The first records multiple user demonstrations of a complex manipulation skill. Unnecessary actions that do not appear in all demonstrations are pruned and only the sequence of essential actions is retained. The latter stresses the role of interaction with the human user to facilitate learning of sequential arrangement of behaviors in a navigation task. In this paper a system is proposed that combines background knowledge based learning techniques with the subsymbolic, data-driven approaches into a single framework for learning of and reasoning on user task demonstrations . This gives the system to use the advantages of both learning paradigms. It overcomes the data-driven approaches’ need for a lot of demonstrations being available before the learning process can start, while being able to incrementally refine the task knoweldge that is generated using background knowledge based methods. It is applied to analyse the sequential structure of a task. In advance to the method of [4], it can not only prune the unnecessary operations that do not contribute to the effects of the demonstration, but reorder the remaining manipulation segments. This is done by applying symbolical reasoning methods on the different demonstrations of a single task in an autonomous way, in contradition to the approch in [9].
3 A System to Generate Hierarchical Task Descriptions User demonstrations featuring a task to be learned by the system are observed and analysed by a Programming by Demonstration system developed at our institute in recent years. This section gives a short overview of the task acquisition process of our system. During the performance of a task in a so called training-center, the user is observed using different sensors. These sensors include two data gloves for gathering the finger joint angles, magnetic trackers for obtaining the hands’ position, and two active stereo cameras mounted on pan-tilt-units for object localisation and tracking. Further details on the hardware used can be found in [6]. From the sensor input data, so called elementary operators are extracted. Elementary operators (EO’s) are actions that abstract a sensory motor coupling like primitive movement types (linear moves etc.), grasping and un-
54
M.Pardowitz,R.Z¨ ollner,andR.Dillmann
Fig. 1. Hierarchical representation of manipulation segments
grasping actions. These EO’s abstract from the specific execution hardware and are robot-independent from a system point of view, although they have to be implemented on the specific target robot system. From the elementary operators, compound actions are constructed in a bottom-up manner. These are called macro-operators (MO’s) and contain the full task description. On a basis level, elementary move operators are aggregated to approach, depart and transport trajectories. Together with the grasp and ungrasp EO’s, grabbing and releasing of objects can be constructed. Between those gripper operations, various basic manipulation operations are detected (like transport operations, pouring operations and tool handling for instance). On the highest level, a sequence of manipulation segments is subsumed under a whole task demonstration (see figure 1). For further details, please refer to [16]. On each level in this hierarchical task representation, the changes to the environment are induced from lower levels of the hierarchy. The pre- and postconditions describing the environmental states before and after the execution of each macro-operator are propagated from the EO-level to the manipulation segment level and are computed from the environmental changes during the manipulation. The state of the environment is described in terms of first order predicate logics, based on a set of semantical variables (see figure 2). These are basically geometrical inter-object relations (like “on”, “under”, “to the right of...” or “contained in ...”) and intra-object predicates like object class (“fork”, “plate”, “table”) and internal state descriptors (“opening angle”, “oven temperature”, “liquid level”, etc., depending on the object class). Each of these semantical variables is computed at every segmentation point, that starts or end a macro-operator. When the semantic variables change during the demonstration of a task these changes are accumulated to the overall effects of the demonstration. From these effects the pre- and postconditions of each node in the macro-operator tree, including the root, is computed. These are the central features of a task, used for the re-identification of tasks and subtasks to be described in section 5.
Incremental Learning of Task Sequences
55
Fig. 2. Some semantical variables describing the relative positions of objects.
The system covers a broad range of manipulation classes drawn from the operations needed in a household environment: Transport operations (Pick&Place), device handling (operating household devices, opening a fridge) and tool handling (pouring liquids from a bottle to a glas, unscrewing a bottle etc.). It depends on the condition that the user is doing all changes in the environment (closed world assumption). The result of the task acquisition process outlined in this section is a task description, containing a sequence of manipulation segments together with their pre- and post-conditions and the hierarchical decomposition into elementary operators.
4 Learning Task Precedence Graphs This section is concerned with the representation of the sequential features of a task, the task precedence graph (TPG), and how it can be learned incrementally from multiple demonstrations of this task.
56
M.Pardowitz,R.Z¨ ollner,andR.Dillmann
Whenauserperformsatask,heperformsit’ssubparts(themanipulation segments)inasequentialorderingthathehaschosenbyrandomorbyintent from all possible task execution orders. For a system recording his actions theseappearasasimplesequenceofoperations.Alearningsystemthatbuilds knowledgeaboutataskhastohypothesizeontheunderlyingsequentialtask structure.Thesehypothesescanberepresentedbytaskprecedencegraphs, introducedby[11]. Definition1. Ataskprecedencegraph(TPG)forataskT isadirectedgraph P =(N, R)withNbeingthesetofsubtasks o1 , o2 , .. ., on ,and R⊂ N× N beingtheset ofprecedence relations afaultlesstaskexecutionmust comply with. A precedence relation (o1 , o2 ) ∈ R with o1 , o2 ∈ N implies that the operationo1 mustbecompletebeforetheexecutionofo2 canstart.Thisis abbreviatedas o1 → o2 . For a system that is supplied with only a single user demonstration D = (o1 , o2 , .. ., on ) of a task, it is hard to guess the inherent task precedencegraph.Thelearningsystemcanstatedifferentequallyvalidhypotheses,rangingfromthemostrestrictingprecedencegraphP D =(N,RD )that onlyallowstheexecutionorderdemonstrated,tothemostliberalTPGthat posessesnosequentialdependenciesatallandallowstherobottochoosethe sequenceofoperationswithoutanyconstraints.[11]statesthatthesetofvalid hypothesesisaversionspace,partiallyorderedbythesubset-predicateonthe setsofprecedencerelations R. Whilethelearningsystemcannotdecide,whichtaskprecedencegraph fromthesetofconsistentTPG’sfitsthetaskbestafterseeingonlyasingle example,itseemsaviableapproachtosupplyitwithmoresampledemonstrations,applyingdifferenttaskexecutionorders.Inordertolearntaskknowledgefromevenasingledemonstrationsufficientforexecutionbutimproving thelearnedtaskwhenmoreknowledgeinformoftaskdemonstrationsisavailable,anincrementalapproachischosen. Assuming that, after processing m demonstrations {D1 , D2 , .. ., Dm } of thesametask,thesystemhaslearnedthemostrestrictiveTPGPm .Under observationofanewdemonstrationDm+1 ,thelearnedtaskprecedencegraph canbeadaptedinawaythatincorporatesthenewknowledge.[8]suggests thatthiscanbedonebyfurthergeneralisingtheprevioushypothesistoanew one,coveringtheadditionalexample.Inordernottogeneralizetoofar,the minimalgeneralizationofPm thatisconsistentwithDm+1 mustbechosen. Theresulting newsetoftaskprecedencerelationsRm+1 canbeexpressed as a function of the previously learned hypothesis Pm = (N,Rm ) and the most restrictive hypothesis for the new observed demonstration P Dm+1 = (N, RDm+1 ): Rm+1 =Rm ∩ RDm+1 (1) Now,onecanstatetheprocessofincrementallearningoftaskprecedence graphs:
Incremental Learning of Task Sequences
57
1. For the first training example D1 , initialize the task precedence graph to the most specific TPG P1 = P D1 . 2. For each additional demonstration Dm+1 of the same task, compute P Dm+1 and update the hypothesis Pm+1 according to equation 1. One issue not adressed until now is the question of when the additional task demonstrations arrive. In the application domain of household service robots one cannot assume that the user gives all demonstrations sufficient to learn the correct task precedence graph at once. Instead, it might take a long time between two successive demonstrations of the same task. Moreover, the task knowledge learned during past demonstrations has to be utilised because it is likely that the user will require the robot to execute the learned task without having taken into account all task demonstrations that might appear in the future. Therefore, the task precedence graph learned by the system should be reliable enough to ensure a faultless and, above all, secure task execution. With the mechanisms presented in this section it is possible to incrementally learn task precedence graphs from user demonstrations. The process of refining task precedence graphs starts with the most restrictive hypothesis (= the order seen in the first demonstrations) and continues to become more and more general as more demonstrations with different sequences of actions are taken into account.
5 Subtask Similarity Measures Based on Information-Theoretic Metrics While the last section presented a method for learning the sequential constraints of a task when in every task demonstration exactly the same subtasks are used, this is not likely to be true for every task. Usually the human demonstrator will only use similar but partly different manipulation segments in order to fulfill the same task. This section deals with the topic of identifying the corresponding manipulation segments in two or more demonstrations of the same task. In order to recognize the matching manipulation segments for two different task demonstrations, the ordering of the manipulations can not be a reliable measure for subtask correspondence as the sequence of subtasks performed is potentially permuted. Though matching the segments that manipulate the same class of objects seems to be a good idea at first, this method fails as soon as there are multiple objects of the same class present in the scene. So more features should be taken into account to determine the similarity of subtasks and establish sufficiently robust subtask corespondences in order to identify operations of equal impact to the scene. In order to recognize the matching manipulation segments, two sets of features are taken into account:
58
M.Pardowitz,R.Z¨ ollner,andR.Dillmann
• Object features: These contain the class of the objects manipulated or referencedinthecertainsubtask(cup,plate,tableetc.)aswellastheir possiblefunctionalroles(liquidcontainer,objectcontaineretc.). • Pre-andPostconditionfeatures:Thesecontainthegeometricalrelations thatexistbetweentheobjectsbeforeoraftertheperformanceofthesubtaskrespectively. ThebaseoperationtoassessfeatureconformanceisthemeasureofsimilaritysF fortwosetsoffeaturesA,Bwiththeportionofcommonfeatures inbothfeaturesets: |A∩ B| sF(A,B)= . (2) |A ∪ B|
It turned out that this straight-forward approach exposes one major drawback: As the number of features is relatively high and many of the features are irrelevant to the certain task to be learned, the features carrying the relevant information will be predominated by the irrelevant ones. The solution is to weight each feature f depending of it’s relevance to the task to be learned with a weight w(f ). This turns equation 2 into sF (A, B) =
f ∈A∩B f ∈A∪B
w(f ) w(f )
.
(3)
When choosing a useful weight function w(f ) one has to consider two points: Features of high relevance to the specific task should gain high weights. One way to assess the relevance of a feature is to test its occurrence over several different demonstrations of the same task. Features with a high relevance to the task to be performed have a great probability of occurrence, while features of lower relevance will occurr less frequently. According to Shannon, the information content I(f ) of a feature f with probability p(f ) is I(f ) = − log2 p(f ). As features with low information content (= high probability of occurrence) to the specific task class T should be favored, w(f ) = − log2 (1 − p(f |T)) seems a reasonable choice for the weight function. On the other hand, when a completely new task is learned or only few demonstrations of the same task are available one has to take into account that only little information about the distribution of features in the specific task class is known. The idea is to introduce global background knowledge on the feature distribution. When several demonstrations of different tasks have been observed by the robot and incorporated into the task knowledge base, it can be assumed that the features that uniquely discriminate the task are those that have low occurrence rates accross the whole task knowledge base. Thus, the information content to all other tasks T of feature f is − log2 p(f |T). An incremental learning system should be able to cope with both situations. Therefore a combination has to be found that favors the global information content measure when no or few task demonstrations are available, and the task-specific relevance measure as more demonstrations of the specific task become available. The weight function that fulfills those requirements is
Incremental Learning of Task Sequences
w(f ) = − α log2 (1 − p(f |T)) + (1 − α) log2 p(f |T)
59
(4)
with the relative weighting α of the task-specific knowledge as α = 1−e−k·|T| . In our experiments we chose k = − ln 0.25 5 , such that after observing 5 demonstrations of the same task, the proportion α of task-specific to prior-knowledge is 0.75 : 0.25 and converging towards 1 for more demonstrations. This choice is motivated by [1], stating that the important features of a task can be sufficiently learned with 4 − 5 demonstrations. With the weighted proportion of features sF as in eq. 3 it is possible to compute the similarities spre and spost of the pre- and postcondition sets of two subtasks M1 and M2 and the average of the subtask similarities ssub of the tasks. They are set to spre = sF (P recond(M1 ), P recond(M2 )) spost = sF (P ostcond(M1 ), P ostcond(M2 )) 1 ssub = sM (m1 , m2 ). |M1 ∪ M2 | m1 ∈M1 ,m2 ∈M2
The (recursive) overall similarity sM of the two manipulation segments M1 , M2 is defined as the simple average of the pre- and postcondition similarity and the average of all subtask similarities: sM (M1 , M2 ) = 31 · (spre + spost + ssub ). Once the subtask-similarity sM is computed for every pairing of the task’s subtasks, the subtask correspondence pST is computed as the permutation that maximizes the sum of similarities: pST = arg
max
p∈perm(subtasks)
sM (M1 , M2 ). (M1 ,M2 )∈p
With the subtask permutation pST the most restrictive hypothesis P Di on the underlying task precedence structure can easily be constructed. This hypothesis can then be utilised to learn sequential task constraints in the way described in section 4.
6 Experimental Results In this section the experiments validating the method for incremental acquisition of task knowledge using heuristic relevance estimates for task features are reported. The task to be learned was chosen from the household domain; it consisted of laying the table for a main dish. The manipulated objects were: a plate, a fork and a mug. The main features that should be learned by the system are the spatial arrangement of the objects (the fork should be placed to the right of the plate and the mug placed on the distant side of the fork) as well as the sequential independence of all subtasks.
60
M.Pardowitz,R.Z¨ ollner,andR.Dillmann
Eight different demonstrations of that task were given (T = {D1 , D2 , · · · , D8 }).Inthefirstfourofthem,thefork,theplateandthemug weremanipulatedinthisorder.Inthefifthandsixthdemonstrationthemug wasplacedfirst,followedbytheforkandtheplate.Inthefinaltwodemonstrationsthemug,theplateandtheforkweremanipulatedinthisordering. After taking into account the first example, the initial task precedence graphP D1 =P1 wascorrectlyobtained,whichcanbeseeninthefirstcolumn oftable1.Afterobservingasecondexample,D2 ,therelevanceestimatesfor thetasksimilaritymeasurewerecontinouslytakenintoaccount.Therelevance estimatesforseveralfeaturesdependingonthenumberoftaskdemonstrations aredepictedinfigure3(a).
0,9 0,8 0,7
0,025 0,6
0,02
Similarity
Relative Relevance Estimate
0,03
feat1 & 2 feat3 & 4 feat5 feat6 feat7
0,015 0,01
heuristical Spre naive Spre heuristical Ssub naive Ssub
0,5 0,4 0,3 0,2
0,005
0,1 0
0 1
2
3
4
5
6
7
1
8
2
3
4
5
6
7
8
#Task Demonstrations
# Task Demonstrations
(a)
(b) 1 0,9 0,8
Similarity
0,7 0,6
heuristical Spost naive Spost heuristical Sm naive Sm
0,5 0,4 0,3 0,2 0,1 0 1
2
3
4
5
6
7
8
# Task Demonstrations
(c) Fig. 3. (a) Relevance estimates for several features, depending on the number of tasks demonstrations taken into account. (b) & (c) resp.: Precondition-, Subtask-, Postcondition- and Overall-Similarity for the demonstrations D1 and D5 , calculated according to formula 3 (heuristic similarity) and 2 (naive similarity)
One can see that features 1-4 (the Relations mug ON table, fork ALIGNED EAST mug, plate ON table, fork ON plate) increase their relevance estimate as more task demonstrations become available. Meanwhile features 5-7 (the Relations fork TOUCHES SOUTH plate, mug TOUCHES NORTH
Incremental Learning of Task Sequences
61
plate and mug INTERSECTING fork ), occurring unintentionally in some task demonstrations, receive lesser ratings as they tend to occurr less frequently in all task demonstrations. With the assessment of feature relevance, the pre- and post-condition similarity spre resp. spost are calculated as well as the overall task similarity measure for all subtasks ssub , and finally the overall task similarity measure sM . In figures 3(b) & (c) , the results for the comparision for the task demonstrations D1 and D5 are shown in dependence of the number of task demonstrations regarded for the heuristical relevance estimation. The figures suggest the following: First, the heuristical measures start with a far better similarity than the naive approach. This is due to the second term in formula 4, which favours features that discriminate the new task from all other previously seen tasks. Second, the similarity measure gains higher values with more task-specific data becoming available. With more demostrations of the same task, the increasing weight α favors the first part of equation 4. As the task feature statistics also become more striking with more demonstrations, this part ensures a task-specific adaption of the relevance measures. After the system has seen four examples with all the same order of actions, the learned task precedence graph is unchanged. After that, the system observes a demonstration featuring a different execution order. The two different sequences and the recognized subtask permutation are shown in figure 4. The task precedence graph learned after five demonstrations is shown in the second column of table 1. When the last two user demonstrations are observed, the system can eliminate the last unnecessary precedence relation between the fork and the plate and is free to schedule every subtask as it is most convenient during the execution.
Fig. 4. Recognition of equal subtasks. Subtask correspondences are drawn with stipled lines. The sequential odering of subtasks is represented by bold arrows.
7 Conclusion This paper presents an approach to combining learning and reasoning on tasks. Tasks are recorded from user demonstrations, segmented, interpreted
62
M.Pardowitz,R.Z¨ ollner,andR.Dillmann
Table1.Taskprecedencegraphslearnedincrementallyduringtheperformanceof theexperimentdescribedinsection6. i
1-4
5-6
7-8
P Di
Pi
and stored in a data representation called macro-operators. Reasoning methods are applied to discover the task’s underlying sequential structure and reordering possibilities. Methods based on information-theoretic approaches are presented that estimate the relevance of task features. These heuristics combine pre-existing background knowledge on feature distributions across different tasks with learned knowledge about the task itself. The methods and models outlined are demonstrated and validated in the context of an everyday houshold task to be learned by a robot servant. The results proved the utility of the relevance estimates, the task similarity measures and the algorithms for learning sequential constraints and opportunities of tasks. This approach presents a step towards robots that learn task knowledge from human demonstrations in an incremental manner that allow them to refine and complete their learned knowledge as additional data becomes available, and act as real ’intelligent’ robot servants in future household applications.
References 1. Aude Billard, Yann Epars, Sylvain Calinon, Stefan Schaal, and Gordon Cheng. Discovering optimal imitation strategies. Robotics and Autonomous Systems, 47(2-3):69–77, 2004. 2. Cynthia Breazeal, Daphna Buchsbaum, Jesse Gray, David Gatenby, and Bruce Blumberg. Learning from and about others: Towards using imitation to bootstrap the social understanding of others by robots. Artificial Life, 11(1):31–62, 2005. 3. S. Calinon and A. Billard. Learning of Gestures by Imitation in a Humanoid Robot, chapter To appear, page In Press. Cambridge University Press, 2005. 4. Jason Chen and Alex Zelinsky. Programming by demonstration: Coping with suboptimal teaching actions. The International Journal of Robots Research, 22(5):299–319, May 2003. 5. R. Dillmann, O. Rogalla, M. Ehrenmann, R. Z¨ ollner, and M. Bordegoni. Learning Robot Behaviour and Skills based on Human Demonstration and Advice:
Incremental Learning of Task Sequences
6.
7.
8. 9.
10.
11.
12.
13.
14. 15.
16.
63
the Machine Learning Paradigm. In 9th International Symposium of Robotics Research (ISRR 1999), pages 229–238, Snowbird, Utah, USA, 9.-12. Oktober 1999. M. Ehrenmann, R. Z¨ ollner, O. Rogalla, S. Vacek, and R. Dillmann. Observation in programming by demonstration: Training and exection environment. In Humanoids 2003, Karlsruhe/Munich, Germany, October 2003, 2003. Y. Kuniyoshi, M. Inaba, and H. Inoue. Learning by Watching: Extracting Reusable Task Knowledge from Visual Observation of Human Performance. IEEE Transactions on Robotics and Automation, 10(6):799–822, 1994. Tom M. Mitchell. Generalization as search. Artificial Intelligence, 18(2):203– 226, 1982. M.. Nicoluscu and M. Mataric. Natural methods for robot task learning: instructive demonstrations, generalization and practice. In 2nd International joint conference on Autonomous agents and multiagent systems, 2003, pages 241–248, Melbourne, Australia, July 2003. H. Onda, H. Hirukawa, F. Tomita, T. Suehiro, and K. Takase. Assembly Motion Teaching System using Position/Force Simulator—Generating Control Program. In 10th IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 389–396, Grenoble, Frankreich, 7.-11. September 1997. M. Pardowitz, R. Z¨ ollner, and R. Dillmann. Learning sequential constratins of tasks from user demonstrations. In IEEE-RAS Intl. Conf. on Humanoid Robots (HUMANOIDS), Tsukuba, Japan, 2005. O. Rogalla. Abbildung von Benutzerdemonstrationen auf variable Roboterkonfigurationen. PhD thesis, University Karlsruhe, Departmen of Computer Science, 2002. Y. Sato, K. Bernardin, H. Kimura, and K. Ikeuchi. Task analysis based on observing handds and objects by vision. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Lausanne, pages 1208 – 1213, 2002. S. Schaal. Is imitation learning the route to humanoid robots? In Trends in Cognitive Scienes, volume 3, pages 323–242, 1999. Qin Wang, Ruqing Yang, and Weijun Zhang. Skill acquisition from human demonstrations using fcm clustering of qualitative contact states. In Proc. 2003 IEEE Intl. Symp. on Computational Intelligence in Robotics and Automation, pages 937–942, 2003. R. Zoellner, M. Pardowitz, S. Knoop, and R. Dillmann. Towards cognitive robots: Building hierarchical task representations of manipulations from human demonstration. Intl. Conf. on Robotics and Automation (ICRA) 2005, Barcelona, Spain, 2005.
Representation,RecognitionandGenerationof ActionsintheContextofImitationLearning HarisDindo1 andIgnazioInfantino2 1
DINFO,DipartimentoIngengeriaInformatica,Universit` adegliStudidiPalermo, VialedelleScienzeEd.6,90128,Palermo,Italy,[email protected] 2 ICAR,IstitutodiCalcoloeRetiadAltePrestazioni,ConsiglioNazionaledelle Ricerche,VialedelleScienzeEd.11,90128Palermo,Italy, [email protected]
Thepaperdealswiththedevelopmentofacognitivearchitectureforlearning by imitation in which a rich conceptual representation of the observed actionsisbuilt.Weadopttheparadigmofconceptualspaces,inwhichstatic anddynamicentitiesareemployedtoefficientlyorganizeperceptualdata,to recognizepositionalrelations,tolearnmovementsfromhumandemonstration andtogeneratecomplexactionsbycombiningandsequencingsimplerones. Theaimistohavearoboticsystemabletoeffectivelylearnbyimitationand whichhasthecapabilitiesofdeeplyunderstandingtheperceivedactionstobe imitated.Experimentationhasbeenperformedonaroboticsystemcomposed ofaPUMA200industrialmanipulatorandananthropomorphicrobotichand.
1 Introduction Among the modern robotic control architectures, imitation systems have gained an important role, aiming to reduce the complexity of the robot programming by endowing the system with the capability to understand actions of a human demonstrator. For a review on learning by imitation paradigm and different aspects on imitation see [15]. The learning by imitation approach involves several difficult research problems as outlined in [4], [7] and [14]. In the past years, different aspects of the problem were addressed in many working systems ([1], [2], [3], [10], [12], [17], [18]). However, the main emphasis is given to the problem of how to map the observed movement of a teacher onto the movement apparatus of the robot. While this is a fundamental aspect of any robotic system which copes with imitation, we claim that in order to effectively learn by imitation, the system itself should have the capabilities to effectively ”understand ” its environment and the perceived actions to be imitated. In our approach, understanding involves the generation of a high-level, declarative description of the perceived world. In other words, understanding
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 65–77, 2006. © Springer-Verlag Berlin Heidelberg 2006
66
H.DindoandI.Infantino
copeswithbuildinganinternalmodeloftheexternalworld.Itendowsthe robotwiththecapabilitytoperformreasoningsandpredictionsaboutwhat surroundsit. AsithasbeenproposedbyChellaetal.[6],theconceptualspaces(see[8] foranintroduction)allowtoinvolvethegenerationofahigh-level,declarative descriptionoftheperceivedworld.Followingthisapproach,inthispaperwe proposearobustarchitecturethatallowsustoextractusefulinformationfrom eachintermediatelevelofaconceptualspaceforthepurposesofimitation learning(geometricandcolorpropertiesofobjects,spatialrelationsbetween objectsandactionsperformedondifferenttypesofobjects).TheMoreover, learningandimitationaretreatedinthesamearchitectureastwodifferent pointsofviewofthesameknowledgerepresentation. Therestofthepaperisorganizedasfollows:inthenextSectionthecognitivearchitectureissummarizedanddetaileddescriptionoftheconceptual representationisgiven.Section3showsanimplementationofthearchitecture onaroboticplatform.Finally,Section4givesshortconclusionsandoutlines thefuturework.
2TheCognitiveArchitecture Theproposedarchitectureisorganizedinthreecomputationalareasasshown infig.1.TheSubconceptualArea isconcernedwiththelow-levelprocessing ofperceptualdatacomingfromvisionsensorsthroughasetofbuilt-invisualmechanisms(e.g.colorsegmentation,featureextraction,teacher’sposturereconstruction),aswellaswiththecontroloftheroboticsystem.Inthe ConceptualArea,differentpropertiesoftheperceivedscenearecaptured:the PerceptualSpace describesstaticpropertiesoftheentitiesinthescene(e.g. geometricalpropertiesoftheobjects),theSituationSpaceencodesrelations between various entities in a Perception (e.g. near, left, right, etc.), while theActionSpaceencodesthedynamicpropertiesoccurringintheworld.In theLinguisticArea,representationandprocessingarebasedonahigh-level symboliclanguage.Thesymbolsinthisareaareanchoredtosensorydataby mapping them on appropriate representations in the different layers of the conceptualarea. The same architecture is used both to analyze the observed task to be imitated(observationmode)andtoperformtheimitationofthegiventask (imitationmode).Forthepurposesofthepresentdiscussion,weconsidera simplifiedbidimensionalworldpopulatedwithvariousobjectsinwhichobservation/imitationtakesplace.Duringtheobservationphase,theusershows her/hishandwhileperformingarbitrarytasksofmanipulatingobjectsinfront ofasinglecalibratedcamera.Thetaskisthensegmentedintomeaningfulunits anditsproperties(objects’positionandorientation,colorandshape,relations betweenobjectsandactiontypes)arerepresented,throughtheConceptual Area, into the linguistic area in terms of high-level symbolic terms. In the
Representation,RecognitionandGenerationofActions
67
Fig. 1. The computational areas of the proposed architecture and their relation with the external world.
imitation phase, the robotic system performs the same tasks triggered by a known stimuli in the external world or by a user’s query. In the following, a detailed description of each computational area is given. 2.1 The Subconceptual Area In the proposed architecture, the sub-symbolic processing area is organized in two distinct modules performing low-level processing of sensory data. The Perceptuomotor Module continuously tracks the actions of human demonstrator in the scene during the observation mode and reports any significant changes in its motion state. During the imitation mode it translates high-level descriptions of the task to be performed into low-level control commands to be sent to the robot’s actuators. The Scene-Reconstruction Module detects all the objects in the scene and reconstructs them by means of parametric functions. It outputs positional, color and shape properties of objects in the scene. The Perceptuomotor Module The perceptuomotor module in the Subconceptual Area is concerned with both the human hand’s posture reconstruction and with the mapping between human and robot movements. The system performs skin detection in order to detect the human hand and continuously tracks its position and orientation in the image plane. The homography relationship between the image and the working plane of the user is computed during an off-line calibration phase. The module outputs the stimuli relative to the human hand’s trajectory in the world coordinate system:
68
H.DindoandI.Infantino
P M Mhand (t)=[xhand (t), yhand (t), θhand (t)]T.
(1)
Asthesceneevolveschangesoccursintheworldbecauseobjectsarebeing
(a)
(b)
(c)
(d) X Trajectory
300 250 200 150 100 50 0
0
1
2
3
4
5
6
4
5
6
Y Trajectory 300 250 200 150 100 50 0
0
1
2
3
Fig. 2. The upper figure shows the sequence of actions performed by the human user. Human hand is continuously tracked and its position and orientation is recorded. Zero-velocity crossing algorithm performs the detection of key-frames and objects’ properties (position, color and shape). The lower figure depicts computed trajectories of the human hand during the observation phase (cartesian position of the hand’s centroid). Vertical lines correspond to extracted key-frames.
grasped by the hand and moved to another location. In order to detect changes in a dynamic scene, we perform motion segmentation on the human hand data by using zero-velocity crossing detection. The zero-velocity occurs either when the direction of movement changes or when no motion is present. In order to accurately detect the zero-crossings in the movement data, we first apply a low-pass filter to eliminate high-frequency noise components, and then we
Representation,RecognitionandGenerationofActions
69
seek for the zero velocities within a threshold. The result is a list of frame indexes, named key-frames, in which object properties (shape and color) are to be processed. Fig. 2 shows a typical task performed by the user and the process of extracting the key-frames from the human hand trajectory data. This module is also responsible for the of the issues related to the mapping of teacher’s internal/external space, e.g. joint angles or Cartesian coordinates of the end-effector, to that of the robot. In the current implementation we convert the Cartesian coordinates (in terms of x-y position and orientation) into robot joint angles by the means of standard inverse kinematics methods ([16]). Scene-Reconstruction Module For the purpose of the present discussion, we represent the objects in a scene as superellipses [11]. A superellipse is a closed curve defined by the following equation: y 2 x 2 (2) ( ) n + ( ) n =1 a b where aandbarethesize(positiverealnumber)ofthemajorandminoraxes andnisarationalnumberwhichcontrolstheshapeoftheofthecurve.For example,ifn=1anda = b,wegettheequationofacircle.Forsmallern wegetgraduallymorerectangularshapes,untilforn→ 0thecurvetakesup arectangularshape(fig.3).Asuperellipseingeneralpositionrequiresthree additionalparametersforexpressingtherotationandtranslationrelativeto the center of the world coordinate system (x,y, θ). Therefore, in order to completelydescribeasuperellipse,weneedsixparameters[a, b,n, x, y,θ]. For each detected object in the scene, we fit a superellipse to it. First, weobtainthepositionandorientationoftheobjectsinthescenebyusing theHotellingtransform(see[9]),andthenweobtaintheminimumbounding rectangle(i.e.,therectanglewithminimumareathatcontainsallthedata).At thispoint,wearegivenallthesuperellipseparameters(axislengths,center, andorientation)excepttheshapeparameter,n,whichcanbeapproximated as n∼ =1 − t ±
t2 − 6t+5,
(3)
Fig. 3. Superellipses with different values of exponent n. Size parameters a and b are kept constant.
70
H.DindoandI.Infantino
A where, t = 4ab , and A is the area of the image object(see [13]). An example of the fitting process is given in the fig. 4. Segmented Image (HSV Color Space Segmentation)
Fitted Superellipses
X=281.1;Y=180.0;Θ=−1.5° ID=3;GRASPED=0
X=105.0;Y=89.0;Θ=15.8° ID=1;GRASPED=0
X=247.0;Y=81.1;Θ=−19.7° ID=2;GRASPED=0
Fig. 4. Results of the superellipse fitting algorithm.
However, since the process of superellipse fitting is time-consuming, the detection of scene objects is performed only at the key-frames extracted in the Perceptuomotor Module. 2.2 The Conceptual Area The conceptual spaces are based on geometric treatment of concepts in which concepts are not independent of each other but structured into domains [8], e.g. color, shape, kinematics and dynamics quantities, and so on. Each domain is defined in terms of a set of quality dimensions (e.g. the color domain may be described in terms of hue/saturation/value quantities while the kinematics domain is described by joint angles). In a conceptual space, objects and their temporal and spatial evolution are represented as points describing the quality dimensions of each domain. It is possible to measure the similarity between different individuals in the space by introducing suitable metrics. In the
Representation,RecognitionandGenerationofActions
71
context of present architecture we propose three conceptual spaces capturing different temporal and spatial aspects of the scene (see fig. 5) as described in the following.
Fig. 5. The structure of the Conceptual Area: Perceptual Space PS describes static properties of the entities in the scene (e.g. geometrical properties of the objects), the Situation Space SS encodes relations between various entities in a Perception (e.g. near, left, right, etc.), while the Action Space AS encodes the dynamical properties of the world.
The Perceptual Space Points in the Perceptual Space are computed starting from the stimuli obtained during the subconceptual processing stage. In this work, the quality dimensions of our Perceptual Space are given by the shape and color properties of the objects perceived: P Sobj = [aobj , bobj , mobj , hobj , sobj , vobj ]T .
(4)
Hence, this space captures the shape and color properties of objects. This is useful in assigning high-level terms to objects, such us square, circle, red, green, etc. Each point is the Perceptual Space is assigned a unique identifier, ps id, together with the spatial information (x, y, θ) and the key-frame at which it has been recorded. The Situation Space The Situation Space encodes spatial relations between objects in a scene, such as left, right, near, etc. Suppose to have to have two objects in the Perceptual Space recorded at the same key-frame and described by their labels: (ps id1 , x1 , y1 , θ1 , kf and (ps id2 , x2 , y2 , θ2 , kf ). The situation between objects is then described by using the following equation: x = x1 − x2 ,
y = y1 − y 2 ,
θ = θ 1 − θ2 .
(5)
72
H.DindoandI.Infantino
Hence,thequalitydimensionsofthespacearegivenbythe( x, y, θ), anditresultstobesymmetricwithrespecttotheorigin.Eachpointinthe SituationSpaceisassignedauniqueidentifier,ss id, together with the ordered set of Perceptual Space labels {ps id1 , ps id2 }. A particular ss id is used to describe the absolute position/orientation of objects at each key-frame. It is straightforward to introduce suitable metrics in order to measure spatial relations between objects in a scene. The Action Space In order to account for the dynamic aspects of the perceived scene, we propose an Action Space in which each point represents the simple motion of entities. An action corresponds to a ”scattering” from one situation to another in the conceptual space. However, the decision of which kind of motion can be considered simple is not straightforward, and it is strictly related to the problem of motion segmentation, as described in Sect. 2.1. An immediate choice is to consider ”simple” any motion of the human hand occurring between keyframes. In our world there are two possible types of actions which can be performed by the hand: reach (reaches an object for the purposes of grasping it) and transfer (moves the hand to an absolute position and orientation in the world coordinate system, or relative to other objects in the world). In the first case, the trajectory of the hand is expressed relative to the object to be grasped, while in the second, it is represented relative to the origin of the world reference system. In both cases, we will refer to target configuration, xtarget . In order to represent simple motions, it is possible to choose a set of basis functions, in terms of which any simple motion can be expressed. Such functions can be associated to the axes of the dynamic conceptual space as its dimensions. Therefore, from the mathematical point of view, the resulting Action Space is a functional space. A single point in the space therefore describes a simple motion of the human hand. In the context of the present work we choose the technique of Principal Component Analysis (PCA) in order to obtain the set of basis functions. We first obtain a set of training trajectories performed by different users, ϕ(t) = [x(t), y(t), θ(t)]T , and represent each trajectory relative to the target configuration, ϕ (t) = ϕ(t) − xtarget . In order to account for different temporal durations of simple motions, we interpolate the data to a fixed length. We noted that generally the first six principal components account for the most of the data variance, which are used to encode the simple human hand motions. Principal components form the basis functions, ϕi (t), in terms of which each hand trajectory can be represented as a linear combination: k
ϕ (t) =
ci ϕi (t) i=1
(6)
Representation,RecognitionandGenerationofActions
73
where ci aretheprojectioncoefficients,andtheindex kreferstothenumber ofprincipalcomponents(inourcase,k=6).Hence,thequalitydimensions oftheActionSpacearegivenbythePCAcoefficients: AS=[c1 , c2 , .. ., c6 ]T.
(7)
However,principalcomponentsareusefulalsoasthebasisforgenerating noveltrajectoriesthatresemblesthoseseenduringthetrainingphase.This isdoneastheprocessofdatainterpolation:giventhestarting,(ϕ (t0 )),and ending,(ϕ (tf)),handconfigurations,itispossibletodeterminethecoefficients[c1 , .. ., c6 ]T whichsatisfytheboundaryconditionsbysolvingtheset ofsix(linear)equationsinsixunknowns: k
k
ϕ (t0 )=
ci ϕi (t)
ci ϕi (t); ϕ (tf ) = i=1
(8)
i=1
2.3 The Linguistic Area The more “abstract” forms of reasoning, that are less perceptually constrained, are likely to be performed mainly within the Linguistic Area. The elements of the Linguistic Area are symbolic terms that have the role of summarizing perceptions and actions represented in the conceptual spaces previously described, i.e., linguistic terms are anchored to the structures in the conceptual spaces [5]. We adopted a first-order logic language in order to implement the Linguistic Area. Perceptions and actions are represented as first-order terms. The semantics of the language is given by structures in conceptual spaces according to G¨ ardenfors [8]. In particular, terms related to actions are interpreted in the action space, terms related to the spatial properties of the objects in the scene are interpreted in the situation space, while terms related with objects are interpreted in the perceptual space. In the present work, we adopt a finite number of primitive actions the robot can perform: • reach and grasp(ps id, as id), adopted to describe an action of reaching and grasping the object described by the identifier ps id by executing the action as id, and • transf er and release(ss id, as id), adopted to describe an action, described by the indentifier as id, of transferring and releasing a grasped object to an absolute position/orientation or relative to other objects in the workspace. Each complex task may be described as a sequence of primitive actions: task : −reach and grasp(ps id, as id), transf er and release(ss id, as id), ...
74
H.DindoandI.Infantino
3ExperimentalResults Ourroboticsetupiscomposedofaconventionalsixdegrees-of-freedomPUMA 200 manipulator and an anthropomorphic four-fingered robotic hand. The overallsystemhas22degrees-of-freedom,anditisendowedwiththeability tonavigateinthefreespace,andtograspobjectswhichcanbedescribedas superellipses.Theexperimentsweperformedareconcernedwiththeproblem ofteachingtheroboticsystemseveralmanipulativetasks. Inordertofacilitatetheprocessofteachingtherobotatask,wehavedeveloped the VisualBlackboardSystem for this purpose. It allows a user to createaworkspacewiththesamedimensionsastherealone,topopulateit withvariousobjects,andtoexecuteseveraltasks.Italsoallowstoobserve theresultsoftheimitationphase.TheVisualBlackboardSystemisshownin figure6.
Fig. 6. Visual Blackboard System for teaching tasks.
As an example, consider the task of placing the ”red square” to the right of the ”green circle”. Having seen this task performed by a user, the system extracts the key-frames and encodes all scene objects, their relations and human hand motion in the Conceptual Area. Suppose that ”red square” is assigned the perceptual space label ps id1 and green circle is assigned the label ps id2 . Their final situation is assigned the situation space label ss idgoal .
Representation,RecognitionandGenerationofActions
75
This task is automatically assigned the following description: reach(ps id1 ), grasp(ps id1 ), transfer relative to(ss idgoal ), and could be labelled accordingly to the goal performed. Since the perceptual space points hold the whole information about the objets involved in a task, it is possible to refer the previous description to another scene in which similar object are present (e.g. red square and green circle) and perform exactly the same task. However, the main strength of our architecture lies in the fact that it is also possible to anchor the same task to the more general instances by exploiting the conceptual representations. Hence, it is possible to obtain the task of ”placing squares left to circles”, or ”placing red objects left to the green ones”, and their combinations. However, each task description is given a priority: less general tasks have higher priority with respect to more general tasks. In the Imitation Mode, the system observes the scene and searches for an already seen task according to the priority (in our example it suffices to seek for squares, circles, red or green objects). If it is not able to match the task, it prompts the user which eventually shows the task in the observation mode. Otherwise, the matched description is then tailored to the particular situation, and absolute positions and orientations to be reached by the robotic system are produced and sent to the Action Space for the trajectory computation. The whole trajectory is then sent to the Robotic Control Module and mapped to our robot’s actuators. An example of the robot imitating the task (depicted in fig. 2) is shown in fig. 7. Note that in this example, the control of the robotic hand fingers is performed as a separate behavior and is not involved in the imitation process.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 7. (a) Top view of the world as seen by the robot. (b)-(f) Sequence of actions performed by the robot in the imitation mode: the robot executes the same task performed by the user.
76
H.DindoandI.Infantino
4Conclusions Themaincontributionofthepaperistheproposalofaneffectivecognitive architectureforimitationlearninginwhichthreedifferentrepresentationand processingareascoexistandintegrateinatheoreticalway:thesubconceptual, theconceptualandthelinguisticareas.Theresultingsystemisabletolearn natural movements and to generate complex action plans by manipulating symbolsinthelinguisticarea. Itshouldbenotedthattheconceptualspacesmaybeemployedasasimulation structure,i.e.tosimulatemovementsandactionsbeforeeffectivelyexecuting them.Currentresearchconcernshowtoenhancetheproposedarchitecture bystressingthissimulatingcapabilitiesinthecontextofimitationlearning.
References 1. C. G. Atkeson, S. Schaal (1997) Learning Tasks From A Single Demonstration. In: Proceedings of IEEE-ICRA, pp. 1706-1712, Albuquerque, New Mexico. 2. A. Billard, M.J. Mataric (2001) Robotics and Autonomous System, no. 37, pp. 145-160. 3. A. Billard (2002) Imitation: a means to enhance learning of a synthetic protolanguage in autonomous robots. In: K. Dautenhahn and C.L. Nehaniv (eds), Imitation in Animals and Artifacts, pp. 281-310, MIT Press. 4. C. Breazeal, B. Scasselati (2002) Challenges in Building Robots That Imitate People. In: K. Dautenhahn and C.L. Nehaniv (eds), Imitation in Animals and Artifacts, pp. 363-390, MIT Press. 5. A. Chella, M. Frixione, S.Gaglio (2003) Robotics and Autonomous Systems, Special Issue on Perceptual Anchoring, vol. 43, 2-3, pp. 175-188. 6. A. Chella, H. Dindo, I. Infantino (2005) A Cognitive Framework for Learning by Imitation. ICRA05 workshop on the Social Mechanisms of Robot Programming by Demonstration, Barcelona, Spain. 7. K. Dautenhahn, C.L. Nehaniv (2002) The agent-based perspective on imitation. In: K. Dautenhahn and C.L. Nehaniv (eds), Imitation in Animals and Artifacts, pp. 1-40, MIT Press. 8. P. G¨ ardenfors (2000) Conceptual Spaces. MIT Press-Bradford Books, Cambridge, MA. 9. R. C. Gonzales, and P. Wintz (1987) Digital Image Processing (2nd ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. 10. J.A. Ijspeert, J. Nakanishi, S. Schaal (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In: Proceedings of Intl. Conf. on Robotics and Automation, Washington. 11. A. Jaklic, A. Leonardis, F. Solina (2000) Segmentation and Recovery of Superquadrics. Computational Imaging and Vision, Kluwer, Dordrecth, 2000 12. Y. Kuniyoshi, M. Inaba, and H.Inoue (1994) IEEE Trans. Robot. Automat., 10:799-822, November. 13. P. Rosin (2000) IEEE Transactions. on Pattern Analysis and Machine Intelligence, Vol. 22, No. 7, pp.726-232. 14. S. Schaal (1999) Trends in Cognitive Sciences 3, pp.233-242.
Representation,RecognitionandGenerationofActions
77
15. S. Schaal, A. J. Ijspeert, A. Billard (2003) Computational Approaches to Motor Learning by Imitation. In: Philosophical Transactions: Biological Sciences (The Royal Society), no. 358, pp.537-547. 16. L. Sciavicco, and B. Siciliano (2000) Modelling and Control of Robot Manipulators. 2nd ed. pringer-Verlag, New York. 17. A. Ude, T. Shibata, C.G. Atkeson (2001) Robotics and Autonomous Systems, vol. 37, pp. 115-126. 18. D. M. Wolpert, K. Doya, M. Kawato (2003) A unifying computational framework for motor control and social interaction. In: Philosophical Transactions: Biological Sciences (The Royal Society), no. 358, pp. 593-602.
ReductionofLearningTimeforRobotsUsing AutomaticStateAbstraction MasoudAsadpour1 ,MajidNiliAhmadabadi2 ,andRolandSiegwart1 1
AutonomousSystemsLab,EcolePolytechniqueF´ed´eraledeLausanne(EPFL), CH-1015Lausanne,Switzerland{masoud.asadpour,roland.siegwart}@epfl.ch 2 ControlandIntelligentProcessingCenterofExcellence,ECEDept.,University ofTehran,Iran [email protected] Summary. TherequiredlearningtimeandcurseofdimensionalityrestricttheapplicabilityofReinforcementLearning(RL)onrealrobots.Difficultyininclusionof initialknowledgeandunderstandingthelearnedrulesmustbeaddedtothementionedproblems.Inthispaperweaddressautomaticstateabstractionandcreation ofhierarchiesinRLagent’smind,astwomajorapproachesforreducingthenumber oflearningtrials,simplifyinginclusionofpriorknowledge,andmakingthelearned rulesmoreabstractandunderstandable.Weformalizeautomaticstateabstraction andhierarchycreationasanoptimizationproblemandderiveanewalgorithmthat adaptsdecisiontreelearningtechniquestostateabstraction.Theproofofperformanceissupportedbystrongevidencesfromsimulationresultsinnondeterministic environments.Simulationresultsshowencouragingenhancementsintherequired numberoflearningtrials,agent’sperformance,sizeofthelearnedtrees,andcomputationtimeofthealgorithm. Keywords: StateAbstraction,HierarchicalReinforcementLearning
1 Introduction It is well accepted that utilization of proper learning methods reduces robots’ after-design problems caused naturally due to use of inaccurate and incomplete models and inadequate performance measures at the design time in addition to confronting unseen situations and environment and robot changes. Existing learning methods that are capable of handling dynamics and complicated situations are slow and produce rules – mostly numerical – that are not abstract, modular, and comprehensible by human. Therefore, the designer’s intuition cannot be simply incorporated in the learning method and in the learned rules. In addition, the existing learning methods suffer from the curse of dimensionality, requiring a large number of learning trials and lack of abstraction capability. RL [1] despite its strength in handling dynamic and nondeterministic environments is an example of such learning methods. In this
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 79–92, 2006. © Springer-Verlag Berlin Heidelberg 2006
80
M.Asadpour,M.N.Ahmadabadi,andR.Siegwart
Fig.1. Stateabstractionconcept
researchwetrytosolvesomeofthementionedproblemsinRLdomainbecause ofitssimplicityandvarietyofitsapplicationsinroboticsfield,inadditionto itsmentionedadvantages. It is believed that some of the discussed drawbacks can be lessen to a greatextendbyimplementationofstateabstractionmethodsandhierarchical architectures.Havingabstractstatesandhierarchiesintheagent’smind,the learnedrulesarepartitioned,definitionofperformancemeasuresforfuture learningissimplified,andthenumberoflearningtrialscanbereducedbya carefuldesignoflearningmethod.Inaddition,incrementalimprovementof agent’sperformancebecomesmuchsimpler. Differentapproacheshavebeenproposedsofarforstateabstractionand creationofhierarchiesandimplementationofRLmethodsinthosearchitectures.OnegroupofthesemethodsassumesaknownstructureanduseRL methodstolearnbothrulesineachlayerandthehierarchy.Anothergroupof researchesproposeshand-designofsubtasksandlearningtheminadefined hierarchy.Thesetwoapproachesrequiremuchdesigneffortforexplicitformulationofhierarchyandstateabstraction.Moreover,thedesignershouldto someextendknowhowtosolvetheproblembefores/hedesignsthehierarchyandsubtasks.Thethirdapproachusesdecision-treelearningmethodsto automaticallyabstractstates,detectsubtasksandspeeduplearning(Fig.1). ThedevisedapproacheshoweverdonottaketheexploratorynatureofRL into account which results in non-optimal solution. In this paper, we take amathematicalapproachtodevelopnewcriteriaforutilizationofdecision treesinstateabstraction.Ourmethodoutperformstheexistingsolutionsin performance,numberoflearningtrials,andsizeoftrees. Relatedworksarereviewedinthenextsection.U-Treealgorithmandits drawbacksarebrieflydescribedinthethirdsection.Inthefourthsectionwe formalizestateabstractionasanoptimizationproblemandderivenewsplit criteria. The fifth section describes the simulation results. Conclusions and futureworksarediscussedfinally.
ReductionofLearningTimeforRobotsUsingAutomaticStateAbstraction
81
2RelatedWorks Thesimplestideatogainabstractknowledgeisdetectionofsubtasks,specially repeatingones[2],anddiscoveringthehierarchyamongthem.Oncetheagent learnedasubtask,whichisfasterandmoreefficientduetoconfrontingwith smallerstatespace,itwouldbeabletouseitafterwards.SubtasksinHierarchicalRL(HRL)areclosed-looppoliciesthataregenerallydefinedforasubset of the state set [3]. These partial policies are sometimes called temporallyextended-actions, options [4], skills [5], behaviors [6], etc. They must have well-definedterminationconditions. Althoughthesemethodsgivemoreunderstandingabouttheunderlying aspects of Hierarchical RL however; since particular features are designed manually based on the task domain, it is hard to apply them to complex tasks.Thegeneralproblemthatshouldbesolvedfirst,eitherwhendesigning manuallyorextractingthestructuresautomaticallyisthestateabstraction.3 Inmanysituationssignificantportionsofalargestatespacemaybeirrelevant to a specific goal and can be aggregated into a few, relevant, states. As a consequence,thelearningagentcanlearnthesubtaskreasonablyfast.Our workaddressesthisproblem. Techniques for non-uniform state-discretization are already known e.g. Parti-game[8],Galgorithm[9],andU-Tree[10].Theystartwiththeworldas asinglestateandrecursivelysplititwhennecessary.ContinuousU-Tree[11] extendsU-Treetoworkwithcontinuousstatevariables.TTree[12]applies ContinuousU-TreetoSMDPs.JanssonandBarto[13]appliedU-Treetooptions but they use one tree per option and build the hierarchy of options manually.WeadoptU-Treeasourbasicmethodbecausethereportedresults illustrateitasapromisingapproachtoautomaticstateabstraction. In this paper we show that the employed U-Tree methods ignore RL’s explorativenature.Thisimposesabiasinthedistributionofsamplesthatare savedforintroducingnewsplitsinU-Tree.Asaresult,findingagoodsplit becomesmoreandmoredifficultandtheintroducedsplitscanbefarfrom optimal.Moreover,U-Tree-basedtechniqueshavebeenexcerptedinessence fromdecisiontreelearningmethods.Thereforetheusedsplitcriteriaarevery generalandcanworkwithanysortofdata.Herewedevisesomespecialized splitcriteriaforstateabstractioninRLandshowthattheyaremoreefficient thanthegeneralonesbothinlearningperformanceandcomputationtime. DeanandRobert[14]havedevelopedaminimizationmodel,knownas reduction,toconstructapartitionspacethathasfewernumberofstatesthan theoriginalMDP,whileensuringthattheutilityofthepolicylearnedinthe reducedstatespaceiswithinafixedboundoftheoptimalpolicy.Ourwork isdifferentbecause -reductiondoesnotextractanyhierarchyamongstate 3
The inverse of state abstraction, State Aggregation [15] clusters “similar” states and assigns them same values. It effectively reduces the state space size, but requires large number of trials and huge initial memory.
82
M.Asadpour,M.N.Ahmadabadi,andR.Siegwart
partitions,whilebuildingahierarchycanreducesearchtimeinpartitionlists fromO(n)toO(log n),nbeingthenumberofpartitions.Also,theirtheoryis developedbasedonimmediatereturnofactionsinsteadoflong-termreturn. Wecanarguethat -reductionisaspecialcaseofourmethod.
3 Formalism We model the world as a MDP which is a 6-tuple (S, A, T, γ, D, R) , where S is a set of states (here can be infinite), A = {a1 , . . . , an } is a set of actions, a T = {Ps´ s } is a transition model that maps S ×A×S into probabilities in [0, 1], γ ∈ [0, 1) is a discount factor , D is the initial-state distribution from which the start state is drawn (shown by s0 ∼ D ), and R is a reward function that maps S × A × S into real-valued rewards. A policy π maps from S to A, a realvalued value function V on states, or a real-valued action-value function (also called Q-function) on state-action pairs. The aim is to find an optimal policy π∗ (or, equivalently, V ∗ or Q∗ ) that maximizes the expected discounted total rewards of the agent. We assume that each state s is a sensory-input vector (x1 , . . . , xn ), where xi is a state-variable (called also feature, or attribute). 3.1 U-Tree A U-Tree [10] [11] abstracts the state space incrementally. Each leaf Lj of the tree corresponds to an abstract state S¯j . Each leaf keeps action-values Q(S¯j , a) for all available actions. The tree is initialized with a single leaf, assuming the whole world as one abstract state. New abstract states are added if necessary. Sub-trees of the tree represent subtasks of the whole task. Each sub-tree can have other sub-sub-trees that correspond to its sub-sub-tasks. The hierarchy breaks down to leaves that specify primitive sub-tasks. Continuous U-Tree [11] loops through a two phase process: sampling and processing phases. In sampling phase, a history of the transition steps Ti = (si , si , ri , s´i ) composed of the current state, the selected action, the received immediate reward, and the next state is saved. The sample is assigned to a unique leaf of the tree, based on the value of the current state-variables. During the sampling phase the algorithm behaves as a standard RL algorithm, with the added step of using the tree to translate sensory input to an abstract state. After some number of learning episodes the processing phase starts. In processing phase a value is assigned to each sample: V (Tt ) = rt + γV (¯ st+1 ), V (¯ st+1 ) = max Q(¯ st+1 , a) a
(1)
where s¯t+1 is the abstract state that st+1 belongs to. Then if a significant difference among the distributions of sample-values within a leaf is found that leaf is split in two leaves.
ReductionofLearningTimeforRobotsUsingAutomaticStateAbstraction
83
Tofindthebestsplitpointforeachleaf,thealgorithmloopsoverstatevariables.Thesampleswithinaleafaresortedaccordingtoastate-variable and a trial split is virtually added between each consecutive pair of statevariablevalues.Thissplitdividestheabstractstateintwosets.Asplitting criterion, like Kolmogorov-Smirnov test (KS), compares the two sets and returns a number indicating the different between the two distributions. If the largest difference among all state-variables is bigger than a confidence threshold,thenasplitisintroduced.Thisprocedureisrepeatedforallleaves. AlthoughtheoriginalalgorithmforU-Tree[10]canfunctioninpartially observabledomains,forthesakeofsimplification,wemakethestandardassumptionthattheagenthasaccesstoacompletedescriptionaswell.Also, sincethestructureofthetreeisnotrevisedonceasplitisintroduced;we assumetheenvironmentisstationary. If we assume processing phase is done after each learning episode, the numberofleaves4 atepisodetisL(t),andsamplesizeineachleafism,the bestsortalgorithms,e.g.Quicksort,cansortthesamplesbasedonastatevariableinO(m logm).Maximumsplitpointscanbe(m− 1).Tofindthe cumulativedistribution,KS-testneedsthesamplesinthetwovirtualsplitsto m besortedbasedontheirvalues.ThisneedsatleastO(2 m 2 log 2 )). Then a pass through all sorted samples needs O(m). Totally the order of the algorithm using KS-test and assuming n being dimension of the state vectors, is: O(L(t) n m4 log m log
m ) 2
(2)
3.2 Splitting Criteria In literature two types of split criteria have been used for state abstraction. Rank-based tests e.g. non-parametric KS-test [10] [11], and tests inspired from Information Theory e.g. Information Gain Ratio(IGR) [17]. Non-parametric KS-test looks for violation of Markov Property by computing statistical difference between the distributions of samples on either side of a split using the difference in cumulative distributions of the two datasets. IGR-test measures the amount of reduction in uncertainty of sample values and is computed as follows; given a set of samples T = {Ti = (si , ai , ri , s´i )}, i = 1, . . . , m from ab¯ labeled with V (Ti ) – computed according to (1) – the Entropy stract state S, of sample-values is: EntropyS¯ (V ) = −
¯ log2 P (v(Ti ) = v|S) ¯ P (V (Ti ) = v|S)
(3)
v
¯ is the probability that samples in T have value v. where P (V (Ti ) = v|S) Information Gain of splitting the abstract state S¯ to S¯1 and S¯2 is defined as the amount of reduction in the entropy of sample-values i.e.: 4
Total number of nodes is 2L(t) − 1.
84
M.Asadpour,M.N.Ahmadabadi,andR.Siegwart
¯ S¯1 , S¯2 ) = EntropyS¯ (V ) − Inf oGain(S,
2
EntropyS¯i (V ) i=1
mi m
(4)
where mi is the number of samples from T that fall in S¯1 . IGR is defined as: ¯ S¯1 , S¯2 ) = IGR(S,
¯ S¯1 , S¯2 ) Inf oGain(S, −
2 mi i=1 m
log2
mi m
(5)
3.3 Biased Sampling Problem Each leaf of U-Tree in implementation can hold a limited number of samples. If the number of samples exceeds an upper limit, an old sample is selected randomly and is replaced by the new sample. However, selection probability of an action in a state depends on its action-value function. This fact affects the distribution of samples in a leaf. We call this a biased sampling. The biased sampling problem has a negative effect; RL algorithms escape from local optimums by making a trade off between exploration and exploitation. If size of a sample list is not big enough, then number of samples with explorative character would not be statistically significant to introduce a split, until the confidence threshold for split criterion is reduced. Reducing the threshold increases the occurrence of non-optimal splits. One expect that size of the generated tree might be large with low performance. This problem can be solved by having in each leaf, instead of one big list with capacity k, |A| k per action. Then the algorithm is modified so smaller lists with capacity |A| that when new samples arrive and the sample list is full, an old sample from the list that corresponds to the action of the new sample is deleted randomly. Now sorting the new sample list would be faster: |A| O(
k k k log ) = O(k log ) < O(k log k) |A| |A| |A|
(6)
4 SANDS In this section we formalize the state abstraction problem and provide mathematical description of a new algorithm that is specially derived for HRL. We call it SANDS which stands for State Abstraction in Nondeterministic Systems. Also we show how we can combine SANDS with other heuristics. 4.1 State Abstraction An abstract state S¯ is a subset of state space S, so that all states within it have “close” values (see bellow). Value of abstract state S¯ is defined as the expected discounted reward return if an agent starts from a state in S¯ and follows the policy π afterwards i.e.:
ReductionofLearningTimeforRobotsUsingAutomaticStateAbstraction
¯ = Vπ (S)
¯ P (s|S) ¯ s∈S
a a π Ps´ s)) s (Rs´ s + γV (´
π(s, a) s∈A
85
(7)
s´
where π(s, a) is the selection probability of action a in state s under policy π, a Ps´ ´ after doing action a s is the probability that environment goes to state s a in state s, and Rs´ is the expected immediate reward after doing action a in s ¯ a) is defined as: state s and going to state s´. Similarly, Qπ (S, ¯ a) = Qπ (S,
¯ P (s|S) ¯ s∈S
a a π Ps´ s)) s (Rs´ s + γV (´
(8)
s´
From set theory, a set partition of a set S, is defined as a collection of disjoint subsets of S whose union is S. The number of partitions of the set is called a Bell number. So U = {S¯1 , . . . , S¯p } is a set partition of S iff: S¯i ⊆ S, ∀i ∈ {1, . . . , p} p S = i=1 S¯i ¯ Sj = ∅, ∀i, j ∈ {1, . . . , p}, i = j
S¯i
(9)
State abstraction is then an optimization problem which aims to find a set partition and a policy π on state space S with p, the Bell number of set partition, minimized while ensuring that the value of the learned policy π is within a fixed bound of the optimal policy π∗ , i.e.: ∗ ¯i ∈ U |V π (s) − V π (S¯i )| < , ∀s ∈ S¯i , ∀S
(10)
4.2 SANDSSplittingCriterion It is easy to prove that leaves of U-Tree generate a set-partition of state space.Itisclearfromdefinitionofstateabstractionin(10)thatweareminimizingthedifferencebetweentheoptimalpolicyandthepolicythatU-Tree represents.Herewe,insteadofwalkingtowardtheoptimalpolicywhichisunknown,trytowalkawayfromthecurrentpolicytowardtheoptimalpolicy. Now,assumethatwehaveaU-Treethatdefinesapolicyπ.Wewouldlike ˜ by splitting the partition S¯ (one of the leaves in toenhanceittoapolicy π the tree) to unknown partitions S¯1 and S¯2 , so that the expected return of the resulting tree is maximized i.e. we maximize: G= s∼D
P (s)[V ˜π (s) − V π (s)]
(11)
where D is initial states distribution. Using Bellman equations we can write: P (s)
G= s∼D
−
˜ [π(s, a) a
a [π(s, a)
a a s (Rs´ s s´ Ps´ a a P (R s s´ s s´ s´
s))] + γV ˜π (´ + γV π (´ s))]
(12)
86
M.Asadpour,M.N.Ahmadabadi,andR.Siegwart a ˜ Rs´ s (π(s, a) − π(s, a)) ˜(s, a)V ˜π (´ +γ(π s) − π(s, a)V π (´ s))
a P (s)Ps´ s
G= s∼D a
s´
(13)
V ˜π (s) is unknown, we can use V π (s) as an estimation. For the states that fall ¯ π ˜(s, a) = π(s, a). outside of S¯ policy remains unchanged after split i.e ∀s ∈ S, ¯ ¯ For the states within S or within the new abstract states S1 and S¯2 we define ˜i (a) = π ˜(s, a), s ∈ S¯i , i = 1, 2. Hence: π(a) = π(s, a), s ∈ S¯ and π 2
G=
a
i=1
( π ˜i (a) − π(a))
¯i s∈S
s´
a a π P (s)Ps´ s)) s (Rs´ s + γV (´
(14)
¯ × From the properties of conditional probabilities P (s) = P (s|S¯i ) × P (S¯i |S) ¯ ∀s ∈ S¯i , i = 1, 2. Considering the definition in (8), we can write: P (S), 2
¯ G = P (S) a
i=1
¯ ˜i (a) − π(a))Qπ (S¯i , a)P (S¯i |S) (π
(15)
This way the learner finds the split that maximizes the performance of the tree over the “whole action set”, provided that actions are selected according to Softmax distribution. But an important point here is that the learner can not revise the tree after initiating the split. This means that it is fixing somehow a part of the policy in this level. Therefore, it is more valuable to judge the splits based on the performance of the greedy action instead of Softmax. The optimization term then should change from summation to maximum. More¯ is same for all states in abstract state S. ¯ So, the final formulation over, P (S) for SANDS criterion is simplified to: ¯ S¯1 , S¯2 ) = max SAN DS(S, a
2 i=1
¯ ˜i (a) − π(a))Qπ (S¯i , a)P (S¯i |S) (π
(16)
U-Tree generates a set-partition of state space. But, optimality of the number of partitions is not guaranteed although the algorithm starts with minimum partitions and adds more partitions if required. In fact the splits here are Univariate [16] that shrink the state space parallel to one of its axes. The necessary (but not sufficient) condition to have minimum size tree is to have Multivariate [16] splits, which can split the space in any direction. Multivariate splits are not covered in the current version of the algorithm however; we discuss the size of the trees in result section. If we assume processing phase starts after one learning episode, number of leaves in episode t is L(t), sample size in each leaf is m, and state dimension is m ) n, time order of the new algorithm would be O(L(t) n m (m2 − |A|2 ) log |A| which is comparable to (2) by a rough factor of m log m. 4.3 Using Mont Carlo Method to Estimate SANDS We now show how to compute each term by Mont-Carlo method using the samples recorded during learning episodes. Each logged sample is a 4-tuple
ReductionofLearningTimeforRobotsUsingAutomaticStateAbstraction
87
(si , ai , ri ,´ si ), corresponding to current state, selected action, next state, and received reward. If among ma samples for action a in S¯ , ma1 and ma2 = ma −ma1 samples correspond to S¯1 and S¯2 respectively, we can compute: ρˆai = ˆ ai = µ ˆai = π
ma i ma , i 1 ma i
= 1, 2 ma i j=1 (rj
ρˆi = + γ maxb Q(´ sj , b)), i = 1, 2
ˆa exp(µ i /τ) P ,i ˆb b exp(µi /τ)
= 1, 2
ˆ a1 + ρˆa2 µ ˆ a2 ˆ a = ρˆa1 µ µ ˆa /τ) exp(µ P ˆb b exp(µ /τ) a a ˆ(a))µ ˆ i ρˆi π
ˆ(a) = π
= 1, 2
¯ S¯1 , S¯2 ) = maxa SAN DS(S,
P ma Pa ia , i am
2 ˆ i=1 (πi (a)
−
(17)
4.4 SoftMax Gain Ratio (SMGR) New criteria can be derived by combining SANDS with different heuristics. One of them is what we call SoftMax Gain Ratio: 2 aˆ ˆ ˆ ˆ ¯ S¯1 , S¯2 ) = max −π(a) log π(a) + i=1 ρˆi πi (a) log πi (a) SM GR(S, 2 a − i=1 ρˆi log ρˆi
(18)
In this criterion, instead of defining entropy function on sample-values, we define it on action-probabilities and try to find the splits that maximize the reduction of this entropy i.e. the split that results in more certainty on selection probability of actions. We can prove that SMGR scales and combines SANDS with a penalty term as following: ¯ S¯1 , S¯2 ) = SM GR(S,
1 ¯ ¯ ¯ a τ SAN DS(S, S1 , S2 ) + log 2 − i=1 ρˆi log ρˆi
2 i=1
πρiˆi (a)
(19)
Wecanshowthepenaltytermpunishesthesplitsthatresultin:(1)givinghigh probabilitytoaspecificactionwithoutsupportofenoughevidence(samples), and(2)contradictingestimationsonselectionprobabilitiesofaspecificaction ineithersideofthevirtualsplit,inthesensethatanactionisthebestaction inonesplitbutthesameactionistheworstactionintheotherone.
5SimulationResults Wehavetestedouralgorithmsonasimplifiedfootballtask.Itisverysimilar totheclassicTaxiproblemwiththepossibilityofhavingmovingplayerswith differentpatternstocreateenvironmentswithdifferentlevelsofnondeterministicity.Theplayingfieldisan8 × 6grid(Fig.2).The xaxisishorizontaland yaxisisvertical.Therearetwogoalslocatedat(0, 3)and(8, 3).Alearning agent plays against two “passive” opponents and learns to deliver the ball totheleftgoal.Bypassiveopponentswemeantheymoveinthefieldonly torestrictmovementsofthelearneranddonotperformanyactiononthe ball.Thelearnercanchooseamong6actions:Left,Right,Up,Down,Pick,
88
M.Asadpour,M.N.Ahmadabadi,andR.Siegwart
Fig. 2. A screen-shot from the simulator
Put. If any player selects an action that results in touching the boundaries or going to a pre-occupied position by another player, that action is ignored. In a special case when the learner has picked the ball, if it hits the other players or a boundary then the ball would fall down. We test the learning simulations on 3 types of opponents: Static, Offender, and Midfielder. Static-players do not move at all. Offenders move randomly in x direction (i.e. they select randomly between Left and Right actions). Midfielders move randomly in all directions. States of the learner consists of absolute x and y position of ball and players, and status of ball from {Free, Picked } states, plus two state for left or right goal. This sums up to 3, 001, 252(= 2 × (7 × 5)4 + 2) states. Players act in parallel; so the next state of the learner when playing against offenders or midfielders would be a stochastically selected state among 4 or 16 different states, respectively. Each experiment consists of 100,000 episodes. Each episode starts by positioning the learner and ball in a random (and unoccupied) place. Each episode consists of maximum 1000 steps. Each step includes one movement by each player. An episode is finished if ball enters any goal or maximum number of steps passed. Upon delivering ball to the left goal the learner receives +1 reward. In case of wrong goal it receives -1 punishment. Otherwise it is punished with -0.01. All methods use SARSA with decaying -greedy policy. Learning rate is 0.1 and discount factor γ is 0.95. Exploration rate is initialized with 0.2 and decays by multiplying to decay factor 0.99999 after each episode. Leaf sample size is 360 (60 samples per action). For each state-abstraction method a range of confidence thresholds are tested to find the best settings; the one that minimizes the number of actions each learner executes from a random initial position to the goal. The number of actions is averaged over the last 10,000 episodes. This average shows how effective has the agent learned to make goal in fewer moves. The results are averaged over 30 runs. Table 1 shows number of actions in each episode averaged over the last 10,000 episodes. UKS and UIGR are new versions of KS and IGR with unbi-
ReductionofLearningTimeforRobotsUsingAutomaticStateAbstraction
89
Fig. 3. Number of actions (left) and leaves of the tree (right) during learning episodes in midfielder case, averaged per every 1000 episodes (non-accumulated) Table 1. Number of moves in each episode averaged over the last 10,000 episodes Player SARSA KS Static 10.85±.01 11.23±.2 Offender 19.63 ± .06∗ 14.97±.57 Midfielder 668.36∗∗ 18.41±1.2 ∗ 14.72 after 1 million episodes
UKS IGR UIGR SMGR 11.35±.22 11.21±.05 11.23±.04 11.12±.03 14.91±.14 14.55±.15 14.58±.5 14.28±.07 17.39±.21 18.65±1.73 17.39±.24 16.84±.06 ∗∗ 18.92 after 15 million episodes
SANDS 11.1±.04 14.39±.09 16.95±.2
Table 2. Number of leaves in the learned trees SARSA Static 2452 Offender 120052 Midfielder 3001252
KS 692±119 1476±404 1857±253
UKS 596±92 1331±78 1651±141
GR 703±35 1536±302 2263±504
UIGR 666±28 1242±23 1533±48
SMGR 516±10 1259±26 1436±78
SANDS 491±7 1282±22 1313±77
Table 3. Computation time of the algorithms in minutes SARSA KS Static 1 14 Offender 11∗ 25 Midfielder 50∗∗ 35 ∗ 16minfor1millionepisodes
IGR SMGR SANDS 15 13 10 27 18 12 40 27 15 ∗∗ 370minfor15millionepisodes
asedsampling.Table2showsnumberofleavesinthelearnedtrees(ForSARSA itisthenumberofstates).Confidenceintervalsarecomputedwithsignificance level0.05.Table3showscomputationtimeinminutes5 .Fig.3showsnumber ofactionsandleavesrecordedduringlearningepisodesinmidfieldercase.
5
Tests were done on a Dell R InspironTM 5150 laptop with [email protected] MHz.
90
M.Asadpour,M.N.Ahmadabadi,andR.Siegwart
5.1 HierarchicalVersusFlatRL FromTable1,itisclearthatalthoughSARSAshowsbetterperformancein static-playercasebutithasquitepoorperformanceinothercases.Evenwe lettheagentcontinueitslearningepisodesto10and150timesmore(1and 15million)foroffenderandmidfieldercases,respectively.Theperformanceis stillpoorcomparedtohierarchicalmethods. BylookingattimeorderoftheHRLalgorithmsandflatRLweobserve thatSARSAtakesonlyoneminutetolearnthestatic-playercase,withhigher efficiency than all hierarchical algorithms. But when environment becomes morenondeterministictherequiredcomputationtimeincreasesexponentially andefficiencyoftheresultsisfarfromthatofhierarchicalapproaches.An interestingresultforHRLalgorithmsisthattherequiredtimeincreasesin logarithmicorderwhilethenumberofstatesincreasesexponentially.These resultsshowthatHRLhaveagreatpotentialindecreasinglearningtimeand numberoftrialsinonlinerobotlearningtasks. 5.2 UnbiasedSampling Table1showsasenvironmentbecomesmoreandmorenondeterministic,positiveeffectofunbiasedsamplingbecomesmoreclear.Althoughtheaverage inUKSandUIGRisalittlebitbiggerthanKSandIGRforstaticplayers, itismuchsmallerformidfieldercase.Alsonumberofleavesintheunbiased versionsisalwaysfewerthantheregularversions;e.g.differencebetweenIGR andUIGRinmidfieldercaseisaround700partitions.Confidenceintervals arealwayslessforunbiasedversionsandthismeansthatnumberofleavesin theirgeneratedtreesdoesnotvaryalot. 5.3 EfficiencyofSANDSandSMGR Table1andTable2showinallcasesSANDSandSMGRcreatetreeswith betterperformancethanallothermethods(evenunbiasedversions).Thedifferencebecomesmoreclearastheenvironmentbecomesmorenondeterministic.Inmidfielderplayercasethedifferencebetweenthenewcriteriaandthe oldonesisabout2actionsperepisode.Thesameistrueforsizeofthegeneratedtrees.Inalmostallcases(exceptinUIGR,offenderplayerwithnegligible difference)SANDSandSMGRgeneratesmallertrees. Figure3showsthatSANDSandSMGRconvergeslowerthantheothers. Althoughthiscanbeproblematicwhenstatespaceisverybig,inthesense thattheirperformancedoesnotincreaseverymuchforalongtime,butit isexpectedthatafteratime,theirperformanceanticipatetheothercriteria. Table1andFig.3(Left)confirmthatsplitsinSANDSandSMGRareselectedmorecarefullybecausenotonlytheirtreeisthesmallest,butalsotheir performanceisthebest.
ReductionofLearningTimeforRobotsUsingAutomaticStateAbstraction
91
SMGRshowsasmallenhancementinoffenderandmidfieldercasecomparedtoSANDS.ItisduetoapenaltytermthatSMGRaddstothesplitting criterionwhichrejectssomeinefficientsplits.Instaticandoffendercaseitis notevidentthatwhichonegeneratesfewerpartitions.Butinmidfieldercase SANDSgeneratestreeswitharound100leavesless. 5.4 ComputationTime AsexplainedinSect.4.2,timeorderofSANDSandSMGRislessthanKS andIGR.Instatic-playercasethedifferenceisnotclearbutinothercases SANDSdecreasescomputationtimetohalfcomparedtoKSandevenmore compared to IGR. A part of the reduction in computation time is due to reductionintheorderofsortalgorithminsamplelists.Anotherpartisdue tocreatingsmallertrees.Thisisimportantbecauseitalsoaffectsthespeed ofknowledgeretrievalandtherequiredmemoryofthetrees.Anotherpartis duetoreductioninthenumberofactionsinepisodes.
6Conclusion Inthispaperweformalizedstateabstractionasanoptimizationproblemand bymathematicalformulationwederivedtwonewcriteriaforsplitselection. Thenewcriteriaadaptdecisiontreelearningtechniquestostateabstraction. Weshowedinsimulationthatthenewcriteriaoutperformtheexistingcriteria notonlyinefficiencyandsizeofthelearnedtreesbutalsoincomputation time.Webelievethatcriteria,likeKSandIGR,judgesplitsbasedonlyon theirrankorprobabilityofsample-valueswithouttakingtheirmagnitudeinto accountwhile;theyareveryinformativeforstateabstraction.SANDSand SMGRmixprobabilityandq-valueofactions. WehavetestedothercriterialikeInformation-Gain[19],Gini-Index[18] andStudentsT-test[10]buttheresultswerewithlimitedsuccess,asreported in[10].StudentsT-testhaspoorperformancewhenvarianceoverthesplits getsclosetozerobecauseofavarianceterminitsdivisor.Information-Gain andGini-Indexhavepoorperformanceinnondeterministicenvironments. Inabilitytorevisethetreeinnon-stationaryenvironmentsisadrawback ofUnivariatemethods.LinearMultivariate[16]splitsareperhapsmoresuitablecandidatesastheyuseaweightedsumofstate-variablesinsplitpoint. Adjustingtheseweightsmightbeeasierthanreorderingthenodesofthetree.
Acknowledgment ThisresearchisfundedbytheFutureandEmergingTechnologiesprogramme(ISTFET)oftheEuropeanCommunity,undergrantIST-2001-35506.Theinformation providedisthesoleresponsibilityoftheauthorsanddoesnotreflecttheCommunity’sopinion.TheCommunityisnotresponsibleforanyusethatmightbemade of data appearing in this publication. The Swiss participants to the project are supportedundergrant01.0573bytheSwissGovernment.
92
M.Asadpour,M.N.Ahmadabadi,andR.Siegwart
References 1. Sutton R.S. and Barto A.: Reinforcement Learning: An Introduction, MIT Press,Cambridge,MA(1996) 2. UtherW.andVelosoM.:TheLumberjackAlgorithmforLearningLinkedDecisionForests,InProc.ofPRICAI-2000(2000) 3. BartoA.G.andMahadevanS.:RecentAdvancesinHierarchicalReinforcement Learning,DiscreteEventDynamicSystems,13(4):41–77(2003) 4. SuttonR.S.,PrecupD.,andSinghS.:BetweenMDPsandsemi-MDPs:Aframeworkfortemporalabstractioninreinforcementlearning,ArtifcialIntelligence, 112:181–211(1999) 5. ThrunS.B.,andSchwartzA.:Findingstructureinreinforcementlearning,In TesauroG.,TouretzkyD.S.,andLeenT.(eds),AdvancesinNeuralInformation ProcessingSystems,pp.385–392,MITPress,Cambridge,MA(1995) 6. BrooksR.A.:Achievingartificialintelligencethroughbuildingrobots,Technical ReportA.I.Memo899,AILab.,MIT,Cambridge,MA(1986) 7. Dayan P. and Hinton G.E.: Feudal reinforcement learning, In Hanson S.J., Cowan J.D., and Giles C.L.(eds), Neural Information Processing Systems 5, SanMateo,CA,MorganKaufmann(1993) 8. MooreA.W.:Theparti-gamealgorithmforvariableresolutionreinforcement learninginmultidimensionalstate-spaces,InCowanJ.D.,TesauroG.,andAlspectorJ.(eds),AdvancesinNeuralInformationProcessingSystems,6:711-718, MorganKaufmann(1994) 9. ChapmanD.,andKaelblingL.P.:Inputgeneralizationindelayedreinforcement learning:Analgorithmandperformancecomparisons,InProc.ofthe12thInternationalJointConf.onArtificialIntelligence(IJCAI-91),pp.726-731(1991) 10. McCallumA.:ReinforcementLearningwithSelectivePerceptionandHidden State,PhDthesis,ComputerScienceDept.,Univ.ofRochester(1995) 11. UtherW.T.B.andVelosoM.M.:Treebaseddiscretizationforcontinuousstate spacereinforcementlearning,InProc.ofthe15thNationalConf.onArtificial Intelligence,AAAI-Press/MIT-Press,pp.769-774(1998) 12. UtherW.T.B.:TreeBasedHierarchicalReinforcementLearning.PhDthesis, Dept.ofComputerScience,CarnegieMellonUniv.,Pittsburgh,PA,USA(2002) 13. JonssonA.andBartoA.G.:Automatedstateabstractionforoptionsusingthe U-treealgorithm,InAdvancesinNeuralInformationProcessingSystems:Proc. ofthe2000Conf.,pp.1054–1060,MITPress,Cambridge,MA(2001) 14. DeanT.,andRobertG.:Modelminimizationinmarkovdecisionprocesses,In Proc.AAAI-97,p.76(1997) 15. SinghS.P.,JaakolaT.,andJordanM.I.:Reinforcementlearningwithsoftstate aggregation,InTesauroG.,TouretzkyD.S.,andLeenT.K.(eds),NeuralInformationProcessingSystems7,MITPress,Cambridge,MA(1995) 16. YildizO.T.,AlpaydinE.:OmnivariateDecisionTrees,IEEETrans.onNeural Networks,12(6):1539–1546(2001) 17. AuM.,MaireF.:AutomaticStateConstructionusingDecisionTreeforReinforcementLearningAgents,InternationalConf.onIntelligentAgents,Web TechnologiesandInternetCommerce(CIMCA),GoldCoast,Australia(2004) 18. BreimanL.,FriedmanJ.,OlshenR.,StoneC.:ClassificationandRegression Trees,Wadsworth,PacificGrove,CA(1984) 19. QuinlanR.:Inductionofdecisiontrees,MachineLearning,1:81–106(1986)
Efficient Failure Detection for Mobile Robots Using Mixed-Abstraction Particle Filters Christian Plagemann, Cyrill Stachniss, and Wolfram Burgard University of Freiburg Georges-Koehler-Allee 79110 Freiburg, Germany {plagem,stachnis,burgard}@informatik.uni-freiburg.de Summary. In this paper, we consider the problem of online failure detection and isolation for mobile robots. The goal is to enable a mobile robot to determine whether the system is running free of faults or to identify the cause for faulty behavior. In general, failures cannot be detected by solely monitoring the process model for the error free mode because if certain model assumptions are violated the observation likelihood might not indicate a defect. Existing approaches therefore use comparably complex system models to cover all possible system behaviors. In this paper, we propose the mixed-abstraction particle filter as an efficient way of dealing with potential failures of mobile robots. It uses a hierarchy of process models to actively validate the model assumptions and distribute the computational resources between the models adaptively. We present an implementation of our algorithm and discuss results obtained from simulated and real-robot experiments.
1 Introduction Whenever mobile robots act in the real world, they are affected by faults and abnormal conditions. Detecting such situations and allowing the robot to react appropriately is a major precondition for truly autonomous vehicles. While the applied techniques need to be able to reliably detect rare faults, the overall estimation process under error-free conditions should not be substantially more complex compared to systems that are optimized for the normal operational mode. Separate monitoring processes that use more complex models to cover all possible system behaviors introduce an unnecessary high computational load. In this paper, we introduce mixed-abstraction particle filters as an effective means for adaptively distributing the computational resources between different system models based on the estimated validity of their specific model assumptions. The term ”fault detection” is commonly referred to as the detection of an abnormal condition that may prevent a functional unit from performing a
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 93–107, 2006. © Springer-Verlag Berlin Heidelberg 2006
94
C.Plagemann,C.Stachniss,andW.Burgard
Fig.1.Theleftimagedepictsasimulatedrobotbeforecollidingwithanobstacle whichisnotdetectedbyitssensors.Therightphotographshowsarealmobilerobot thatcollideswithanundetectedglassdoorwhilemovingonitsplannedtrajectory totheneighboringroom.
requiredfunction[11].Mostworksinthefaultdetectionandisolationliteraturedealwithinternalfaultssuchasdefectsinhardwareorsoftware.Forthe mobilerobotdomain,weapplythesamenomenclaturetoexternalinfluences likecollisionsorwheelslipsincetheireffectsaresimilartothoseofinternal defectsandtheresultingmodelshavethesamestructure. Asanillustratingscenario,consideramobilerobotequippedwithwheel encoders,alaserrangefinder,andasufficientlyaccuratemapoftheenvironment.Inthefault-freecase,thepositionoftherobotcanbetrackedusing a standard tracking algorithm such as a Kalman filter or a particle filter withasimplisticodometry-basedmotionmodel,whichisformallygivenin Section3.3.Inodometry-basedmodels,thenextsystemstatext isdirectly predictedfromtheodometry,whichisthemeasurementot obtainedfromthe wheelencoders. Althoughsuchmodelsallowustoevaluatedifferentstatehypothesesby weightingthemusingexteroceptivemeasurements,e.g.,usingalaserrange measurement l,theydonotdirectlyallowustodetectcollisionswithobstacles t that cannot be perceived by the sensors of the robot. This is due to the factthatwhentherobotstopsmoving,itswheelencodersdonotrecordany motion,whichisperfectlyconsistentwiththerecordedlasermeasurements. Therefore,nofilterdegradationoccursandthereisnopossibilitytodetect suchfaultsinsidethefilter.Onetypicalsolutiontoovercomesuchproblems is to compare the estimated trajectory with the planned one on a higher systemlevel.Asmajordrawbacksofsuchanapproach,onecannotinferthe actual cause for the deviation from the planned trajectory and the system architectureiscomplicatedbythestrongerconnectionbetweentheplanning andtrackingmodule. Analternativesolutionistoconsidertheactualmotioncommandsthat havebeensenttothemotorsinsteadofjustusingthewheelencoderreadings.However,thismakesthesystemmodelsubstantiallymorecomplexand
Efficient Failure Detection for Mobile Robots
95
the predictions, which are now based on motor currents and accelerations, less accurate. In our experiments, we observed that such a model is around 32 times slower to compute than the odometry-based model. It is important to note that the odometry-based model makes the implicit assumption, that the wheel encoder measurements reflect the intended motion. If this assumption is violated, the standard estimation technique cannot be used to estimate the joint probability p(x, f ) anymore, where x stands for the pose of the robot and f indicates the failure state. In this paper, we propose to make the model assumptions explicit and to build a model abstraction hierarchy. We present the mixed-abstraction particle filter algorithm that uses such a hierarchy to direct computational resources to the most efficient model whose assumptions are met. In this way, it minimizes the computational load while maximizing the robustness of the system.
ot xt-1
Ut-1 xt
xt-1 lt
xt ot
lt
Fig. 2. Arbitrary model hierarchy (left) with an unrestricted model M0 , several restricted models M1 , M2 , M3 as well as the specific assumptions Ai→j that restrict the state spaces respectively. Two models for the same physical process: the standard odometry-based model (e.g. M1 ) that uses the odometry measurements ot as controls (middle) and the laser measurements lt as sensor inputs. A less restricted model (e.g. M0 , on the right) that includes the actual motion commands ut as controls.
The paper is organized as follows. After the discussion of related work, we present our mixed-abstraction particle filter algorithm and its application to monitoring mobile robots in Section 3. Finally, Section 4 presents experimental results.
2 Related Work Model-based diagnosis has been approached from within the AI community using symbolic reasoning with a focus on large systems with many interacting components and from the control theory community concentrating on fewer components with complex dynamics and higher noise levels [8]. With symbolic approaches, the system is typically assumed to move between discrete
96
C.Plagemann,C.Stachniss,andW.Burgard
steadystates[17].Here,diagnosisisoftenbasedonsystemsnapshotswithoutahistory.Krysander[8]proposedahybridmodelconsistingofdiscrete faultmodesthatswitchbetweendifferentialequationstodescribethesystem behavior.Thediagnosissystemisdesignedforlargesystemswithlownoise levels,whereinstantaneousstatisticaltestsaresufficienttoidentifyafaulty component. AsDeardenandClancy[3]pointedout,theclosecouplingbetweenamobilesystemwithitsenvironmentmakesithardtoapplydiscretediagnosis models directly, because extremely complex monitoring components would havetobeused.Amorerobustandefficientwayistoreasondirectlyonthe continuoussensorreadings.Asaresult,probabilisticstatetrackingtechniques havebeenappliedtothisproblem.AdoptedparadigmsrangefromKalman filters [16] to particle filters in various modelings [1, 3, 12]. Particle filters representthebeliefaboutthestateofthesystembyasetofstatesamples, whicharemovedwhenactionsareperformedandre-weightedwhensensor measurementsareintegrated(seeDellaertetal.[4]).Inparticlefilterbased approachestofailurediagnosis,thesystemistypicallymodeledbyadynamic mixtureoflinearprocesses[2]oranon-linearMarkovjumpprocess[5].Benanzera etal. [1]combineconsistency-basedapproaches,i.e.theLivingstone system,withparticlefilterbasedstateestimationtechniques. Vermaetal.[15]introducethevariableresolutionparticlefilterforfailure detection.Theirapproachissimilartooursinthattheybuildanabstraction hierarchyofsystemmodels.Themodelsofconsiderationbuildapartitionof thecompletestatespaceandthehierarchyisdefinedintermsofbehavioral similarity.Ourfocusincontrastliesonswitchingbetweenoverlappingsystem modelsforcertainpartsonthestatespace.Ourmodelhierarchyistherefore basedonefficiencydifferencesandexplicitmodelassumptionsaboutthesystemstate.Thetwoapproachesshouldthereforebeseenascomplementary ratherthanalternatives. OtherapproachesthatdealwiththetimeefficiencyofparticlefiltersincludeKwoketal.[10]inwhichreal-timeconstraintsareconsideredforsingle system models or techniques in which a Rao-Blackwellized particle filter is usedtocoordinatemultiplemodelsfortrackingmovingobjects[9].
3 Particle Filters for Sequential State Estimation A mobile robot can be modeled as a dynamic system under the influence of issued control commands ut and received sensor measurements zt . The temporal evolution of the system state xt can be described recursively using the formalism of the so called Bayes filter
Efficient Failure Detection for Mobile Robots
97
Algorithm 1 Particle filter(Xt−1 , ut−1 , zt ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
X t = Xt = ∅ for m = 1 to M do [m] [m] sample xt ∼ p(xt | ut−1 , xt−1 ) [m] [m] wt = p(zt | xt ) [m] [m] X t = X t + xt , wt end for for m = 1 to M do [i] draw particle i with probability ∝ wt [i] add xt to Xt end for return Xt
p(xt | z0:t , u0:t−1 ) = = ηt
p(xt | zt , ut−1 , xt−1 ) p(xt−1 | z0:t−1 , u0:t−2 ) dxt−1 p(zt | xt ) observation model
(1)
p(xt | ut−1 , xt−1 ) p(xt−1 | z0:t−1 , u1:t−2 ) dxt−1 . (2) motion model
recursive term
The term ηt is a normalizing constant ensuring that the left-hand side sums up to one over all xt . With Equation 1, we assume Markovian dependencies, namely that xt only depends on the most recent measurement zt and control command ut−1 given knowledge about the preceding state of the system xt−1 . Particle filters are an implementation of the Bayes filter and can be used to efficiently estimate the posterior p(xt ) in a sequential manner. Here, the pos[m] [m] terior is represented by a discrete set of weighted samples X = { xt , wt }. With the sampled representation, the integral in Equation 2 simplifies to a finite sum over the samples resulting from the previous iteration. The motion model and the observation model can be applied directly to move and weight the individual samples respectively. Algorithm 1 formulates the standard particle filtering approach [14]. In Algorithm 1, the state of the system at time t is represented by a set Xt [m] of state samples xt . In Line 3, we perform a state prediction step using [m] the external motion command ut−1 and the motion model p(xt | ut−1 , xt−1 ). Line 4 incorporates the current sensor measurement zt by re-weighting the [m] state samples according to the measurement model p(zt | xt ). From Line 7 to 10, a resampling step is performed to concentrate the samples on highprobability regions of the state space. We refer to Thrun et al. [14] for details about the resampling step and its efficient implementation. For the odometry-based model that treats the odometry measurements as controls, we have ut−1 = ot and zt = lt , where ot is the odometry measurement and lt is a perception of the environment. For the dynamic model depicted
98
C.Plagemann,C.Stachniss,andW.Burgard
in the right diagram of Figure 2, ut−1 is the actual control command and zt = ot , lt .BothmodelsaredescribedinmoredetailinSection3.3. 3.1 ProcessModelHierarchy Afundamentalprobleminscienceandengineeringischoosingtherightlevel of abstraction for a modeled system. While complex and high-dimensional modelspromisehighestimationaccuracy,modelsoflesscomplexityareoftensignificantlymoreefficientandeasiertoconstruct.Intheareaofmobile robotics,theaccuracy-efficiencytrade-offisanimportantissue,sinceonthe onehand,computationalresourcesarestrictlylimitedinonlineproblemsand ontheotherhand,estimationerrorshavetobeavoidedtopreventseriousmalfunctioning.Wethereforeproposeanonlinemodelselectionalgorithmwith adaptiveresourceallocationbasedontheBayesianframework. Anabstractionhierarchyforprocessmodelsisgivenbythespecificassumptionsthatthedifferentmodelsmakeabouttheworld(compareFigure2). Wedefinethemodelabstractionhierarchyasanacyclicdirectedgraphwith thedifferentsystemmodelsMi asnodesandtheirmodelassumptionsAi→j asedges,leadingfromthemoregeneralmodelitoamorerestrictedonej.A modelassumptionAi→j isdefinedasabinaryfunctiononthestatespaceof theunrestrictedmodelMi . Asanexample,considertheprocessmodelforamobilerobotthatshould beabletocontinuouslylocalizeitselfinagivenmap.AgeneralmodelM0 wouldincludethetheposeoftherobotxandadditionallytakephysicalfactors like ground friction, tire pressure, load balance, motor characteristics, etc.intoaccountandtreatthoseasadditionalstatevariablesinastatevector f . In most situations, however, it is quite common and reasonable to assume a simpler model M1 , where these additional variables are constant anddonotneedtobeestimatedduringoperation.Formally,thestatespace ofM1 istherefore{x, f| f=const},whichisaprojectionofthemoregeneral space{x,f } ofmodel M0 .TheassumptionA0→1 wouldinthiscasebedefined as true : f=const A0→1 (x,f ):= false : f=const. Itisimportanttonotethatthevalidityofanassumptioncanonlybetested inalessrestrictedstatespace,wherethisassumptionisnotmade.Inpractice, thismeansthatwehavetotestforeveryedgeinthemodelabstractiongraph theassociatedassumptionusingthemoregeneralmodel.Asameasurefor thevalidityofanassumptionA˜ at time t, we use the ratio ˜ := vt (A)
A
p(zt | x[m] )
p(zt | x[m] )
1 |A| 1 |X |
,
(3)
˜ [m] )} of all particles X for which where A is defined as the subset {x[m] | A(x A˜ is valid. More informally, we compute the amount of evidence in favor of
Efficient Failure Detection for Mobile Robots
99
Algorithm 2 Mixed-Abstraction Particle filter 1: Calculate samples for the unrestricted model Mi until the assumption Ai→j (˜ xt ) is valid for a minimal number of samples x ˜t 2: Build a first estimate of vt (Ai→j ) according to Equation 3 3: repeat 4: if vt (Ai→j ) >= Θ then 5: Calculate samples for Mj 6: else 7: Calculate samples for Mi 8: end if 9: until either Mi received enough samples or (vt (Ai→j ) >= Θ and Mj received enough samples)
a restricted state space relative to the unrestricted case. The quantity vt (A˜) is based on the current approximation of the posterior distribution by the particle filter. 3.2 Adaptive Model Selection To adaptively switch between alternative system models, the validity of the model assumptions have to be estimated online and computational resources have to be distributed among the appropriate models. The distribution of resources is done by increasing or decreasing the number of particles for the different models. To achieve this, we apply the following algorithm that takes as input the model abstraction hierarchy graph defined in the previous section. When a new measurement zt is obtained, the mixed abstraction particle filter algorithm draws samples from the particle set representing the current posterior Xti−1 for all system models i, incorporates the measurement, and builds new posterior distributions Xti . The key question in this update step is which model posterior Xti should receive how many samples. We base this decision on the estimated validity of the model assumptions Ai→j . If the estimated quantity vt (Ai→j ) drops below a predefined threshold Θ, we sample into the more complex model Mi and otherwise prefer the more efficient one Mj . This decision is made repeatedly on a per particle basis until a model has received enough samples and all its assumptions are validated. In each iteration, we start with the most unrestricted model Mi and perform the following steps for each of its outgoing edges Ai→j . In the update steps mentioned above, the samples are taken from the j previous posterior distributions Xt−1 , if assumption Ai→j was valid in the i previous step and from Xt−1 otherwise. When all outgoing edges Ai→j of model Mi have been processed in the described manner, the same update is applied to models Mj further down in the hierarchy until either the leaf nodes have been processed or one of the assumptions did not receive enough evidence to justify further model simplifications. Several quantities like the numbers of samples necessary for each model (Line 9) or the validity threshold Θ (Line 4) have to be determined in an offline
100
C.Plagemann,C.Stachniss,andW.Burgard
learningstep.Fortheexperimentalresultsreportedbelow,weoptimizedthese valuesonasetofrepresentativetrajectories,recordedfromrealandsimulated robots. Torecapitulate,themixedabstractionparticlefilterestimatesthesystem statebyrunningseveralparticlefiltersinparallel,eachusingadifferentsystemmodel.Samplesareassignedapplyingthefollowingrule.Foreachtwo alternativesystemmodels,thesimpleroneispreferedaslongasthereispositiveevidenceforthevalidityofthecorrespondingmodelassumption. 3.3 MotionModelsforMobileRobots Thestandardodometry-basedmotionmodelforawheeledrobotestimatesthe posterior p(xt | xt−1 , ot ) about the current pose xt based on the previous posext−1 andthewheelencodermeasurementot .Apopularapproach[6]to representtherelativemovementistousethreeparameters,aninitialrotation α,atranslationd,andasecondrotationβasillustratedinFigure3.Typically, oneusesaGaussiandistributionforeachoftheseparameterstomodelthe noise.
measured pose
β’
β
d’ initial pose
d α
α’
final pose path
Fig. 3. Parameters of the standard odometry-based motion model.
Under the influence of events like the collision with an obstacle or wheel slippage, the odometry-based model is not applicable anymore since the wheel encoder measurements do not provide useful information about the actual motion. To handle such situations, we construct an alternative model, termed dynamic motion model, that depends on the actual motion commands that were sent to the motors of the robot. This model includes the geometry of the robot and its physical attributes like mass and moment of inertia. For each wheel, we compute the influence of the velocity command on the translational and rotational energy of the robot. In this representation, we can directly incorporate the effects of collisions, slippage, and deflating tires. We then convert the energy state of the system to its speed and obtain a state prediction for the next time step. It is important to note that the dynamic motion model is not designed for the failure states only. Rather it is able to deal with normal conditions
Efficient Failure Detection for Mobile Robots
101
as well. It is therefore considered as more general as the standard odometrybased model in our model abstraction hierarchy. The assumption placed on the system state by the odometry-based model is, that there are no external influences like collisions, slippage, etc.
4 Experiments 4.1 Quantitative Evaluation Using a Simulator To quantitatively evaluate our system, we performed several simulation experiments and compared the results to the known ground truth. We used the high-fidelity simulator Gazebo [7], in which physics and motion dynamics are simulated accurately using the Open Dynamics Engine [13]. In several practical experiments carried out with real robots, we experienced the Gazebo simulator as well suited for studying the motion dynamics of robots even under highly dynamic events like collisions. For example, we did not have to change any system parameters when we ported our system from the simulator to the real robot. To demonstrate how the proposed algorithm coordinates multiple particle filters that have been designed independently, we confronted the system with two different faults within one scenario. A simulated ActivMedia Pioneer 2DX robot (see the left image of Figure 1) was placed in the corridor of a 3D office environment. We manually steered the robot through this environment. On its path, it encountered a collision with an undetectable object. After that, its left tire started to deflate. Four filters were used independently to track the state of the system. The first filter was based on the standard odometry-based motion model described in Section 3.3. The second filter used the dynamic motion model described in the previous section and also included a model for collision faults. The third model was also based on the dynamic motion model but was capable of dealing with deflating tires. Finally, the forth filter was the proposed mixed-abstraction filter that combined all of the filters described above. Figure 4 depicts the true trajectory as well as the tracking results obtained with the individual filters overlayed on a map built from laser measurements in a real building. The three arrows in the diagram mark the following three events: a collision with an undetected glass door (1), a point where the robot stopped backing off and turned around (2), and a point where the left tire of the robot started losing air (3). As can be seen from the figure, the filter that is able to deal with deflating tires diverges immediately after the collision at point 1. Since the filter able to deal with collisions cannot deal with deflating tires, it diverges at point 3. The odometry-based model keeps track of the robot despite the collision at point 1, however it is not aware of any fault at this point. The combined filter in contrast succeeds in tracking the robot despite of both incidents.
102
C.Plagemann,C.Stachniss,andW.Burgard
The middle and lower image of Figure 4 plot the internal states of the specializeddetectorswithinthemixted-abstractionfilter.Thesevaluesreflect thebeliefofthesystemaboutthepresenceofcertainfaults.Themiddleimage plotstherelativenumberofparticlesinthefaultmodeofthecollisiondetector overtime.Ascanbeseen,thisnumberraisessignificantlyshortlyafterthe collisionaswellasafterthefullstopatpoint2.Aftertherobothadbeen intenionallystoppedthere,thesystemcannotknowwhetheranobstacleis in its way or not. The lower image of Figure 4 shows the evolution of the relativenumberofparticlesinthefaultmodeofthedeflationdetector.Since the collision at point 1 has been handled by the collision detection within themixed-abstractionfilter,thisfilterdoesnotswitchtoafailuremodeuntil point3.Atthatpoint,thefilterswitchesintoitsfailuremodeandinthisway enablesthemixed-abstractionfiltertokeeptrackoftheposeoftherobot. 4.2 AnalyzingtheGaininEfficiency In this experiment, we quantatively evaluate the gain in efficiency that we achieve by dynamically distributing samples between the individual filters. Inthemodeledscenario,asimulatedrobotfollowsatrajectoryandcollides twicewithanobstacle,whichistoosmalltobedetectedbyitslaserrange finder. After the collisions,the robotbacks offandcontinues its task. The topdiagramofFigure5showsthetrajectoryofthevehicleandthelocations wherethealgorithmcorrectlydetectedacollision.Theothertwodiagrams illustratethefailuredetectionprocessofthesamesimulationrun.Thebar atthebottomindicatesthetruetimestampsofthefaults.Whereastheplot inthesecondrowdepictstherelativelikelihoodforacollisionasdefinedin Equation3,thecurveplottedinthethirdrowgivesthetimesneededforthe individualiterationsoftheparticlefilter. Table1givestheresultsofacomparisonofouradaptivemodel-switching approachtothreeotherimplementations,whereonlysinglemodelswereused forstateestimationandfaultdetection.Theresultsareaveragedoverthefull trajectoriesof100runsperimplementation.Theimplementationsconsidered herearerealizedonthebasisoftwomodels.WhereasmodelM0 isthecomplexmodelthatconsiderstheactualmotioncommandsandthereforeisable totrackthepositionaswellasthefailurestate,modelM1 isthestandard odometry-basedsystemmodel,whichisabletotrackthepositionoftherobot reliablywithlowtimerequirements,butcannotdetectthecollisions.Thefirst implementationisbasedonmodelM1 with20particles,thesecondismodel M1 with 200 particles. While the third implementation is based on model M0 with300particles,theforthoneisthemixed-abstractionparticlefilter combiningimplementationoneandthree.Thecommontaskofallimplementations was to track the position of the robot along a trajectory on which therobotencounteredtwoundetectedcollisionsafter8.2and28.9seconds. Atypicalestimateofthetrajectorygeneratedbythemixed-abstractionfilter includingmarkingsfordetectedcollisionsisdepictedintheleftdiagramof
Efficient Failure Detection for Mobile Robots
1 0.5 0
1 0.5 0
103
Collision Detector 1 2 3 22 24 26 28 30 32 34 36 38 40 Simulation time [sec] Deflation Detector 1 2 3 22 24 26 28 30 32 34 36 38 40 Simulation time [sec]
Fig. 4. Results of an experiment with two fault-events (top). The collision of the robot with a glass door is marked with the arrow “1”, the point where it stopped backing of and turned is marked with “2”, and at point “3” the left tire of the robot started losing air. The diagram in the middle plots the relative number of the particles in the fault mode of the collision detector. The lower diagram shows the corresponding number for the deflation detector.
Figure 5. The lower right image of Figure 5 plots the value of vt (A) over time. As can be seen from the figure, the evidence for a fault substantially increases at the time of the incidences. The upper right image in the same figure plots the CPU time used by the mixed-abstraction filter. It nicely shows that the computation time is only high when the evidence of a failure has increased. Table 1 shows average values obtained from the 100 test runs for each implementation. Whereas model M1 was never able to detect a failure, M0
104
C.Plagemann,C.Stachniss,andW.Burgard 17
Collision detected Estimated trajectory
Estimated y-position
16.5
t=8.2 sec
16 15.5 15 14.5
t=50.0 sec t=0.0 sec
14 13.5
t=28.9 sec 7
8
9
10
11
12
13
14
Rel. likelihood
Estimated x-position 8 7 6 5 4 3 2 1 0
Relative likelihood: collision
0
10
20
30
40
50
Time per iteration [ms]
Simulation time [sec] 35 30 25 20 15 10 5 0
Time per iteration
0
10
20
30
40
50
Simulation time [sec]
Fig.5.Atrajectoryfollowedbyasimulatedrobot(firstrow)withmarksatpositions wheretheevidenceforacollisionwashigh.Theplotinthesecondrowdepictsthe relativelikelihoodforcollisions.Theplotalsoshowsthegroundtruth(thebarat thebottom).Thelastplotshowsthetimeneededforeachiteration(thirdrow).
aswellasouradaptiveswitchingalgorithmdetectedallfailuresequallywell. However, our adaptive model required substantially less computation time comparedtoM0 aloneusing300particles. 4.3 EvaluationonaRealRobot WealsotestedoursystemonarealActivMediaPioneer2DXrobotinanofficeenvironment.TherightimageofFigure1depictstheexperimentalsetup. Threepositionsoftherobotweremanuallycutfromarecordedvideoand overlayedononeimagetoillustratetheprocess.Therobotplannedapath totheneighboringroomontheright-handsideofthecorridor.Whileexecutingtheplannedtrajectory,therobotcouldnotdetecttheglassdoorthat
Efficient Failure Detection for Mobile Robots
105
Table 1. Results of a series of simulation runs using different system models for state estimation. The results are averaged over the complete trajectories of 100 runs per model. System model
Failure Detection Average time Average estimation error rate per iteration Position Orientation M1 : standard odometry 0% 0.67 ms 0.13 m 3.6◦ 20 particles M1 : standard odometry 0% 5.83 ms 0.13 m 3.4◦ 200 particles M0 : dynamic 100 % 10.10 ms 0.11 m 5.8◦ 300 particles adaptive-switching: 100 % 3.42 ms 0.12 m 3.9◦ M1 : 20 particles M0 : 300 particles
blocked its path and thus collided with the wooden part of the door. In this situation, the standard odometry-based system model used for localization does not indicate a defect, because the wheel encoders report no motion, which is perfectly consistent with the laser measurements. The left diagram of Figure 6 gives the evolution of the observation likelihoods for the standard model, which stays nearly constant. In contrast to this, the proposed mixed abstraction particle filter detects that the model assumption of the standard model is not valid anymore after the collision with the door and switches to the more complex system model. The right diagram of Figure 6 visualizes this process. For the sake of clarity, we plotted the estimated validity of the negated model assumption, which can be interpreted as the evidence against the assumption that no collision has occurred. The upper curve corresponds to the time needed per iteration. Note that the required computational resources only slightly exceed those of the standard odometry-based model (see left diagram for a comparison). Only in the failure case, the runtime increases seriously since the more complex model requires substantially more particles. Please also note, that the runtime goes back to the original value after the robot has backed off and the model assumption of the simplified model is valid again.
5 Conclusion This paper presents an efficient approach to estimate the state of a dynamic system including its failures. Complex models with high computational requirements are needed in order to detect and track unusual behaviors. We therefore proposed a mixed-abstraction particle filter which distributes the computational resources in a way that failure states can be detected and tracked but at the same time allows us an efficient estimation process in case
1
Time per iteration [ms]
0.5 0
0
5
10
15 20 Timestamp [sec]
0.03
25
30
Observation likelihood
0.02 0.01 0
0
5
10
15 20 Timestamp [sec]
25
Time per iteration [ms]
C.Plagemann,C.Stachniss,andW.Burgard
Rel. likelihood
Observation likelihood
Time per iteration [ms]
106
30
40
Time per iteration [ms]
30 20 10 0
0
5
10 8 6 4 2 0
10
15 20 Timestamp [sec]
25
30
Relative likelihood (collision)
0
5
10
15 20 Timestamp [sec]
25
30
Fig.6.Inanexperimentwitharealrobot,theobservationlikelihoodofthestandard systemmodel(lowerleft)doesnotindicatethecollisionwithaglassdoorattime t=22seconds.Themixed-abstractionparticlefilter(right)detectsthefaultwithout needing substantially more computational resources in the fault-free case (upper diagrams).
the systems runs free of faults. To achieve this, we apply a process model hierarchywhichallowsustomodelassumptionsthatholdforthefault-free casebutnotingeneral. In several experiments carried out in simulation and with real robots, wedemonstratedthatourtechniqueiswell-suitedtotrackdynamicsystems affectedbyerrors.Ourapproachallowsustoaccuratelytrackdifferentfailure statesandatthesametimeisonlymarginallyslowerincasethesystemis runningfreeoffaults.Webelievethatourapproachisnotlimitedtothefailure detectionproblemandcanalsobeadvantageousforvariousstateestimation tasksinwhichdifferentsystemmodelshavetobeusedtocorrectlypredict thebehaviorofthesystemundervaryingconditions.
Acknowledgments ThisworkhaspartlybeensupportedbytheECundercontractnumberFP6004250-CoSy,bytheGermanScienceFoundation(DFG)undercontractnumberSFB/TR-8(A3)andtheGermanFederalMinistryofEducationandResearch(BMBF)undercontractnumber01IMEO1F.
References 1. E.Benazera,R.Dearden,andS.Narasimhan. Combiningparticlefiltersand consistency-based. In15thInternationalWorkshoponPrinciplesofDiagnosis, Carcassonne,France,2004. 2. N. de Freitas, R. Dearden, F. Hutter, R. Morales-Menendez, J. Mutch, and D. Poole. Diagnosis by a waiter and a mars explorer. In Invited paper for ProceedingsoftheIEEE,specialissueonsequentialstateestimation,2003.
Efficient Failure Detection for Mobile Robots
107
3. R. Dearden and D. Clancy. Particle filters for real-time fault detection in planetary rovers. In Proceedings of the Thirteenth International Workshop on Principles of Diagnosis, pages 1–6, 2002. 4. F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte carlo localization for mobile robots. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), Leuven, Belgium, 1998. 5. J.N. Driessen and Y. Boers. An efficient particle filter for nonlinear jump markov systems. In IEEE Sem. Target Tracking: Algorithms and Applications, Sussex, UK, 2004. 6. J.-S. Gutmann, W. Burgard, D. Fox, and K. Konolige. An experimental comparison of localization methods. In Proc. of the IEEE/RSJ InternationalConference on Intelligent Robots and Systems, 1998. 7. N. Koenig and A. Howard. Design and use paradigms for gazebo, an opensource multi-robot simulator. technical report. Technical report, USC Center for Robotics and Embedded Systems, CRES-04-002, 2004. 8. M. Krysander. Design and Analysis of Diagnostic Systems Utilizing Structural Methods. PhD thesis, Link¨ oping University, Sweden, 2003. 9. C. Kwok and D. Fox. Map-based multiple model tracking of a moving object. In RoboCup 2004: Robot Soccer World Cup VIII, pages 18–33, 2004. 10. C. Kwok, D. Fox, and M. Meila. Real-time particle filters. In Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15 (NIPS), pages 1057–1064. MIT Press, 2002. 11. N. Leveson. Safeware : System Safety and Computers. Addison-Wesley Pub Co., Reading, Mass., 1995. 12. B. Ng, A. Pfeffer, and R. Dearden. Continuous time particle filtering. In Proceedings of the 19th IJCAI, Edinburgh, 2005. 13. R. Smith. Open dynamics engine. http://www.q12.org/ode/ode.html, 2002. 14. S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, 2005. 15. V. Verma, S. Thrun, and R. Simmons. Variable resolution particle filter. In Georg Gottlob and Toby Walsh, editors, IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003, pages 976–984. Morgan Kaufmann, 2003. 16. R. Washington. On-board real-time state and fault identification for rovers. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 1175–1181, 2000. 17. B.C. Williams and P.P. Nayak. A model-based approach to reactive selfconfiguring systems. In Jack Minker, editor, Workshop on Logic-Based Artificial Intelligence, Washington, DC, College Park, Maryland, 1999. University of Maryland.
Optimization and Fail–Safety Analysis of Antagonistic Actuation for pHRI Gianluca Boccadamo, Riccardo Schiavi, Soumen Sen, Giovanni Tonietti, and Antonio Bicchi Interdepartmental Research Centre “E. Piaggio” via Diotisalvi 2, Faculty of Engineering University of Pisa, Italy @ing.unipi.it Summary. In this paper we consider some questions in the design of actuators for physical Human-Robot Interaction (pHRI) under strict safety requirements in all circumstances, including unexpected impacts and HW/SW failures. We present the design and optimization of agonistic-antagonistic actuation systems realizing the concept of variable impedance actuation (VIA). With respect to previous results in the literature, in this paper we consider a realistic physical model of antagonistic systems, and include the analysis of the effects of cross–coupling between actuators. We show that antagonistic systems compare well with other possible approaches in terms of the achievable performance while guaranteeing limited risks of impacts. Antagonistic actuation systems however are more complex in both hardware and software than other schemes. Issues are therefore raised, as to fault tolerance and fail safety of different actuation schemes. In this paper, we analyze these issues and show that the antagonistic implementation of the VIA concept fares very well under these regards also.
1 Introduction One of the goals of contemporary robotics research is to realize systems which operate with delicacy in environments they share with humans, ensuring their safety despite any adverse circumstance [4]. These may include unexpected impacts, faults of the mechanical structure, sensors, or actuators, crashes or malfunctional behaviours of the control software [6, 7, 1, 5]. A recent trend in robotics is to design intrinsically safe robot arms by introducing compliance at their joints. The basic idea of this approach is that compliant elements interposed between motors and moving links help prevent the (heavy) reflected inertia of actuators from concurring to damage in case of impacts. Introducing compliance, on the other hand, tends to reduce performance of the arm. Some approaches in the direction of minimizing the perfor-
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 109–118, 2006. © Springer-Verlag Berlin Heidelberg 2006
110
G.Boccadamoetal.
mancelosswhileguaranteeingsafetyincaseofimpactshavebeenpresented intherecentliterature(seee.g.[2]).Amongthese,amethodwasproposed in [2] consisting in varying the compliance of the joint transmission mechanismwhilemovingthearm.Thisso-calledVariableStiffnessTransmission (VST)technique,anditsgeneralizationintheVariableImpedanceActuation (VIA)concept,havebeenshowntobecapableintheoryofdeliveringbetter performancethanpurelypassivecomplianceandothertechniques. Initsformulation,however,theVIAconceptin[2]usedaratherabstract modelofactuatorandtransmission,wherebytheimpedancecouldbedirectly controlledtodesiredvaluesinnegligibletime.Inthispaper,weconsidera morerealisticmodelofanactuationsystemimplementingtheidea,whichis basedontheuseoftwoactuatorsandnonlinearelasticelementsinantagonisticarrangement.Theantagonisticsolutionhasseveraladvantages,andhas beenusedinmanyroboticdevicesbefore(seee.g.[3,11,14,13,8]),insome casesbecauseofbiomorphicinspiration.However,tothebestofourknowledgetheintroductionofnonlinearspringstoachievevariablestiffnessinreal time(i.e.,duringdifferentphasesofeachmotionact)wasnotamotivation forearlierworkwiththepurposeofguaranteeingsafety. Inthispaper,weconsidertheimplementationoftheVIAconceptbymeans ofantagonisticactuation,discusstheroleofcross-couplingbetweenantagonistactuators,andapplyoptimizationmethodstochooseparameterswhich arecrucialinitsdesign.Weshowthatantagonisticsystemscanimplement effectivelytheVIAconcept,andtheirperformancecompareswellwithother possibleapproaches. Antagonisticactuationsystemshoweveraremorecomplexinbothhardwareandsoftwarethanotherschemes.Issuesarethereforeraisedastowhether safetyisguaranteedunderdifferentpossiblefailuremodes.Inthepaper,we alsoanalyzetheseissuesandshowthattheantagonisticimplementationof theVIAconceptfaresverywellundertheseregardsalso.
2AntagonisticActuationasaVIASystem In[2]itwasshownthatanidealVIAmechanism(depictedinfig.1-a)can effectivelyrecoverperformanceofmechanismsdesignedtoguaranteesafetyof humansincaseofimpact.ThebasicideaisthataVIAmechanismcanbe controlledaccordingtoastiff-and-slow/fast-and-softparadigm:namely,tobe ratherstiffintheinitialandfinalphasesofmotion,whenaccuracyisneeded and velocity is low, while choosing higher compliance in the intermediate, high-velocityphase,whereaccuracyistypicallynotimportant.Lowstiffness impliesthattheinertiaoftherotorsdoesnotimmediatelyreflectonthelinkin caseofimpacts,thusallowingsmootherandlessdamagingimpacts.Suchargumentsweresupportedin[2]byadetailedmechanism/controlco-designoptimizationanalysis,basedonthesolutionoftheso-calledsafebrachistochrone
Optimization and Fail–Safety of Antagonistic Actuation for pHRI
a)
111
b)
Fig. 1. The concept of Variable Impedance Actuation (a) and a possible implementation by means of antagonistic actuators (b). Effective rotor inertias are coupled to the link inertia through nonlinear springs.
a)
b)
Fig. 2. An experimental implementation of an antagonistic VIA actuator (a) and its conceptual scheme (b).
problem, i.e. a minimum time control problem with constraints on the maximum acceptable safety risk at impacts. The model considered fig. 1-a, however, uses direct variations of impedance, which is not physically realizable. A possible implementation of the concept via an antagonistic mechanism is depicted in fig. 1-b. Practical implementations of antagonistic VIA systems may assume more general configurations than the one in fig. 1-b. For instance, in the prototype of an antagonistic VIA system depicted in fig. 2-a ([10]), the two actuators act through a nonlinear elastic element on the link, but they are also connected to a third elastic element cross-coupling the actuators. Questions we consider in this section are the following: is the stiff-andslow/fast-and-soft control paradigm still valid, and are the good safety and performance properties of the ideal VIA device fig. 1-a retained by an antagonistic implementation as in fig. 1-b? What is the role of cross-coupling elastic elements as in fig. 2-b in antagonistic VIA actuators? To answer these questions, we use again the analysis of solutions to the safe brachistochrone problem, which consists in finding the optimal motor torques τ1 , τ2 which drive the link position xmov between two given configurations in minimal time, subject to the mechanism’s dynamics, motor torque limits, and safety constraints. This problem is formalized for the antagonistic mechanism of fig. 1-b as
112
G.Boccadamoetal.
T
minτ 0 1 dt Mrot1 x ¨rot1 + φ1 (xrot1 , xmov ) = τ1 Mrot2 x ¨rot2 − φ2 (xrot2 , xmov ) = τ2 Mmov x ¨link − φ2 (xrot2 , xmov ) − φ1 (xrot1 , xmov ) = 0 |τ1 | ≤ U1,max |τ | ≤ U2,max 2 HIC(x˙ mov , x˙ rot1 , x˙ rot2 , φ1 , φ2 ) ≤ HICmax ,
(1)
where Mmov , Mrot1 , Mrot2 are the inertias of the link and the rotors (effective, i.e. multiplied by the squared gear ratio); Ui,max , i = 1, 2 is the maximum torque for motor i; φi , i = 1, 2 represent the impedance of deformable elements as functions of the position of the rotors and link. A polynomial nonlinear stiffness model is used, whereby the applied force as a function of end-point displacement is φi (xj , xk ) = K1 (xj − xk ) + K2 (xj − xk )3 .
(2)
This model has been found to fit well experimental data for the device in fig. 2-a. The safety constraint HIC(x˙ mov , x˙ rot1 , x˙ rot2 , φ1 , φ2 ) ≤ HICmax describes the fact that the Head Injury Coefficient of an hypothetical impact at any instant during motion, should be limited. The HIC is an empirical measure of biological damage used in car crash analysis literature ([12]), and depends on both the velocity of the impacting mass, its inertia and the effective inertia of rotors reflected through the transmission stiffness. The HIC function is rather complex, and can only be evaluated numerically for non trivial cases. However, based on simulation studies, a conservative approximation of the HIC function for the antagonistic mechanisms was obtained which allows rewriting the safety constraint in the simpler form |x˙ mov | ≤ vsaf e (φ1 , φ2 , HICmax ) (cf. [2]). Different solutions of problem (1) have been obtained numerically, setting parameters to realistic values as Mmov = 0.1 Kgm2 , Mrot1 = Mrot2 = 2.5 0.6 Kgm2 , HICmax = 100 ms4 , U1,max = U2,max = 7.5 Nm. A first interesting set of results is reported in fig. 3. The optimal profiles of link velocity and joint stiffness are reported for the case where both the initial and final configurations are required to be stiff (σ0 = σf = 16 Nm/rad, plots a and b) and when both are compliant (σ0 = σf = 0.2 Nm/rad, plots c and d). Notice in fig. 3 that the stiff-and-slow/fast-and-soft paradigm applies also to antagonistic actuation approaches. The minimum time necessary in the two cases is 2.4 sec and 2.65 sec, respectively. This level of performance should be compared with what can be achieved by a simpler actuation system, consisting of a single actuator connected to the link through a linear elastic element (this arrangement is sometimes referred to as SEA, Series Elastic Actuation [9]). A SEA system with a motor capable of torque Umax = 2U1,max = 15 Nm and inertia Mrot = 2Mrot1 = 1.2 Kgm2 , with linear elasticity coefficient matched exactly with the required stiffness in the two cases σ = 16 and σ = 0.2, would
Optimization and Fail–Safety of Antagonistic Actuation for pHRI
113
4.5 16
4
14
3
Stiffness [Nm/rad]
Link Velocity [rad/s]
3.5
2.5 2 1.5
10 8 6 4
1
2
0.5 0 0
12
0.5
1 Time [s]
1.5
0 0
2
0.5
1 Time [s]
a)
1.5
2
b)
4.5
2.5
4 2
3
Stiffness [Nm/rad]
Link Velocity [rad/s]
3.5
2.5 2 1.5 1
1.5
1
0.5
0.5 0 0
0.5
1
1.5 Time [s]
c)
2
2.5
0 0
0.5
1
1.5 Time [s]
2
2.5
d)
Fig. 3. Optimization results for antagonistic actuation without cross-coupling in a pick-and-place task. A stiff-to-stiff task is shown in a, b, while c, d, refer to a soft-to-soft task.
reach the desired configuration in 3.15 sec and 3.6 sec, respectively. Moreover, it should be pointed out that the SEA system cannot change its stiffness without modifying the mechanical hardware. Focusing again on the antagonistic VIA system’s results, it is also interesting to notice that, in the likely case that the task requires the manipulator to be stiff at the initial and final configurations (as it would happen e.g. in a precision pick-and-place task), the actuators are required to use a significant portion of their maximum torque just to set such stiffness, by co–contracting the elastic elements. However, it is also in the initial and final phases that torque should be made available for achieving fastest acceleration of the link. Based on this observation, it can be conjectured that some level of preloading of the nonlinear elastic elements in an antagonistic VIA system could be beneficial to performance. Elastic cross-coupling between actuators (fig. 2-b) can have a positive effect in that it can bias the link stiffness at rest, so that more torque is available in slow phases, while torque is used for softening the link in fast motion. On the other hand, it is intuitive that very stiff crosscoupling elements would drastically reduce the capability of the mechanism to vary link stiffness, thus imposing low velocities for safety and ultimately a performance loss. It is therefore interesting to study the effect of cross-coupling, to determine if there is an intermediate value of stiffness which enhances performance with respect to the limit cases of fig. 1 (no cross-coupling) and constant stiffness (rigid cross-coupling). To this purpose, we study a modified formulation of the safe brachistochrone problem, namely
114
G.Boccadamoetal. 5
16
3.6 3.4 3.2 3 2.8 2.6 2.4 2.2
4
Stiffness [Nm/rad]
Link Velocity [rad/s]
Minimum Time [s]
3.8
3 2 1
14 12 10 8 6 4 2
−2
10
0
2
10
10
K* [Nm/rad]
a)
0 0
0.5
1
1.5
Time [s]
b)
2
2.5
0 0
0.5
1
1.5
Time [s]
2
2.5
c)
Fig. 4. a) Minimum time to reach the target configuration vs. cross-coupling stiffness K3 (L0 = 1). Solid: σ0 = σf = 16 Nm/rad; dashed: σ0 = σf = 0.2 Nm/rad. Optimization results for antagonistic actuation with cross-coupling in a pick-andplace, stiff-to-stiff task (solid) and soft-to-soft task (dashed): velocity (b) and joint stiffness (c).
T
minτ 0 1 dt Mrot1 x ¨rot1 + φ1 (xrot1 , xmov ) − φ3 (xrot1 , xrot2 ) = τ1 Mrot2 x ¨rot2 + φ2 (xrot2 , xmov ) + φ3 (xrot1 , xrot2 ) = τ2 Mmov x ¨link + φ2 (xrot2 , xmov ) − φ1 (xrot1 , xmov ) = 0 |τ1 | ≤ U1,max |τ | ≤ U2,max 2 HIC(x˙ mov , x˙ rot1 , x˙ rot2 , φ1 , φ2 ) ≤ HICmax ,
(3)
where φ3 (xrot1 , xrot2 ) indicates the cross-coupling elasticity. In particular, we study how optimal solutions vary in different instances of the problem with increasing cross-coupling stiffness. A linear stiffness model is assumed for crosscoupling, with φ3 (xrot1 , xrot2 ) = K3 (L0 + xrot2 − xrot1 ), where L0 denotes the preload offset. In fig. 4 numerical solutions obtained for problem (3) are reported, for the two stiff-to-stiff and soft-to-soft tasks already considered in fig. 3). It can be observed from fig. 4-a that an optimal value of cross-coupling K3 exists in both cases, but they do not coincide. Setting for instance K3 = 0.04 Nm/rad, the optimal value for the stiff-to-stiff task, we obtain the optimal link velocity and stiffness plots reported in fig. 4-b and -c, and the optimal times 2.35 sec and 2.65 sec, respectively. Optimizing the design for one task can hence decrease performance in others.
3 Fail Safety In this section we analyze how antagonistic VIA mechanisms behave under failure of some of their components, and compare their ability to remain safe in spite of failures, with those of other possible actuation structures. A large variety of possible failure modes should be considered in order to assess fail safety of a robot system. In our case, we focus on the following
Optimization and Fail–Safety of Antagonistic Actuation for pHRI
115
possible events: mechanical failures, consisting in breakage of one or more of the elastic elements, leading to K → 0, and HW/SW control failures, whereby one or more of the actuators delivers an uncontrolled torque. In the latter case, we consider that the torque can either remain “stuck” at some value, or default to zero, or to ±Umax . An extensive simulation campaign has been undertaken to study the effective HIC at impacts of an antagonistic VIA mechanism (fig. 1-b), evaluating all possible failure mode combinations. To be conservative, the HIC is evaluated assuming an impact occurs after some time from the instant of the fault event, during which time the mechanism could increase its momentum. The diagram in fig. 5 reports the worst-case HIC during the execution of a typical pick-and-place task, under a nominal optimal control. Labels in different regions indicate which failure mode determines the worst-case HIC of impacts. It can be noticed from fig. 5 that the worst case impact has maxi-
100 τ =U 1 1,max τ2 = U2,max
90 80
K =0 1 τ2 = U2,max
4
50
HIC [m .5/s ]
60
2
70
τ1 = U1,max τ =U
τ1 = stuck τ2 = stuck
40
2
2,max
K =0 1 τ2 = stuck
30 20 10 0
1
1.5
2
2.5 Time [S]
3
3.5
Fig. 5. HIC values under different failures mode for VIA.
mum HIC value around 100. This has been obtained by suitably choosing the mechanism parameters, in particular by setting U1,max = U,2max = 1.5N m. The performance under these design choice is around 3 sec. Should different values of worst-case HIC be specified, the antagonistic VIA mechanism could be designed with different parameters to accommodate for the specifications. It is interesting to compare these results with other existing actuation systems for pHRI. Namely, we will make reference to SEA and to the so-called Distributed Macro-Mini (DM2 ) actuation scheme ([15]). The latter basically consists of a SEA with a small inertia, low torque, high bandwidth actuator connected directly to the link inertia to actively damp oscillations. In order to obtain a meaningful comparison, we consider optimized SEA and DM2 implementations, with equal rotor and link inertia, which obtain
116
G.Boccadamoetal.
asimilarperformanceof3secinthesametask.Fig.6describestheworstcaseHICvaluesobtainedforSEAandDM2 ,respectively.Failuremodeshave beenconsideredwhichareanalogoustothosedescribedabove,asareallother parametersinthesimulations.Itcanbeobservedthat,insomemodes(notably forcontroltorquesdefaultingtotheirmaximumvalue,τ→ Umax ),bothSEA andDM2 arenotassafeasVIA(HIC 100).TheDM2 schemeexhibits slightly better fail safety characteristics than SEA. An explanation of the apparentlysuperiorfail-safetycharacteristicsofantagonisticVIAactuation isthatsuchschemeachievescomparablenominalperformancebyemploying twomotorseachofmuchsmallersizethanwhatnecessaryintheSEAand DM2 .
HIC (SEA) [m2.5/s4]
200 150
1
MAX
τ = UMAX
τ = UMAX τ = UMAX
50 0
τ1 = −UMAX
τ = −U
100
1
1.5
2
2.5 Time [s]
3
HIC (DM2) [m2.5/s4]
300 τ1 = −U1,MAX τ2 = τ2, opt
200 τ1 = U1,MAX τ2 = U2, MAX
100 0
1
1.5
2
2.5 Time [s]
3
3.5
4
τ1 = −U1,MAX τ =τ 2
2, opt
τ1 = U1,MAX τ2 = U2, MAX
3.5
4
Fig. 6. Maximum of the HIC value in various failure cases for SEA (top) and DM2 (bottom).
4 Conclusion We discussed the design of variable stiffness actuation systems based on antagonistic arrangements of nonlinear elastic elements and motors. The basic stiff-and-slow/fast-and-soft idea of the VIA approach was shown to apply and to be effective for realistic models of antagonistic actuation for a single joint. However, a VIA mechanism assembly can sometimes become heavy for practical purposes to be implemented in a serial link robot. It may be advantageous to use the actuators at base instead of respective joints, and motion can be transferred to joints via transmissions, where, the transmission element itself
Optimization and Fail–Safety of Antagonistic Actuation for pHRI
117
can be designed to have nonlinear stiffness, for example in a tendon driven system. In order for the system to guarantee safety of operations in the proximity of humans, its behaviour must remain safe in conditions where functionality is degraded by possible failures. A thorough examination of possible HW and SW failures of the system has been considered, and results have been compared with those of other possible solutions, with favourable result.
Acknowledgments WorkpartiallydonewiththesupportoftheEUNetworkofExcellenceEURON-EuropeanRoboticNetwork,undertheProspectiveResearchProject PHRIDOM
References 1. A. Albu-Schaffer, A. Bicchi, G. Boccadamo, R. Chatila, A. De Luca, A. De Santis,G.Giralt,G.Hirzinger,V.Lippiello,R.Mattone,R.Schiavi,B.Siciliano,G.Tonietti,andL.Villani.Physicalhuman-robotinteractioninanthropic domains: Safety and dependability. 4th IARP/IEEE-EURON Workshop on TechnicalChallengesforDependableRobotsinHumanEnvironments,2005. 2. A.BicchiandG.Tonietti. Fastandsoftarmtactics:Dealingwiththesafety– performance tradeoff in robot arms design and control. IEEE Robotics and AutomationMagazine,Vol.11,No.2,June,2004. 3. D.G.Caldwell,G.A.Medrano-Cerda,andM.J.Goodwin. Braidedpneumatic actuator control of a multi-jointed manipulator. In Systems Engineering in Service of Humans, Proceedings International Conference on, IEEE Systems ManandCybernetics,volume1,pages423–428,1993. 4. G.GiraltandP.Corke,editors. TechnicalChallengeforDependableRobotsin HumanEnvironments,Seoul,Korea,2001.IARP/IEEEWorkshop. 5. J. Guiochet and A. Vilchis. Safety analysis of a medical robot for teleechography. In Second IARP IEEE/RAS joint workshop on Technical ChallengeforDependableRobotsinHumanEnvironments,Toulouse,France,October2002.LAAS-CNRS. 6. J.HeinzmannandA.Zelinsky. Thesafecontrolofhuman–friendlyrobots. In Proc.IEEE/RSJInt.Conf.onIntelligentRobotsandSystems,pages1020–1025, 1999. 7. K. Ikuta, H. Ishii, and M. Nokata. Safety evaluation method of design and controlforhuman-carerobots.TheInternationalJournalofRoboticsResearch, 22(5):281–297,May2003. 8. I.Mizuuchi,R.Tajima,T.Yoshikai,D.Sato,K.Nagashima,M.Inaba,Y.Kuniyoshi,andH.Inoue.Designandcontroloftheflexiblespineofafullytendondrivenhumanoid”kenta”. InProceedingsofthe2002IEEE/RSJIntl.ConferenceonIntelligentRObotsandSystems,EPFL,Lausanne,Switzerland,pages 2527–2532,2002.
118
G.Boccadamoetal.
9. G.A.PrattandM.Williamson. Serieselasticsactuators. InProc.IEEE/RSJ Int.Conf.onIntelligentRobotsandSystems,pages399–406,1995. 10. G.Tonietti,R.Schiavi,andA.Bicchi.Designandcontrolofavariablestiffness actuatorforsafeandfastphysicalhuman/robotinteraction. InProc.ICRA, pages528–533,2005. 11. R.W. van der Linde. Design, analysis, and control of a low power joint for walkingrobots,byphasicactivationofmckibbenmuscles. IEEETransactions onRoboticsandAutomation,15(4):599–604,August1999. 12. J.Versace. Areviewoftheseverityindex. InProc.oftheFifteenthStappCar CrashConference,numberSAEPaperNo.710881,pages771–796.Societyof AutomotiveEngineers,1971. 13. J.Yamaguchi,S.Inoue,D.Nishino,andATakanishi.Developmentofabipedal humanoidrobothavingantagonisticdrivenjointsandthreedoftrunk. InProceedingsoftheIEEE/RSJInternationalConferenceonIntelligentRobotsand Systems,Victoria,B.C.,Canada,pages96–101,1998. 14. J. Yamaguchi, D. Nishino, and A Takanishi. Realization of dynamic biped walkingvaryingjointstiffnessusingantagonisticdrivenjoints. In Proceedings of the IEEE International Conference on Robotics and Automation, Leuven, Belgium,pages2022–2029,1998. 15. M.Zinn,O.Khatib,B.Roth,andJ.K.Salisbury.Anewactuationapproachfor humanfriendlyrobotdesign. In Proc.ofInt.Symp.onExperimentalRobotics -ISER’02,2002.
Bilateral Control of Different Order Teleoperators Jos´e M. Azor´ın1 , Rafael Aracil2 , Jos´e M. Sabater1 , Manuel Ferre2 , Nicol´ as M. Garc´ıa1 and Carlos P´erez1 1
2
Universidad Miguel Hern´ andez de Elche, Avda. de la Universidad s/n, 03202 Elche (Alicante), Spain [email protected] DISAM, ETSII, Universidad Polit´ecnica de Madrid, 28006 Madrid, Spain [email protected]
This paper presents a bilateral control method for teleoperation systems where the master and the slave are modeled by different order transfer functions. The proposed methodology represents the teleoperation system on the state space and it is based in the state convergence between the master and the slave. The method allows that the slave follows the master, and it is able to establish the dynamic behavior of the teleoperation system. The first results obtained when the method is being applied to a commercial teleoperation system in which the master and the slave are modeled by different order discrete transfer functions are shown in this paper.
1 Introduction Often, in a teleoperated system, the interaction force of the slave with the environment is reflected to the operator to improve the task performance. In this case, the teleoperator is bilaterally controlled [1]. The classical bilateral control architectures are the position–position [2] and force–position architecture [3]. Additional control schemes have been proposed in the literature, e.g. the bilateral control for ideal kinesthetic coupling [4] or the bilateral control based in passivity to overcome the time delay problem [5]. Usually the proposed bilateral control schemes consider simple master and slave models of the same order. They do not provide a design procedure to calculate the control system gains when the master and the slave are modeled by different order transfer functions. In [6] we presented a bilateral control method of teleoperation systems with communication time delay. In this method, the teleoperation system is modeled on the continuous-time state space from nth-order linear differential equations that represent the master and the slave. Through the state convergence between the master and the slave, the control method achieves that
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 119–127, 2006. © Springer-Verlag Berlin Heidelberg 2006
120
J.M.Azor´ınetal.
theslavefollowsthemasteranditallowstoestablishthedesireddynamicsof theteleoperationsystem.However,thiscontrolmethodconsidersonlyteleoperationsystemswherethemasterandtheslavearemodeledbydifferential equationsofthesameorder. Thispaperexplainshowthestateconvergencemethodologycanbeusedto controlteleoperationsystemswherethemasterandtheslavearemodeledby differentorderdiscretetransferfunctions.Inaddition,inthispaper,different from [6], the teleoperation system is modeled in the discrete-time domain, andthecommunicationtimedelayisnotconsideredinordertosimplifythe explanation. Clearly, the results presented in this paper could be directly appliedtocontinuous-timeteleoperationsystemswithcommunicationtime delay. Thepaperisorganizedasfollows.Section2describesthebilateralcontrol methodologyofteleoperationsystemsbasedinthestateconvergence.Theapplicationofthismethodologytocontroldifferentorderteleoperationsystems isexplainedinsection3.Section4showstheresultsobtainedwhenamasterandanslavewithdifferentordertransferfunctionshavebeencontrolled usingthemethodproposedinthepaper.Finally,section5summarizesthe conclusionsofthispaper.
2 Bilateral Control by State Convergence This section describes the bilateral control method of teleoperation systems based in the state convergence [6]. However, different from [6], a teleoperation system without communication time delay where the master and the slave are modeled by nth-order discrete transfer functions is considered. The control method is based on the state space formulation and it allows that the slave follows the master through state convergence. The method is able also to establish the desired dynamics of this convergence and the dynamics of the slave manipulator. 2.1 Modeling of the Teleoperation System The modeling of the teleoperation system is based on the state space formulation, Fig. 1. The master system is represented in the state space like: xm (k + 1) = Gm xm (k) + Hm um (k) ym (k) = Cm xm (k)
(1)
and the slave system is represented similarly. The matrices that appear in the model are: G2 , influence in the slave of the operator force; Km and Ks , feedback matrices of the master and slave state; and Rm and Rs , interaction slave - master, and interaction master slave.
Bilateral Control of Different Order Teleoperators
121
Km
MASTER Fm
+
um
+ +
z
+
ym Cm
+
Rm
G2
xm
-1
Hm
Gm
Rs
SLAVE
+
+
us
+
-1
Hs
+
z
xs
Cs
ys
+
Gs
Ks
Fig. 1. Modeling of the teleoperation system
From the model shown in Fig. 1, the next state equation of the teleoperation system is obtained: xs (k + 1) Gs + Hs Ks Hs R s = Hm R m Gm + H m K m xm (k + 1)
xs (k) Hs G2 Fm (k) + Hm xm (k) (2) where Fm represents the force that the operator applies to the master. The master and the slave are modeled by nth-order discrete transfer functions. The representation of the master is given by: Gm (z) =
zn
bmn−1 z n−1 + · · · + bm1 z + bm0 + amn−1 z n−1 + · · · + am1 z + am0
(3)
and the representation of the slave is analogous. The master and the slave are modeled in the state space using the controller canonical form. Considering this master and slave represention, the matrices Km , Ks , Rm and Rs are row vectors of n components, and G2 is a real number. If the environment is modeled by means of a stiffness (ke ), the matrix Ks can be used to incorporate in the teleoperation system model the interaction
122
J.M.Azor´ınetal.
oftheslavewiththeenvironment.Inthesameway,thematrixRm canbe usedtoconsiderforcefeedbackfromtheslavetothemasterassumingaforce feedbackgainkf. 2.2 ControlMethodThroughStateConvergence IntheteleoperationsystemmodelshowninFig.1,thereare3n+1control gainsthatarenecessarytoobtain:Km (ncomponents),Ks (ncomponents), Rs (ncomponents)andG2 (onecomponent).Tocalculatethesecontrolgains itisnecessarytoget3n+1designequations. Applyingthenextlineartransformationtothesystem(2),theerrorstate equationbetweenthemasterandtheslaveisobtained: xs (k) I 0 = Es xs (k) − Em xm (k) Es −Em
xs (k) xm (k)
(4)
where Es = diag{bs0 , · · · , bsn−1 } and Em is similar to Es . Let xe (k) be the error between the slave and the master, xe (k) = Es xs (k)− Em xm (k). If the error converges to zero, the slave output will follow the master output. From the error state equation, n + 1 design equations can be obtained to achieve that the error evolves as an autonomous system, and the slave output follows the master output. In addition, n − 1 conditions between the numerator coefficients of the master and slave discrete transfer functions can be derived. If these conditions are not satisfied, there will be an error between the master and slave output. When the error evolves like an autonomous system, the characteristic polynomial of the system can be calculated. From this polynomial, 2n design equations can be obtained to establish the slave and the slave-master error dynamics. These 2n equations plus the n + 1 previous equations, form a system of 3n + 1 equations. Solving this equations system, it will be possible to obtain the 3n + 1 control gains. With these control gains, the slave manipulator will follow the master and the dynamics of the error and the slave will be established.
3ApplicationtoTeleoperationSystems ofDifferentOrder Thecontrolmethodologythatconsidersteleoperationsystemswherethemasterandtheslavearemodeledbydiscretetransferfunctions(DTFs)ofthe sameorderhasbeenpresentedintheprevioussection.Thissectionexplains howtousethedesignequationstocontrolteleoperationsystemswherethe masterandtheslavearemodeledbydifferentorderDTFs. IfthemasterandtheslavearemodeledbyDTFsofthesameorderthe controlmethodallowsthattheslavefollowsthemaster,anditisablealso
Bilateral Control of Different Order Teleoperators
123
to establish the dynamics of the slave and the slave-master error (i.e. to fix the n poles of the slave and the n poles of the error). If the master and the slave are modeled by different order DTFs, there are two options to design the control system: to increase the order of the smaller DTF, or to reduce the order of the higher DTF. Next, each option is explained describing its effects in the control system. 3.1 To Increase the Order of the Smaller DTF In this option the order of the smaller DTF is increased to achieve that it has the same order of the higher DTF. The order is increased adding the necessary pole-zero pairs. Pole-zero pairs are added to avoid the increment of the delay attached to the system. The pole is placed in such a way that it does not affect to the system dynamics. The zero is placed near the pole, but avoiding the exact cancellation, in order to increase really the DTF order. It must be checked that the increased DTF has the same static gain that the original DTF. Below the effects of increasing the order of the smaller DTF in the control are described. First, if a zero is added, the conditions to achieve the evolution of the error as an autonomous system could not be verified, and a constant error between the master and the slave could exist. This is not a problem, because, e.g., in a teleoperation system where the master dimensions are smaller than the slave dimensions, and the output of both was the cartesian position, it will not be desirable that both positions were the same. The effects of increasing the master order are: • The slave dynamics is completely established. • All the desired error poles are fixed. However, because of the fact that the error has the same number of poles that the slave order, and the master order is smaller, part of the error depending on the master would be artificially established. On the other hand, the effects of increasing the slave order are: • Additional poles of the slave dynamics are established because the slave order has been increased. This is not a problem because all the poles of the original slave dynamics are fixed. • All the desired error poles are established. However, in a similar way to the previous case, part of the error depending on the slave would be artificially established 3.2 To Reduce the Order of the Higher DTF In this option the order of the higher DTF is reduced to achieve that it has the same order of the smaller DTF. That is, the same number of pole-zero pairs that there are in the smaller DTF will be considered to design the control
124
J.M.Azor´ınetal.
system.Thereforesomepole-zeropairsofthehigherDTFwillberemovedin thedesign.Inordertoremoveapole-zeropairitwouldbedesirabletoselect thepolefurtherfromthedominantpoles,andtoselectazeronearthepole toconsiderapole-zerocancellation. BelowtheeffectsofreducingtheorderofthehigherDTFinthecontrol aredescribed.First,ifadominantpoleisremoved,orazerothatisnotnear thepoleselectedisremoved,thereducedDTFwillbeverydifferenttothe originalDTFandthecontrolgainsobtainedcouldnotbeappliedtothereal teleoperationsystem.Inaddition,whenazeroisremoved,theconditionsto achievetheevolutionoftheerrorasautonomoussystemcouldnotbeverified. Theeffectsofreducingtheslaveorderare: • Theslavedynamicsisnotcompletelyestablished,becausefewerpoleshave beenconsideredinthedesignphase. • Onlythenumberoferrorpolesfixedbythemasterordercanbeestablished. Therefore, part of the error depending on the slave will not be established. Ontheotherhand,theeffectsofreducingthemasterorderare: • Theslavedynamicsiscompletelyestablished. • Only the number of error poles fixed by the slave order can be established.Therefore,partoftheerrordependingonthemasterwillnotbe established. 3.3 ComparisonofOptions IftheorderofthesmallerDTFisincreased,thecontrolmethodcancompletely establishthedynamicsoftheteleoperationsystem,i.e.theslaveandtheerror dynamicsisfixed.Ontheotherhand,iftheorderofthehigherDTFisreduced, thedynamicsoftheteleoperationsystemcannotbecompletelyestablished. Inthiscase,partoftheerrordynamicsdependingonthereducedDTFcannot befixed.Inaddition,iftheslaveorderisreduced,theslavedynamicswillnot becompletelyestablished.Inbothoptions,whentheDTForderisincreased orreduced,theconditionstoachievetheevolutionoftheerrorasautonomous systemcouldnotbeverified,butthisisnotaproblem.Thereforethebest optiontodesignthecontrolsystemofateleoperatorwherethemasterand theslavehavedifferentorderisincreasingtheorderofthesmallerDTF. The control gains that are obtained modifying the master or the slave DTFinthedesignphasemustbeusedtocontroltherealteleoperationsystem.Thestateconvergencemethodologyallowsthatthecontrolgainscanbe directlyappliedtotherealteleoperationsystembecauseoftherobustnessof thecontrolmethodtotheuncertaintyofthedesignparameters[7].Onthe otherhand,stateobserversmustbedesignedtoapplythecontrol.Inthecase ofthereducedDTF,thestateobservermustestimatethenumberofstate variablesfixedbythehigherDTF,soitmustbedesignedusingtheincreased DTF.
Bilateral Control of Different Order Teleoperators
125
4 Experimental Results The EL (Elbow Pivot) joint of the Grips teleoperation system has been bilaterally controlled in simulation mode using the state convergence methodology. The Grips manipulator system from Kraft Telerobotics is a six DOF teleoperator with force-feedback. The slave is an hydraulic manipulator, and the master is powered by electrical motors. The identified DTFs for the EL joint of the master and the slave, considering a sample time of T = 0.0005s, are, respectively: 3.79 × 10−7 z 2 (5) Gm (z) = 3 z − 2.974z 2 + 2.948z − 0.9741 1.357 × 10−5 z 1.357 × 10−5 z = (6) Gs (z) = 2 z − 1.9928z + 0.9928 (z − 1)(z − 0.9928) The process of identification has been done using the corresponding Matlab Toolbox. Both DTFs have been identified considering the Box-Jenkins model. As it has been previously explained, when the order of the master and slave DTF is different, the best option to design the control system is increasing the order of the smaller DTF. Therefore the order of the slave DTF must be increased. A pole-zero pair has been added in the slave DTF. The pole has been added far from the dominant poles to avoid that it affects to the system dynamics. And the zero has been placed near the pole: 1.357 × 10−5 z(z − 0.93) 1.357 × 10−5 z 2 − 1.262 × 10−5 z = 3 (z − 1)(z − 0.9928)(z − 0.9302) z − 2.923z 2 + 2.8465z − 0.9235 (7) In order to solve the design equations of the control method, all the numerator coefficients of (5) and (7) must not be null. In this case, the coefficients {bm1 , bm0 , bs0 } are null, i.e. there are some zeros placed in z = 0. For this reason, the zeros placed in z = 0 have been slightly modified, and the DTFs considered for the design are the following: Gs (z) =
3.79 × 10−7 z 2 + 10−8 z + 10−8 z 3 − 2.974z 2 + 2.948z − 0.9741
(8)
1.357 × 10−5 z 2 + 3.58 × 10−7 z + 3.58 × 10−7 z 3 − 2.923z 2 + 2.8465z − 0.9235
(10)
Gm (z) =
1.357 × 10−5 z 2 − 1.262 × 10−5 z + 10−8 (9) z 3 − 2.923z 2 + 2.8465z − 0.9235 In addition, to verify the conditions of the control method between the numerator coefficients of Gm (z) and Gs (z), and to achieve the evolution of the error as autonomous system, the numerator coefficients of Gs (z) in (9) have been modified for the design in this way: Gs (z) =
Gs (z) =
The control gains have been obtained considering that the force feedback gain is kf = 0.1, the slave interacts with a soft environment (ke = 10N m/rad), and that the poles of the error and the slave are placed in z = 0.95.
126
J.M.Azor´ınetal.
Ifthecontrolgainsareappliedtotheteleoperationsystemconsideredfor thedesign,DTF(8)and(10),theslavefollowsthemasterwithouterror,left partofFig.2.However,iftheyareappliedtotheincreasedoriginalsystem, DTF(5)and(7),theslavefollowsthemaster,butthereisaconstanterror becausetheconditionstoachievetheevolutionoftheerrorasautonomous systemarenotverified,rightpartofFig.2.Inbothcases,aunitaryconstant operatorforcehasbeenconsidered. −3
3.5
−3
x 10
3.5 3
master slave
2.5
master and slave position
master and slave position
3
2 1.5 1 0.5 0
x 10
master slave
2.5 2 1.5 1 0.5
0
0.5
1 time (s)
1.5
2
0
0
0.5
1 time (s)
1.5
2
Fig. 2. Master and slave position considering DTF (8) and (10) (left part), and DTF (5) and (7) (right part)
In order to verify the performance of the control over the original teleoperation system, two state observers have been designed for the identified DTFs of the EL joint (DTF (5) and (6)). Both state observers must estimate three state variables. The design of the state observer for the DTF (5) is straight. However, as the order of the DTF (6) is two, the observer has been designed from the DTF (7). In the left part of Fig. 3 the position evolution of the original teleoperation system using the designed observers is shown. It can be verified that the results obtained are the same as the shown in the right part of Fig. 2. The control signals applied to the master and the slave are shown in the right part of Fig. 3. In this figure, the slave control signal has initial values close to zero, but not null. These results allow verify that the control method by state convergence can be applied to teleoperation systems where the master and the slave are modeled by different order DTFs.
5 Conclusions This paper has presented a bilateral control method for teleoperation systems where the master and the slave are modeled by different order transfer functions. It has been verified that the order of the smaller DTF must be increased
Bilateral Control of Different Order Teleoperators
127
−3
3.5
x 10
1
master slave
master and slave control signal
3
master and slave position
master slave
0.8
2.5 2 1.5 1
0.6 0.4 0.2 0 −0.2 −0.4 −0.6
0.5
−0.8 0
0
0.2
0.4 0.6 time (s)
0.8
1
0
0.2
0.4 0.6 time (s)
0.8
1
Fig. 3. Master and slave position (left part), and control signal (right part)
to design the control system. Then, state observers that estimate the number of state variables fixed by the higher DTF must be designed in order to apply the control. The control method allows that the slave follows the master, and it is able to establish the dynamic behavior of the teleoperation system. In the paper the performance of the control scheme has been tested in simulation mode over the EL joint of the Grips teleoperation system. The next work is controlling the real joint. The final aim is controlling all the joints of the system applying the state convergence methodology.
References 1. B. Hannaford. Stability and performace tradeoffs in bilateral telemanipulation. In Proc. IEEE Int. Conf. on Robotics and Automation, pages 1764–1767, 1989. 2. R.C. Goertz and al. The anl model 3 master-slave manupulator design and use in a cave. In Proc. 9th Conf. Hot Lab. Equip., volume 121, 1961. 3. C.R. Flatau. Sm 229, a new compact servo master-slave manipulator. In Proc. 25th Remote Syst. Tech. Div. Conf., volume 169, 1977. 4. Y. Yokokohji and T. Yoshikawa. Bilateral control of master-slave manipulators for ideal kinesthetic couplig–formulation and experiment. IEEE Transactions on Robotics and Automation, 10 (5):605–620, 1994. 5. R.J. Anderson and M.W. Spong. Bilateral control for teleoperators with time delay. IEEE Transactions on Automatic Control, 34 (5):494–501, 1989. 6. J. M. Azorin, O. Reinoso, R. Aracil, and M. Ferre. Generalized control method by state convergence of teleoperation systems with time delay. Automatica, 40 (9):1575–1582, 2004. 7. J. M. Azorin, O. Reinoso, R. Aracil, and M. Ferre. Control of teleoperators with communication time delay through state convergence. Journal of Robotic Systems, 21 (4):167–182, April 2004.
NavigationandPlanninginanUnknown EnvironmentUsingVisionandaCognitiveMap NicolasCuperlier,MathiasQuoy,andPhilippeGaussier ETIS-UMR8051 UniversitdeCergy-Pontoise-ENSEA 6,AvenueduPonceau 95014Cergy-PontoiseFrance [email protected] Summary. WepresentaframeworkforSimultaneousLocalizationandMapbuildingofanunknownenvironmentbasedonvisionanddead-reckoningsystems.An omnidirectional camera gives a panoramic image from which no a priori defined landmarksareextracted.Thesetoflandmarksandtheirazimuthrelativetothe northgivenbyacompassdefinesaparticularlocationwithoutanyneedofanexternalenvironmentmap.Transitionsbetweentwolocationsareexplicitlycoded.They aresimultaneouslyusedintwolayersofourarchitecture.Firsttoconstruct,during exploration(latentlearning),agraph(ourcognitivemap)oftheenvironmentwhere thelinksarereinforcedwhenthepathisused.Andsecond,tobeassociated,onan anotherlayer,withtheintegratedmovementusedforgoingfromoneplacetothe other.Duringtheplanningphase,theactivityoftransitioncodingfortherequired goal in the cognitive map spreads along the arcs of this graph giving transitions (nodes)anhighervaluetotheonescloserfromthisgoal.Wewillshowthat,when planningtoreachagoalinthisenvironmentisneeded,theinteractionsofthesetwo levels can lead to the selection of multiple transitions corresponding to the most activatedonesaccordingtothecurrentplace.Thoseproposedtransitionsarefinally exploitedbyadynamicalsystem(neuralfield)mergingtheseinformations.Stable solutionofthissystemgivesauniquemovementvectortoapply.Experimentalresultsunderlinetheinterestofsuchasoftcompetitionoftransitioninformationover astrictonetogetamoreaccurategeneralizationonthemovementselection.
1 Introduction Path planning requires from the agent or the robot to select the appropriate action to perform. This task might be complex when several actions are possible, and so different approaches have been proposed to choose what to do next. Experiments carried out on rats have led to the definition of cognitive maps used for path planning [21]. Most of cognitive maps models are based on graphs showing how to go from one place to an other [2, 3]. They mainly differ in the way they use the map in order to find the shortest path, in the way
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 129–142, 2006. © Springer-Verlag Berlin Heidelberg 2006
130
N.Cuperlier,M.Quoy,andP.Gaussier
theyreacttodynamicalenvironmentchanges,andinthewaytheyachieve contradictory goal satisfactions. One can refer to [9, 12] for a comparative reviewoflocalisationandmappingmodels.Manymethodsrelyonthecombinationofdifferentalgorithmsthathavetobetriggeredappropriately(and concurrently)whennecessary.Forinstance,localisationmayinvolvedifferent sensors(laser,ultra-sound,visualfeaturerecognition...)thathavetobechosenappropriately.Someworksuseruled-basedalgorithms,classicalfunctional approach,thatcanexhibitthedesiredbehaviors,wewillnotdiscussthemin thispaper,butonecanreferto[8]. Instead,otherworkstrytolookatwhatthenaturedoesbytakinginspirationfromneurobiologytodesigncontrolarchitectures.Thereareatleasttwo reasonsforthis: • first,gettingrobust,adaptive,opportunisticandready-madesolutionsfor controlarchitecture. • second,ifroboticresultscanbecomparedtoexperimentalresultsinvolving severalpartsofthebrain,whicharegenerallydifficulttostudyduetoits complexity,itcanhelpneurobiologiststounderstandhowaneurobiological modelbehaves. Hence we propose here a unified neuronal framework based on an hippocampalandandprefrontalmodelwherevision,placerecognitionanddeadreckoningarefullyintegrated(seeFig.4foranoverviewofthearchitecture). Thismodelreliesonatopologicalmap:theenvironmentiscodedviaasetof distinctivenodesandbythewayarobotcangofromonenodetoanother.In ourwork,thosenodesarenotdirectlyplacesoftheenvironmentbutrather the transition between two of them. No cartesian metric informations and no occupancy grid are used to construct the map. Localisation is achieved usingabiomimeticmodeldesignedtoemulatetheneuralactivityofparticular neurons found in the rat hippocampus named place cells. Those cells learndirectionandidentity(recognition)ofpunctuallandmarksleadingtoa placedefinition.Akeypointfortheunderstandingofthismodelisthedistinctionbetweentransitioncodingforthesuccessionoftwoplacecellsatthe recognition(visual)levelandmotortransitionencoding,onamotorlevel,the integratedmovementperformedtogofromoneplacetotheother.Whereas thesetwokindsoftransitionarestronglydependentandlinkedinaunique way,theydonothavethesamemodality:oneisrelatedtovisioncodingand theotherisrelatedtomotorcoding.Keepingthisbasicdistinctioninmind, wecanlisttheassetsofthismodel: • autonomous landmarks extraction based on characteristic points (section2) • autonomousplacebuildingviaplace-cells-likeneurons:therearenoapriori predefinedsquares,orworldmodel(section3) • autonomouscreationoftransitionsatrecognitionlevel(recognitiontransitions).Aneuroncodesthesuccessionoftworecognizedplacecellswithout anycombinatorialexplosion(section4).
NavigationandPlanninginanUnknownEnvironment
131
• autonomouscognitivemapbuildingbasedonthoserecognitiontransitions betweenplaces,givingtopologicalinformationslikeadjacencyoftwotransitionslinked(i.e:iftransitionsABandBCarelinked:adjacencyofthe destinationplacecellABwiththeplacecelloforigineoftransitioncell BC).But,thismapcanalsogiveakindofmetricviathevalueofthearc ofthisgraph.(section5). • autonomousassociationofrecognitiontransitionswithintegratedmovementgivingmotortransitionswhichcanbeusedforplanning(section6). • autonomousplanningusingboththecognitivemap(graphofrecognition transitions)andthecorrespondingmotortransitions(section7) • stablemovementgivenbythestablefixedpointsolutionofadynamical system(section8) Drawbackswillbeleftforconclusion.
2AutonomousLandmarkExtractionBasedon CharacteristicPoints Images are taken by a panoramic camera at a low resolution. This allows tohandlelighterimagesthatmaybeprocessedinarealtime.Inorderto eliminateproblemsinducedbyluminancevariability,weonlyusethegradient imageasinputofthesystem(a1500 × 240pixelsimageextractedfromthe 640× 480pixelspanoramicimagewhichisoriginallycircular).Twoprocesses thenoccurinparallel: • First,curvaturepointsareextractedfromthisgradientimagebyDifference Of Gaussian (D.O.G) filtering. Those curvature points are robust focal pointsduetothelowresolution.Eachfocalpointisthecenterofa32x32 pixels small image giving the local visual area extracted around it (cf. circlesonFig.1).Thisimageisbinarizedthroughalog-polartransform [19,11]andnextitislearnedonaneuroncodingforthislandmark.Asoft competitionbetween landmark neurons, allowingseveralinterpretations ofagivenlocalsnapshot,isthencomputed • Second,eachlandmarkislinkedwithitsangularpositionrelativetothe northgivenbyacompass[20,14]. Inapanoramicimage,32(landmark,azimuth)pairsareextracted(seeFig. 1).Thus,thisvisualsystemprovidesbothawhat anda whereinformations: therecognitionofa32× 32pixelssnapshotinlog-polarcoordinates,andthe azimuthofthefocalpoint.What and where informationsarethenmerged inaproductspacethatmemorizestheincominginputsduringagiventime. Thenumberoflandmarksneededisabalancebetweentherobustnessofthe algorithmandthespeedoftheprocess.Ifalllandmarkswerefullyrecognized, onlythreeofthemwillbeneeded.Butassomeofthemmaynotberecognized incaseofchangingconditions(luminance;occlusion),takingagreaternumber
132
N.Cuperlier,M.Quoy,andP.Gaussier
Fig. 1. Image taken from a panoramic camera. Below are 15 examples of 32 × 32 log-polar transforms taken as landmarks and their corresponding position in the image.
is enough to guarantee the robustness. Moreover the log-polar transform gives some rotation and depth robustness.
3AutonomousPlaceBuilding Eachsetof(landmark,azimuth)pairs,mergedintheproductspace,islearned andthuscharacterizesonelocation.Theneuroncodingforthislocationis called a “place cell” (PC) as the one found in the rat’s hippocampus [14]. Placecell’sactivityistheresultofamatchingfunctioncomputingthedistance betweenthelearnedsetandthecurrentset(thedistanceiscomputedonly ontherecruitedneurons).Thus,theactivityofthek th PCcanbeexpressed asfollows: Pk = N
1 lk
NL i=1
L ωik .fs (Li ).gd (θik − θi )
(1)
L with lk = i=1 ωik the number of landmarks used for the k th PC, where ωik = {0, 1} expresses the fact that landmark i has been used to encode PC k, with NL the number of learned landmarks, Li the activity of the landmark i, fs (x) the activation function of the neurons in the landmark recognition L group, θik the learned azimuth of the ith landmark for the k th PC, θi the azimuth of the current local view interpreted as the landmark i. d is the angular diffusion parameter which defines the shape of the function gd (x). The purpose of fs (x) and gd (x) is to adapt respectively the dynamics of what and where groups of neurons. They are defined as follow :
NavigationandPlanninginanUnknownEnvironment
gd(x)= 1 − fs (x) =
1 1−s
|x| d.π
133
+
[x− s]
+
where[x]+ =xif x>0, and 0otherwise. Thesparameterrescalestheactivityofthelandmarkneuronoversbetween0and1.Thedparametermodulatestheweightoftheangulardisplacement. Moreinformationontheneuralmodelforplacecellcodingmaybefound in[10,7].
An
R.T
Outdistance
Fig. 2. Idealised shape of a place cell (An ) response, according to the distance to the exact location where it has been learned. This cell has its maximal response in this place.
Recruitment of a new place cell for encoding a new location is performed autonomously, without any external signal. If the activities of all previously learned place cells are below a given recognition threshold (R.T), then a new neuron is recruited for coding this recently discovered location (see Fig. 2). Hence, the density of location learned depends on the level of this threshold. But it depends also on the position of the robot in the environment. Namely, more locations are learned near walls or doors because a fast change in the angular position of near landmarks, or in the (dis)appearance of landmarks may occur. In other locations, small changes produce a small variation in the place cell activity (see Fig. 2) . If at a given place, several cells responds with an activity greater than the R.T, a competition takes place so that the most activated cell wins and codes this location. These cells are created during the exploration of the unknown environment. We use random exploration but naturally other kind of algorithms can be used. At the end of this task, the environment is fully covered by place cells, so that in any part of it a place cell responds specifically for it (see Fig. 3.).
134
N.Cuperlier,M.Quoy,andP.Gaussier
Fig. 3. A simulated environment fully explored. Each region represents the response domain for which a particular place cell win the competition for recognition. After a full exploration the entire environment is covered by the place cell population.
4AutonomousBuildingofTransitions atRecognitionLevel Twosuccessivelyreachedplacesarecodedbya transitioncell (seeFig.4). Hencetwosuccessivelyreachedplaces(A,thenB)arecodedbya transition cell(AB).Arelevantquestionisaboutthegrowthofthenumberofthesecells. Beforetoshowtheexperimentalresults,wealreadycanmaketwoimportant remarquesaboutthisgrowth. • First,thisgrowthisintimatelylinkedwiththenumberofplacecells.This lastonemainlydependsontwoparameters: – thevalueofR.T:thehigherR.Tisthebiggeristhenumberofcell created. – thecomplexityoftheenvironment:thenumberandthelocationofits landmarksandthenumberofobstaclesfoundinside. • Second,wedonotcreateallpossibletransitionsbutonlyphysicallyfeasible transitionsbetween“adjacent”placecells.Andsincethenumberofaplace cellneighboursisnecessarylimited(seeFig3),thenumberoftransition createdisalsolimited. Hence,wehavestudiedtheratiobetweencreatedtransitioncellsovercreated placecellsforthreeenvironmentsofincreasingcomplexityaccordingtotheir obstacle configuration (a single, a two and a fourroomenvironment). The testshavebeenperformedinsimulationusingavirtualrobot(oranimat). Foreachsimulation,10animatshaveexploredtheirenvironmentfor50000 cycles.Thisnumberhasbeenchosenhighenoughtobesurethattheanimat haslearnedacompletecognitivemap.Theresultsshownherearetheaverage onthese10animatresults.Theratioremainsstablearoundthemeanvalue 5.45forallenvironments(seetable1.).
NavigationandPlanninginanUnknownEnvironment
Env / RT nbp nbt ratio nbp nbt ratio nbp nbt ratio
0.97 133.8(2.85) 735.8(19.80) 5.49(0.06) 606.2(6.89) 3389.2(56.38) 5.59(0.08) 643.7(9,88) 3281,2(48,80) 5.09(0,04)
135
Table 1.Ratio of the number of place cells (nbp) created over the number of transitions created (nbt) according to the number of room in the environment:withoneroom(topline),withtwo rooms(middleline)andfourrooms(bottom line). Standard deviation is given intobrackets.Thisratioremainsstable. Therearefivetimesmoretransitioncells thanplacecells.
Indeed, only a few transitions can be created from a given place cell, since a transition is a link between “adjacent” place cells and since the number of a place cell neighbours is necessary limited. So there is no combinatorial explosion on the number of created transitions and we do notneeda”fullmatrix”tocreatethetransitions:amatrixofonlyM*Nis enough.WithMthenumberofplacecells,andNthemaximalnumberof possibleneighbours.
Place recognition t−1
Azimuth
...
... ... Landmark
Place recognition t−1 Cognitive map
...
Landmark − azimuth Place recognition t
Transition map ...
...
Recognition transitions Place recognition t
Motor transitions
... ...
Motor command
One to one links − No learning One to all links − Learning
Fig. 4. Sketch of the model. From left to the right: merging landmarks and their azimuth in a product space, then learning of the corresponding set on a place cell. Two successive place cells (the one at time t and the previous one at t − 1) define a transition cell. They are used to build up the cognitive map and are also linked with the corresponding motor transition that integrated the movement performed.
136
N.Cuperlier,M.Quoy,andP.Gaussier
5AutonomousCognitiveMapBuilding Since our robotic model is inspired from the animat approach [13] we use three contradictory animal like motivations (eating, drinking, and resting). Eachoneisassociatedwithasatisfactionlevelthatdecreasesovertimeand increaseswhentherobotisonthepropersource.Whenalevelofsatisfaction fallsbellowagiventhreshold,thecorrespondingmotivationistriggeredso thattherobothastoreachaplaceallowingtosatisfythisneed.Hencethis place becomes the goal to reach. More sources can be added and one can increasethenumberofsourcesassociatedwithagivenmotivation.Eachtime anewtransitioniscreated,anewnodeisrecruitedinthecognitivemap.This nodeisthenlinkedwiththeprevioustransitionused.Whenatransitionleads toaplacecellwhereasourcecanbefound,alinkbetweenthecorresponding motivationandthemostactivenodeonthemapiscreatedandsettoone (latentlearning),otherwisethislinkissettozero.Aftersometime,exploring theenvironmentleadstothecreationofthecognitivemap(seeFig.5).This mapmaybeseenasagraphwhereeachnodeisatransitionandthearcs thefactthatthepathbetweenthesetwotransitionswasused.Wecangive afixedvalue(lowerthanone)toeachlinkatthecreationtime.Thisvalue is increased if the link is used, and decreased if it is not. After some time passedintheenvironment,somelinksarereinforced.Theselinkscorrespond topathsthatareoftenused.Inparticular,thisisthecasewhensomeparticularlocationshavetobereachedmoreoftenthanothers(seesection7)[15].
Fig. 5. Cognitive map (in red) build by exploration of the environment. The triangles give the successive robot positions starting from the right to the goal (on the left). Landmarks are represented by blue crosses. The blue rectangle are obstacles.
NavigationandPlanninginanUnknownEnvironment
137
6AutonomousMotorTransitionsCreation Eachofthesecellsislinkedwiththedirectionusedtogofromthestarting locationtotheendinglocation.Forinstance,goingfromplaceAtoplaceB createsarecognitiontransitioncellAB andthecorrespondingnodeonthe map.Inthesametimeanothertransitioncelliscreatedonthemotorlevel. Thismotortransitionassociatesthedirection(relativetothenorth)forgoing fromAtoBwiththenodeABonthemap.ThisdirectionintegratesalldirectionchangesperformedbetweenAandthecreationofBusingrobotwheel encoderstocomputeelementarydisplacementvectors.Directionchangesare inducedbyanewmovementvectorgeneratedbytheexplorationmechanism (random exploration) or due to the obstacle avoidance mechanism. Hence theintegratedvector(I.V)takescareofallthesemovementchanges.Each timethistransitionisperformed,theI.V.isaveragedwiththecorresponding learneddirection.TheI.Visthenresetwhenenteringadifferentplacecell. Thenormofthisvectorisalsocomputedinthesameway.
7AutonomousPlanningUsingtheCognitiveMapand MotorTransitions Whenagoalhastobereached,thetransitionsleadingtoitareactivatedvia thelinkslearnedduringexplorationbetweenthosetransitionsandthecorrespondingmotivation.Thisactivationisthendiffusedonthecognitivemap graph,eachnodetakingthemaximalincomingvaluewhichistheproduct betweentheweightonthelink(lowerthanone)andtheactivityofthenode sendingthelink.Afterstabilization,thisdiffusionprocessgivestheshortest pathbetweenallnodesandthegoalnodes.Thisisaneuralversionofthe Bellman-Fordalgorithm[5,17](seeFig.6). WhentherobotisinaparticularlocationA,allpossibletransitionsbeginning withAarepossible.Thetop-downeffectofthecognitivemapistobiasthe possibletransitionssuchthattheoneschosenbythecognitivemaphavea highervalue.Thisvaluereflectsatopologicaldistancemeasure:thenumber of intermediate node to get in touch with to actually reach the goal. This small bias is enough to select/filter the appropriate transitions via a competitionmechanism(howeverseesection8).Thismechanismrealizesasoft competition:severalmotortransitionsareproposedatthislevel.Theyallow tocomputeamoreaccuratedirectionthanastrictcompetitionsincetransitionsstartingonlyfromplacesreallyclosetothecurrentoneandtopologically closetothegoal(onthegraph)areselected.Finalselectionofthemotoractionresultsfromthemergingoftheseglobaldecisionswithlocalconstraints suchasobstacleavoidance,robotinertia...
138
N.Cuperlier,M.Quoy,andP.Gaussier 1 MotivA
Go al m ap
MotivB
Pl ac eR ec og ni tio n
0.81
Transition
0.9
1
W=0.9
0.81 Ni
Pi
0.9
Fig. 6. Diffusion of the activity in the graph corresponding to the cognitive map. Diffusion is starting from the goal. Each node keeps the maximal activity coming from its neighbours. The activity is the product between the value of the link and the activity of the sending node.
8 Movement and neural field dynamics As seen in section 7, after planning, different movements are proposed. a simple competition mechanism selects the neuron with the higher value. But how may these informations be used ? We have first tested strict competition between them, but why do not also use other transitions that contains interesting informations about the agent location context ? So we now have several transitions to be taken into account for the movement. The solution used for having a stable continuous direction to follow is to define a dynamical system where the stable fixed point solution is the direction to follow. This is achieved using a neural field [1, 18, 16]. τ.
df (x, t) = −f (x, t) + I(x, t) + h + dt
z∈Vx
w(z).f (x − z, t)dz
(2)
Where f (x, t) is the activity of neuron x, at time t. I(x, t) is the input to the system. h is a negative constant. τ is the relaxation rate of the system. w is the interaction kernel in the neural field activation. A difference of Gaussian (DOG) models these lateral interactions that can be excitatory or inhibitory. Vx is the lateral interaction interval that defines the neighbourhood. Without inputs the constant h ensures the stability of the neural field homogeneous pattern since f (x, t) = h. In the following, the x dimension is an angle (direction to follow according to the north). This equation allows the computation of attractors corresponding to fixed points of the dynamics and to local maxima of the neural field activity. A stable direction to follow is reached when the system is on any of the attractors. The angle of a candidate transition is used as input. The intensity of this input depends on the corresponding goal transition activity, but also on its origin place cell recognition activity (see Fig 7). If only one transition is proposed, there will be only one input with an angle xtarg = x∗ and it erects only one
NavigationandPlanninginanUnknownEnvironment
139
Animat Location
B
AB
A
B
BD
A D C
level of place recognition in the neighborhood
AC
global mvt
C
CD
D
level of place recognition in the neighborhood + transitions cells and actions
Fig.7.The merging mechanism allows to get a better direction (global movement) than the use of the single information obtained from current transition (BD). It takes into account the previous movement performed and the transitions predicted from close enough place cell (C).
attractor x∗ =xtarg ontheneuralfield.Ifxc isthecurrentorientationofthe robot,itsrotationspeedwillbeproportionaltow=x˙ = df (x,t) dt |xc . Merging of several transition informations depends on the distance between them. Indeed if the inputs are spatially close, the dynamics give rise to a single attractor corresponding to the average of them. Otherwise, if we progressively amplify the distance between inputs, a bifurcation point appears for a critical distance, and the previous attractor becomes a repellor and two new attractors emerge. Obstacles are detected by 12 infra-red sensors. A reflex behavior is triggered by a Braitenberg-like architecture [6]. When an obstacle is detected on a given direction, the reflex system will generate a negative input in this orientation. Hence the bubble of the neural field activity will move. Consequently the computed direction will reflect this change and allow to avoid this obstacle (see Fig 8). Oscillations between two possible directions are avoided by the hysteresis property of this input competition/cooperation mechanism. It is possible to adjust this distance to a correct value by calibrating the two elements responsible for this effect: spatial filtering is obtained by the convolution of the Dirac like signal coming from motor transition information with a Gaussian and taking it as the input to the system. This combined with the lateral interactions allows the fusion of distinct input as a same attractor. The larger the curve, the larger the merging will be.
9 Conclusion Our model currently running on robots (Koala robots and Labo3 robots) has interesting properties in terms of autonomous behavior. However, this autonomy has some drawbacks:
140
N.Cuperlier,M.Quoy,andP.Gaussier
Robot at ending position
Robot at starting position B
Inputs to the neural field
Location A:
A
Robot trajectory
Obstacle input Neural field
Final movement selection
Neural field
Final movement selection
Joystick direction input
Location B : Inputs to the neural field
Obstacle input
Joystick direction input
Fig.8.Top: Traectory of a Labo3 robot in an open environment with obstacles. The movement direction is given by a oystick input. We focus on the final action selection on two points in this environment: A and B. During all the traectory, the movement ordered by the oystick is go straight. Turn movements in the robot traectory are only due to obstacle avoidance. Middle (A): Neural field activity without any obstacle. The direction taken corresponds to the oystick input. Bottom (B): Neural field activity with an obstacle. The obstacle shifts the neural field maximal activity leading to a turning move.
NavigationandPlanninginanUnknownEnvironment
141
• wearenotabletobuildacartesianmapoftheenvironmentbecauseall locationlearnedarerobotcentered.However,theplacesinthecognitive mapandthedirectionsusedgiveaskeletonoftheenvironment. • we have no information about the exactsize of the rooms or corridors. Again,thecognitivemaponlygivesasketchoftheenvironment. • someparametershavetobeset:therecognitionthreshold(section3)and thediffusionsizeoftheinteractionkerneloftheneuralfield(section8).The firstonedeterminesthedensityofbuildplaces.Thehigherthethreshold, themoreplacesarecreated.Thesecondparameterhastobetunedforeach robotdependingonitssizeandonthepositionoftheinfra-redsensorsfor obstacleavoidance.Forinstance,atoohighdiffusionvaluepreventsthe robotfromgoingthroughthedoors. Thetransitionusedinthismodelmayalsobetheelementaryblockofa sequencelearningprocess.Thus,weareabletoproposeaunifiedvisionofthe spatial (navigation) and temporal (memory) functions of the hippocampus [4]. Acknowledgments ThisworkissupportedbytwofrenchACIprograms.Thefirstoneonthemodeling oftheinteractionsbetweenhippocampus,prefontalcortexandbasalgangliaincollaborationwithB.Poucet(CRNC,Marseille)JP.Banquet(INSERMU483)andR. Chatila(LAAS,Toulouse).Thesecondone(neurosciencesintegrativesetcomputationnelles)onthedynamicsofbiologicallyplausibleneuralnetworksincollaboration withM.Samuelides(SupAero,Toulouse),G.Beslon(INSA,Lyon),andE.Dauce (Perceptionetmouvement,Marseille).
References 1. S.Amari.Dynamicsofpatternformationinlateral-inhibitiontypeneuralfields. BiologicalCybernetics,27:77–87,1977. 2. M.ArbibandI.Lieblich. Motivationallearningofspatialbehavior. InJ.Metzler,editor,SystemsNeuroscience,pages221–239.AcademicPress,1977. 3. I.BachelderandA.Waxman.Mobilerobotvisualmappingandlocalization:A view-basedneurocomputationalarchitecturethatemulateshippocampalplace learning. NeuralNetworks,7(6/7):1083–1099,1994. 4. J. Banquet, P. Gaussier, M. Quoy, A. Revel, and Y. Burnod. A hierarchy ofassociationsinhippocampo-corticalsystems:cognitivemapsandnavigation strategies. NeuralComputation,17(6),2005. 5. R.Bellman.Onaroutingproblem.QuarterlyofAppliedMathematics,16:87–90, 1958. 6. V. Braitenberg. Vehicles : Experiments in Synthetic Psychology. Bradford Books,Cambridge,1984. 7. J.B.C.Giovannangeli,P.Gaussier.Robotasatooltostudytherobustnessof visualplacecells. InI3M’2005:InternationalConferenceonConceptualModelingandSimulation,Marseille,2005.
142
N.Cuperlier,M.Quoy,andP.Gaussier
8. J. Donnart and J. Meyer. Learning reactive and planning rules in a motivationnally autonomous animat. IEEE Transactions on Systems, Man and Cybernetics-PartB,26(3):381–395,1996. 9. D.FilliatandJ.-A.Meyer.Map-basednavigationinmobilerobots-I.areview oflocalisationstrategies.JournalofCognitiveSystemsResearch,4(4):243–282, 2003. 10. P.Gaussier,A.Revel,J.Banquet,andV.Babeau. Fromviewcellsandplace cellstocognitivemaplearning:processingstagesofthehippocampalsystem. BiologicalCybernetics,86:15–28,2002. 11. C. Joulain, P. Gaussier, and A. Revel. Learning to build categories from perception-action associations. In International Conference on Intelligent Robots and Systems - IROS’97, pages 857–864, Grenoble, France, September 1997.IEEE/RSJ. 12. J.-A.MeyerandD.Filliat. Map-basednavigationinmobilerobots-II.areviewofmap-learningandpath-planingstrategies.JournalofCognitiveSystems Research,4(4):283–317,2003. 13. J.-A.MeyerandS.Wilson.Fromanimalstoanimats.InM.Press,editor, First InternationalConferenceonSimulationofAdaptiveBehavior.BardfordBooks, 1991. 14. J.O’KeefeandN.Nadel.Thehippocampusasacognitivemap.ClarendonPress, Oxford,1978. 15. M.Quoy,P.Gaussier,S.Leprˆetre,A.Revel,C.Joulain,andJ.Banquet.Lecture NotesinArtificialIntelligenceSeries,1812,chapterAplanningmapformobile robots:speedcontrolandpathsfindinginachangingenvironment,pages103– 119. Springer,ISBN3-540-41162-3,2000. 16. M.Quoy,S.Moga,andP.Gaussier. Dynamicalneuralnetworksfortop-down robotcontrol. IEEEtransactionsonMan,SystemsandCybernetics,PartA, 33(4):523–532,2003. 17. A.Revel,P.Gaussier,S.Leprˆetre,andJ.Banquet.Planificationversussensorymotorconditioning:whataretheissues? In FromAnimalstoAnimats:SimulationofAdaptiveBehaviorSAB’98,pages129–138,1998. 18. G.Sch¨ oner,M.Dose,andC.Engels. Dynamicsofbehavior:theoryandapplicationsforautonomousrobotarchitectures.RoboticsandAutonomousSystem, 16(2-4):213–245,December1995. 19. L. Schwartz. Computational anatomy and functional architecture of striate cortex:aspatialmappingapproachtoperceptualcoding. VisionRes.,20:645– 669,1980. 20. N.Tinbergen. Thestudyofinstinct. OxfordUniversityPress,London,1951. 21. E.Tolman. Cognitivemapsinratsandmen. ThePsychologicalReview,55(4), 1948.
ExploitingDistinguishableImageFeaturesin RoboticMappingandLocalization PatricJensfelt1 ,JohnFolkesson1 ,DanicaKragic1 ,and HenrikI.Christensen1 CentreforAutonomousSystems RoyalInstituteofTechnology SE-10044Stockholm,SWEDEN [patric,johnf,danik,hic]@nada.kth.se Summary. Simultaneous localization and mapping (SLAM) is an important researchareainrobotics.Lately,systemsthatuseasinglebearing-onlysensorshave receivedsignificantattentionandtheuseofvisualsensorshavebeenstronglyadvocated.Inthispaper,wepresentaframeworkfor3DbearingonlySLAMusinga singlecamera.Weconcentrateonimagefeatureselectioninordertoachieveprecise localizationandthusgoodreconstructionin3D.Inaddition,wedemonstratehow thesefeaturescanbemanagedtoprovidereal-timeperformanceandfastmatching to detect loop-closing situations. The proposed vision system has been combined withanextendedKalmanFilter(EKF)basedSLAMmethod.Anumberofexperimentshavebeenperformedinindoorenvironmentswhichdemonstratethevalidity andeffectivenessoftheapproach.WealsoshowhowtheSLAMgeneratedmapcan beusedforrobotlocalization.Theuseofvisionfeatureswhicharedistinguishable allowsastraightforwardsolutiontothe“kidnapped-robot”scenario.
1 Introduction It is widely recognized that an autonomous robot needs the ability to build maps of the environment using natural landmarks and to use them for localization, [19, 2, 4, 18, 20]. One of the current research topics related to SLAM is the use of vision as the only exteroceptive sensor, [3, 5, 6, 17, 15] due to the low cost. In this paper, we present a SLAM system that builds 3D maps using visual features using a single camera. We describe how we deal with a set of open research issues such as producing stable and well-localized landmarks, robust matching procedure, landmark reconstruction and detection of loop closing. These issues are of extreme importance when working for example in an EKF setting where the computational complexity grows quadratically with the number of features. Robust matching is required for most recursive formulations of SLAM where decisions are final.
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 143–157, 2006. © Springer-Verlag Berlin Heidelberg 2006
144
P.Jensfeltetal.
Besideslowcost,anotheraspectofusingcamerasforSLAMisthemuch greaterrichnessofthesensorinformationascomparedtothatfrom,forexample,arangesensor.Usingacameraitispossibletorecognizefeaturesbased ontheirappearance.ThiscanthensimplifyoneofthemostdifficultproblemsinSLAM,namelydataassociation.Wedemonstratejusthowpowerful anadvantagethisisbyusingtheSLAMmaptoperformrobotlocalization withoutanyinitialposeestimate. Themaincontributionsofthisworkarei)amethodfortheinitializationof visuallandmarksforSLAM,ii)arobustandprecisefeaturedetector,iii)the management of the measurement to make on-line estimation possible, and iv)thedemonstrationofhowthisframeworkcanfacilitateloopclosingand localization.Duetothelowcomplexity,webelievethatourapproachscales tolargerenvironments.
2 Related Work Bearing only/single camera SLAM suffers from the problem that a single observation of a landmark does not provide an estimate of its full pose. This problem is typically addressed by combining the observations from multiple views as in the structure-from-motion (SFM) approaches in computer vision. The biggest difference between SLAM and SFM is that SFM considers mostly batch processing while SLAM typically requires on-line real-time performance. The most important problem that has to be addressed in bearing only SLAM is landmark initialization, again because a single observation does not allow all degrees of freedom to be determined. A particle filter used to represent the unknown initial depth of features has been proposed in [3]. The drawback of the approach is that the initial distribution of particles has to cover all possible depth values for a landmark which makes it difficult to use when the number of detected features is large. A similar approach has been presented in [9] where the initial state is approximated using a Gaussian Sum Filter for which the computational load grows exponentially with number of landmarks. The work in [10] proposes an approximation with additive growth. Similarly to our work, a multiple view approach has been presented in [6]. This work demonstrates the difficulties related to landmark reconstruction when the robot performs only translational motion along the optical axis. To cope with the reconstruction problem, a stereo based SLAM method was presented in [17] where Difference-of-Gaussians (DoG) is used to detect distinctive features which are then matched using SIFT descriptors. One of the important issues mentioned is that their particle filter based approach is inappropriate for large-scale and textured environments. The contribution of our work is that we deal with this problem using a feature detector that produces fewer features (presented in more detail in Section 4). In our work we use only a few high quality features for performing SLAM.
ExploitingDistinguishableImageFeatures
145
We have also considered another problem raised in [17] related to time consumingfeaturematchinganduseKD-treestomakeourmatchingprocess veryfast.ThevisualfeaturesusedinourworkareHarriscornerfeaturesacross differentscalesrepresentedbyaLaplacianpyramidforfeaturedetection.For featurematching,wetakeacombinationofamodifiedSIFTdescriptorand KD-trees. Inman-madeenvironments,therearemanyrepetitivefeaturesandasingle SIFTdescriptorisnotdiscriminativeenoughinitselftosolvethedataassociationproblem.Todealwiththisproblem,theapproachusing“chunks”ofSIFT pointstorepresentlandmarksinanoutdoorenvironmenthasbeenpresented in[12].ThisismotivatedbythesuccessthatSIFThashadinrecognitionapplicationswhereandobject/sceneisrepresentedasasetofSIFTpoints.Inour approach,thepositionofalandmarkisdefinedbyaseriesofsinglemodified SIFTpointsrepresentingdifferentviewsofthelandmark.Eachsuchpointis accompaniedwithachunkofdescriptorsthatmakethematching/recognition oflandmarksmorerobust.Ourexperimentalevaluationshowsalsothatour approachperformssuccessfulmatchingevenwithanarrowfieldofviewwhich wasmentionedasaproblemin[6],[17]. OneofthemorechallengingprobleminSLAMisloopclosing.In[15]visuallysalientsocalled“maximallystableextremalregions”orMSERs,encoded usingSIFTdescriptors,areusedtodetectwhentherobotisrevisitinganarea. Anumberofapproacheshavebeenpresentedforothersensorymodalities,[7]. Wealsoshowhowourframeworkcanbeusedforthispurpose. Inourwork,adistinctionismadebetweenrecognitionandlocationfeatures. A single location feature will be associated with several recognition features. The recognition features’ descriptors then give robustness to the matchbetweenthelocationfeaturesinthemapandthefeaturesinthecurrentimage.Thekeyideaistouseafewhighqualityfeaturestodefinethe locationoflandmarksandthenusetheotherfeaturesforrecognition.
3 Landmark Initialization In this work, we use feature matching across N frames to determine which points make good landmarks. Features that are successfully matched over a number of frames are candidates for landmarks in the map. Having a buffer of N frames also allows us to calculate an estimate of the 3D position of the corresponding landmark. The SLAM process is fed with data from the output side of the buffer, i.e. with the data from frames that are delayed N steps with respect to the input side of the buffer. Fig. 1 illustrates the idea. Since an estimate of the 3D position of landmarks can be supplied with the first observation of a landmark presented to the SLAM process (see Fig. 1), the landmarks can immediately be fully initialized in the SLAM process. This allows immediate linearization without the need to apply multiple hypotheses [10] or particle filtering [3] techniques to estimate the depth. It is important
146
P.Jensfeltetal. Matching, Selection & 3D estimation Camera
image(k)
....
meas(k−N) 3D info
SLAM
Fig. 1. Features are tracked over N frames in a buffer to allow for quality assessment and feature selection. The 3D position is estimated and landmarks formed from features that are tracked over many frames and are of high quality. The input to the SLAM process is delayed by N frames, but in return an estimate of the 3D position of the points landmarks can be supplied.
to point out that that the approximate 3D position found from the buffer of frames is only used to be able to initialize the point landmark at the correct depth with respect to the camera at the first observation. In a sense it allows us to know which of the multiple hypotheses about the depth is correct right away which of course saves computations. Having the correct depth allows us, as said before, to handle the linearization errors that would results from having a completely wrong estimate of the depth. In our experience having the SLAM output lagging N frames is not a problem. For cases where the current position of the robot is needed, such as when performing robot control, a prediction can be made using dead-reckoning forward in time from the robot pose given by the SLAM process. For typical values of N , the addition to the robot position error caused by the deadreckoning is small and we believe that the benefits of being able to initialize landmarks using bearing-only information and perform feature quality checks are more significant. It is also important to understand that we predict forward the short distance from the SLAM time to the current time in every iteration so the errors from the prediction do not accumulate. When selecting what features to use as landmarks in the map several criteria are considered: i) The feature is detected in more than a predefined number of frames, ii) The image positions of the feature allow good triangulation, and iii) The resulting 3D point is stable over time in the image. The first requirement removes the noise and moving targets. The second removes the features that have a high triangulation uncertainty (small baseline, features with bearings near the direction of motion). The third requirement removes features that lack sharp positions in all images due to parallax or a lack of a strong maximum in scale space. Difference in scales of the images can also cause apparent motion of features, such as for example a corner of a non-textured object. In our implementation the length of the buffer, i.e. the number N is fixed. We have tested with values between 10 and 50. Since one of the key purposes with the buffer is to allow for 3D estimation we demand that the camera has moved to add a new frame to the buffer. This way, it is likely that there is a baseline for estimating the location. The value of N depends very much on the motion of the robot/camera and the camera parameters. For a narrow field
ExploitingDistinguishableImageFeatures
147
ofviewcameramountedinthedirectionofmotionoftherobotasinourcase theeffectivebaselinewillbequitesmall.Aspartofourfutureworkweplan tocomparetheresultsthatwegetfromusinganomnidirectionalcamera.
4 Feature Description It has been shown in [14] that the SIFT descriptor [11] is the most robust regarding scale and illumination changes. The original version of the SIFT descriptor uses feature points determined by the peaks of a series of Difference of Gaussians (DoG) on varying scales. In our system, peaks are found using Harris-Laplace features, [13] since they respond to regions of high curvature, instead of blob-like image structures obtained by series of DoG. This leads to features accurately localized spatially, which is essential when features are used for reconstruction and localization, instead of just recognition. In addition, Harris-Laplace generates on average one fourth of points[8] compared to the regular DoG approach which is an important benefit for SLAM where we strive to keep the number of features low for computational reasons. The original SIFT descriptor assigns canonical orientation at the peak of smoothed gradient histograms. This means that similar corners but with a significant rotation difference can have similar descriptors. In a sparse, indoor environment many of the detected features come from corner features and there may potentially be many false matches. For example, the four corners of the waste bin in Fig. 2 might all match if rotated. Therefore, we use a rotationally ’variant’ SIFT descriptor where we avoid the canonical orientation at the peak of smoothed gradient histogram and leave the gradient histogram as it is.
Fig. 2. Example image from an indoor environment showing that many corner structures are very similar if rotated. Examples are the corners of the waste bin and the window sections on the door.
148
P.Jensfeltetal.
5FeatureTracking AsoutlinedinSection3abufferwiththelastNframesisstoredinmemory. The image data itself does not need to be stored, it is the feature points extracted from the frames that are stored. The feature points are tracked inconsecutiveimageframesastherobotmoves.Thesimilaritybetweentwo featurepointsisthedistancebetweenthedescriptorswhicharevectorsina 128-dimensionalspace.Tomanagethematchingbetweenframes,listswith association/pointsaremaintained.Fig.3showsthebasicorganizationofthis frame memory. On the left we have the buffer with the N frames drawn verticallywiththeinputsideatthetopandoutputsidethatisfedintothe SLAMprocessatthebottom.Ontherightsideofthebufferintheframe memoryweshowtheassociationlistthatkeeptrackofwhichfeaturepoints in the different frames have been matched. Each such association list item correspondstoonelandmarkintheworld. TheSIFTdescriptorofafeaturechangeswhenthecameramoves.Howevergiventhatthefeatureiscontinuouslytrackedthischangeindescriptor istypicallysmallbetweenconsecutiveframesandcanbehandled.Eachassociation/pointintheworld,pi ,willhaveasetofdescriptorsdj corresponding todifferentvantagepointsforeachofthetrackedfeaturesovertheNframes. Theselistscanbeanalyzedtojudgethequalityofthecorrespondinglandmark candidateasdescribedinSection3.Theoutputfromthetrackingmoduleis aselectionofthepointsintheoldestframeinthebuffer.Thesepointscorrespondtoeitheralreadyinitializedlandmarksornewlandmarksofhighquality thatcanbeinitialized.Alongwithnewlandmarksanestimateofthe3Dpositionisprovided.Thisestimateisonlyusedintheinitializationstepandthus onlyhavetobeperformedfornewlandmarks.Thenumberofpointsthatare outputisonlyasmallfractionofallpointsdetectedinaframe. Tomakethematchingprocedurefasterandmorerobust,wepredictthe approximateimagelocationofthefeaturesfrompreviousframesusingodometryandopticalflowestimates.Thebufferallowsustopredicttheposition offeaturesdetectednotonlyinthepreviousframebutalsoinframesbefore that.Wemakeonepredictionforeachpotentiallandmarkintheoldframes, i.e.,weonlyusethelastoffeaturesthathavebeenmatchedbetweenframes. Oneimportantobservationinourinvestigationwasthatevenwithverysmall changesbetweenframesthesamefeatureswouldtypicallynotbedetectedin everyframewhichwouldmeanloosingtrackofmostfeaturesifonlymatching againstthepreviousframe.ThisobservationistruealsowhenusingtheDoG detectorforthefeatures. Usingthepredictionallowsustoreducethesearchwindowwhenlooking formatchesbetweenframes.Featuresthatdonotmatchthepredictedpositionsofpointscurrentlyintheframememoryarematchedtolandmarksin thedatabase.Thisalsoallowsforfastdetectionofpotentialloopclosingsituations.Thefirsttimethelocationofalandmarkhasbeenestablishedthrough triangulationandithasmettheothercriterialistedinSection3,itisadded
ExploitingDistinguishableImageFeatures
149
tothedatabaseasanewlocationfeature.Thisisdiscussedfurtherinthenext section. DATABASE
... ...
... ... ...
... Output frame
Landmarks
FN
... ...
.......
.......
...
F2
...
F1
Recognition
... ...
N
.......
...
Input frame
Location
Associations/ Points
...
FRAMEMEMORY Frames
Fig. 3. A schematic view of the frame memory that stores the points and the associations between points in the N last frames and the database. The point in the frame memory that have been matched or added to the database have a reference to the corresponding landmark there. Each landmark in the database has a set of descriptors that corresponds to location features seen from different vantage points. To validate a match, each of these descriptors keeps a list of the other descriptors found in the same frame. We refer to these as recognition descriptors. These provide the ability to “recognize” a landmark again.
6 Database Management As the robot moves along in the environment, features will leave its field of view and thus eventually also the frame memory described in Section 5. In SLAM, it is important for the robot to detect if it is revisiting an area and find the correspondence between the new features in the image and existing landmarks. In fact, this is an issue not only when revisiting an area after closing a large loop but also when turning the camera back to an area that has not been looked at for a while. To handle this we use a database that stores information about the appearance of the landmarks from different view points. That is, we let the SLAM process take care of estimating the position of the landmarks and leave it to the data base to deal with the appearance and matching of them.
150
P.Jensfeltetal.
Inourdatabase,eachlandmarkhasasetofSIFTdescriptorsthatcorrespondtodifferentvantagepoints.Thesedescriptorsareprovidedbytheframe memorythatmatchestheimagepointsbetweenframesandstoresthisinthe association/pointlists.Anewdescriptorisaddedtoalandmarkwhenitdiffersmorethanacertainthresholdfromallotherdescriptorsattachedtothat landmark.Fig.3showsthestructureofthedatabasewherethelandmarks aredenotedwithF1 ,F2 ,...,FN.Thedashedboxcontainsthedescriptors foreachofthelandmarks.AKD-treerepresentationandaBest-Bin-First[1] search allow for real-time matching between new image feature descriptors andthoseinthedatabase. Theremaybepotentiallymanysimilardescriptorscorrespondingtodifferentfeatures.LookingonceagainattheimageinFig.2thefourcorresponding cornerpointsontheglasswindowsonthelefthandsidedoorsectionwillall beverysimilar.Tryingtofindacorrespondencebetweenimagesfeaturesin anewframeandthemaplandmarkswithabitofuncertaintyinducedbya loopcanbehardbasedonlyonasingleSIFTdescriptor.Therefore,werequire multiplematchestoincreasetherobustness[6].Werefertothesemultipledescriptorsasrecognitiondescriptorsandthecorrespondingimagefeaturesas recognitionfeatures.Thatis,whenmatchingafeaturetothedatabasewefirst lookformatchesbetweenitslocationdescriptor1 andthedescriptorsinthe database.Then,weverifythematchusingthecorrespondingtwosetsofrecognitiondescriptors.Asanadditionaltest,itisrequiredthatthedisplacement inimagecoordinatesforthedescriptorisconsistentwiththetransformation betweenthetwoframesestimatedfromthematchedrecognitiondescriptors. Currently,thecalculationissimplifiedbycheckingthe2Dimagepointdisplacement.Thisfinalconfirmationeliminatesmatchesthatarecloseinthe environmentandthussharerecognitiondescriptorssuchaswouldbethecase withtheglasswindowsinFig.2. Tosummarize,thematchingofnewfeatureswiththedatabaseproceeds asfollows: 1. Findmatchingcandidatesbymatchingwiththesetoflocationdescriptors in the database (dashed box in the database in Fig. 3). The KD-tree representationallowsforfastmatchingandaneffectivewaytoeliminate allbutafewofthepotentiallocationfeaturematchesinthedatabase. 2. Validatethecandidatesusingallextracteddescriptorsfromthecurrent frame,i.e.therecognitionfeatureandtherecognitionfeaturesassociated withthematchesfromstep1. 3. Confirmcandidatebycheckingthatthemotiongivenallthematchesis consistent. 1
Each feature in a new image is a potential location feature and the rest of the features in that frame will be its recognition features. A feature can thus be both a location feature and act as recognition feature for one or more other location features in that frame.
ExploitingDistinguishableImageFeatures
151
7UsingtheLocationFeaturesforSLAMand Localization Wehaveseenintheprevioussectionshowfeaturesaretrackedbetweenframes usingabufferofNframesandhowkeepingthisbufferallowsforjudgingthe quality of potential landmarks and for finding an estimate of the 3D positionofthelandmarksbeforetheyareinitializedinSLAM.WeuseanEKF basedimplementationforSLAMbuttheoutputfromtheframememoryand databasecanbefedintoanynumberofSLAMalgorithms.ThemaindifferencebetweentheEKFimplementationusedhereforSLAMandthestandard formulationisthatwesupplementthefirstbearing-onlymeasurementofanew landmarkwiththeadditionalinformationabouttheapproximatedistanceas determinedthroughthetriangulationintheframebuffer.ThedistanceinformationsnotgivenintheformofanEKFmeasurement,thesearealways onlybearings.ItisusedmorelikesomeoracletoldtheSLAMfilterwherein depthtoinitializeanewlandmark.Asaresultanaccuratelinearizationcan bemadeintheEKFasofthefirstsightingwithouttheneedforanyspecial arrangementtoaccountforanunknowndepthasine.g.[3,10]. Oncethemapisbuilt,itcanbeusedforlocalization.Sincethelandmarks inthedatabasearedistinguishable,itallowstherobottorecognizeareasthat arepartofitsmapusingasinglelandmark.Thustherobotcanautomatically initialize itself in the map after which the map can be used to track the robotpose.Thisissimilartothesituationwhenclosingaloopandhavingto maketheassociationbetweennewmeasurementsandoldlandmarksinthe map.Inthelattercaseonehasaroughideaaboutwheretolookforsucha match.Whenperforminggloballocalizationtheuncertaintyistheentirespace andthesearchformatchescanthusbecomequiteexpensive.Theframework presentedinthisworkallowsforfastmatchingagainstthedatabasewhichwe willfurtherinvestigateintheexperimentalsection. One of the benefits of our representation with multiple descriptors for thelandmarks,eachofthemencodingtheappearancefromacertainvantage point,isthatthisinformationcanbeusedtodeducetheapproximateposition oftherobotfromasinglepointobservation.Toallowforthis,theposefrom whicheachdescriptorwasfirstobservedisstoredalongwiththedescriptor. Theideaisnotthattogetanexactpositionoutofanobservationbutto narrowdowntheareathattherobotislikelytobein. Tofindtheposeoftherobotinthemapweinitializeaparticlefilteras soonasamatchtothedatabaseisfound.Theparticlesareinitiallydistributed aroundtheposeindicatedbythedatabasematch.Theorientationforeach particleintheinitialsamplesetisgivenbythebearingangleoftheobservation.Thiscutstheunconstraineddegreesoffreedomdowntotwo.Given thatthe3Dpositionofthelandmarkisgiven,thatthefloorisflatandthat ameasurementofthebearingtothelandmarkisgivennotonlyinthexyplanebutalsointheverticaldirectionthedistancetothelandmarkcanalso beestimatedfromasinglemeasurement.Inourcurrentimplementationhave
152
P.Jensfeltetal.
notincorporatedthislastbitofinformation.Afterthefirstobservationitwill thenrequireonlyafewotherdatabasematchestoreducethespreadofparticlesenoughtosaythatthepositionisknown.Inourcurrentimplementation weinitializeanormalEKFlocalizationalgorithmatthispoint.
8 Experimental Evaluation The experimental evaluation has been carried out on a PowerBot platform from ActivMedia. It has a differential drive base with two rear caster wheels. The camera used in the experiments is a Canon VC-C4 CCD camera. This camera has built-in pan-tilt-zoom capabilities but in the experiments these were all fixed with a slight tilt upwards to reduce the amount of floor in the image. The non-holonomic motion constraints of the base makes it hard to generate large baselines for triangulation as the motion mostly is along the optical axis. This in combination with the relatively narrow field of view2 contributes to the difficulty of the problem.
Fig. 4. The experimental PowerBot platform with the Canon VC-C4 camera mounted on the laser scanner.
The experimental evaluation is carried out in two steps. In the first step we let the robot build up a database of visual landmarks while feeding them into SLAM to build a map with the 3D position of these landmarks. This test will show that i) our framework produces few but stable landmarks well suited for map building ii) we can match new observation to the database when closing the loop. 2
The field of view of the Canon VC-C4 is about 45◦ in the horizontal plane and 35◦ vertically when the camera is in “wide-angle mode” as in our experiments.
ExploitingDistinguishableImageFeatures
153
Inthesecondsteptherobotisgiventhetasktofinditspositioninthis mapgiventhedatabaseandthemap.Thistestwillunderscoretheabilityto matchobservationstothedatabasewithoutfalsematchesandhighlightthe strengthourrepresentationprovidestoalocalizationsystem.Thesettingfor theexperimentsisanareaaroundanatriumthatconsistsofloopsofvarying sizes. Inthemapbuildingexperimentwelettherobotdrive3lapsaroundapart oftheenvironment,eachlapbeingapproximately30mlong.Thistrajectoryis shownalongwithamapbuiltusingalaserscannerinFig.5(dark“circular” path).Theexperimenttook8minutesand40secondsandthetimetoprocess thedatawas7min7sona1.8GHzlaptopcomputerwhichshowsthatthe systemcanruninreal-timeevenifallunmatchedfeaturesarematchedto thedatabaseineveryprocessedframe.Afterthefirstloopthedatabase/map consistedof98landmarks.Thelandmarksareshownasred/darksquaresin Fig.5.Afterthe3loopswerecompletedthemapconsistedof113landmarks. Thisshowsthattherobotwassuccessfullymatchingmostoftheobservations tothedatabaseduringthelasttwolaps.Otherwiseonewouldexpectthemap toberoughly3timesthesizeafterthefirstloop. Therearetwoimportantcharacteristicstonoteaboutthemapwith3D visuallandmarks.Firstly,therearefarfewerfeaturesthenistypicallyseen inotherworkswhereSIFTlikefeaturesareusedinthemap[17,16].This can be attributed to using only the most stable points features as SLAM landmarksandtherestforrecognition/matchingofthoselandmarks.Realtimeperformancewasnotdemonstratedin[17,16]. Secondly,thelandmarksarewelllocalizedwhichcanbeseenbycomparing withthewallsfromthesuperimposedmapbuiltusingthelaserscanner3 .Some ofthelandmarksarelampshangingfromtheceilingsuchastheoneatthe upperrightcorneroftherobotpathinFig.5.Thislampisalsovisibleinthe upperimageontherightsideofthesamefigure.Theareatotherightinthe mapwithalargenumberoflandmarksisshowninthelowerrightimagein Fig.5.Thisareahasmanyobjectsatdifferingdepths.Inthebackagainst thewallstherearefiveheavilytexturedpaintings.Thelineinthelaserbased mapcomesfromthebenchesthatareinfrontofthewallwhichaccountsfor the line of visual landmarks behind the line made using the laser. Part of thespreadisalsoduetouncertaintyindepthsofthelandmarks.Therobot movedclosetoorthogonaltothewallcreatingverylittlebaselineinthedata fedintoSLAM.Itisworthwhilerepeatingthatthedepthestimatefromthe framememoryonlyservestogettheinitializationofthedepthroughlyright toreducethelinearizationerrorsbutthattherealestimateofthe3Dposition iscalculatethroughtheSLAMprocessusingthebearing-onlymeasurements. Dependingonthescene,atypicalframehasbetween40-100pointfeatures. Thetimetoperformthetrackingoverframeshasconstantcomplexity.Outof 3
Note that the laser based map is only shown for reference. Only vision was used in the experiments
154
P.Jensfeltetal.
Fig. 5. The SLAM map is shown as red squares at the locations of landmarks. To help visualize the environment, a separately made laser scanner map is superimposed on this map. Also shown are the trajectories from when the map was built (3 loops) and when the robot was trying to localize. The feature match between the evening database image image (top) and the daytime localization experiment (middle) is indicated. The bottom image corresponds to the area of the map where the density of landmarks is greatest.
the features in each new frame as many as half typically do not match any of the old features in the frame memory and are thus matched to the database. The matching to the database uses the KD-tree in the first step which makes this first step fast. This often results only in a few possible matching candidates. A typical landmark in the database has around 10 descriptors acquired from different viewing angles. As we noted while building this map, the landmarks inserted into the database during the first loop were then matched and updated on the second two loops with no difficulty. In order to demonstrate the usefulness of this result we then used the map and database in the second experiment to localize the robot at a different time of day, when there was sunlight streaming through the windows and students milling about. When performing localization we do not need to estimate the 3D location of the points and thus the output from the frame memory is shifted to be at the input side, i.e., without the delay. The
ExploitingDistinguishableImageFeatures
155
framememoryisstillusedtotrackthepointsovertimebutnottoverifythe qualitywhichismostimportantwhenbuildingthemap.Therobotwasgiven noinitialposeinformationrelativetothedatabaseandmap.InFig.5the pathoftherobotinthissecondexperimentisoverlayed.Therobotinitially movesaroundinthelowerleftcornerofthefigurebeforeitmovestotheright forsometimeandtheneventuallymovesupintoregionsthatweremapped in the first experiment. The robot moves in unmapped areas for almost 6 minutes. During this time it is constantly trying to find matches between unmatchedfeaturesintheimagetothedatabase.Morethan1,000images areaddedtotheframememoryandmatchedwithoutanyfalsematchesto thedatabase.Whentherobotentersanareawereitcanobservelandmarks from the database/map it almost immediately recognized a landmark and a particle filter consistent with the observation was initialized. In seconds, afterrepeatedobservationsofseverallandmarks,theparticleshadconverged enoughtoinitializetheEKFlocalization.Theevolutionoftheparticlefilter frominitializationtoconvergenceisshowninFig.6infourframesfromleft toright.Inthelastframeinthisfigure,theprecisionofthelocalizationis indicatedbythealignmentofthelaserscan(notusedforlocalization,only forcomparison)andthewallsinthemap.
Fig. 6. Here we show the evolution of the localization particle filter. On the left we see the initial distribution after first match to the database. On the right is shown the robot after initialization in the map. The laser scan is also shown in the right image to help confirm that the localization is indeed correct.
The problems of a narrow field of view for an ordinary camera were overcome in these experiments partly by carefully driving the robot. Better results were observed when the robot was driven so that features remained in view for as long as possible, (notice the rather crooked paths in Fig. 5).
156
P.Jensfeltetal.
9ConclusionandFutureWork Thecontributionsoftheframeworkpresentedherearethefeatureselection andmatchingthatallowsforreal-timevisionbasedbearing-onlySLAM.We distinguishbetweenlocationandrecognitionfeatures.Thelocationfeatures correspondtopointsin3Dforwhichtherobotmotionallowsgoodtriangulationandwhichareusedaslandmarksinthemap.Thematchingismade robustbytheinclusionofmanyrecognitionfeaturesfromtheimageforeach locationfeature. TheuseofHarris-Laplacecornerdetectioncombinedwithascalespace maximizationgivesrotationallyvariantfeatureswhicharemoreappropriate forthecameramotionsgeneratedbyacameramountedonamobilerobot. WeusethreecriteriatoselectfeaturesforSLAM:persistence,rangeestimatequalityandimagestability.Thepersistencecriteriaeliminatesspurious anddynamicfeaturesbyrequiringthatthefeaturebeobservedmanytimes. Thequalityoftherangeestimatedependsonthemotionoftherobotrelative tothe3Dpoint.Ifthemotiondoesnotproduceasufficientbaselinethere isnoreasontousetheassociatedvisionfeature.Thestabilityintheimage dependsonhowwellthefeatureislocalizedintheimage. Theevaluationofourvisionbasedlandmarkswasdoneondatacollected fromarobotmovingthroughaindoorenvironments.Wewereabletoreportno falsematchesandthecreationofaccurate3Dmaps.Wealsoshowedthatthose mapscouldbeusedtolocalizetherobotautomaticallyintheenvironment. Onetopicforfutureresearchistoactivelycontrolthepan-tiltdegreesof freedomofthecameraontherobot.Thiswouldallowtherobottofocuson alandmarkforalongertimeandcreateabetterbaselineandthusamore accuratemap.Anotherwaytoaddresstheproblemwiththelimitedfieldof viewistouseanomnidirectionalcamerawhichisalsopartoftheplanned futurework.
References 1. J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In IEEE CVPR, pages 1000–1006, 1997. 2. J. A. Castellanos and J. D. Tard´ os. Mobile Robot Localization and Map Building: A Multisensor Fusion Approach. Kluwer Academic Publishers, 1999. 3. A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In ICCV, 2003. 4. G. Dissanayake, P. Newman, S. Clark, H.F. Durrant-Whyte, and M. Corba. A solution to the slam building problem. IEEE TRA, 17(3):229–241, 2001. 5. J. Folkesson, Jensfelt P, and H. I. Christensen. Vision slam in the measurement subspace. In IEEE ICRA05, 2005. 6. L. Goncavles, E. di Bernardo, D. Benson, M. Svedman, J. Ostrovski, N. Karlsson, and P. Pirjanian. A visual fron-end for simultaneous localization and mapping. In IEEE ICRA, pages 44–49, 2005.
ExploitingDistinguishableImageFeatures
157
7. J. Gutmann and K. Konolige. Incremental mapping of large cyclic environments.In IEEEInt.SymposiumonComputationalIntelligenceinRoboticsand Automation,volume1,pages318–325,1999. 8. PatricJensfelt,DanicaKragic,JohnFolkesson,andM˚ artenBj¨ orkman.Aframeworkforvisionbasedbearingonly3DSLAM.In Proc.oftheIEEEInternational ConferenceonRoboticsandAutomation(ICRA’06),Orlando,FL,May2006. 9. N.M.Kwok,G.Dissanayake,andQ.P.Ha.BearingonlySLAMusingaSPRT basedgaussiansumfilter. InIEEEICRA05,2005. 10. T.Lemaire,S.Lacroix,andJ.Sol` a. Apractical3Dbearing-onlySLAMalgorithm. InIEEE/RSJIROS,pages2757–2762,2005. 11. DavidG.Lowe.Objectrecognitionfromlocalscale-invariantfeatures.InICCV, pages1150–57,1999. 12. R.H.Luke,J.M.Keller,M.Skubic,andS.Senger. Acquiringandmaintaining abstractlandmarkchunksforcognitiverobotnavigation. InIEEE/RSJIROS, 2005. 13. K.MikolajczykandC.Schmid.Indexingbasedonscaleinvariantinterestpoints. InIEEEICCV,pages525–531,2001. 14. K.MikolajczykandC.Schmid. Aperformanceevaluationoflocaldescriptors. InIEEECVPR,pages257–263,2003. 15. P.NewmanandK.Ho. SLAM-loopclosingwithvisuallysalientfeatures. In IEEEICRA,pages644–651,2005. 16. S.Se,D.G.Lowe,andJ.Little. Mobilerobotlocalizationandmappingwith uncertaintyusingscale-invariantvisuallandmarks. IJRR,21(8):735–58,2002. 17. R.Sim,P.Elinas,M.Griffin,andJ.J.Little.Vision-basedslamusingtheraoblackwellisedparticlefilter.InIJCAIWorkshoponReasoningwithUncertainty inRobotics,July2005. 18. J.D.Tard´ os,J.Neira,P.M.Newman,andJ.J.Leonard. Robustmappingand localizationinindoorenvironmentsusingsonardata. IJRR,4,2002. 19. S. Thrun, D. Fox., and W. Burgard. A probalistic approach to concurrent mapping and localization for mobile robots. Autonomous Robots, 5:253–271, 1998. 20. S. Thrun, Y. Liu, D.Koller, A. Ng, Z. Ghahramani, and H. Durrant-White. SLAMwithsparseextendedinformationfilters. IJRR,23(8):690–717,2004.
Image-Based Visual Servoing for Mobile Robots with Catadioptric Camera Gian Luca Mariottini, Domenico Prattichizzo, and Andrea Cerbella Dipartimento di Ingegneria dell’Informazione, Universit` a di Siena, Via Roma 56, 53100 Siena, Italy. {gmariottini,prattichizzo}@ing.unisi.it Summary. This paper presents an image-based visual servoing strategy for holonomic mobile robots equipped with a catadioptric camera. This kind of vision sensor combines lens and mirrors to enlarge the field of view. The proposed visual servoing is based on the auto-epipolar property, a special configuration which occurs when the desired and the current views undergo a pure translation. This occurrence is detectable from a set of image points, observing when the so-called bi-conics have a common intersection. The proposed visual servoing is divided in two sequential steps. Lyapunov-based stability demonstrates the parametric robustness of the proposed method. Experimental results are presented to show the applicability of our strategy in a real context.
1 Introduction In this paper we propose an image-based visual servoing (IBVS) strategy for the autonomous navigation of a holonomic mobile robot. Both the current and the desired robot configurations are only specified through a set of features points acquired by the on-board omnidirectional camera. Differently to position-based visual servoing (PBVS), in IBVS both the control objective and the control laws are directly expressed in the image domain. As a consequence, such schemes do not need any a priori knowledge of the 3D structure of the observed scene points. Moreover designing the feedback at the sensor level improves the system performances especially when uncertainties and disturbances affect the model of the robot and the camera calibration [8]. However, system convergence can typically be guaranteed only in a neighborhood of the desired configuration. In [10] a hybrid (2-1/2 D) visual servoing approach for articulated manipulators has been presented. It combines IBVS and PBVS features and guarantees convergence in the whole task space. In particular camera rotation and translation between the current and the desired views are directly estimated from the observation of planar objects and then used in the control law.
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 159–170, 2006. © Springer-Verlag Berlin Heidelberg 2006
160
G.L.Mariottini,D.Prattichizzo,andA.Carbella
While most of these works deal with articulated manipulators, there has been an increasing interest in the PBVS control of mobile robots, as in[16,7,2].Oneofthemainproblemsinvisualservoingisthatofkeeping the features in the camera field of view (FOV). This problem is naturally solved if a catadioptric camera is used in the servo loop since it provides panoramicFOV.Catadioptriccamerasconsistofacouplingbetweenmirrors (catoptricelements)andconventionalcameras(dioptricelements)[13].Some PBVSstrategiesusingcatadioptriccamerashavebeenpresented[4,3,17]. In recent years there has been a great interest in the vision-based controlwithcentralcatadioptriccameras.However,someoftheseworksassume knownthecamera-featuredistance(inmeters)[12,1].TheoriginalcontributionofthisworkisthedesignofaIBVSalgorithmforholonomicmobilerobots withacatadioptriccameraonit,thatdoesnotneedforany3-Dscenemetrical informationandthatdoesneedofanymultiple-viewestimationprocess. Ourcontrolalgorithmisbasedontheexecutionoftwosequentialsteps. Thefirstonecompensatestheorientationerror,i.e.,alignstherobottothe desiredconfigurationbyexploitingtheauto-epipolarproperty forpanoramic cameras(seeSect.2).Thesecondstepzeroesthetranslationaldisplacement usingfeaturepointstodrivetherobottothetarget. Thepaperisorganizedasfollows.InSection2,theauto-epipolarpropertyispresented.Section3introducestheimage-basedcontrollaw.Section4 describessomeexperimentalresults.ConclusionsarepresentedinSection5.
2Auto-epipolarConfigurations Thissectionpresentstheauto-epipolarpropertyforpanoramiccameras.The auto-epipolarproperty[6]hasbeenfirstappliedby[15]forvision-basedcontrol withpinholecameras. Consider the case in Fig. 1(a) with a catadioptric camera made of a parabolic mirror centered at f and an orthographic camera. By construction,everyscenepointp∈ IR3 isprojectedontothemirrorsurfaceatx∈ IR3 through the focus f . The image point m (pixels) is then obtained via orthographic projection of x. For a more detailed description of the imaging modelforcatadioptriccamerasthereadercanreferto[5].Considernowtwo panoramicviewsasinFig.1(b),acquiredinfc andfd bythesamecamera andreferredtoas{c} current and{d} desired,respectively.Withoutlossof generality,wechoosetheworldreferenceframecenteredatthedesiredcamera frameinfd .Thisallowsustoexpressallimageentitiesdirectlyinthemirrorreferenceframeontothemirrorsurface,ifnototherwisespecified.Given apairofviewsofascenepointpi ,thereexistsamatrixE ∈ IR3×3 ,called essentialmatrix[6],suchthat xdi TExci =0 where E=tR i=1, ..., N
(1)
IBVS for Mobile Robots with Catadioptric Camera p
image plane
p1
p2 π2
P cc2
X O ym
xc2
xc1
π1 xd2 xd1 cd1
cc1
fd
fc
xm zm
parabolic mirror
161
Cc R,t
cd2
Cd
(b)
(a)
Fig. 1. (a) Imaging model of central catadioptric camera. (b) Basic epipolar geometry setup for central catadioptric cameras.
for all N corresponding mirror points xci and xdi obtained via back-projection of the corresponding image points mdi and mci . t is the skew-symmetric matrix obtained from t. It is an easy matter to show that when R = I, i.e., pure translations, then the essential matrix is skew symmetric, i.e., E = t. Other configurations exist (such as rotations of π around t) providing a skew symmetric essential matrix, as pointed out in [9], but they will not occur in our planar motion case. Definition 1 (Auto-epipolar configurations). The relative configurations between two cameras yielding to a skew-symmetric essential matrix are referred to as auto-epipolar configurations. Remark 1. We now define the new concept of bi-conic, that will allow us to detect the auto-epipolar configurations for catadioptric cameras directly in the image plane and without any estimation. Definition 2 (Bi-conics). Consider two projections of the same scene point p onto two corresponding image points md and mc . Overlap the two images. Let xd and xc be their back-projections to their respective mirror surfaces. The bi-conic is the intersection between the plane ψ (with normal vector n = xd × xc ) and the desired camera mirror1 (see Fig. 2). Proposition 1 (Common intersection line). Consider a set of N corresponding points mdi and mci , i = 1, ..., N and let xdi and xci be their backprojections onto the mirror. Consider the matrix M ∈ IRN ×3 containing all the normal vectors ni = xdi × xci of the planes ψi , namely M = [n1 , ..., nN ]T . If the matrix M is rank 2, then all planes ψi intersect at a common line ∈ IR3 or, equivalently, xdi T ˆxci = 0, ∀ i.
In other terms, 1
is perpendicular to ni , ∀i.
Note that xc and xd belong to the bi-conic.
162
G.L.Mariottini,D.Prattichizzo,andA.Carbella p image plane
md mc
xd catadioptric mirror
fd xc fc
xc
xd n
ψ bi-conic
Fig. 2. Construction of a bi-conic through the superimposition of xc and xd . xc and xd define a plane with normal n that intersects the mirror at the bi-conic.
Proof. If M is rank 2, then there exists a right null vector such that M = 0, namely (xdi ×xci )T = 0 ∀i. This means that each plane ψi contains the vector and since all planes pass through the origin of the desired mirror frame, then there exists a unique common intersection line between all planes. The last equivalence comes directly from the triple scalar product property: the expression (xdi × xci )T = 0 is equivalent to xdi T ˆxci = 0 ∀i. In conclusion, if N different planes ψi all share a common intersection line, then the bi-conics intersect at two common points. The relationship between bi-conics and auto-epipolar configurations is stated in the following theorem. Theorem 1. Consider N corresponding point pairs (mdi , mci ) in two views and let xdi and xci be their back-projections onto the mirror surfaces. Assume T that the matrix A = [a1 , a2 , . . . , aN ] is of rank 8, where ai = (xci ⊗ xdi )T , being the symbol ⊗ the Kronecker product and the resulting vector ai ∈ IR9×1 . The N planes ψi with normal ni = xdi × xci all intersect at only one line if and only if the essential matrix E is skew symmetric. Proof. (⇒) If all planes ψi with normal vector ni = xdi × xci intersect at a T single line then the matrix M = [n1 , ..., nN ] is rank 2. This is equivalent to say that there exist a subspace S of dimension equal to one (i.e. a line) that belongs to all planes. In other terms ∃ ∈ S : (ni )
T
= 0 ∀ i = 1, 2, ..., N.
IBVS for Mobile Robots with Catadioptric Camera
163
w
cc2
y(0) fc
cc1
cc2
y(0)
fc
θ
cc1 cc2
y(t)
t
fc
cc1 v=[vx,vy] cd1 fd
cd1
cd1 fd
x(0)
cd2
fd
x(0)
Starting configuration
x(t)
cd2
cd2
Second Step Translational motion
First Step Auto-epipolar configuration Desired features
No intersection
s(t)
Current features
Fig. 3. The visual servoing algorithm steers the holonomic camera-robot from the starting configuration centered at fc (no common intersection between bi-conics exists), through an intermediate configuration when both views have the same orientation (common intersection exists), towards the position fd (s(t) = 0).
From the triple scalar product property, this is equivalent to xdi T ˆxci = 0 ∀i. Being E unique for the full rank hypothesis on A [9], then E = ˆ, i.e., E is skew symmetric. (⇐) Being rank(E) = 2, then ∃ w = 0 s.t. Ew = 0. If E is skew symmetric, then we distinguish two cases [9]: E1 = t and E2 = teuˆπ (where u = t/||t||). If we consider E1 then the epipolar constraint reads as: xdi T E1 xci = (xdi × xci )T t = 0
∀ i = 1, ..., N.
(2)
It is an easy matter to show that E2 = −t and then the same epipolar constraint of (2) is verified also for E2 . Equation (2) is equivalent to impose that all the planes ψi with normal vector ni intersect at a common line t, i.e., the baseline. Fig. 3 (bottom-left) shows the image plane bi-conics when the planes ψi do not share a common intersection line, i.e., the cameras are not in autoepipolar configuration (Fig. 3 (upper-left)). On the other hand Fig. 3 (bottomcenter ) shows the image plane bi-conics when all planes ψi have a common intersection line and the cameras are in auto-epipolar configuration (Fig. 3 (up-center )). The previous theorem and proposition provide an important tool for recognize when the cameras are in auto-epipolar configuration. It is worthwhile to mention that to build matrix M we only need to know backprojected mirror points xci and xdi that can be retrieved from the image points
164
G.L.Mariottini,D.Prattichizzo,andA.Carbella 0.04 0.035 0.03
Value ξ
0.025 0.02 0.015 0.01 0.005
-150
-100
-50
0 50 Angle θ [deg.]
100
150
2
Fig. 4. Singular value ratio ξ = σσ32 with respect to the current camera orientation variation. ξ resulted to be convex in many simulation results.
mai and mdi without any time consuming estimation process. It is paramount to note that in the case of camera planar motion, the auto-epipolar condition can be checked when at least N = 3 corresponding points are observed [9], also in the case of coplanar 3D scene points. To test if M is rank deficient we exploit the smallest singular value σ3 obtained by means of SVD of M . As the displacement approaches to a pure translation, then the smallest singular value approaches to zero. In practical cases, however, we experimentally verified that better results are obtained if the smallest singular value is squared and normalized with respect to the second singular value σ3 2 . (3) ξ= σ2 Ifthetwocamerasarepurelytranslatedthenξisexactlyzero,asshownin Fig.4. Remark2.Function ξ will be used as a control variable for the rotational partoftheproposedvisualservoing.Notethatthederivativeofξisanodd functioninaverylargeneighborhoodofθ=0◦ (Fig.4).
3TheImage-BasedControlLaw Inthissectionwepresentanimage-basedvisualservoingcontrollawforholonomicmobilerobotsequippedwithacatadioptriccamera.Thiscontrollaw (1ststep)isbasedontheauto-epipolarpropertyforpanoramiccameraspresentedintheprevioussection. Letq(t)=[x(t)y(t)θ(t)]T betherobotconfigurationvectorwithrespect totheworldframe Oxyz w .Theholonomicrobotkinematicmodelcanbe writtenas
IBVS for Mobile Robots with Catadioptric Camera
x˙ = vx y˙ = vy θ˙ = ω
165
(4)
wherev=[vx vy ]T isthetranslationalvelocityand ωistherotationalvelocity. Supposethatacatadioptriccameraisfixedonthemobilerobotsuchthatits opticalaxisisperpendiculartothexy-plane. Wesupposethatatargetimage,referredtoasdesired,hasbeenpreviouslyacquiredinthedesiredconfigurationqd =[000]T.Thecurrent image isavailableateachtimeinstantfromthecamera-robotinthecurrentposition.AsshownintheinitialsetupofFig.3(upper-left),therobotdisparity betweenthecurrentandthedesiredposesischaracterizedbyarotationof anangleθandatranslationt∈ IR2 .Thecontrollawwillbeabletodriveto zerothisdisparityonlyusingvisualinformation(i.e., θ→ 0and ||t|| → 0).In particular,wewillregulateseparatelytherotationaldisparity(firststep)and thetranslationaldisplacement(secondstep). 3.1 FirstStep:RotationCompensation Thegoalofthefirststepistodrivethecurrentcamera-robottohavethesame orientationofthedesiredone.Thisconditionissimplydetectablewhenξ=0 in(3).Duringthisstep,thetranslationalvelocitiesaresettozerovx =vy =0, whiletheangularvelocityissetto ω=−αω σs (t)ξ
(5)
˙ and ξ is the control variable where αω is a positive scalar, σs (t)=sign(ξ) introduced in (3). The local asymptotic convergence of this control law has been proven in [11] by means of the Lyapunov function V = 21 θ2 .Althoughtheconvergenceislocal,thedomainofattractionforθ iswide.Manysimulativeand experimentalresultsshownthatconvergenceisguaranteedwithrespecttoθ inaverylargesetofangles([−150◦ ,150◦ ])aroundtheequilibriumθ=0◦ . 3.2 SecondStep:TranslationCompensation Let us now address the second step that will start when ||ξ(t)|| = 0, i.e., when only a pure translation is needed to reach the target configuration. Actually,intheexperiments,auser-definedthresholdεisconsidered.During thistranslationalmotiontothedesiredposition,allbi-conicsCdi intersectand thecorrespondingfeaturepointsmci willslideontotheconicCdi towardmdi . Thetranslationwillstopwhenallfeaturesmatch,i.e.,whens(t)= si (t) is equal to zero, being si (t) the arc-length of the conic Cdi from mci (t) to mdi (t)(seeFig.3bottom-right).Thedirectionoftranslationtothedesired configurationfd canberetrievedfromtheknowledgeoftheepipolecd pointing
166
G.L.Mariottini,D.Prattichizzo,andA.Carbella
towardfc ,chosenastheintersectionbetweenbi-conics.In[11]ithasbeen shownthat v=−s(t)[ cdx cdy]T. (6) guaranteestherobotconvergestothetargetconfiguration.
4 Experiments In order to validate the proposed image-based visual servoing, a series of experiments have been carried out at the SIena Robotics and Systems lab (SIRS lab). The experimental setup consists of a NOMAD XR4000 holonomic robot with a catadioptric camera mounted on. The catadioptric camera is made of a CCD color camera LU175C by Lumenera screwed on a folded catadioptric mirror NetVision360 by Remote Reality. The whole catadioptric camera can be modeled by the equation x2 +y 2 zm + a2 = m2a m (Fig. 1(a)) where a = 33.4 mm. The orthographic camera parameters are fx = 13 mm, fy = 14 mm, u0 = 616.3 and v0 = 628.2 pixels. All image acquisition and processing algorithms are realized using Intel’s OpenCV libraries on a notebook endowed with a 2 GHz Pentium 4 processor. A set of n = 3 feature points have been selected in the image in order to show the applicability of our technique with the theoretically minimum number of points. This implies that we have three bi-conics on the image plane as shown in Fig. 5. The initial configuration of the camera-robot is set to qini = [1 m 1 m π/4 rad] while the desired one is set to qdes = [0 0 0] as shown in Fig. 5(a) where the desired and the initial robot configurations are superimposed. In the initial configuration being the orientation not equal to zero bi-conics do not intersect at the same two points (Fig. 5(b)). First, the rotational controller in Sect. 3.1, based on the auto-epipolar property, is used to compensate the rotation (see Fig. 5(c)) with respect to the desired configuration, until all bi-conics intersect, as in Fig. 5(d). Note that in real experiment, notwithstanding image noise and unmodeled dynamics, variable ξ is very close zero at the end of the first step as shown in Fig. 6(a). After the first step, the translation discussed in Sect. 3.2 is performed. The robot moves along fc fd and its velocity decreases as the distance si (t) get close to zero (see Fig. 6(b)). The overall performance of the visual servoing strategy was interesting. In fact, the final configuration differs from the desired one of only 1.96cm corresponding to an average distance of about 5 pixels between final and target feature points. The complete motion of the feature points is shown in Fig. 7. Note that the trajectory discontinuity of the features is due to the change of the control law between the first and the second step. Note also that the perfect bi-conic intersection (i.e., equal orientation to the desired position) is lost during the second step (compare Fig. 5(f) with
IBVS for Mobile Robots with Catadioptric Camera
167
desired position
initial position
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 5. Image-based visual servoing: 2-steps algorithm. (a) Initial and desired robot positions; (b) When a rotational disparity occurs then all bi-conic do not intersect; (c) Rotation compensation after the 1st step; (d) After the 1st step all bi-conics intersect at the same two points; (e) Translational motion to the target position; (f) Current feature points (dot) reach the desired ones (cross).
Fig. 5(d)). The main reason of this error is the slippage of the wheels during the pure translation. In order to improve performances and to increase robustness with respect to wheels slippage and other unmodeled dynamics and errors, we implemented an iterative procedure as in [14] to robustly perform the proposed visual servoing procedure. Experiments have been executed also with this iterative implementation and more details are in [11]. Globally, the iterated-2steps algorithm exhibits good performances for the robot motion, and the positioning error between the final and the desired camera-robot configurations decreases from 2cm to 0.5cm (iterative).
168
G.L.Mariottini,D.Prattichizzo,andA.Carbella -6
x 10 4
250
3.5 200
3 s [pixels]
Csi
2.5 2
150
100
1.5 1
50
0.5 0
5
10
15
20
0
Steps
(a)
50
100
150
200 Steps
250
300
350
(b)
Fig. 6. Visual servoing (2-steps). (a) First step: Parameter ξ(t) decreases (rotation compensation); (b) Second step: Distance s(t) between corresponding feature points measured along bi-conics decreases.
Fig.7.Visualservoing.Motionofthefeaturespointsintheimageplanefromthe initial(circle)tothedesiredimage(cross).
5ConclusionsandFutureWork Inthispaperwepresentedanimage-basedvisualservoingforholonomicmobilerobotequippedwithacatadioptriccamera.Thisvisionsensorcombines
IBVS for Mobile Robots with Catadioptric Camera
169
lens and mirrors to enlarge the field of view. The main feature of our technique consists in exploiting the auto-epipolar property, a special configuration which occurs when the desired and the current views undergo a pure translation. This condition can be detected directly in the image plane, without requiring any estimation and it has been used to compensate the rotation disparity between initial and desired image (1st step). Then a feature-based translational step (2nd step) is needed to drive the robot to the target position. Experimental results on a real robot validate the applicability of our image-based strategy. A current hypothesis of this work is that camera calibration parameters are know. Future work will deal with the extension to uncalibrated catadioptric cameras.
References 1. J.P. Barreto, F. Martin, and R. Horaud. Visual servoing/tracking using central catadioptric images. In 8th International Symposium on Eperimental Robotics, pages 863–869, 2002. 2. F. Conticelli, B. Allotta, and P. K. Khosla. Image-based visual servoing of nonholonomic mobile robots. In 38th IEEE Conference on Decision and Control, 1999. 3. N. Cowan, O. Shakernia, R. Vidal, and S. Sastry. Vision-based follow the leader. In IEEE Intelligient Robots and Systems, pages 1797–1801, 2003. 4. J. Gaspar, N. Winters, and J. Santos-Victor. Vision-based navigation and environmental representations with an omnidirectional camera. In IEEE Transactions on Robotics and Automation, volume 16, pages 890–898, 2000. 5. C. Geyer and K. Daniilidis. A unifying theory for central panoramic systems. In 6th European Conference on Computer Vision, pages 445–462, 2000. 6. R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, 2000. 7. K. Hashimoto and T. Noritsugu. Visual servoing of nonholonomic cart. In IEEE International Conference on Robotics and Automation, pages 1719–1724, 1997. 8. S. A. Hutchinson, G. D. Hager, and P. I. Corke. A tutorial on visual servo control. IEEE Transaction on Robotics and Automation, 12(5):651–670, 1996. 9. Y. Ma, S. Soatto, J. Koˇseck´ a, and S. Shankar Sastry. An invitation to 3-D Vision, From Images to Geometric Models. Springer Verlag, New York, 2003. 10. E. Malis, F. Chaumette, and S. Boudet. 2-1/2-D visual servoing. IEEE Transactions on Robotics and Automation, 15(2):238–250, 1999. 11. G.L. Mariottini and D. Prattichizzo. Image-based visual servoing for holonomic robots with central catadioptric camera. In Internal Report, http://www.dii.unisi.it/~sirslab/MaPr_IR05.pdf, September 2005. University of Siena. 12. Y. Mezouar, H. Haj Abdelkader, P. Martinet, and F. Chaumette. Central catadioptric visual servoing from 3d straight lines. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, volume 1, pages 343–349, 2004. 13. S.K. Nayar. Catadioptric omnidirectional camera. In International Conference on Computer Vision and Pattern Recognition, pages 482–488, 1997.
170
G.L.Mariottini,D.Prattichizzo,andA.Carbella
14. G. Oriolo and M. Vendittelli. A framework for the stabilization of general nonholonomicsystemswithanapplicationtotheplate-ballmechanism. IEEE TransactionsonRobotics,21(2):162–175,2005. 15. J. Piazzi, N. Cowan, and D. Prattichizzo. Auto-epipolar visual servoing. In Proc.IEEEConferenceonIntelligentRobotsandSystems,2004. 16. R.Pissard-GibolletandP.Rives.Applyingvisualservoingtechniquestocontrol amobilehand-eyesystem.InIEEETransactionsonRoboticsandAutomation, pages166–171,1995. 17. K.Usher,P.Ridley,andP.Corke. Visualservoingforacar-likevehicle-An applicationofomnidirectionalvision. In2003IEEEInternationalConference onRoboticsandAutomation,pages4288–4293,2003.
Multi-knowledgeApproachforMobileRobot IdentificationofaChangingEnvironment QingXiangWu12 ,T.M.McGinnity1 ,GirijeshPrasad1 ,DavidBell2 and BrendanGlackin1 1
SchoolofComputingandIntelligentSystems,UniversityofUlsteratMagee, Londonderry,BT487JL,N.Ireland,UK[email protected] 2 DepartmentofComputerScience,QueensUniversity,Belfast,UK [email protected]
Summary. Inthispaper,alargeenvironmentisdividedintosub-areastoenable a robot to apply precise localization technology efficiently in real time. Sub-area featuresarerepresentedinafeatureinformationsystemsothatconventionalmachinelearningordataminingapproachescanbeappliedtoidentifythesub-areas. However,conventionalrepresentationswithasinglebodyofknowledgeencounter manyproblemswhenthesub-areafeaturesarechanged.Inordertodealwithchangingenvironments,themulti-knowledgeapproachisappliedtotheidentificationof environments.Multi-knowledgeisextractedfromafeatureinformationsystemby means of multiple reducts (feature sets) so that a robot with multi-knowledge is capableofidentifyinganenvironmentwithsomechangingfeatures.Acase-study demonstratesthatarobotwithmulti-knowledgecancopebetterwiththeidentificationofanenvironmentwithchangingfeaturesthanconventionalsinglebodyof knowledge.
1 Introduction Current robot localization techniques include position tracking techniques [1, 2, 3], global localization techniques [4], Markov localization [5, 6, 7], landmark localization [8, 9, 10], and visual-based localization [11, 12, 13, 14]. These techniques aim to estimate the position of a robot within a given environment by means of sensor data and a map of environment. In order to estimate a robot’s position with a high precision, a fine-grained discretization in the state space is usually applied. The spatial resolution is usually between 5 cm and 50 cm and the angular resolution is usually 2 to 5 degrees. For a mediumsized environment of 40 x 40 m2 , an angular grid resolution of 2o , and a cell size of 10 x 10cm2 the state space consists of 28,800,000 states. If a map of an environment is very large, localization techniques encounter a high cost in computation, and are very difficult to use in real time. In this paper, a strategy is proposed to address this problem. A large map may be divided into area
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 171–180, 2006. © Springer-Verlag Berlin Heidelberg 2006
172
Q.Wuetal.
maps, and a robot can apply conventional machine learning approaches to recognizeareaenvironmentfeatures,forexample,inabuilding,inaroom, orinacarpark.Therefore,therobotcarriesoutapreciselocalizationina smallareausingthespecificinformationontheareaenvironmentsuchasan electronicmap.However,conventionalsinglebodyofknowledgeencounters problems in a changing environment. A novel multi-knowledge approach is proposedtosolvethisproblem.InSection 2,anenvironmentfeaturedecision system is used to divide a large map into sub-maps so that conventional machine learning [15] and data mining methods [16, 17] can be applied to identify the environment. The problems with conventional approaches are analyzedinSection 3.InSection 4themulti-knowledgerepresentationis defined and it is demonstrated that a robot with multi-knowledge has the abilitytoidentifyitsenvironmentevenifsomefeaturesoftheenvironment changed.TheconclusionisgiveninSection 5.
2 Representation of Environment Features 2.1 Environment Features In order to illustrate the application of multi-knowledge to robot identification of a changing environment, suppose that there is a robot in a building with six rooms shown in Fig. 1. Rooms from No.1 to No.3 are offices. The area marked 4 is the corridor and rooms No.5 and No.6 are laboratories. Features of the environment are selected from those that can be detected by the sensors in the robot. In this case, the features of the environment are Area, GroundColour, WallColour, CeilingHeight, and Illumination. After the robot had gone through all the rooms in the building, the environment feature information system was obtained. The colours are obtained from a colour camera. Illumination is detected by photosensitive sensors. Distances are detected by distance detectors.
Fig. 1. A robot in a building with six rooms
Mobile Robot Identification of a Changing Environment
173
Fig. 2. A robot detects a room area
Area is calculated from the distance data. As shown in Fig. 2, let r(θ) represent distance from the robot to a wall along the direction θ. The distance r(θ) can be detected by distance detectors in the robot. The area can be calculated as follows. s=
1 2
2π
r2 (θ)dθ
(1)
0
For simplicity, r(θ) is represented by a set of distances r0 , r1 , r2 , ..., rN with angular interval Δθ = 2π/N . The area between rn and rn+1 can be calculated by using the small triangle if Δθ is very small. 1 rn rn+1 Δθ 2 The room area can be calculated by sn =
s=
N −1 n=0
sn =
N −1 n=0
1 rn rn+1 Δθ 2
(2)
(3)
wherer 0 =r N =r(θ=0).Notethattherobotcandetectthearea withthisformulaatanypointintheroom.Thisisanapproximateformula. TheerrordependsontheangularintervalΔθis.Theotherfeaturescanbe detectedbythecorrespondingsensorsintherobot.Anexampleenvironment informationsystemisshowninTable 1. 2.2 EnvironmentFeatureInstanceSystem Following[17],letI=representan instancesystem,whereU= u1 ,u2 ,...,ui ,...,u|U | isafinitenon-emptyset,calledaninstancespaceor universe,andwhereui iscalledaninstancein U .A= a1 ,a2 ,a3 ,...,ai ,..., a|A| ,alsoafinitenon-emptyset,isasetofattributesoftheinstances,whereai
174
Q.Wuetal.
Table1.RoomFeatureDecisionSystem S:Area,GC:GroundColor,WC:WallColor, CH:CeilingHeight,B:Illumination,R:Room U S GC
WC
CH B R
1 2 3 4 5 6
yellow white white white yellow white
2.6 2.6 2.6 2.8 3.0 3.0
9 12 9 13 15 15
yellow yellow blue gray gray yellow
1 2 1 2 3 3
1 2 3 4 5 6
isanattributeofagiveninstance.Disanon-emptysetofdecisionattributes, and A ∩ D = 0. For every a ∈ A there is a domain, represented by Va , and there is a mappinga(u):U → Va fromU intothedomainVa ,wherea(u)represents thevalueofattributeaofinstanceuandisavalueinthesetVa . Va =a(U )=a(u):u∈ U f ora∈ A
(4)
Foradecisionsystem,thedomainofdecisionattributeisrepresentedby Vd =d(U )=d(u):u∈ U f ord∈ D
(5)
Forexample,Table 1isregardedasanenvironmentfeaturedecisionsysteminwhichtheinformationisdetectedbyarobotthathasgonethrough alltheroomsinthebuildingshowninFig 1.TheattributesetA=Area, GroundColor,WallColor,CeilingHeight,Illumination.Thedecisionattribute isD=Room.AccordingtoEquation(4),VArea =9,12,13,15.So|VArea | =4. Byanalogy,wehave |VGround | =3,|VW all | =2.|VCeiling | =2, |VLight | =3, andthesizeofdecisionattributedomain|VRoom | =6. 2.3 ConditionVectorSpace Theconditionvectorspace,whichisgeneratedfromattributedomainVa ,is denotedby V×A = × Va =Va1× Va2× ...× Va|A| a∈A
(6)
andthesizeofthespaceis |V×A | =
|A| i=1
|Vai |
(7)
Mobile Robot Identification of a Changing Environment
175
For Table 1, |V×A | =
|A|
|Vai | = |VArea | × |VGround | × |VW all | × |VCeiling | × |VLight | (8)
i=1
|V×A | =4× 3× 2× 2=48
(9)
A(u)=(a1 (u),a2 (u,....,a ) |A| (u))
(10)
A(U )={A(u) : u∈ U }
(11)
V×D = × Vd = Vd1 × Vd2× ... × Vd|D|
(12)
This means that there are 48 condition vectors in the condition vector space|V×A |.Everyconditionvectorcorrespondstoacombinationofattribute values.Everyinstancecorrespondstoavectorinthevectorspacebyitsattributevalues.
Forexample,A(Room1)=(9,yellow,yellow, 26 . , 1).IfA(u)=A(v)for u∈ U andv∈ U ,instanceuandinstancevhavethesameconditionvector. Instanceuandinstancevareindiscernible.A(U )representsasetofvectors whichexistinthedecisionsystem.
If |A(U )| = |V×A|, the system is called a complete instance system or completesystem.Intherealworld,trainingsetsfordecision-makingorfeatureclassificationarerarelycompletedsystems.Table 1hasonly6existing conditionvectors,whilethesizeofconditionspaceis |V×A | =48.ThusTable 1isnotacompleteinstancesystem. Thedecisionvectorspaceisrepresentedby d∈D
Vd isequivalenttoV×D incasesinwhichthereisonlyonedecisionattribute. 2.4 ConventionalKnowledgeRepresentations Knowledgecanberepresented inmanyformssuchasrules,decisiontrees, neuralnetworks,Bayesianbeliefnetworksandothermethods[15].Ingeneral, knowledgecanbedefinedasamappingfromconditionvectorspacetodecision space. ϕA :V×A → V×D
(13)
V×B = × Va = Va1 × Va2× ... × Va|B|
(14)
LetB ⊂ A.Wehave
a∈B
and
176
Q.Wuetal.
|V×B | =
|B| i=1
|Vbi |
(15)
|V×B | is called a subspace of |V×A |. The conventional approach in machine learning or data mining is to select a “best” subset B to get a mapping ϕB based on the subset B. ϕB : V×B → V×D
(16)
ϕB is then applied instead of ϕA to make decisions or in feature classification. Different mappings can be obtained because the training set is an incomplete system. For example, the decision tree, which is shown in Figure 3, is obtained from Table 1 by means of information entropy.
Fig. 3. Knowledge ϕB represented by a decision tree
The advantage of this approach is that it enables a robot to identify the rooms in the building with only two attributes – Area and Ground.
3ProblemswithSingleKnowledge A single body of knowledge encounters problems in cases of dealing with changingattributes,incompleteattributes,andunseeninstances.Issuesarise whenusingasinglebodyofknowledgeinachangingenvironment.ForexamplesupposethatthefeaturesoftheroomsinTable 1havebeenchanged. ThenewfeaturesareshowninTable 2. Using the decision tree in Fig. 3 consider a robot that goes through theroomswiththechangedfeaturesasshowninTable 2.Whentherobot entersroom1,itobtainstheconditionvector(9,yellow,yellow,2.6,2).As the decision tree only tests Area and GroundColor, it does not care that Illuminationhaschanged.
Mobile Robot Identification of a Changing Environment
177
Table 2. Changed Room Feature Values S:Area,GC:GroundColor, WC:WallColor, CH:CeilingHeight, B:Illumination, R:Room US
GC
WC
CH
B
R
1 2 3 4 5 6
yellow yellow brown(blue) blue(gray) gray yellow
yellow white white yellow(white) yellow white
2.6 2.8(2.6) 2.9(2.6) 2.8 3.0 3.0
2(1) 2 3(1) 2 3 3
1 2 3 4 5 6
9 12 9 13 14(15) 13(15)
By using the decision tree, the robot can determine that it is in room 1. For the other instances from U2 to U6, the results are listed as follows. U2: (12, yellow, white, 2.8, 2) → decision tree → Room2. U3: (9, black, white, 2.9, 3) → decision tree → Unknown. U4: (13, blue, yellow, 2.8, 2) → decision tree → Room4. U5: (14, gray, yellow, 3.0, 3) → decision tree → Unknown. U6: (13, yellow, white, 3.0, 3) → decision tree → Room4. In many cases, the decision tree will give a wrong answer or cannot give any answer. This problem is not only encountered by decision trees, but other methods as well when using single body representations.
4EnvironmentIdentificationBasedon Multi-knowledge Amulti-knowledgeapproachcanbeappliedinthemachine-learningdomain tosolve different problems.For example acombination ofmulti-knowledge andtheBayesClassifierhavebeenusedsuccessfullytoimprovetheaccuracy ofaclassificationsystem[18].Hereweapplytheconceptofmulti-knowledge toarobottoidentifythechangedenvironment.Asinglebodyofknowledge representationusuallydoesnotneedallattributesinaninstancesystem.For example,thedecisiontreeinFigure 3containsonlytwoattributes-Areaand GroundColor.Thisattributesetiscalledareduct.Ingeneral,therearemany reductsinadecisionsystem.Inaconventionaldataminingmethod,firstly,a goodreduct(orasetofgoodfeatures)isselectedandthenextractedtoforma singlebodyofknowledgebasedonthisreduct.Themulti-knowledgeapproach however encourages finding as many reducts as possible. Every reduct can contribute to a single body of knowledge. A set of these single bodies of knowledgeiscalledthemulti-knowledge.Definition:Givenadecisionsystem I =.Multi-knowledgeisdefinedas Φ={ϕB |B∈ RED}
(17)
178
Q.Wuetal.
whereϕB isamappingfromtheconditionvectorspaceV×B tothedecision spaceV×D .REDisasetofreductsfromthedecisionsystem.ReductsRED can be found by the algorithm in [18]. For Table 1, there are 5 reducts. RED={{Area, Wall}, {Area, Ground}, {Ground, Illumination}, {Ground, Ceiling,Wall},{Wall,Illumination,Ceiling}} Applyingthese5reducts,5singlebodiesofknowledgecanbeobtained. Forexample,letreductB={Area,Wall}.Theexistingconditionvectorspace is A{Area,W all} (U )={(9,yellow),(12,white),(9,white), (13,yellow),(15,yellow),(15,white)}
(18)
ExtractingrulesinthisconditionvectorspacefromTable 1andgeneralizingtherules,wehave
ϕ(Area,W all)
Room1 Room2 Room3 = Room4 Room5 Room6 U nknown f or other cases
if (Area, W all) = (9, yellow) if (Area) = (12) if (Area, W all) = (9, white) if (Area) = (13) if (Area, W all) = (15, yellow) if (Area, W all) = (15, white)
(19) where ϕ(Area,W all) is presented by using the existing condition vector space A(Area,W all) instead of V×(Area,W all) because the decision system is not a complete system. ϕ(Area,Ground) , ϕ(Ground,Illumination) , ϕ(Ground,Ceiling,W all) and ϕ(Illumination,Ceiling,W all) can be obtained by analogy. Every instance may get multiple decisions from multi-knowledge. In order to merge these decisions, a decision support degree, which is denoted by Sup(di ), is defined by a probability as follows. Sup(di ) = P (di |di = ϕB )
f or ϕB ∈ Φ
(20)
where di ∈ Vd is a decision in the decision space, ϕB is a single body of knowledge among the multi-knowledge Φ. The final decision is made by dF inal = arg max Sup(di )
(21)
di ∈Vd
Now consider a robot has multi-knowledge Φ extracted from Table 1 and it enters the changed environment shown in Table 2. For instance U3 in Table 2, ϕ(Area,W all) ϕ(Area,Ground) ϕ(Ground,Illumination) ϕ(Ground,Ceiling,W all) ϕ(Illumination,Ceiling,W all)
= ϕ(9,white) = ϕ(9,brown) = ϕ(brown,3) = ϕ(brown,2.9,white) = ϕ(3,2.9,white)
= = = = =
Room3 U nknown U nknown U nknown U nknown
(22)
Mobile Robot Identification of a Changing Environment
179
So Sup(Room1) = 0, Sup(room2) = 0, Sup(Room3) = 1, Sup(Room4) = 0, Sup(Room5) = 0, Sup(Room6) = 0. The final decision is as follows. dF inal = arg max Sup(di ) = Room3
(23)
di ∈Vd
It is possible, according to this algorithm, to calculate dF inal for all the instances in Table 2, The results are shown in Table 3. The results show that multi-knowledge can cope with a changing environment much better than any single body of knowledge. Table 3. Results for multi-knowledge to identify the changed environment Φ
U1
U2
U3
U4
U5
U6
ϕ(Area,W all) ϕ(Area,Ground) ϕ(Ground,Illum)
Room1 Room1 Room2 Room1 Unknown Room1
Room2 Room2 Room2 Unknown Unknown Room2
Room3 Unknown Unknown Unknown Unknown Room3
Room4 Room4 Unknown Unknown Unknown Room4
Unknown Unknown Room5 Room5 Room5 Room5
Unknown Unknown Room6 Room6 Room6 Room6
ϕ(Ground,Ceiling,W all) ϕ(Illum,Ceiling,W all) dF inal
5 Conclusion In this paper the problem of a robot identifying its location in a changing environment has been considered. A large environment is divided into small area environments. An instance information system is applied to represent environment features of the sub area environments. Conventional machine leaning approaches are applied to identify sub area environments. The problem for identifying a changing environment was analyzed, where a robot using conventional machine leaning mechanisms finds it difficult to solve the problem. In contrast a multi-knowledge approach was proposed to solve the problem. A case-study was presented to demonstrate that a robot with multi-knowledge copes with a changing environment much better than a conventional singlebody knowledge representation.
References 1. Leonard JJ, Durrant-Whyte HF (1991) Mobile robot localization by tracking geometric beacons, IEEE Transactions on Robotics and Automation, 7(3): 376−382. 2. Feng L, Borenstein J, Everett HR (1994) ”Where am I?” Sensors and methods for autonomous mobile robot positioning. Technical Report UMMEAM9412, University of Michigan, Ann Arbor, MI.
180
Q.Wuetal.
3. Jensfelt P, Kristensen S (2001)Activegloballocalization fora mobilerobot usingmultiplehypothesistracking,IEEETransactionson RoboticsandAutomation,Volume:17(5):748−760. 4. Dissanayake MW, Newman MG, Clark S, Durrant-Whyte, HF, Csorba M (2001)Asolutiontothesimultaneouslocalizationandmapbuilding(SLAM) problem.IEEETransactionsonRoboticsandAutomation,17(3):229−241. 5. Fox D, Burgard W, and Thrun S (1999) Markov Localization For Mobile Robots in Dynamic Environments. Journal of Artificial Intelligence Research,11:391−427. 6. FoxD,BurgardW,KruppaH,andThrunS(2000)Aprobabilisticapproach tocollaborativemulti-robotlocalization.AutonomousRobots,SpecialIssueon HeterogeneousMulti-RobotSystems,8(3):325−344. 7. ThrunS(2001)Aprobabilisticonlinemappingalgorithmforteamsofmobile robots.InternationalJournalofRoboticsResearch,20(5):335−363. 8. ThrunS,(1998)BayesianLandmarkLearningforMobileRobotLocalization. MachineLearning,33(1):41−76. 9. ShimshoniI(2002)Onmobilerobotlocalizationfromlandmarkbearings,IEEE TransactionsonRoboticsandAutomation,V18(6):971−976. 10. BriechleK,HanebeckUD(2004),Localizationofamobilerobotusingrelative bearingmeasurements;IEEETransactionsonRoboticsandAutomation,20(1): 36−44. 11. SeS,LoweD,LittleJ(2001)Vision-basedmobilerobotlocalizationandmappingusingscale-invariantfeatures.InProceedingsoftheIEEEInternational ConferenceonRoboticsandAutomation(ICRA),pp.2051-2058,Seoul,Korea. 12. NiallW,JosSV(2002)InformationSamplingforvision-basedrobotnavigation, RoboticsandAutonomousSystems,41(2-3):145−159. 13. AyacheN(1991)ArtificialVisionforMobilerobots-Stereo-visionandMultisensorPerception,MIT-Press,page342. 14. KiyosumiK,JunM,YoshiakiS(2002)Autonomousvisualnavigationofamobilerobotusingahuman-guidedexperience,RoboticsandAutonomousSystems,40(2-3):121−130. 15. MitchellMT(1997)MachineLearning,McGrawHill,Co-publishedbytheMIT PressCompanies,Inc. 16. Lin TY, Cercone N(eds) (1997) Rough Sets and Data Mining: Analysis for ImpreciseData,Boston,Mass;London:KluwerAcademic. 17. PolkowskiL,TsumotoS,LinTY(2000)RoughSetMethodsandApplications, NewDevelopmentsinKnowledgeDiscoveryinInformationSystems,PhysicaVerlag,ASpringer-VerlagCompany. 18. WuQX,BellDA,McGinnityTM(2005)Multi-knowledgeforDecisionMaking, InternationalJournalofKnowledgeandInformationSystems,Springer-Verlag LondonLtd.7(2):246−266.
RobustMonte-CarloLocalizationUsing AdaptiveLikelihoodModels PatrickPfaff1 ,WolframBurgard1 ,andDieterFox2 1
DepartmentofComputerScience,UniversityofFreiburg,Germany, {pfaff,burgard}@informatik.uni-freiburg.de 2 DepartmentofComputerScience&Engineering,UniversityofWashington, USA,[email protected] Summary. Inprobabilisticmobilerobotlocalization,thedevelopmentofthesensor modelplaysacrucialroleasitdirectlyinfluencestheefficiencyandtherobustness ofthelocalizationprocess.Sensormodelsdevelopedforparticlefilterscomputethe likelihoodofasensormeasurementbyassumingthatoneoftheparticlesaccurately represents the true location of the robot. In practice, however, this assumption isoftenstronglyviolated,especiallywhenusingsmallsamplesetsorduringglobal localization.Inthispaperweintroduceanovel,adaptivesensormodelthatexplicitly takesthelimitedrepresentationalpowerofparticlefiltersintoaccount.Asaresult, ourapproachusessmoothlikelihoodfunctionsduringgloballocalizationandmore peaked functions during position tracking. Experiments show that our technique significantlyoutperformsexisting,staticsensormodels.
1 Introduction Probabilistic mobile robot localization estimates a robot’s pose using a map of the environment, information about the robot’s motion, and sensor measurements that relate the robot’s pose to objects in the map. The sensor model plays a crucial role as it directly influences the efficiency and the robustness of the localization process. It specifies how to compute the likelihood p(z | x, m) or short p(z | x) of a measurement z given the vehicle is at position x in the environment m. A proper likelihood function has to take various sources of noise into account, including sensor uncertainty, inaccuracy of the map, and uncertainty in the robot’s location. An improperly designed likelihood function can make the vehicle overly confident in its position and in this way might lead to a divergence of the filter. On the other hand, the model might be defined in a too conservative fashion and this way prevent the robot from localizing as the sensor information cannot compensate the uncertainty introduced by the motion of the vehicle. Monte Carlo localization (MCL) is a particle-filter based implementation of recursive Bayesian filtering for robot localization [2, 6]. In each iteration of
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 181–194, 2006. © Springer-Verlag Berlin Heidelberg 2006
182
P.Pfaff,W.Burgard,andD.Fox
MCL,thelikelihoodfunctionp(z| x)isevaluatedatsamplepointsthatare randomlydistributedaccordingtotheposteriorestimateoftherobotlocation. SensormodelsdevelopedforMCLtypicallyassumethatthevehiclelocation x is known exactly; that is, they assume that one of the sampling points correspondstothetruelocationoftherobot.However,thisassumptionisonly validinthelimitofinfinitelymanyparticles.Otherwise,theprobabilitythat theposeofaparticleexactlycorrespondstothetruelocationoftherobotis virtuallyzero.Asaconsequence,theselikelihoodfunctionsdonotadequately modeltheuncertaintyduetothefinite,sample-basedrepresentationofthe posterior.Inpractice,atypicalsolutiontothisproblemistoartificiallyinflate the noise of the sensor model. Such a solution is not satisfying, however, since the additional uncertainty due to the sample-based representation is notknowninadvance;itstronglyvarieswiththenumberofsamplesandthe uncertaintyinthelocationestimate. In this paper we introduce a novel, adaptive sensor model that explicitly takes location uncertainty due to the sample-based representation into account.Inordertocomputeanestimateofthisuncertainty,ourapproach borrowsideasfromtechniquesdevelopedfordensityestimation.Thegoalof densityestimationistoextractanestimateoftheprobabilitydensityunderlyingasetofsamples.Mostapproachestodensityestimationestimatethe densityatapointxbyconsideringaregion around x,wherethesizeofthe regiontypicallydependsonthenumberandlocaldensityofthesamples(for instance,seeParzenwindowork-nearestneighborapproaches[4]). Thedensityestimationviewsuggestsawayforcomputingtheadditional uncertaintythatisduetothesample-basedrepresentation:Whenevaluating thelikelihoodfunctionatasamplelocation,weconsidertheregionadensity estimationtechniquewouldtakeintoaccountwhenestimatingthedensityat thatlocation.Thelikelihoodfunctionappliedtothesampleisthencomputed byintegratingthepoint-wiselikelihoodoverthecorrespondingregion.Asa result, the likelihood function automatically adapts to the local density of samples,beingsmoothduringgloballocalizationandhighly“peaked”during positiontracking.Ourapproachraisestwoquestions. 1. Howlargeshouldtheregionconsideredforasamplebe? 2. Howcanweefficientlydeterminethisregionandintegratetheobservation likelihoodoverit? While our idea can be applied to arbitrary particle filter approaches, this paper focuses on how to address these questions in the context of mobile robotlocalization.Inparticular,weestimatetheregionassociatedtoaparticle usingameasureappliedink-nearestneighbordensityestimation,inwhich theregionofapointgrowsuntilasufficientnumberofparticleslieswithin it.Weshowthatbyconsideringthesimplecaseofk =1andlearningthe appropriatesmoothnessofthelikelihoodfunction,wecaneffectivelyimprove thespeedrequiredforgloballocalizationandatthesametimeachievehigh accuracyduringpositiontracking.
RobustMonte-CarloLocalizationUsingAdaptiveLikelihoodModels
183
Thispaperisorganizedasfollows.Afterdiscussingrelatedworkinthe nextsection,webrieflydescribeMonteCarlolocalizationinSection3.Our approachtodynamicallyadaptthelikelihoodmodelispresentedinSection4. Finally,inSection5,wepresentexperimentalresultsillustratingtheadvantagesofourmodel.
2 Related Work In the literature, various techniques for computing the likelihood function for Monte Carlo localization can be found. They depend mainly on the sensors used for localization and the underlying representation of the map m. For example, several authors used features extracted from camera images to calculate the likelihood of observations. Typical features are average grey values [2], color values [8], color transitions [10], feature histograms [17], or specific objects in the environment of the robot [9, 12, 13]. Additionally, several likelihood models for Monte-Carlo localization with proximity sensors have been introduced [1, 15]. These approaches either approximate the physical behavior of the sensor [7] or try to provide smooth likelihood models to increase the robustness of the localization process [14]. Whereas these likelihood functions allow to reliably localize a mobile robot in its environment, they have the major disadvantage that they are static and basically independent of the state the localization process has. In the past, is has been observed that the likelihood function can have a major influence on the performance of Monte-Carlo Localization. For example, Pitt and Shepard [11] as well as Thrun et al. [16] observed that more particles are required if the likelihood function is peaked. In the limit, i.e., for a perfect sensor, the number of required particles becomes infinite. To deal with this problem, Lenser and Veloso [9] and Thrun et al. [16] introduced techniques to directly sample from the observation model and in this way ensure that there is a critical mass of samples at the important parts of the state space. Unfortunately, this approach depends on the ability to sample from observations, which can often only be done in an approximate, inaccurate way. Alternatively, Pitt and Shepard [11] apply a Kalman filter lookahead step for each particle in order to generate a better proposal distribution. While this technique yields superior results for highly accurate sensors, it is still limited in that the particles are updated independently of each other. Hence, the likelihood function does not consider the number and density of particles. Another way of dealing with the limitations of the sample-based representation is to dynamically adapt the number of particles, as done in KLD sampling [5]. However, for highly accurate sensors, even such an adaptive technique might require a huge number of samples in order to achieve a sufficiently high particle density during global localization. The reader may notice that Kalman filter based techniques do consider the location uncertainty when integrating a sensor measurement. This is done
184
P.Pfaff,W.Burgard,andD.Fox
bymappingthestateuncertaintyintoobservationspaceviaalinearapproximation. However, Kalman filters have limitations in highly non-linear and multi-modalsystems,andthefocusofthispaperistoaddsuchacapability toparticlefilters.Morespecifically,inthispaperweproposeanapproachthat dynamicallyadaptsthelikelihoodfunctionduringlocalization.WedynamicallycalculateforeachparticlethevarianceoftheGaussiangoverningthe likelihoodfunctiondependingontheareacoveredbythatparticle.
3 Monte Carlo Localization To estimate the pose x of the robot in its environment, we consider probabilistic localization, which follows the recursive Bayesian filtering scheme. The key idea of this approach is to maintain a probability density p(x) of the robot’s own location, which is updated according to the following rule: p(xt | z1:t , u0:t−1 ) = α · p(zt | xt ) ·
p(xt | ut−1 , xt−1 ) · p(xt−1 ) dxt−1 (1)
Here α is a normalization constant ensuring that p(xt | z1:t , u0:t−1 ) sums up to one over all xt . The term p(xt | ut−1 , xt−1 ) describes the probability that the robot is at position xt given it executed the movement ut−1 at position xt−1 . Furthermore, the quantity p(zt | xt ) denotes the probability of making observation zt given the robot’s current location is xt . The appropriate computation of this quantity is the subject of this paper. Throughout this paper we use a sample-based implementation of this filtering scheme also denoted as Monte Carlo localization [2, 6]. In Monte-Carlo localization, which is a variant of particle filtering [3], the update of the belief generally is realized by the following two alternating steps: 1. In the prediction step, we draw for each sample a new sample according to the weight of the sample and according to the model p(xt | ut−1 , xt−1 ) of the robot’s dynamics given the action ut−1 executed since the previous update. 2. In the correction step, the new observation zt is integrated into the sample set. This is done by bootstrap re-sampling, where each sample is weighted according to the likelihood p(zt | xt ) of sensing zt given by its position xt .
4 Dynamic Sensor Models The likelihood model p(z | x) plays a crucial role in the correction step of the particle filter. Typically, very peaked models require a huge number of particles. This is because even when the particles populate the state space very densely, the likelihoods of an observation might differ by several orders of
RobustMonte-CarloLocalizationUsingAdaptiveLikelihoodModels
185
magnitude.Astheparticlesaredrawnproportionaltotheimportanceweights, whichthemselvesarecalculatedasthelikelihoodofzt giventheposext of thecorrespondingparticle,aminordifferenceinxt willresultinadifference ofoneorderofmagnitudeinthelikelihoodandthuswillresultinadepletion of such a particle in the re-sampling step. Accordingly, an extremely high densityofparticlesisneededwhenthesensorishighlyaccurate.Ontheother hand,thesheersizeofthestatespacepreventsusfromusingasufficiently largenumberparticlesduringgloballocalizationinthecasethatthesensor ishighlyaccurate.Accordinglythesensormodelneedstobelesspeakedin thecaseofgloballocalization,whentheparticlesaresparselydistributedover thestatespace.Thisisessentiallytheideaoftheapproachproposedinthis paper. 4.1 LikelihoodModelforRangeSensors Thelikelihoodmodelusedthroughoutthispapercalculatesp(z | x)based onthedistancedtotheclosestobstacleinthedirectionofthemeasurement givenx (andthemapm).Accordingly,p(z| x)iscalculatedas p(z| x)=p(z| d).
(2)
To determine this quantity, we follow the approach described in Thrun et al.[15]andChosetetal.[1].Thekeyideaofthissensormodelistousea mixtureoffourdifferentdistributionstocapturethenoiseanderrorcharacteristicsofrangesensors.ItusesaGaussian phit (z| d)tomodelthelikelihood insituationsinwhichthebeamhitsthenextobjectinthedirectionofthe measurement.Additionally,itusesauniformdistributionprand (z| d)torepresentrandommeasurements.Itfurthermoremodelsobjectsnotcontainedin themapaswellastheeffectsofcross-talkbetweendifferentsensorsbyan exponentialdistributionpshort (z| d).Finally,itdealswithdetectionerrors, inwhichthesensorreportsamaximumrangemeasurement,usingauniform distribution pmax (z | d). These four different distributions are mixed by a weightedaverage,definedbytheparametersαhit ,αshort ,αmax ,andαrand . p(z| d)
(3) T
=(αhit , αshort , αmax , αrand )· (phit (z| d), pshort (z| d), pmax (z| d), prand (z| d)) . Note that these parameters need to satisfy the constraints that none of them should be smaller than zero and that p(z | d) integrates to 1 over all z for a fixed d. A plot of this sensor model for a given set of parameters is shown in Figure 1. Also note that the exponential distribution is only used to model measurements that are shorter than expected, i.e., for measurements z with z < d. Therefore, there is a discontinuity at z = d (see Thrun et al. [15] for details).
186
P.Pfaff,W.Burgard,andD.Fox sigma_von_r
sigma_2
sigma_1
radius_r_mal_alpha
Fig. 1. Sensor model given by a mixture of different distributions (left image) and piecewise linear function used to calculate the standard deviation based on the distance d = 2r to the closest particle.
4.2 Adapting the Variance As already mentioned above, the particle set should approximate the true posterior as closely as possible. Since we only have a limited number of particles, which in practice is often chosen as small as possible to minimize the computational demands, we need to take into account potential approximation errors. The key idea of our approach is to adjust the variance of the Gaussian governing phit (z | d), which models the likelihood of measuring z given that the sensor detects the closest obstacle in the map, such that the particle set yields an accurate approximation of the true posterior. To achieve this, we approximatively calculate for each particle i the space it covers by adopting the measure used in k-nearest neighbor density estimation [4]. For efficiency reasons we rely on the case of k = 1, in which the spatial region covered by a particle is given by the minimum circle that is centered at the particle and contains at least one neighboring particle in the current set. To calculate the radius ri of this circle, we have to take both, the Euclidean distance of the positions and the angular difference in orientation, into account. In our current implementation we calculate ri based on a weighted mixture of the Euclidean distance and the angular difference d(x, x ) =
(1 − ξ)((x1 − x1 )2 + (x2 − x2 )2 ) + ξ δ(x3 − x3 )2 ,
(4)
where x1 and x1 are the x-coordinates, x2 and x2 are the y-coordinates, and δ(x3 − x3 ) is the differences in the angle of x and x . Additionally, ξ is a weighing factor that was set to 0.8 in all our experiments. Figure 2 shows the fraction of a map and a particle distribution. The circle around each particle represents the radius r = 12 d(x, x ) to the closest particle x . The next step is to adjust for each particle the standard deviation σ of the Gaussian in phit (z | d) based on the distance r = 21 d(x, x∗ ), where x∗ is the particle closest to x with respect to Equation 4. In our current implementation we use a piecewise linear function σ(r) to compute the standard deviation of phit (z | d):
RobustMonte-CarloLocalizationUsingAdaptiveLikelihoodModels
187
Fig. 2. Fraction of a particle set consisting of 10,000 particles during a global localization run. The circles around the individual particles represent the radius r = 21 d(x, x ) calculated from the distance to the particle x closest to x.
Fig. 3. Distribution of 1,000,000 particles after 0, 1 and 2 integrations of measurements with the sensor model according to the specifications of the SICK LMS sensor.
σ1 σ(r) = σ2 αr
if αr < σ1 if αr > σ2 otherwise.
(5)
To learn the values of the parameters σ1 , σ2 , and α of this function, we performed a series of localization experiments on recorded data, in which we determined the optimal values by minimizing the average distance of the obtained distributions from the true posterior. Since the true posterior is unknown, we determined a close approximation of it by increasing the number of particles until the entropy of a three-dimensional histogram calculated from the particles did not change anymore. In our experiment this turned out to be the case for 1,000,000 particles. Throughout this experiment, the sensor model corresponded to the error values provided in the specifications of the laser range finder used for localization. In the remainder of this section, we will denote the particle set representing the true posterior by S ∗ . Figure 3 shows the set S ∗ at different points in time for the data set considered in this paper.
188
P.Pfaff,W.Burgard,andD.Fox
TocalculatethedeviationofthecurrentparticlesetSfromthetrueposteriorrepresentedby S ∗ ,weevaluatetheKL-distancebetweenthedistributions obtainedbycomputingahistogramfromSandS ∗ .Whereasthespatialresolution of this histogram is 40cm, the angular resolution is 5 degrees. For discreteprobabilitydistributions,p=p1 , .. ., pn andq=q1 , .. ., qn ,theKLdistanceisdefinedas KL(p,q)=
pi log2 i
pi . qi
(6)
Whenever we choose a new standard deviation for phit (z | d), we adapt the weights αhit , αshort , αmax , and αrand to ensure that the integral of the resulting density is 1. Note that in principle phit (z | d) should encode also several other aspects to allow for a robust localization. One such aspect, for example, is the dependency between the individual measurements. For example, a SICK LMS laser range scanner typically provides 361 individual distance measurements. Since a particle never corresponds to the true location of the vehicle, the error in the pose of the particle renders the individual measurements as dependent. This dependency, for example, should also be encoded in a sensor model to avoid the filter becoming overly confident about the pose of the robot. To reduce potential effects of the dependency between the individual beams of a scan, we only used 10 beams at angular displacements of 18 degrees from each scan.
5 Experimental Results The sensor model described above has been implemented and evaluated using real data acquired with a Pioneer PII DX8+ robot equipped with a laser range scanner in a typical office environment. The experiments described in this section are designed to investigate if our dynamic sensor model outperforms static models. Throughout the experiments we compared our dynamic sensor model with different types of alternative sensor models: 1. Best static sensor model. This model has been obtained by evaluating a series of global localization experiments, in which we determined the optimal variance of the Gaussian by maximizing the utility function I
U (I, N ) = i=1
(I − i + 1)
Pi , N
(7)
where I is the number of integrations of measurements into the belief during the individual experiment, N is the number of particles, and Pi is the number of particles lying in a 1m range around the true position of the robot.
RobustMonte-CarloLocalizationUsingAdaptiveLikelihoodModels
189
2. Besttrackingmodel.Wedeterminedthismodelinthesamewayasthe beststaticsensormodel.Theonlydifferenceisthatwehavelearnedit fromsituationsinwhichthefilterwastrackingtheposeofthevehicle. 3. SICKLMSmodel.Thismodelhasbeenobtainedfromthehardwarespecificationsofthelaserrangescanner. 4. Uniformdynamicmodel.Inourdynamicsensormodelthestandarddeviationofthelikelihoodfunctioniscomputedonaper-particlebasis.We alsoanalyzedtheperformanceofamodelinwhichauniformstandard deviationisusedforallparticles.Thecorrespondingvalueiscomputed bytakingtheaverageoftheindividualstandarddeviations.
Fig. 4. Distribution of 10,000 particles after 1, 2, 3, 5, and 11 integrations of measurements with our dynamic sensor model (left column) and with the best static sensor model (right column).
5.1 Global Localization Experiments The first set of experiments is designed to evaluate the performance of our dynamic sensor model in the context of a global localization task. In the particular data set used to carry out the experiment, the robot started in the kitchen of our laboratory environment (lower left room in the maps depicted in Figure 3). The evolution of a set of 10,000 particles during a typical global localization run with our dynamically adapted likelihood model and with the
190
P.Pfaff,W.Burgard,andD.Fox
dynamic sensor model uniform dynamic sensor model best static sensor model
100
particles closer than 1m to ground thruth [%]
particles closer than 1m to ground thruth [%]
Fig. 5. Evolution of the distance d(x, x ) introduced in Equation (4). Distribution of 10,000 particles after 1, 3, 4 and 5 integrations of measurements with our dynamic sensor model.
80 60 40 20 0
0
5
10
15 iteration
20
25
dynamic sensor model uniform dynamic sensor model best static sensor model
100 80 60 40 20 0
0
5
10
15
20
25
iteration
Fig. 6. Percentage of particles within a 1m circle around the true position of the robot with our dynamic sensor model, the uniform dynamic model, and the best static model. The left image shows the evolution depending on the number of iterations for 10,000 particles. The right plot was obtained with 2,500 particles.
best static sensor model is depicted in Figure 4. As can be seen, our dynamic model improves the convergence rate as the particles quickly focus on the true pose of the robot. Due to the dynamic adaptation of the variance, the likelihood function becomes more peaked so that unlikely particles are depleted earlier. Figure 5 shows the evolution of the distance d(x, x ) introduced in Equation (4). The individual images illustrate the distribution of 10,000 particles after 1, 2, 3, and 5 integrations of measurements with our dynamic sensor model. The circle around each particle represents the distance r = 21 d(x, x ) to the next particle x . Figure 6 shows the convergence of the particles to the true position of the robot. Whereas the x-axis corresponds to the time step, the y-axis shows the number of particles in percent that are closer than 1m to the true position. Shown are the evolutions of these numbers for our dynamic sensor model,
RobustMonte-CarloLocalizationUsingAdaptiveLikelihoodModels dynamic sensor model best static sensor model best tracking model SICK LMS model dynamic sensor model with one beam SICK LMS model with one beam
125
100 success rate [%]
191
75
50
25
0
2500
5000
7500
10000
number of particles
Fig. 7. Success rates for different types of sensor models and sizes of particle sets. The shown results are computed in extensive global localization experiments.
a uniform dynamic model, and the best fixed model for 10,000 and 2,500 particles. Note that the best static model does not reach 100%. This is due to the fact that the static sensor model relies on a highly smoothed likelihood function, which is good for global localization but does not achieve maximal accuracy during tracking. In the case of 10,000 particles, the variances in the distance between the individual particles are typically so small, that the uniform model achieves a similar performance. Still, a t-test showed that both models are significantly better than the best fixed model. In the case of 2,500 particles, however, the model that adjusts the variance on a per-particle basis performs better than the uniform model. Here, the differences are significant whenever they exceed 20. Figure 7 shows the number of successful localizations after 35 integrations of measurements for a variety of sensor models and for different numbers of particles. Here, we assumed that the localization was achieved when the mean of the particles differed by at most 30cm from the true location of the robot. First it shows that our dynamic model achieves the same performance as the best static model for global localization. Additionally, the figure shows that the static model that yields the best tracking performance has a substantially smaller success rate. Additionally, we evaluated a model, denoted as the SICK LMS model, in which the standard deviation was chosen according to the specifications of the SICK LMS sensor, i.e., under the assumption that the particles in fact represent the true position of the vehicle. As can be seen, this model yields the worst performance. Furthermore, we evaluated, how the models perform when only one beam is used per range scan. With this experiment we wanted to analyze whether the dynamic model also yields better results in situations in which there is no dependency between the individual beams of a scan. Again, the plots show that our sensor model outperforms
192
P.Pfaff,W.Burgard,andD.Fox
average standard deviation
0.45
10000 particles 7500 particles 5000 particles 2500 particles
0.4
0.35
0.3
0.25
5
10
15
iteration
Fig. 8. Evolution of the average standard deviation during global localization with different numbers of particles and our dynamic sensor model. 0.2
0.16
0.16
0.14
0.14
0.12 0.1 0.08
0.12 0.1 0.08
0.06
0.06
0.04
0.04
0.02
5
10
15 iterations
20
dynamic sensor model static sensor model
0.18
localization error
localization error
0.2
dynamic sensor model static sensor model
0.18
25
0.02
5
10
15 iterations
20
25
Fig. 9. Average localization error of 10,000 (left image) and 2,500 (right image) particles during a position tracking task. Our dynamic sensor model shows a similar performance as the best tracking model.
the model, for which the standard deviation corresponds to the measuring accuracy of the SICK LMS scanner. Finally, Figure 8 plots the evolution of the average standard deviations of several global localization experiments with different numbers of particles. As can be seen from the figure, our approach uses more smoothed likelihood functions when operating with few particles (2,500). The more particles are used in the filter, the faster the standard deviation converges to the minimum value. 5.2 Tracking Performance We also carried out experiments, in which we analyzed the accuracy of our model when the system is tracking the pose of the vehicle. We compared our
RobustMonte-CarloLocalizationUsingAdaptiveLikelihoodModels
193
dynamicsensormodeltothebesttrackingmodelandevaluatedtheaverage localizationerroroftheindividualparticles.Figure9depictstheaveragelocalizationerrorfortwotrackingexperimentswith10,000and2,500particles.As canbeseenfromthefigure,ourdynamicmodelshowsthesameperformance asthetrackingmodelwhoseparametershavebeenoptimizedforminimum localizationerror.Thisshows,thatourdynamicsensormodelyieldsfaster convergenceratesinthecontextofgloballocalizationandatthesametime achievesthebestpossibletrackingperformance.
6 Conclusions In this paper we presented a new approach for dynamically adapting the sensor model for particle filter based implementations of probabilistic localization. Our approach learns a function that outputs the appropriate variance for each particle based on the estimated area in the state space represented by this particle. To estimate the size of this area, we adopt a measure developed for density estimation. The approach has been implemented and evaluated in extensive experiments using laser range data acquired with a real robot in a typical office environment. The results demonstrate that our sensor model significantly outperforms static and optimized sensor models with respect to robustness and efficiency of the global localization process.
Acknowledgment The authors would like to thank Dirk Zitterell for fruitful discussions on early versions of this work. This work has partly been supported by the German Research Foundation (DFG) within the Research Training Group 1103 and under research grant SFB/TR-8, and by the National Science Foundation under grant number IIS-0093406.
References 1. H. Choset, K.M. Lynch, S. Hutchinson, G. Kantor, W. Burgard, Kavraki L.E., and S. Thrun. Principles of Robot Motion Planing. MIT-Press, 2005. 2. F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte Carlo localization for mobile robots. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA), 1999. 3. A. Doucet, N. de Freitas, and N. Gordon, editors. Sequential Monte Carlo in Practice. Springer-Verlag, New York, 2001. 4. R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley-Interscience, 2001.
194
P.Pfaff,W.Burgard,andD.Fox
5. D. Fox. Adapting the sample size in particle filters through KLD-sampling. InternationalJournalofRoboticsResearch,22(12):985–1003,2003. 6. D.Fox,W.Burgard,F.Dellaert,andS.Thrun.MonteCarlolocalization:Efficientpositionestimationformobilerobots.InProc.oftheNationalConference onArtificialIntelligence(AAAI),1999. 7. D.Fox,W.Burgard,andS.Thrun. Markovlocalizationformobilerobotsin dynamicenvironments. JournalofArtificialIntelligenceResearch,11,1999. 8. H.-M. Gross, A. K¨ onig, Ch. Schr¨ oter, and H.-J. B¨ ohme. Omnivision-based probalisticself-localizationforamobileshoppingassistantcontinued.InProc.of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pages 1505–1511,2003. 9. S. Lenser and M. Veloso. Sensor resetting localization for poorly modelled mobile robots. In Proc. of the IEEE Int. Conf. on Robotics & Automation (ICRA),2000. 10. E. Menegatti, A. Pretto, and E. Pagello. Testing omnidirectional visionbased Monte-Carlo localization under occlusion. In Proc. of the IEEE/RSJ Int.Conf.onIntelligentRobotsandSystems(IROS),pages2487–2494,2004. 11. M.K.PittandN.Shephard. Filteringviasimulation:auxiliaryparticlefilters. JournaloftheAmericanStatisticalAssociation,94(446),1999. 12. T.R¨ oferandM.J¨ ungel.Vision-basedfastandreactiveMonte-Carlolocalization. InProc.oftheIEEEInt.Conf.onRobotics&Automation(ICRA),pages856– 861,2003. 13. M.Sridharan,G.Kuhlmann,andP.Stone.Practicalvision-basedMonteCarlo localizationonaleggedrobot. InProc.oftheIEEEInt.Conf.onRobotics& Automation(ICRA),2005. 14. S.Thrun.Aprobabilisticonlinemappingalgorithmforteamsofmobilerobots. InternationalJournalofRoboticsResearch,20(5):335–363,2001. 15. S.Thrun,W.Burgard,andD.Fox. ProbabilisticRobotics. MIT-Press,2005. 16. S.Thrun,D.Fox,W.Burgard,andF.Dellaert.RobustMonteCarlolocalization formobilerobots. ArtificialIntelligence,128(1-2),2001. 17. J.Wolf,W.Burgard,andH.Burkhardt. Robustvision-basedlocalizationby combining an image retrieval system with Monte Carlo localization. IEEE TransactionsonRobotics,21(2):208–216,2005.
MetricLocalizationwithScale-InvariantVisual FeaturesUsingaSinglePerspectiveCamera MarenBennewitz,CyrillStachniss,WolframBurgard,andSvenBehnke UniversityofFreiburg,ComputerScienceInstitute,D-79110Freiburg,Germany
Abstract. The Scale Invariant Feature Transform (SIFT) has become a popular feature extractor for vision-based applications. It has been successfully applied to metric localization and mapping using stereo vision and omnivision. In this paper, we present an approach to Monte-Carlo localization using SIFT features for mobile robots equipped with a single perspective camera. First, we acquire a 2D grid map of the environment that contains the visual features. To come up with a compact environmental model, we appropriately down-sample the number of features in the final map. During localization, we cluster close-by particles and estimate for each cluster the set of potentially visible features in the map using ray-casting. These relevant map features are then compared to the features extracted from the current image. The observation model used to evaluate the individual particles considers the difference between the measured and the expected angle of similar features. In real-world experiments, we demonstrate that our technique is able to accurately track the position of a mobile robot. Moreover, we present experiments illustrating that a robot equipped with a different type of camera can use the same map of SIFT features for localization.
1 Introduction Self-localization is one of the fundamental problems in mobile robotics. The topic was studied intensively in the past. Many approaches exist that use distance information provided by a proximity sensor for localizing a robot in the environment. However, for some types of robots, proximity sensors are not the appropriate choice because they do not agree with their design principle. Humanoid robots, for example, which are constructed to resemble a human, are typically equipped with vision sensors and lack proximity sensors like laser scanners. Therefore, it is natural to equip these robots with the ability of vision-based localization.
H.I. Christensen (Ed.): European Robotics Symposium 2006, STAR 22, pp. 195–209, 2006. © Springer-Verlag Berlin Heidelberg 2006
196
M.Bennewitzetal.
In this paper, we present an approach to vision-based mobile robot localization that uses a single perspective camera. We apply the well-known Monte-Carlolocalization(MCL)technique[5]toestimatetherobot’sposition.MCLusesasetofrandomsamples,alsocalledparticles,torepresentthe beliefoftherobotaboutitspose.Tolocatefeaturesinthecameraimages, weusetheScaleInvariantFeatureTransform(SIFT)developedbyLowe[15]. SIFTfeaturesareinvarianttoimagetranslation,scale,androtation.Additionally,theyarepartiallyinvarianttoilluminationchangesandaffineor3D projection.ThesepropertiesmakeSIFTfeaturesparticularlysuitableformobilerobotssince,astherobotsmovearound,theytypicallyobservelandmarks fromdifferentanglesanddistances,andwithadifferentillumination. Whereasexistingsystems,thatperformmetriclocalizationandmapping usingSIFTfeatures,applystereovisioninordertocomputethe3Dpositionof thefeatures[20,7,21,2],werelyonasinglecameraonlyduringlocalization. Sincewewanttoconcentrateonthelocalizationaspect,wefacilitatethemap acquisitionprocessbyusingarobotequippedwithacameraandaproximity sensor.Duringmapping,wecreatea2Dgridmodeloftheenvironment.In eachcellofthegrid,westorethosefeaturesthataresupposedtobeatthat 2D grid position. Since the number of observed SIFT features is typically high,weappropriatelydown-samplethenumberoffeaturesinthefinalmap. DuringMCL,wethenrelyonasingleperspectivecameraanddonotuseany proximity information.Our approachestimates for clustersofparticlesthe setofpotentiallyvisiblefeaturesusingray-castingonthe2Dgrid.Wethen comparethosefeaturestothefeaturesextractedfromthecurrentimage.In theobservationmodeloftheparticlefilter,weconsiderthedifferencebetween the measured and the expected angle of similar features. By applying the ray-castingtechnique,weavoidcomparingthefeaturesextractedoutofthe current image to the whole database of features (as the above mentioned approachesdo),whichcanleadtoseriouserrorsinthedataassociation.As we demonstrate in practical experiments with a mobile robot in an office environment,ourtechniqueisabletoreliablytrackthepositionoftherobot. WealsopresentexperimentsillustratingthatthesamemapofSIFTfeatures canbeusedforself-localizationbydifferenttypesofrobotsequippedwitha singlecameraonlyandwithoutproximitysensors. Thispaperisorganizedasfollows.Afterdiscussingrelatedworkinthe followingsection,wedescribetheMonte-Carlolocalizationtechniquethatis applied to estimate the robot’s position. In Section 4, we explain how we acquire2DgridmapsofSIFTfeatures.InSection5,wepresenttheobservationmodelusedforMCL.Finally,inSection6,weshowexperimentalresults illustratingtheaccuracyofourapproachtoestimatetherobot’sposition.
MetricLocalizationwithSIFTFeaturesUsingaSingleCamera
197
2RelatedWork Monte-Carlomethodsarewidelyusedforvision-basedlocalizationandhave beenshowntoyieldquiterobustestimatesoftherobot’sposition.Several localizationapproachesareimage-based,whichmeansthattheystoreaset ofreferenceimagestakenatvariouslocationsthatareusedforlocalization. Someoftheimage-basedmethodsrelyonanomnidirectionalcamerainordertolocalizeamobilerobot.Theadvantagesofomnidirectionalimagesare thecircularfieldofviewandthus,theknowledgeabouttheappearanceof the environment in all possible gaze directions. Recent techniques were for examplepresentedbyAndreassonetal.[1]whodevelopedamethodtomatch SIFTfeaturesextractedfromlocalinterestpointsinpanoramicimages,by Menegattietal.[16]whouseFouriercoefficientsforfeaturematchinginomnidirectionalimages,andbyGrossetal.[9]whocomparethepanoramicimages usingcolorhistograms.Wolfetal.[23]applyacombinationofMCLandan imageretrievalsysteminordertolocalizearobotequippedwithaperspective camera.ThesystemspresentedbyLedwichandWilliams[12]andbyK˘os´ecka andLi[11]performMarkovlocalizationwithinatopologicalmap.Theyuse theSIFTfeaturedescriptortomatchthecurrentviewtothereferenceimages. Whenever using those image-based methods, care has to be taken in decidingatwhichpositionstocollectthereferenceimagesinordertoensure acompletecoverageofthespacetherobotcantravelin.Incontrasttothis, ourapproachstoresfeaturesatthepositionswheretheyarelocatedinthe environmentandnotforallpossibleposestherobotcanbein. Additionally,localizationtechniqueshavebeenpresentedthatuseadatabaseofobservedvisuallandmarks.SIFTfeatureshavebecomeverypopular for metric localization as well as for SLAM (simultaneous localization and mapping,[21,2]).Seetal.[20]werethefirstwhoperformedlocalizationusingSIFTfeaturesinarestrictedarea.Theydidnotapplyatechniquetotrack thepositionoftherobotovertime.Recently,ElinasandLittle[7]presented a system that uses MCL in combination with a database of SIFT features learnedinthesamerestrictedenvironment.Alltheseapproachesusestereo visiontocomputethe3Dpositionofalandmarkandmatchthevisualfeatures inthecurrentviewtoallthoseinthedatabasetofindcorrespondences.To avoidmatchingtheobservationstothewholedatabaseoffeatures,wepresent asystemthatdeterminesthesetsofvisiblefeaturesforclustersofparticles. Theserelevantfeaturesarethenmatchedtothefeaturesinthecurrentimage. Thisway,thenumberofambiguities,whichcanoccurinlargerenvironments, is reduced. The relevant features are determined by applying a ray-casting techniqueinthemapoffeatures.ThemaindifferencetoexistingmetriclocalizationsystemsusingSIFTfeaturesishoweverthatourapproachisapplicable torobotsthatareequippedwithasingleperspectivecameraonly,whereas theotherapproachesrequirestereovisionoromnivision. NotethatDavisonetal.[3]andLemaireetal.[13]presentedapproaches tofeature-basedSLAMusingasinglecamera.Theseauthorsuseextended
198
M.Bennewitzetal.
Kalmanfiltersforstateestimation.Bothapproacheshaveonlybeenapplied torobotsmovingwithinarelativelysmalloperationalrange. Vision-basedMCLwasfirstintroducedbyDellaertetal.[4].Theauthors constructedaglobalceilingmosaicandusesimplefeaturesextractedoutof imagesobtainedwithacamerapointingtotheceilingforlocalization.Systems thatapplyvision-basedMCLarealsopopularintheRoboCupdomain.Inthis scenario,therobotsuseenvironment-specificobjectsasfeatures[19,22].
3 Monte-Carlo Localization To estimate the pose xt (position and orientation) of the robot at time t, we apply the well-known Monte-Carlo localization (MCL) technique [5], which is a variant of Markov localization. MCL recursively estimates the posterior about the robot’s pose: p(xt | z1:t , u0:t−1 ) = η · p(zt | xt ) ·
xt−1
p(xt | xt−1 , ut−1 ) · p(xt−1 | z1:t−1 , u0:t−2 ) dxt−1 (1)
Here, η is a normalization constant resulting from Bayes’ rule, u0:t−1 denotes the sequence of all motion commands executed by the robot up to time t − 1, and z0:t is the sequence of all observations. The term p(xt | xt−1 , ut−1 ) is called motion model and denotes the probability that the robot ends up in state xt given it executes the motion command ut−1 in state xt−1 . The observation model p(zt | xt ) denotes the likelihood of making the observation zt given the robot’s current pose is xt . To determine the observation likelihood, our approach compares SIFT features in the current view to those SIFT features in the map that are supposed to be visible (see Section 5). MCL uses a set of random samples to represent the belief of the robot (i) about its state at time t. Each sample consists of the state vector xt and a (i) weighting factor ωt that is proportional to the likelihood that the robot is in the corresponding state. The update of the belief, also called particle filtering, is typically carried out as follows. First, the particle states are predicted according to the motion model. For each particle a new pose is drawn given the executed motion command since the previous update. In the second step, new individual importance weights are assigned to the particles. Particle i is (i) weighted according to the likelihood p(zt | xt ). Finally, a new particle set is created by resampling from the old set according to the particle weights. Each particle survives with a probability proportional to its importance weight. Due to spurious observations it is possible that good particles vanish because they have temporarily a low likelihood. Therefore, we follow the approach proposed by Doucet [6] that uses the so-called number of effective particles [14] to decide when to perform a resampling step
MetricLocalizationwithSIFTFeaturesUsingaSingleCamera
Neff =
1 N i=1
w(i)
2,
199
(2)
where N is the number of particles. Neff estimates how well the current particle set represents the true posterior. Whenever Neff is close to its maximum value N , the particle set is a good approximation of the true posterior. Its minimal value 1 is obtained in the situation in which a single particle has all the probability mass contained in its state. We do not resample in each iteration, instead, we only resample each time Neff drops below a given threshold (here set to N2 ). In this way, the risk of replacing good particles is drastically reduced.
4 Acquiring 2D Maps of Scale-Invariant Features We use maps of visual landmarks for localization. To detect features, we use the Scale Invariant Feature Transform (SIFT). Each image feature is described by a vector p, s, r, f where p is the subpixel location, s is the scale, r is the orientation, and f is a descriptor vector, generated from local image gradients. The SIFT descriptor is invariant to image translation, scaling, and rotation and also partially invariant to illumination changes and affine or 3D projection. Lowe presented results illustrating robust matching of SIFT descriptors under various image transformations [15]. Mikolajczyk and Schmid compared SIFT and other image descriptors and showed that SIFT yields the highest matching accuracy [17]. Ke and Sukthankar [10] presented an approach to compute a more compact representation for SIFT features, called PCA-SIFT. They apply principal components analysis (PCA) to determine the most distinctive components of the feature vector. As shown in their work, the PCA-based descriptor is more distinctive and more robust than the standard SIFT descriptor. We therefore use that representation in our current approach. As suggested by Ke and Sukthankar, we apply a 36 dimensional descriptor vector resulting from PCA. To acquire a 2D map of SIFT features, we used a B21r robot equipped with a perspective camera and a SICK laser range finder. We steered the robot through the environment to obtain image data as well as proximity and odometry measurements. The robot was moving with a speed of 40cm/s and collected images at a rate of 3Hz . To be able to compute the positions of features and to obtain ground truth data, we used an approach to grid-based SLAM with Rao-Blackwellized particle filters [8]. Using the information about the robot’s pose and extracted SIFT features out of the current camera image, we can estimate the positions of the features in the map. More specifically, we use the distance measurement of the laser beam that corresponds to the horizontal angle of the detected feature and the robot’s pose to calculate the 2D position of the feature. Thus, we assume that the features are located on the obstacles detected by the laser range finder. In the office environment in
200
M.Bennewitzetal.
whichweperformedourexperiments,thisassumptionleadstoquiterobust estimateseveniftherecertainlyexistfeaturesthatarenotcorrectlymapped. Ineach2Dgridcell,westorethesetoffeaturesthataresupposedtobeat that2Dgridposition.Currently,weuseagridresolutionof10by10cm.In thefirststageofmapping,westoreallobservedfeatures. Aftertherobotmovedthroughtheenvironment,thenumberofobserved SIFTfeaturesisextremelyhigh.Typically,wehave150-500featuresextracted perimagewitharesolutionof320by240pixels.Thisresultsinaround600,000 observedfeaturesaftertherobottraveledfor212minatypicalofficeenvironment.Aftermapacquisition,wedown-sampleareducedsetoffeaturesthatis usedforlocalization.Foreachgridcell,werandomlydrawfeatures.Adrawn featureisrejectedifthereisalreadyasimilarfeaturewithinthecell.WedeterminesimilarfeaturesbycomparingtheirPCA-SIFTvectors(seebelow). Wesampleamaximumof20featuresforeachgridcell.Usingthesampling process,featuresthatwereobservedmoreoftenhaveahigherchancetobe selectedandfeaturesthatweredetectedonlyonce(duetofailureobservations ornoise)areeliminated.Thegoalofthissamplingprocessistoreducecomputationalresourcesandatthesametimeobtainarepresentativesubsetof features.Tochoosegoodrepresentativesforthefeatures,aclusteringbased onthedescriptorvectorscanbecarriedout. TheleftimageofFigure3showsa2DgridmapofSIFTfeaturesofan officeenvironmentthatwasacquiredbythedescribedmethod.Thefinalmap containsapproximately100,000features.Notethatalsoastereocamerasystem,whichwasnotavailableinourcase,wouldbeanappropriatesolution formapbuilding.Thepresentedmapacquisitionapproachisnotrestricted torobotsequippedwithalaserrangefinder.
5 Observation Model for SIFT Features In the previous section, we described how to built a map of SIFT features using a robot equipped with a camera and a proximity sensor. In this section, we describe how a robot without a proximity sensor can use this environmental model for localization with a single perspective camera. Sensor observations are used to compute the weight of each particle by estimating the likelihood of the observation given the pose of the particle in the map. Thus, we have to specify how to compute p(zt | xt ). In our case, an observation zt consists of the SIFT features in the current image: zt = {o1 , . . . , oM } where M is the number of features in the current image. To determine the likelihood of an observation given a pose in the map, we compare the observed features with the features in the map by computing the Euclidean distance of their PCA-SIFT descriptor vectors. In order to avoid comparing the features in the current image to the whole set of features contained in the map, we determine the potentially visible
MetricLocalizationwithSIFTFeaturesUsingaSingleCamera
201
features.Thishelpstocopewithanenvironmentthatcontainssimilarlandmarksatdifferentlocations(e.g.severalsimilarrooms).Incaseonematches thecurrentobservationagainstthewholesetoffeatures,thisleadstoserious errorsinthedataassociation. Tocomputetherelevantfeatures,wegroupclose-byparticlestoacluster. Wedetermineforeachparticleclusterthesetoffeaturesthatarepotentially visiblefromtheselocations.Thisisdoneusingray-castingonthefeaturegrid map.Tospeed-uptheprocessoffindingrelevantfeatures,onecouldalsoprecomputeforeachgridcellthesetoffeaturesthatarevisible.However,inour experiments,computingthesimilarityofthefeaturevectorstooksubstantially longerthantheray-castingoperations.Typically,wehave150-500featuresper image. InordertocomparetwoSIFTvectors,weuseadistancefunctionbased ontheEuclidiandistance.ThelikelihoodthatthetwoPCA-SIFTvectorsf and f belongtothesamefeatureiscomputedas f− f p(f=f )=exp − 2 · σ12
,
(3)
where σ1 is the variance of the Gaussian. In general, one could use Eq. (3) to determine the most likely correspondence between an observed feature and the map features. However, since it is possible that different landmarks exist that have a similar descriptor vector, we do not determine a unique corresponding map feature for each observed feature. In order to avoid misassignments, we instead consider all pairs of observed features and relevant map features. This set of pairs of features is denoted as C. For each pair of features in C we use Eq. (3) to compute the likelihood that the corresponding PCA-SIFT vectors belong to the same feature. (i) This information is than used to compute the likelihood p(zt | xt ) of an (i) observation given the pose xt of particle i, which is required for MCL. Since a single perspective camera does not provide depth information, we can use only the angular information to compute this likelihood. We therefore consider the difference between the horizontal angles of the currently observed (i) features in the image and the features in the map to compute p(zt | xt ). More specifically, we compute the distribution over the angular displacement of a particle given the observation and the map. For each particle, we compute a histogram over the angular differences between the observed features and the map features. The x-values in that histogram represent the angular displacement and the y-values its likelihood. The histogram is computed using the pairs of features in C evaluated using Eq. (3). In particular, we compute for each pair (o, l) ∈ C the difference between the horizontal angle at which the feature was observed and the angle at which the feature should be located according to the map and the particle pose. We add the likelihood that these features are equal, which is given by Eq. (3), to
202
M.Bennewitzetal.
thecorrespondingbinofthehistogram.Asaresult,weobtainadistribution abouttheangularerroroftheparticle. Inmathematicalterms,thevalueh(b)ofabinb(representingtheinterval ofangulardifferencesfromα− (b)toα+ (b))inthehistogramisgivenby p(fo = fl ),
h(b)=β+ (o,l)∈C
(4)
α− (b)≤α(o)−α(l)