Intelligent robots and systems: selections of the International Conference on Intelligent Robots and Systems 1994, IROS 94, Munich, Germany, 12-16 September 1994 044482250X, 9780444822505

Of the 300 papers presented during IROS '94, 48 were selected because they are particularly significant and charact

309 22 29MB

English Pages 745 Year 1995

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Intelligent robots and systems: selections of the International Conference on Intelligent Robots and Systems 1994, IROS 94, Munich, Germany, 12-16 September 1994
 044482250X, 9780444822505

  • Commentary
  • Conference publication
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

This Page Intentionally Left Blank

Selections of the International Conference on Intelligent Robots and Systems I994, IROS 94, Munich, Germany, 12-i6 September i 9 9 4

E D I T E D BY

V o T.K~.R G RAE F E

Institut ffir Messtechnik Universitgt der B W Mi~nchen Neubiberg, Germany

1995 ELSEVIER A m s t e r d a m - Lausanne - New Y o r k - O x f o r d - Shannon - Tokyo

ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 2II, IOOO AE Amsterdam, The Netherlands

Library oF C o n g r e s s

Cataloging-In-Publication

Data

IEEE/RSJ/OI Internationa] ConFerence on I n t e l l i g e n t R o b o t s and Systems (1994 : M u n i c h , Oermanv) Inte]ligent r o b o t s and s y s t e m s : s e l e c t i o n s oF t h e I n t e r n a t i o n a ] C o n F e r e n c e on I n t e l l i g e n t R o b o t s and Systems 1994, IROS 9 4 , M u n i c h , Oerman#, 12-16 S e p t e m b e r 1994 / e d i t e d bv V o l k e r O r a e ~ e .

p. cm. Includes bibliographical references. ISBN 0-444-82250-X I. Robotics--Congresses. 2. Inte111gent control systems-Congresses. I. OraeFe, Volker, 1938. II. Title. TJ210.3.I447 629.8'92--dc20

1994a 95-36630 CIP

ISBN: o 444 82250 x 9 i995 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 52i, iooo AM Amsterdam, The Netherlands. Special regulations for readers in the U.S.A. - This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA, oi923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U.S.A., should be referred to the copyright owner, Elsevier Science BV, unless otherwise specified. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands.

PREFACE In September 1994 the seventh International Conference on Intelligent Robots and Systems- IROS '94-- was held in Neubiberg, a suburb of Munich, Germany, on the premises of the German Armed Forces University Munich. Of the 300 papers presented during the conference I have selected 48 ones which, in my subjective opinion, are particularly significant and characteristic for the present state of the technology of intelligent robots and systems. This book contains the selected papers in a revised and expanded form. Robotics and intelligent systems constitute a very wide and truly interdisciplinary field. The papers in this book have been grouped into the following categories: Sensing and Perception Learning and Planning Manipulation Telerobotics and Space Robotics Multiple Robots - Legged Locomotion - Mobile Robot Systems Robotics in Medicine

-

-

-

-

-

-

They cover, however, many additional fields, e.g. control, navigation, and simulation, to mention only a few. When preparing the conference, and this book, I found that most researchers in robotics are now apparently interested in some combination of learning, mobile robots, and robot vision. Consequently, most of the articles in this book relate to at least one of these fields. A book like this is always the result of many persons' efforts. Special thanks are due to Mr Michael LiJtzeler who helped with the compilation of the book by creating and maintaining various data bases, and to my secretary, Mrs Sigrun Haussmann, who among other things, helped some of the authors to improve the style and grammatical correctness of their texts. Thanks also to Ms Drs. Eefke Smit &Elsevier Science BV for competently and patiently managing all matters related to the publishing of the book.

Neubiberg, March 1995 Volker Graefe

This Page Intentionally Left Blank

vii

TABLE OF CONTENTS

Preface

SENSING AND PERCEPTION

A new High-performance Multisonar System for Fast Mobile Robot Applications U.D. Hanebeck and G. Schmidt Proper Selection of Sonar and Visual Sensors in Vehicle Detection and Avoidance N. Moghaddam Charkari, K. Ishii and H. Mori

15

Selective Refinement of 3-D Scene Description by Attentive Observation for Mobile Robot H. Takizawa, Y. Shirai and J. Miura

29

Active Vision Inspired by Mammalian Fixation Mechanism T. Yagi, N. Asano, S. Makita and Y. Uchikawa

39

Realtime Range Imaging Using An Adjustment-free Photo VLSI A. Yokoyama, K. Sato, T. Yoshigahara and S. Inokuchi

49

A Pedestrian Discrimination Method Based on Rhythm S. Yasutomi, H. Mori and S. Kotani

61

Visual Recognition of Obstacles on Roads U. Regensburger and V. Graefe

73

Visual Collision Avoidance by Segmentation I. Horswill

87

LEARNING AND PLANNING

Using Perceptions to Plan Incremental Adaptions A.J. Hendriks and D.M. Lyons Robot Task Programming by Human Demonstration: Mapping Human Grasps to Manipulator Grasps Sing Bing Kang and Katsushi Ikeuchi Robot Learning By Nonparametric Regression S. Schaal and C.G. Atkeson

101

119

137

~ 1 7 6

VIII

Behavioral Control in Mobile-Robot Navigation Using a Fuzzy Decision Making Approach H.R. Beom, K.C. Koh and H.S. Cho Learning Emergent Tasks for an Autonomous Mobile Robot D. Gachet, M.A. Salichs and L. Moreno Efficient Reinforcement Learning of Navigation Strategies in an Autonomous Robot J. del R. Milldn and C. Torras

155

169

185

A Lifelong Learning Perspective for Mobile Robot Control S. Thrun

201

A Multilevel Learning Approach to Mobile Robot Path Planning F. Wallner, M. Kaiser, H. Friedrich and R. Dillmann

215

A Self-Organizing Representation of Sensor Space fi~r Mobile Robot Navigation B.J.A. KrOse and M. Eecen

229

Path Planning and Guidance Techniques for an Autonomous Mobile Cleaning Robot C. H ~ w r and G. Schmidt

241

MANIPULATION

Grasp Strategies for a Dextrous Robotic Hand K. Woelfl and F. Pfeiffer

259

Motion Scheme for Dextrous Manipulation in the Intelligent Cooperative Manipulation System - ICMS M. Buss and H. Hashimoto

279

Designing a Behavior of a Mobile Robot Equipped with a Manipulator to Open and Pass through a Door K. Nagatani and S. Yuta

295

The Development of Re-usable Software Systems for Intelligent Autonomous Robots in Industrial and Space Applications E. Freund and J. Roflmann

311

TELEROBOTICS AND SPACE ROBOTICS

Toward Integrated Operator Interface for Advanced Teleoperation under Time-Delay A.K. B~jczy, P. Fiorini, W.S. Kim and P.S. Schenker

327

Active Understanding of Human Intention by a Robot through Monitoring of Human Behavior T. Sato, K Nishida, J. Ichikawa, K Hatamura and H. Mizoguchi

349

Human/machine Sharing Control for Telerobotic Systems T.-J. Tarn, N. Xi, C. Guo and A.K. Bejczy

373

Task-Directed Programming of Sensor-Based Robots B. Brunner, K. Arbter and G. Hirzinger

387

Telerobotics with Large Time Delays - the ROTEX Experience G. Hirzinger, K. Landzettel and Ch. Fagerer

401

Feature-Based Visual Servoing and its Application to Telerobotics G.D. Hager, G. Grunwald and K. Toyama

415

Practical Coordination Control Between Satellite Attitude and Manipulator Reaction Dynamics Based on Computed Momentum Concept K. Yoshida

431

A 5-Axis Mini Direct Drive Robot for Teleoperation and Biotechnology M. Moreyra, P.-H. Marbot, S. Venema and B. Hannaford

445

A Laboratory Test Bed for Space Robotics: The VES II S. Dubowsl,y, W. DurJ~)e, T. Corrigan, A. Kuklinski and U. MOiler

463

MULTIPLE ROBOTS

Hierarchical Prediction Model for Intelligent Communication in Multiple Robotic Systems K. Sekiyama and T. Fukuda

477

Proposal for Cooperative Robot "Gunryu" Composed of Autonomous Segments S. Hirose, T. Shirasu and E.F. Fukushima

491

Unified Control for Dynamic Cooperative Manipulation K Kosuge, H. Yoshida, T. Fukuda, M. Sakai, K. Kanitani and K. Hariki

501

LEGGED LOCOMOTION Comparative Study of Adaptive Controllers for a Pneumatic Driven Leg M. Guihard

513

An Efficient Forward Gait Control for a Quadruped Walking Robot D.J. Pack and A.C. Kak

533

The Six-Legged TUM Walking Robot H.-J. Weidemann and F. Pfeiffer Vision-Based Adaptive and Interactive Behaviors in Mechanical Animals Using the Remote-Brained Approach M. Inaba, S. Kagami, T. Ishikawa, F. Kanehiro, K. Takeda and H. Inoue

549

559

MOBILE ROBOT SYSTEMS

Autonomous Sonar Navigation in Indoor, Unknown and Unstructured Environments W.D. Rencken

577

Control and Localisation of an Autonomous Mobile Vehicle M. Adams, N. Tschichold-Garman, S. Vestli and D. yon FlOe

589

The Robotic Travel Aid "HITOMI" H. Mori, S. Kotani and N. Kiyohiro

607

Vision-based Naviation for an Office Messenger Robot T. Gomi, K. Ide and P. Maheral

619

Trajectory Tracking Control for Navigation of the inverse Pendulum Type Self-Contained Mobile Robot Y.-S. Ha and S. Yuta

637

An Outdoor Robots System for Autonomous Mobile All-purpose Plattbrm T. Okui and Y. Shinoda

657

Mechanism and Control of Propeller Type Wall-Climbing Robot A. Nishi and H. Miyagi

669

Automated Guided Vehicles in Japan T. Tsumura

679

ROBOTICS IN MEDICINE Robotics in Medicine P. Dario, E. Guglielmelli and B. Allotta Planning, Calibration and Collision-Avoidance for Image-Guided Radiosurgery A. Schweikard, M. Bodduluri, R. Tombropoulos and J.R. Adler

691

721

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

A new High Performance Multisonar System for Fast Mobile Robot Applications Uwe D. Hanebeck and Giinther Schmidt Department for Automatic Control Engineering (LSR) Technische Universit~t Miinchen 80290 Miinchen, Germany A new type of actively sensing multisonar system has been developed for obstacle detection, localization and map updating in indoor environments. It comprises 24 individual intelligent transmitter and receiver elements arranged in a horizontal plane around a high-speed motion platform. The physical sensors are electronically configured to form various types of virtual sensor arrays for specific tasks and situations. A method for high-frequency firing compensates for the low speed of sound in air. Various schemes for intelligent evaluation of the array outputs provide highly accurate localization estimates. Experiments with the developed multisonar system in structured and cluttered environments demonstrate improved perception capabilities compared to state-of-the-art approaches. 1. I n t r o d u c t i o n

Ultrasonic sensor systems in mobile robot applications serve three major objectives: 9 detecting collisions with unexpected, possibly mobile obstacles, 9 localizing known environmental structures, 9 updating an internal map with previously unknown objects. Different approaches have been reported. One class of systems uses one rotating time-offlight sonar sensing device to obtain a 360 degree scan of the environment [4]. The width of the sonar beam causes a blurred image. Attempts to narrow the beam by phased arrays [2], [6] or horn constructions [3] have also been undertaken. Mechanical positioning makes these systems slow and inadequate for collecting data during fast motion. In another class of systems several (20-30) individual time-of-flight sensors surround a robot. Fixed firing schemes independent of mission and situation are employed [7], [13]. Undesired mutual influences of the sensors result in slow firing rates and make error correction algorithms necessary [1]. Crosstalk effects are eliminated by identifying and rejecting pulses received from other sensors. The fastest system known to the authors achieves a sampling interval of 60 ms for each sensor with a maximum range of 2.5 m [1]. The separate evaluation of sensor outputs results in an unsatisfactory angular resolution. This feature complicates many sensory tasks, like determining the vehicle's own position with respect to a landmark or the position of a moving obstacle. While the second class of systems tries to avoid crosstalk effects, another class takes advantage of these phenomena. Sparse sensor arrays are used and the characteristics of sound propagation are taken into account. This approach permits determination of the normal directions of walls [10], I12] or tracking

of moving obstacles as discussed in [8]. The need for mechanical aiming renders these systems too slow for the intended application. The sensor system discussed in this paper is used with an omnidirectional locomotion platform as shown in Figure 6. For this application, omnidirectional measurement capabilities are required. In addition, the system is expected to concurrently perform various sensory tasks around the mobile robot. For example, localizing of a wall is performed while reliably scanning the current travelling direction. Data has to be provided at a sufficiently high rate to guide a fast (2 m/s) mobile robot with steering commands generated every 10 ms. The proposed system which adopts useful aspects of today's sensor systems is discussed in detail in Section 2. A method to alleviate complicated cross-talk error correction schemes is presented. High-frequency firing schemes compensate for the low speed of sound in air. Furthermore, it is shown how the system is dynamically configured during operation. Section 3 presents algorithms for extracting several relevant environmental features with the proposed system. Section 4 introduces a current prototypical implementation. Experiments with the multisonar system mounted on a fast motion platform are described in Section 5. Compared to state-of-the-art systems the new approach gives improved perception capabilities and simplifies subsequent algorithms like obstacle detection and localization. The techniques presented in this paper apply to what is called conformal arrays in radar technology, i.e. sensor arrays which conform with a given vehicle geometry. 2. S y s t e m O v e r v i e w The proposed multisonar system is designed for operation in office or laboratory type indoor environments. Unknown objects like humans, chairs or trash cans must be detected in a wide surveillance area around the mobile robot even during fast motion. The estimate of the obstacle positions should, nevertheless, be accurate enough to avoid blocking of narrow passages. Section 2.1 details on a method that allows combined firing of many sensors while avoiding cross-talk errors. We assume that static structures such as walls, edges and doorframes are stored in a 2D map as natural landmarks. These a-priori known features are used for localization estimation. A fast-firing method that takes advantage of this prior knowledge is introduced in Section 2.2. The sampling rate is increased far beyond the limits usually imposed by the low speed of sound in air. To allow for several tasks to be performed concurrently, only the most appropriate subsets of sensors are used. This form of dynamic resource management is discussed in Section 2.3. 2.1. M e t h o d of V i r t u a l Point Source Moving with high speed in a cluttered hallway requires early detection of the closest unknown obstacles and the determination of their positions. Usually several sensors are used separately for this purpose. Disturbing crosstalk effects require elaborate error correction schemes. A more appropriate configuration for this application uses one single transmitter with the desired beam form and a receiving array configured of several receivers in order to evaluate incoming echos as reported in [8]. A typical surveillance area is shown in Figure 1 a). Since the objective is to use only transmitters located on the

Figure 1. a) Typical surveillance area. b) Combining physical transmitters to form a virtual transmitter. vehicle surface, this beam shape cannot be achieved with a single element. A virtual transmitter with the desired characteristics is obtained by appropriately phasing several adjacent transmitter elements with the main axis intersecting at the origin, Figure 1 b). For developing an adequate firing scheme, we assume a virtual circular wave which is starting from the origin. When the virtual wave front passes a physical transmitter, an ultrasonic pulse is fired. For M transmitters involved, an index vector I can be defined such that CIi < CIi+l, i = 1, 2, . . . , M -

1,

(1)

with Ci the distance from transmitter i to the origin. The time differences ti between firing transmitters Ii and/i+1 are given by ti =

CIi+~ - CIi V~

, i-

1, 2, . . . , M -

1,

(2)

with V~ the velocity of sound in air. This idea result,, in the following firing scheme: FORi=I,2

..... M-1

~

Fire S e n s o r I i Wait t~ Fire S e n s o r I M

! i

Let Tj denote the time difference between firing of the first sensor 11 and detection of an echo at receiver j, j = 1, 2, . . . , N, where N is the number of receivers. The distances Rj from virtual transmitter to the object and back to receiver j are then given by n j : T j . Ys -Jr-CI1, j • 1, 2, . . . , N . (3)

Figure 2. Fast firing

combining windowing and interlacing.

The number M of neighbouring transmitters employed depends on the desired beam width. Since the physical sensors' main axis intersect at the origin, superposition of the pulses r o u g h l y a p p r o x i m a t e s one pulse of a single centrally located virtual transmitter. Since we assume only one single pulse stemming from the origin, no decision on which physical transmitter produced the pulse detected by an individual receiver is required. Rotation of the beam to another focus of attention is achieved by electronically shifting the sensor arrangement. 2.2. Fast F i r i n g It is common engineering practice to focus on a certain interval around a nominal or predicted measurement value. For example, prior knowledge from a map could be used in conjunction with an estimated position to a-posteriori discard inconsistent measurements. However, we propose to go one step further and use prior knowledge at an earlier stage, i.e. during the data gathering process. Echos are received only in certain precalculated time-slots. This scheme is also applicable to reflectors not represented in the map. In this case, for windowing the distance predicted from previous data is used. Missing an obstacle occluding the expected reflector is impossible because the firing scheme is changed when the predicted time-slot is empty for several times. Due to the low speed of sound in air, the elapsed time between transmitting a pulse and receiving the echo is approximately 3 ms per meter travelled. With longer distances, interlaced transmitting and receiving becomes more and more appropriate. Thus, windowing and interlacing are combined in a highly effective manner. After initializing with a standard firing method, i.e. determining the expected echo return time T~, we receive the echo from the previous pulse after the next pulse has been transmitted. A window of length Tv + TN, with Tv = TN, is centered around the expected time of arrival. The most straightforward scheme for one sensor is shown in Figure 2. The pause time Tpkus~ is given by

Tpkus~ - T~ - 2. Tdead- 2 . Tv -- T N .

(4)

The expected echo return time Tk is updated with the knowledge from every new sample. Td~d corresponds to the dead zone of the transmitter/receiver-combination. Tv = TN = 0.5 ms and Td~ad = 1.5 ms results in the firing rate 1/T~ with

T~ = T E - T d ~ d - Tv

~

d - - " 6 m s - 2 ms. m

(5)

N "1-

)

'~--~ 4 5 5 .i....,

250 C U_

loo i

45!

. . . .

~

t + _ _ • L

i ..........

!....

0.7

i

1

*

2

>

4 Distance d / m

Figure 3. Firing rate for fast firing of first order as a function of distance.

The dependency of the firing rate on distance d is shown in Figure 3. The order of the fast firing scheme is defined according to the number of pulses interlaced in the pause times. With simple first order fast firing, sampling frequencies above 100 Hz are achieved for distances of up to 2 m. Thanks to windowing, the scheme is invulnerable to noise and secondary reflections. It is applicable for d > (2. Tdead -4- 2" Tv + TN)" V~/2, in this case approximately 70 cm. If the tracking process breaks down because the object is moving too fast, the system restarts with initialization measurements. Fast firing of higher order is achieved by interlacing more pulses during the pause times.

2.3. Dynamic Reconfiguration The 24 transmitter and receiver elements arranged on a horizontal plane around the fast mobile robot constitute a homogeneous multisonar system. They are, however, combined to a variety of virtual sensing arrays depending on tasks and situations. This type of operation is similar to a concept described in [5]. Since virtual sensors are aimed electronically, there is no need for moving parts. All virtual sensor arrays consist of N arbitrarily distributed receiving elements. W h e n the angular region of interest is wide, a virtual point source combining several physical transmitters is used to illuminate the scene. On the other hand, when prior knowledge on reflector positions is available, the angular region of interest is narrow and one physical transmitter may suffice. 3. O b j e c t L o c a l i z a t i o n Based on the concept of dynamic reconfiguration, virtual sensors are formed from several physical transmitting and receiving elements. In general, we assume a virtual point source located at the vehicle's origin and an array of N arbitrarily distributed receivers. For the sake of simplicity, only two types of reflecting surface primitives are considered here: i) line reflector ~ walls, objects with small curvature, ii) point reflector --~ edges, doorframes. Both reflectors represent classes of real targets within the 2D sensor plane.

Figure 4. Measurement model for a) line type reflector b) point type reflector.

Given a reflector hypothesis and N measured distances Ri, i -- 1, 2 , . . . , N, an advanced object localization scheme should 9 allow for arbitrary receiver distribution, 9 provide position estimates with high angular accuracy, 9 discard inconsistent measurements, 9 quantify the estimates' uncertainty, 9 use redundancy: N receivers -~ N equations, 9 be valid for near/far-field reflectors, 9 be simple, linear, and non-iterative. The derivation of schemes that fulfill the above set of requirements is given next. 3.1. Line R e f l e c t o r Assuming specular reflection and introducing the mirror image of the transmitter, the following relationship can be obtained from Figure 4 a). R 2 - (C[ - 2x) z 4 - ( C ~ - 2y) 2 , i - 1, 2 , . . . , i

(6)

with c; = c,

= e,

(7)

With r 2 = x 2 + y2 the set of Eqs. (6) is rewritten in vector/matrix form

z+4r21-H'(X)+e_ y

(8)

with

c~ - n~

c~ ci'

c~-n~ z

,H

-

4

1

C~ C~

-

.

c ~ - n~

.

,

1

-

c~ c~

.

(o)

,

i

where e denotes an error vector. In a first step, x, y and r 2 are assumed to be independent variables. Then a weighted least-squares estimate of 3, 9 in terms of 2 2 is given by 9

- ~+4

~

(10)

with a

-

a.z,

G

-

(HTE-IH)

(11) (12)

~-G.1 -1. H T. E -1.

E denotes the weighting matrix. With the relationship ~

,

(13)

the following quadratic equation for 2 2 is obtained 16flTz~ 4 + ( 8 a T f l - 1)~z + a T a -

0.

(14)

Substitution of 2 2 back into Eq. (10) gives the desired solution. The optimal choice for E is derived next by adopting a set-theoretic point of view. The errors Ai in the measurement distances R{, i - 1, 2 , . . . , N, are assumed to be located inside an ellipsoid defined by a symmetric, positive definite matrix SA, i.e. (A,, A2, . . . , AN) S~' (A1, A2, . . . , AN) T

lana's a l)l)r()a('h [5] is s(mwwhat (lose to ours" It(' has tal:('n th(' m()ti(m int() a(-(-()lmt 1)y al)l)lying F F T and its 1)(uio(li('ity against tim('. A new ln('thod w(' 1)lOl)OSC in this 1)al)(U t(,lls 1)('d('st:ians h o m oth(u ln()ving ()l)j(,('ts 1)y utilizing two tyl)CS of 1)('rio(li('ity uni(lll{' t{) th(,m. Th(, first ()1:(' is the pitch (): 1)a('(, of walking. \V(' shall ('all this 1)(':i()(li('it)' against tim(' "t(unl)()ral-fr('q~:('ncy". Th(' oth('r tyl)(' is st ri(l(' of walking whi('h wc ('all "Sl)atial-1)('rio(l"" it is a n('w ('on('('l)t a(l(l('(l t() th(' apl)roach by Polana. The pace of walking at natural Sl)r has its o.wn Ul)l)(U' and low(u limits. So has the walking stride. Thus, t lwsc two featur(,s can represent all the motion of 1)c(h'strial~s. In order to ol)sc:vc t h('sc 1)('ri()di('iti('s in mono('ular images, w(' look at th(' intcmsity diffi'r('m'(' l)etwc(,l~ suc('essivc flam(,s. The l~e:'iodi(" ('hal~g(' of intel:sity l~rought al)out 1)y t h(' motion of t h(, f(,et can 1)(, ()l)s(uv(,(l and is ~s(,(l t() ('stimat(' t h(" t('ml~oralh'('qucnt'y. Tlw aml)lit~td(' of tlw os('illati(m will not aff'('('t th(' tcml)()ral-fr('(l~:(u~('y. ()m'(' the 1)a('(' aim dire('tiol: and distan('c of a m()ving ()l)j(,(-t a:(' knr th(, sl)atial-1)('ri(~(l ('an 1)e 1):'('(li('t('(l. Again, thcs(' 1):'o('c(hu('s a:'(' n()t aff'('('t('(l 1)y vis~al (listurl)am'(' as m('nti(m('(l a])ov('. Pedestrian (h,t(,('tio:~ 1)as(,d on t h(, a l)ove i(h'a is :('aliz('(l l)y tim,(' lm~('('ss('s: 1) m()vingol)j('ct (h't('ction, 2) the t ra('kil~g 1)l'O('('ss, 3) s('l('(ti(m l~y rhythln. In section 2, w(" (lcs('ril)(' t h(' m e t h o d of (l(,t('('ting a m()ving ol)j(,('t and its 1)()siti(m in successive h'alnc data. Then section 3 will l~(, (h,v(,t(,d to a tracking m(,th()(l l)asr on th(' kincmati(" model and ('onsi(h~latioll of its :vlial)ility. Cal('~:lation (,f t(,ml)oral-h('(l,:Cn(y and sl)atial-p('riod as wcll as a mod('l mat('hing t('('hniq~:c ::,ill 1)(, Dilly lnVs('nt('d in s(,('ti(m 4. The CXl)(Uiln('ntal results ol)tain(,d (m an asphalt lmV('d roa(l at(' (h,s('ril)(,(l in s(,('ti(m 5. Th(, final section gives some conch:(ling r(una:ks.

2

MOVING

OBJECT

DETECTION

A moving ol).jcct ('an 1)e detect(,d in m a n y ways such as sul)tra('ti(m of su(('(,ssiv(' frames [6], sul)tra('tion from background imag(', ol,ti('al flows [7] and so on. Sin('(" th(, sul)tractioll of su('c'r frames can 1)e dolw in the sh(n't(,st 1)ro('cssing t inw, w(, al)l)li('d this at the l~cginning of the data pro(cssing. The detail('d dctecti(m ln'O('cdm(' is as folh)ws. 1. Th(' illt('nsity ('hang(' 1)('tween su('('('ssiv(' flam('s is ('ah'ulatc(l and its al)solut(' vah:(,s arc t ransferr('d to the following lnOc('ss as sh()wll ill figlll(' 1. 2. To d(:'fin(" the moving-ol~jcct region a('('urat('ly, a vertical ('(lg('s arc sear('h('d fin and d a t a l~oints :cmoved whi('h hay(, h'ss int(,nsity than a t hrcsh(~M vahw. Th~,se lnOCeSs('s a.rc us(,ful bccaus(' imag('s ~)f"p(,d(,strians at(' mo:'~' likc,ly t.o hay(, vcrti~'al ('(lg('s than arc ()tlwr obj('cts. Th(' t hr(,sh(fl(l int(,nsiti(,s I/'t//el and Iv'nl~2 arc fix('(l so to cut off th(' 99 % of the intensity ('hange Inought h o m the 1)a('k-ground of t lw images. 3. To get th(' l~ott(ml position of t h(, ol)j(,(t :vgion, th(, v(,rti('al and h(n'iz(mtal lnoJ('('tions of the intensity a.rc ('al('ulatcd as shown in figure 2. The ol)je(,t-rcgion width

63

F i g u r e 1. D(,t(,ctit)n ()f a m o v i n g ol)ject

F i g l m ' 2. P o s i t i o n an(l wi(lth on t h e imag('

u'i(lt:xl(;(~.) is d(~t('rmined as t h a t of t h( ~ vcrti('al 1)roj(,('ti()n, a n d tit(, h o r i z o n t a l 1)osition u(A.) p o i n t s th(, (cnt(~r of th(' wi(lth. Th(' v('rti('al 1)()sition ~,(~.) is (h't('rmin(~(l as the 1)()ttt)m of the 1)rojti~(:tioll so t() fi)(.us 111)t)n th(, m()ti(m of t ht ~ t'eet. St'e figurt 92 t'()r (l(,tails of tht' r(qations 1)(~twt~en u'i(//.xtc,.(z.), u(~.) a n d ~'(~.). Her( ~ th(" in(l('x L" s t a n d s for tht, tim(, at whi('h th(' flamt, is ol)s(,rv(,tl. Th( ~ 1)ro('t'ss('s list t'd al)ov(, art, 1)el'fOrln('(| ()vt, r t h(' wh()l(' t)f th(' imagt~s ol)tain(~(l at t h(" v('ry 1)(~ginning of th(, d a t a 1)roct'ssillg. H()w(,v(~r, if tht, 1)t)siti()n of th(, ()l)jt,('t ca.n 1)(" 1)redi('t(~d in tit(' t't)urs(, of t r a c k i n g , only th( , (lata in a s m a l l t l a ( k i n g w i n d o w ('Clltt,lt,(l Olt t h(, ol)j(,('t art, 1)r()('t'ss(,(l. T h i s grt'atly r(~thl('('s 1)r()('('ssing cost a n(l nois(' t'a.llst,(| ])y ()th('r m o v i n g ol)j(~('ts.

3. T R A C K I N G Thtu(, art, tw() k i n d s ()fmt, t ht)(l tt) realiz(' t r a c k i n g . Tit(' first is tt) s('t'k th(" mfitllu 9shal)(" of the t)l)j(-'(-t. Gill)(,rt t't al. [8] h a v e (l(,mt)nstrat(,(l this mt, th()d in t r a c k i n g an airl)lan( ~ using its twt)()rth()g()nal 1)rojt'ctit)llS. Howt, vt'r. this mt,thotl rt,tluircs t ht~ ol)jt,('t to 1)t, rigid an(l is nt)t suital)h' for ()1lr l m r p o s ( '. T h t ~ s('('t)n(l att('tho(l is l)ased ()n t ht' ldnt'mati('s of the ()l).j(,(,t. Lt, gtt,rs anti Y ( m n g [9] hart, a i)l)lit'(l a m('tht)tl whi('h us(,s kintunati(" mo(h,ls an(l t.ra('killg filters to the sinmlatt~(| scent'. Alth()ugh thi.s ln('tho(| is us(~ful a n(l suital)h' f()r ()ur goal, it is n()t 1)ow('rflfl (m()ugh. \V(' h()l)(' th(, t r a c k i n g will (-ontinu(, (,Vt,ll if t h(, m('asur(un('nt ('n('()untt'rs ()utli('rs. Th(' ilnl)r()vt'(l r(~('()v('ry svst(un for th( ~ t ra('king will 1)(, stat('d in sul)s(,('tion 3.2.

3.1. K i n e m a t i c s a n d M e a s u r e m e n t M o d e l B('t'or(' s t a t i n g th(' m('asllr(un(mt or (lata h a n d l i n g t('('hni(llt(', th(' s('tting ()f t h(, vi(h,() ('amtua sht)lfl(l l)(' ('xl)laint'(l. \\-(' assure(' it tt)1)(, a 1)in-h()h' c a m t u a an(l alignt'(l 1)aralh'l tt) th(' r()ad. T h i s makt's it 1)()ssil)h ~ to st't, a m a x i m l m ~ mmfl)('r t)f 1)edt'strians in a vi(,w. Tit(' horiz()ntal axis ()f" vi(,w is also 1)aralh'l tt) th(' ,gr()lln(| s() as tt) ltlak(' 1)(,d('strians hay(, m()rt ' vtuti('al ('(lg('s as (letaih~d in the 1)rt'vi()lls s(,('ti()n. L('t .~'(a.)and !I(L.) giv(' th( ' 1)()siti()n

64

Z 9

,,0

/'-'-'-'-'-'-'-'O ~- / Dl" " ",,

(~,,v(~)

.I

~ ~ 1 ',, ...--" )- ' 4 ~ ~ . / " , ,

ax,

~(X(k) ,y(k) )-Y

X"

Figure 3. ~Ieasurelnent model and ('OOl'(linat(' s3"stelns of a 1)edestrian at tim(' k. The state v(,('tor X(t.) is conll)osed of the position .r(~.) and y(k.) an(l their (lerivatives with respect to tilne .i'(/,,) and /1(i,.), i.e. velocity. Now we suppose a pedestrian to have an ahnost constant velo('ity. Then the estimated 1)osition of the pedestrian at time (k + 1)-Ys will 1)e

(i)

X(k+t) = FX(,.) + w(t.) where

X(t.)=

,i'(t.) y(k) !)(,.)

1 00

,F-

0 0 0

1 0 0

0 0 1 Ts' 0 1

,w(t.)-

i01 ,'t(t.) 0 ,"2(~.)

"

Here Ys is the nornlalized time interval and w(i,.) is calh'd system noise, w(t.) represents the fact that no pedestrian goes exactly straight, and it. is supposed to be a sequence with zero lnean, white noise and Gaussian with ('ovaiiallce Q. The Q is set to be 0.27Tm/s 2, as determined by experiment. The relation between the position in real-world coordinates with that in the linage plane Y(k) can l)e expressed using translation operator hit. ) as

(2)

Y(~'i = hIk'l(X(~'l) + z(k)

where Y(k)

=

~'(k)

, Zlk)

--

-'2(k)

9

The operator h(t.) defines the relative position of tile video camera and the ground (see figure 3) mid is read as

" fc :r(t.) uc + fc--Y(k) sin(O-Tr/2)+h cos(O-Tr/2) h(k.j(X(k.i) =

(3)

b fc (Y(t.) cos(O-Tc/2)+h s i n ( O - r / 2 ) ) fc-Y(k) sin(O-Tr/2)+h, cos(O-Tr/2)

65 where .f(:, h, and 0 correspond to focal length, height an(l depression angle of the ('amera. The optical axis is set at tile center of the image plane and its position is (u(,, vc). a and b are the aspect, ratios needed to correct the rati() 1)(,tween wi(lth and height of the image plane. Z(k) represents the measurement error and again is set to be zero mean, white noise and Gaussian with cova.rimlce R(k.). R(k) del)ends on the distance 1)etween pedestrian and tile video camera: the measurement noise becomes smaller when a 1)edestrian goes farther from the camera. Extended h:ahnan filter (EKF) is used to get the estimated state vector X(~.]~.) and predicted measurement Y(klk-1). X(klk) is the result of the incremental procedure of EKF. The predicted ~r(klk._l) points the center of the window at the next time step as is mentioned in section 2.

3.2. Checking the Validity of M e a s u r e m e n t The pedestrian may be hidden 1)3;other moving ol)jects and noise may lessen the validity of measurement. These facts reduce the reliability of measur('ment and will lead tracking to end in failure. For these reasons, we should make s~tr(, that observation is reliable. If the observation does not satisfy the following ('()nditions, th(' ol)serve(l l)osition Y(~.) is judged to 1)e wrong and should 1)e replaced with th(" 1)redi('ted 1)ositi()n ~r(~.l~._l). 1. Occlusion When the image of a pedestrian ix covered by another moving object, occlusion may arise. In this case, the 'mcas'urcd width of the object may not lie with in the range of the pedestrian width. So. th,c width of the object is a good measure of occlusion. The width of the object in real spacc wid(~.) is determined by applying an inverse perspective transformation to tt,i(11,~l(;(~.) at (u(~.), t'(~.)), and ch(:cki'ng by thc following condition. I'VI > u'id(k) > H2

(4)

where l't'l and H,~ dcnote the maximum and minimum widths of pedestrian respectively. .

Outlier When the noise cause an outlier in th,c measurement of position, the estimation may not be valid. We check the following condition for validity.

[X(k',/~'-1)-s

p(kl/,._l)-I[X(k',~,-l)-- s 2_

(5)

where ~( is a tentative estimation by the mcas'uTvmcnt "Y at I,. P(~I~-J) is the prediction error covariance determined by EKF. This condition means that wh,en the Mahalanobis distance between ~((k'l~'-l) and ~( is greater than 3.0 .) in the chisquare distribution of 4 degrees of freedom, then the measurement ~Y ix not reliable.

4. D I S C R I M I N A T I O N B A S E D ON R H Y T H M Tile rllythm of walking is caused 1)y a two-stage 1)il)edal action: first, a pedestrian stauds still for a relatively long time on both feet" second, oue of the feet steps forward rather swiftly. This is clearly seen in figure 4 as the 1)eriodi(' intensity change of the sul)tracted image data around the feet. In this section we will r the ways of placing a small window which is especially useful for detecting rhythln and ascribing it to walking.

66

Figure 4. Sul)tra('ted images at the fi,(,t of a 1)edestrian. Th(' ('(mtrast was enhan('ed fin. clarity

Fig~lr(, 5. \Vin(h)w fin d(,te(-tion of int(,nsitv ('hauge

4.1. D e t e c t i o n of R h y t h m A small win(low which in useful fin" (h,t(,('ting th(, rh.xthm ()f 1)('(h'strians in 1)1lt within the tra('king window as shown in figure 5. Th(' small win(low in l l l in wi(lth an(l 11~ in height. The actual values of 11, and ll~ will l)(, stated ill s('('ti(m 5. Th(' int('nsitv in this window will oscillate during walking lint th(' (listan('(' 1)('tw('en th(' camera and t h(, pedestrian aftb('ts its a nll)litude. Since the (listan('(' is inv(u'se 1)rol)ortional to the win(l()w size, the total int(msity within tit(" window in divide(l 1)3" the x,~:in(h)w size ill th(' imag(' plane. The norlnalized tilne se(tuen('c of total int(ulsity (~f"the window (l('ll()t,'(l by I(~.) in pa.rti('ularl.x,, suit al)le fl)r detecting the rh.x'thm. To calculate th(, 1)ower sl)e('trum P(.f) flOln I(~.) (1,: - 0 , . . . , / ~ ) for a d m a t i o n ()f :X' 9T.s. seconds, F F T is not slfital)le be('ause res(flllti()n is lilnit('d 1)y t.h(' numl)er of d a t a 1)r in a sampling interval. So, we use the n mxim, ml cntrol)y m(,thod (~IEM) [10] for its 1)etter fl'eqllency resolution. The final 1)l'('dicti(m Crl'()r in used to estimate the (n'(h'r 3I tbr the aut(~regressive lno(h'l for ~IE~I. The f'~uMaln('ntal t elnl)oral-fl(,(l~ten('y F of I(~.) ol)tain('(l in this way is related to the sl)atial-1)(ui()d T 1)y r -

D F - N . T,s,

(6)

where D iel)resents the distan('e of the ol)je('t at _Y. J-:.s'.

4.2. M o d e l M a t c h i n g Before 1)roceeding to the matchillg 1)roc('ss. a whit(, 1lois(, ('(mtl)(ment is r~unove(l flOln the power sl)e('truln ol)tained. Peaks slnalh,r than PrH/~ are set to 1hake no ~'ontril)uti(m. Walking a.t natm'al speed has its own limitatiollS fl)r th(' flmdamental t('ml)()ral-fl('q,Wll('y F and sl)atial-1)eriod T. Let F:~l.:l.\', F~ll:v and T.~I i.\', T~t/,x l)e the Ul)l)er and lower limits of these quantities. The dimensions of tiles(' 1)aramet('rs at(' presented in sul~section 5.1. The (letaih~d process in as follows: 1. To nornmliz(, the power Sl)ectrunl shal)e: a l l / } larg('r than P/'t11,' are smnme(l:

67 ,\:p

(7) /--1

H('r(' ~ is th(' p()w('r at th(, fl'('qll('ll('y f I (/ - 1 , . . . , A ~ , ) . If th('r(' is n() 1)()W('l ~/ gr('at('r than P~H~r no 1)('(lcstrian is in th(' win(low. S(), tit(, ('vahlati()n valu(' 5" is set to () and th(' 1)ro('('ss is lmltcd. 2. F()r ('a('h fr('(lu(ul('y _F!, th(' ('()rr('sl)()n(ling Sl)atial-1)('ri()(l T/ is ('ah'ulat('(l llsing th(' r('latioll 6. 3. The fr('qu(uwy F i is found at which Pi is a m a x i n m m and slwh that

F,~ttx < F/ < F.~t.~x, T,~llx < T / < T.~I ~.x-.

(s)

4. The evaluation valu(' E is t lw normaliz('(l 1)ow('r Pi, i.(,. E - P j / S . 10() 5. If E is grcat('r than t h(" giv(,n yah1(, E/'H t~, w(, (()n(ht(h ~ that th(u(' is a 1)(,(l(,strian in th(' window. ()n th(, oth(,r hand, if E is smalh'r than ErH/r t h(u'(" is n() 1)('(l('strian. ErH I~ is ('Xl)('rim(mtally s(,t so as t() nfininfiz(' th(, dis('riminati()n ('rr()r and th(, vahw is givcll in slll)s('ction 5.2. In th(, r(,al (mvir()nment, irwgular nois(:, will fr('(lll('ntly al)l)car whi('h has almost th(' sam(' Sl)c('trunl as that a ssigne(l t o 1)edestrians. This t('n(huwy is strong if' th(' ()l)s('rvati()n is ('arri(,d out fl)r only. a sh()lt. tiln(,. . To less(u1 th(, ('()ntril)lltion ('alls(,(l 1)v nois(', th(' . 11111111)o1 of o])s(uvations _V should l)e largc. In oth(,r words, the Salnl)lillg 1)(uio(l sh()uhl 1)(' long. Howcv('r, this will 1hake t lw d a t a 1)r()('('ssing tim(' l()ng. As a r(,sult ()i'a vari('tv of ('Xl)erim(ults. we t'oluld 64 d a t a 1)oints to 1)(, ('n()l@l t() lnak(, t h(' disturl)an(-(' smalhu than E/'Hfr 1)rovid('d only tw()('y('h:'s of 1)(u'i()(li(" ('hang(' wh()s(, fr(,(llWn(-y is th(, sam(, as that of walking, at(' a('cid(,ntally m(,rg('(l in th(, (lata.

5. E X P E R I M E N T S 5.1. E x p e r i m e n t a l S y s t e m The algorithm fi)r 1)('(l('strian dete(tion dcs('ril)(,(l in th(, 1)rcvious s('('ti()ns was t('stc(l using imag(' s('(lu(uwcs of an asl)halt paved road ()n th(' university ca nll),ls. Th(' vi(h'() c a m e r a S o ~ g C C D - TRI()00 was 1)la('('d at 1.()m h('ight, with 15.0 ~ (l('l)r('ssi()n angh'. In t lw RGB dcco(l(,(l vi(h'o signal, only tlw G (,Oml)()n('nt was us('d, whi('h has al)out 60(X: of th(" total hmfinosity. The A N D R O X ICS-400X~I9 imag(' 1)ro('('ssillg system ( C P U 68030, OS-9) was (Unl)loy('d with i b m digital signal 1)r()('('ss()rs. The saml)ling of th(' images and t la('king of the ('Xl)('rim('ntal s('t-ul) arc 1)r('s(ultc(l in tal)h,s 1 an(l 2. 5.2. E x p e r i m e n t a l R e s u l t s Two exanll)los of imag(, sequences arc shown in tigmc 6. Th(u(' are six 1)('(h'strians (Nil-El6), two 1)i('.x'('l('s (NIt, :~i~) and a dog (NI~)). The video ('anwra can look at ol)j(,('ts at distan('cs of 3.0 to 20.0m. Figure 7 plots t h(' cstimatvd traj('('tori('s and v(,lo('iti('s of t h(, ol).ic('ts. Th(' time illtcnsity an(l its power Sl)(,('trlml fl)l ('a(-h ol)j(,(t arc shown in figur(' 8. The int('nsity (l(,1)('n(ls on the a ctioll of walking (n ln()titm such as riding a 1)i('y('h,

68

Table 1 Parameters for the exl)eriments Threshold for moving object detection Number of intensity change data Order of AR model Threshold of power Pace model Stride model Pedestrian width model Ankle height model

Ir..i N.ll -

-

11 64

I r H R2 =

1?

15 P r H , =10.0

F.~/i,v = 1.50

F,~,.ux,

TMI,v =0.45

Tal..~x =0.99

ll~l= li,~--

14:2=

0.66 0.13

-2.60

Hz 111

0.13

Ill 111

Table 2 Examples of the standard deviation of measurement noise Distance frolll the video camera 3.0 5.0 10.0

20.0

m

/R(t:)ll

10.4

7.5

3.2

0.5

1)ixel

R(t,)22

4.9

3.9

2.2

0.7

1)ixel

o1" the running of a dog, as well as on the intensity diff('ren('(, l)etw(,en t lw ol)ject and the background. The pedestrians MI-.NI~i show 1)(,rio(lie change of intensity, but nonpedestrians lklr-M9 exhibit no periodicity. The estimated 1)ace, stride, evahmtion value and real pace for each object are talmlated in tal)h, 3. Th(' real pace for the pedestrialls is that of an average over 10 steps of walking. \Vith regard to non-pedestrian, pace and stride are estimated kom the frequency which has the maximum power. The estimated pace for non-pedestrian Mr is very (,lose to the lower ])omld of F, i.e. FM~A'. However, the stride is estimated to be about 2m which is not within the range of pedestrians (0.45-0.99m). So, we can tell Mr is not a pedestrian. Indeed, the evaluation value for 1~I7 is equal to zero.

The polmlation of the evaluation value E for 533 1)edestrians is shown in figure 9(a) and for 109 non-1)edestrians in figure 9(b). Of tht' 1)('d(,strians 82cZ~.wear trousers, 11% are in knee-length skirts or short 1)ants and 7(X, art' in long skirts. Of tht, non-pedestrians 807c, are bicycles, 18% (logs and 2c/(:. swaying 1)lants. Figure 9(a) shows that evaluation values E for 1)edestrians have a peak at the right side of the figure. On the other hand, for non-pedestrian there is a peak on the left. If the threshold value E'rHI~ is set to be 10, 94.0% of pedestrians and 95.40/(,, of non-pedestrians are corre('tly recognized. This m('ans that 94.2~: of all moving objects are correctly identified. The reasons for error will be summarized as follows. For pedestrians, 1) jittering of the observed images disturb the detection of the rhythnl of walking, 2) with swaying 1)lants often al)l)ear in the tra('king window, 3) walkers light-coh)red shoes diminish the contrast between the images of feet and the paved road. Reasons (1) and (2) are also true for n~ Jitters between successive fram('s is the main cause of error.

69

Figure 6. hnage sequences for.the eXl)eriments

6. C O N C L U D I N G

REMARKS

\Ve have I)roposed a new method for discrimination of pedestrians fl'om other moving objects based on the rhythm of walking. This method is achieved by the following three processes: moving-object detection in successive frames, tracking of moving objects using kinematic models and measurement applying extended 1,2ahnan filter, and searching for the walking rhythm in the power spectra of intensity of the images. These processes arc realized and executed in a realtime operation. The experiment has shown 94.2% of moving ol)jects to 1)e correctly , 6 0 t l , , , , , , ,

,

60

oo 40

c-

,

,

,

,

,

,

,

,

,

60

i

,

0 0

'

40

40

20

20

,

,

,

,

' 2

'

,

,

,

,

,

(D

__E 20 0"()

'

601_,

'

' :2 '

, 1, ,

,

'

' 4s

,

, ,

0 0

o~ 4020 0

2

(~[~)

4

i-i,

6

,a'

0

Hz

,

~ 60

--(oo-9 40 __E 20

20

;

O"

' I-I

' '

~-o ~ 1000I

' 2

'

'

'

'

'

'

'

'

;,s'

i'

2

,

'

0

-

4

(x[._,)

",'

,

6

Hz

'

,

i

(~[,~) '

, :

4

0

(3 Hz

'

'

'

'

'

'

'

4

0";

'

'

' 2 '

'

' ~,s I

60

.'.',,., 2

'

'

60

,

2

,

,

4

~ ,

(x[:~)

,

,

,

,

6

Hz

,

,

4s '

'

'

0

'

2

'

'

'

0 0

4~f,

0

2

4s

'

,-

4

'

I

I

t

O

6 Hz

(~[~-,)

'

'

'

I

I

I

2 ~-I

o

1000

140,

'

4s

100

20~

0

20

2011

I'

::4~

~

10I~,

2

,

40

0 0

A t

Oh--'b-'~

o~

0

, i

' 4s

1 24~

40

(D

60

60i,

'~176176I

601~ , , , , ,

>,

4s

2 0 0 0 ~ ~ ~ r ~

B_

0

2

60~,

2

,

,

4~

,

v 0

I

.

~

0

(~[.) ,

4

6 Hz

,

,

,

I

I

I

2 I

.I

I

I

,

4

1

4s I

J

11

&.

0

0

2

(~[:)

4

6 Hz

0

0

2

4

I- I

6 Hz

F i g u r e 8. I n t ( ' n s i t y ('haltgc a n d p o w e r Sl~(,('tllun ~ff' c a ( h ol~j(,(t

0

2

4

6 Hz

72

100 104 o

2oo7 1 94 OO/o

I [

100-

e-

t_ U.

0

10

50 Evaluation Value (E)

90 100

(a) Pedestrians

0 10

50 90 100 EvaluationValue (E) (b) Non-1)edestrians

Figure 9. Histograms of evaluation value

pp. 384-391, 1994. 2. G. Johansson, "S1)atio-Teml)oral Differentiation and Integration in Visual Motion Perception", Psych. tt,'s., Vol. 38, Pl). 379-383, 1976. 3. L.T. I(ozlowski and J.E. Cutting, "Recognizing the Sex of a Walker fi'om a Dynanfic Point-Light Disl)lay", Perception & Psychophysics, Vol. 21, No. 6, pp. 575-580, 1977. 4. R.F. Rashid, "Towards a. System for the hlterl)retation of Moving Light Disl)lays", IEEE Trans. Pattern Anal. Math. Intell., Vol. PAMI-2, No. 6, pp. 574581, 1980. 5. R. Polana, "Detecting Activities", Pro('. of IEEE Computer Soc. Conf. on Computer Vision and Pattern Recognition, pp. 2 7, New York, 1993. 6. M. Yachida, M. Asada and S. Tsuji, "Automatic Analysis of Moving hnages", IEEE Trans. Pattern Anal. ~[ach. Intell., Vol. PAMI-3, No. 1, pp. 12-20, 1981. 7. A. Shio and .I. Sklansky, "Segmentation of Peol)le in Motion", Proc. of IEEE Workshop on Visual Motion, pp. 325-333, 1991. 8. A.L. Gill)ert, M.G. Giles, G.M. Flachs, R.B. Rogers and Y.H. U, "A Real-Time Video Tracking System", IEEE Trans. Pattern Anal. Math. Intell., Vol. PAMI-2, No. 1, pp. 49-56, 1980. 9. G.R. Legters, Jr. and T.Y. Yomlg, "A Mathematical Model for Computer hnage Tracking", IEEE Trans. Pattern Anal. Math. Intell., Vol. PAMI-4, No. 6, pp. 583594, 1982. 10. D.F. Elliott (ed.), Handbook of Digital Signal Processing Engineering Applications, pp. 701-740, A('ademic Press, 1987.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

73

Visual Recognition of Obstacles on Roads Uwe Regensburger and Volker Graefe Institut ftir MeBtechnik Universit~it der Bundeswehr Mtinchen Neubiberg, Germany Fax (+49 89) 6004-3074

As part of a driver support system for motor vehicles on freeways an obstacle recognition system was developed within the EUREKA project PROMETHEUS over the last four years. The obstacle recognition system uses a single monochrome TV camera and a multi-processor robot vision system. The recognition system includes a road tracker and an obstacle detector that were described in previous publications, and an object classification and tracking module presented here. The module is based on generic 2-D object models. It recognizes obstacles in real time and from distances of 200 to 300 m, sufficient for high-speed driving. The module was extensively tested in real-world scenes on the German Autobahn and on various other roads, including city streets with heavy traffic.

1. INTRODUCTION A driver support system for road vehicles is a great challenge for the technology of machine vision. The goal is to relieve the driver from monotonous tasks, to warn him of imminent danger (for instance, if he is about to fall asleep), and - possibly- to assume control of the vehicle in precarious situations. The necessary technology is being developed in numerous laboratories all over the world; there is hope that it will eventually lead to a significantly increased traffic safety [Masaki 1992]. Among the characteristics demanded of such a system are high reliability, low costs, and independence of any special infrastructure. Within the European research project PROMETHEUS the Institute of Measurement Science has developed real-time vision technology that may be used for a driver support system [Graefe 1993 ]. Freeways were chosen as the principal domain for testing and demonstrating the visual recognition of objects that are relevant for the understanding of traffic situations. The reason for choosing freeways, in spite of the typically high driving speeds, is that the complexity of the traffic situations and the variety of objects are much lower on freeways than on other roads. Obstacle recognition is an essential subtask of a driver support system. (In the sequel "obstacle" refers to any visible object on the road in front of the ego vehicle that is, or could possibly become, a danger for the ego vehicle.) According to a concept for obstacle recognition proposed by [Graefe 1989] the task should be decomposed into several largely independent subtasks, or modules, including the initial detection of possible obstacle candidates, the classification of obstacle candidates into false alarms (caused e.g. by shadows), and real physical objects, the tracking of obstacles, etc.. A module for detecting obstacles in distances adequate for high-speed

74 driving was introduced by [Solder 1992] and [Solder, Graefe 1993], and road modules by [Tsinas, Graefe 1992] and [Wershofen 1992]. A multitude of methods for recognizing obstacles in monocular image sequences has been described in the literature. Essentially three different classes of methods have been used. Gestalt approaches, e.g. [Davies 1990], aim at a direct reconstruction of the 3-D world as a basis for object recognition. Other approaches, e.g. [Enkelmann 1990], [Carlsson, Eklundh 1990] and [Young, G. et al. 1992] evaluate displacement vector fields and the differences in image motion between points within the road surface and points above the road surface. Due to the large amounts of computation required by these methods, they have not yet been demonstrated in real time and, thus, do not appear suitable for vehicle applications. The third class of methods is based on object models. According to this approach objects are recognized by certain combinations of visible features. This may be accomplished in a number of different ways. As one example, [Moil et al. 1989] and [Raboisson, Even 1990] first segment the image. Irregularities within the road segment are then taken as indications for objects. Additional criteria (edges, color, etc.) are used to minimize the frequency of false detections, while additional sensors, such as laser range finders [Alizon et al. 1990] or radars [Young, E. et al. 1992], may be used to measure the distance from an object. [Korn 1989] bases the tracking of vehicles on histograms of edge orientations. [Seitz et al. 1991] use a hierarchically structured multi-layer representation of edge directions for building object models and recognizing objects in images. [Hartmann, Mertsching 1992] describe object images by a "hierarchical structure code." [Ktihnle 1991] and [Zielke et al. 1992] track vehicles on the basis of edges and symmetry. [Graefe, Jacobs 1991], [Efenberger et al. 1992], [Solder 1992], and [Solder, Graefe 1993] detect and track vehicles and obstacles on roads by comparing generic 2-D object models with edges detected in image sequences. [Thomanek, Dickmanns 1992] and [Schmid, Thomanek 1993] use spatio-temporal models for tracking objects.

2. THE CONCEPT OF OBSTACLE RECOGNITION An object-oriented concept for dynamic robot vision has been proposed by [Graefe 1989; 1992]. It is based on the notion that the world is structured, rather than monolithic, and consists of individual objects. Only a few physical objects in a robot's environment affect the control of the robot in any instant. Following this concept, the task of object recognition should, therefore, be structured according to the relevant objects. This is done by associating one individual object process within the vision system with each relevant external object. The following object processes are needed for recognizing obstacles on roads (Figure 1): 9 Recognition and tracking of the road (because only objects that are actually sizuated on the road are relevant as obstacles) 9 Detection of obstacle candidates (the corresponding object is either a hypothetical object that has not yet been found in the image, or an object that has just been detected) 9 Tracking of an obstacle; multiple copies of this process may exist in the system to allow multiple obstacles to be tracked simultaneously.

75

Video section

V Road recognition

V

0otection II tr. ,o0

V Obstacle

tracking

L _ _ '_--'-'-- . . . . . _; Situation assessment

Vehicle control

Figure 1 Structure of an object-oriented obstacle recognition system Each object process receives image data (pixels) from the video section of the vision system and continuously generates a description of its assigned object. All object descriptions are evaluated by the situation assessment module and combined into a situation description. Depending on the perceived situation, vehicle control commands or messages for the human operator may be issued. Step by step the information contained in the pixels is transformed into more and more abstract forms. Starting with a large number of pixels a chain of data fusion processes eventually leads to a description of an object or a situation. At each step specific knowledge in the form of appropriate models is introduced into the process (Table 1). At the first step, the feature extraction, groups of pixels are combined into features, such as edges or corners. Knowledge about characteristics of the camera and of the features being searched is used in this process. Feature extraction may be supported by the next higher level in the processing hierarchy, the 2-D object level. Here, knowledge about the visual appearance of specific physical objects is used to generate hypotheses regarding the nature of the object being seen and expectations as to which of its features should be visible, and where in the image they should show up. Feature extraction is thus not a form of schematic image processing, but part of a model- and knowledge-based hypothesis verification or falsification process; its task is to provide answers to very specific questions as to the presence and location of certain expected features in the image. A very robust image interpretation may be realized by such an interaction between the feature extraction and the 2-D object levels. It is possible to add additional 3-D or 4-D (spatio-temporal) levels if a description of the 3-D shape of physical objects, or of their motions, is desired.

76

Table 1 Levels of Modeling and Data Fusion Model Level

Data Fusion Performed at that Level

groups of pixels feature

-, elementary features complex features

2-D object

groups of features 2-D objects

4-D object

environment and situation

2-D objects, time, sensor data 4-D objects robot state, external objects -- situation

3. THE FALSE-ALARM PROBLEM

For reasons of safety an obstacle detection system must be perfectly reliable in the sense that it never fails to detect any obstacle that is actually present. Moreover, in order to allow a timely reaction, obstacles must be detected while they are still far away. These requirements make it practically unavoidable to generate false alarms occasionally, caused, for instance, by shadows, dirt, markings or patches on the road surface. Although false alarms are not directly dangerous they must be processed and, thus, constitute an unnecessary load for the vision system. To minimize this load, false alarms should be identified and discarded as quickly as possible. Distinguishing between actual obstacles and shadows is easy in many cases, but sometimes it can be extremely difficult. We use a set of heuristics to reject most shadows within the detector module as quickly as possible. Therefore, all alarms reported by the detection module are classified by matching the visible features that have been detected in the image against generic 2-D models of obstacles. If a fairly good match is possible, the obstacle candidate is accepted as an actual obstacle. This decision is final. If, on the other hand, the detected features do not match the expectations derived from the model, the obstacle candidate is considered a possible false alarm. A final rejection, however, does not occur at this point. Rather, the detector is requested to report the same obstacle candidate again if it can be detected in one of the subsequent images. Due to the motion of the vehicle the subsequent images will differ from the previous ones in respect of perspective, specular reflections, occlusions, etc., and, accordingly provide an independent chance to recognize an actual obstacle if there is one. A final rejection thus occurs only if an obstacle candidate has been classified many times and has not been accepted even once.

77 If the time required for one cycle of detection and rejection is short, many independent images may be analyzed while approaching an obstacle candidate. The probability of rejecting an actual obstacle is then very small. On the other hand, it is possible that once in a while false alarms are accepted as an obstacle. They must be rejected during tracking.

4. GENERIC 2-D OBJECT M O D E L S 4.1 Effects of Distance on the Visual Appearance

As Figure 2 shows, the visual appearance of an object depends strongly on the distance from which it is seen. Moreover, for a given camera and object the area of the object's image is inversely proportional to the square of the distance. Images of nearby objects are clear with large contrast and many visible details. Many features may, therefore, be detected in such an image. A lateral motion of the object causes a significant motion of its image. Images of distant objects have a low contrast and do not display much internal structure. Their contours tend to be somewhat blurred. Because of their small size and low contrast hardly any internal features can be detected. Rotation of the camera is the dominating cause for motions of objects in the image. In an intermediate range of distances the contours of objects tend to be clearly visible, but the object images do not have much internal structure. Motions of the camera and of the objects have similar effects on the motions of the image. Because of these differences specific models are used to model the appearance of physical objects, depending on their distance. The models include a number of free parameters in order to let them match objects of different shapes. Therefore, they are called generic object models.

4.2 The 2-D Model for the Nearby-range

Most of the obstacles that exist on freeways display a nearly rectangular contour with vertical and horizontal sides when seen from an approaching vehicle. Therefore, parts of rectangular contours are searched in the image.

Figure 2 Different 2-D object models (top) take into account the significant differences in the appearance of nearby (left) and far away (right) objects.

Characteristic points on the contour that may be used for an object description are the corners or the centers of the sides of the rectangle. The upper part of an obstacle is often seen before a structured background, such as trees, mountains, or other vehicles; this makes it difficult to locate the upper part of the contour in the image. Usually numer-

78 ous horizontal edges are visible, and it is not easy to determine which one is the upper edge of an obstacle. The lower part of the contour is usually seen before the surface of the road and may, therefore, be detected and tracked more easily. Recognizing only the left, right, and lower edge is sufficient for tracking the object in the image and for measuring its distance. Therefore, only these edges are looked for in the image. Nevertheless, the large number of features that are often visible in the images of nearby objects makes it difficult to single out those belonging to the contour of the object. A statistical approach was chosen to solve this problem. The image of a typical object contains a number of homogeneous areas, delimited by approximately vertical edges. Many of these homogeneous areas begin, or end, near the le~ and right contours of the object. In an area of interest that encloses the image of the object candidate, the le~ and right ends of homogeneous areas, and their centers, are counted in each image row. Local maxima of the distribution of these features indicate a possible location of the object contour and its center line. Combining these yields strong expectations where the contours of the object should be found in the image. Vertical edge detectors based on controlled correlation [Kuhnert 1986] are then applied in the vicinity of the expected contours to find the exact locations of the contours. Generally, the lower edge of an object is distinctly visible. In case of vehicles the lower edge of the shadow underneath the vehicle is taken as the lower edge of the object. It may usually be detected reliably and robustly because the shadow appears much darker than the pavement. 4.3 The 2-D Model for Distant objects

Images of distant objects are typically small and of low contrast (Figure 2). Only few features may be extracted; therefore, a statistical approach is not feasible for recognizing such objects. The left, right, and lower edges are detected directly by applying suitable edge detectors. Due to the large image motions induced by unavoidable camera motions, the tracking of distant objects would not be robust if it were based only upon a few local edge elements. Using additional features, such as corners and symmetry, improves the robustness. In contrast to the near range where, due to the high spatial resolution of the image, object contours rarely contain any sharp corners, corners may be detected robustly in the far range. Specific corner masks are used for detecting the lower left and right corners of objects by the method of controlled correlation. The images of many distant objects, including the rear-views of vehicles, have a vertical symmetry axis. Symmetry is, therefore, included as an additional feature in the object model. In the near range symmetry would be difficult to detect because of the many visible details, and finding a symmetry axis would be time-consuming due to the large areas in the image that must be processed. In the far range, where object images are small and do not contain many features, symmetry detection is fast and robust.

5. DETECTING VERTICAL SYMMETRY AXES Numerous methods for detecting and exploiting symmetry have been proposed. If object contours have been determined and converted into B-splines, symmetries may be extracted according to [Saint-Marc, Medioni 1990]. A method for detecting symmetries of polyhedrical objects has been described by [Jiang, Bunke 1990]. Shape recognition based on symmetry was introduced by [Yuen 1990]. [Friedberg 1986] determined symmetry by computing a moment matrix. A known

79 symmetrical object may be detected by convolving its image with a mirror image of a model; atter detection it may be localized by convolving it with its own mirror image.

Figure 3 Pairs of pixels, centered around a pixel, x, are evaluated. For each pair the difference of the grey-levels is computed. The absolute values of the differences are added up. Their sum is the symmetry function at the center pixel x.

All these methods presuppose a good segmentation of the image (e.g. into object and background) or have been tested only on synthetic images or line drawings. Moreover, they require computing times of tens of seconds.

Methods that are suitable for real-time and real-world applications have also been proposed. [Zielke et al. 1992] determine for each pixel a symmetry measure, based on the even and the odd part of the symmetry function. A symmetry-based statistical approach has been used by [Ki~hnle 1991] to determine the center of the image of a car as seen from the rear. Histograms of symmetrical contour pixels, of symmetrical grey-level pixels, and of the lengths of horizontal edges are computed; local maxima in all three histograms indicate candidates for a vertical symmetry axis [Marola 1989]. In the sequel a method is described that detects individual symmetry points and then determines image columns containing a maximum number of symmetry points. For finding a symmetry point the even part of the symmetry function is determined. To accomplish this the similarity of greylevels of pairs of pixels, located symmetrically to the left and to the right of a point in question, is studied (Figure 3). The absolute values of the differences of the grey-levels of each pair of pixels are added up and assigned to the pixel in question. A symmetry function is obtained by repeating this procedure for all pixels within an area of interest. Local minima within a row of the symmetry function indicate symmetry points; in Figure 4 the symmetry points have been marked. Figure 5 shows the corresponding symmetry function, computed over a 40 pixels by 40 pixels area of interest around the vehicle. When symmetry points are searched, the width of the environment upon which the symmetry function is based should be somewhat larger than the width of the object. However, if the

Figure 5 Figure 4 Symmetry points determined from 12-pixel wide environments

The symmetry function in an area of interest covering the car in Figure 4; each graph corresponds to one image row.

80 environment is chosen too large, many background pixels are included in the computation. A strongly asymmetrical background may then severely distort the symmetry function.

6. EXPERIMENTAL RESULTS 6.1 Test Environment

A real-time vision system BVV 3 [Graefe 1990] was used for testing the obstacle recognition system. Each of the 3 sub-processes, road recognition, obstacle detection, and obstacle tracking, was implemented on a separate parallel processor within the B VV 3. The obstacle recognition system was designed to cooperate closely with other modules of a driver support system, for instance modules for modeling the dynamics of the recognized obstacles. Such a cooperation has been tested earlier and described by [Dickmanns, Christians 1989] and [Regensburger, Graefe 1990]. The tests reported in the sequel were, however, performed without any additional modules. The results may, therefore, be directly attributed to the modules described here. The situation assessment module (cf. Figure 1) was replaced by a message passing system that allowed communication between the recognition modules, but did not perform any additional processing. 6.2 Recognition Distance

The ability to recognize obstacles from a great distance is of critical importance for driving at high speed. Experiments have shown that cars (moving and resting) can be tracked in distances of about 200 m; under cooperative conditions, e.g. high contrast and little camera motion, cars could be tracked in distances of over 300 m. As Figure 6 shows, a car in a distance of 300 m appears rather small in the camera image. The obstacle tracking module does not include any distance measurement. The distance from the obstacle was, therefore, measured by external means. Visible markers were placed on the pavement in known intervals. As this was impossible on a public road, the runway of a closed airfield was used for the experiments to measure the maximum recognition and trackingrange,

Figure 6 Passenger car seen from a distance of 300 m through the camera of the vision system, with overlaid markings showing the detected contour and symmetry axis

6.3 Robustness

Although the obstacle recognition system was designed for freeways, due to the model-based tracking module it proved to be so robust that it was also tested on ordinary highways and furthermore in heavy city traffic. There, other cars, as well as the own car, changed lanes frequently causing time-varying occlusions, and together with structured backgrounds and markings on the pavement generated a high degree of visual complexity. Even in such difficult scenes (Figure 7) vehicles could be tracked reliably over extended periods of time.

81 The 2-D object models employed by the recognition system assume a nearly rectangular object contour and they have been optimized especially for cars and trucks. Nevertheless, it could be shown that various other objects that may occasionally be seen on roads, such as boxes, barrels and, in some cases, persons, may also be classified and tracked. However, tracking such objects that do not match the generic models well is less reliable, especially under conditions of poor lighting or when they are far away and their images are small. Increasing the repertoire of different object models in the system should further improve the system performance in this respect.

6.3 Speed Figure 7 The computing time required by a module of Tracking a vehicle in heavy city traffic a real-time vision system is always of prime importance. The cycle time of the object classifier and tracker was, therefore, measured in a number of experiments. One parallel processor of the robot vision system BVV 3 using an Intel 386/20-MHz CPU, augmented by a special co-processor for fast feature extraction [Graefe, Fleder 1991], was used as a hardware platform in the experiments. The basic cycle time of the vision system is the image field period of the camera's video signal, i.e. 20 ms. Thus, all cycle times are integer multiples of 20 ms. Tracking a distant object requires always less than 20 ms of computing time; the cycle time is in this case 20 ms. The time required for tracking a nearby object depends on the size of its image (see Table 2). On freeways the cycle time is usually between 40 and 60 ms. Only when the distance to the preceding vehicle is exceptionally short, e.g. in stop-and-go traffic, a cycle time of 80 ms may be observed. In such situations it would be preferable anyway to evaluate the image of a wideangle camera instead of the telescopic image used, because the lower edge of the vehicle may not even be visible if a vehicle in such a short distance is imaged by the telescopic camera.

Table 2 Cycle times for tracking nearby-objects Object width

Cycle time

< 45 pixels

40 ms

45 - 90 pixels

60 ms

90 - 120 pixels

80 ms

82 6.4 Reliability of Classification

Parameters such as scene complexity or image quality that are essential for judging the performance of the tracking module are difficult to define quantitatively. Therefore, long image sequences covering a wide variety of situations were evaluated. A 20-min run on a freeway may be used as an example. It included 60,000 images processed in real time without any preselection. Goal of the classification of obstacle candidates is the discrimination between false alarms and actual obstacles. It should be remembered that primarily it must be avoided to ever mistakenly reject any actual obstacle as a false alarm. (The other type of misclassification, accepting a shadow as an obstacle, is much less dangerous.) Table 3 shows the results obtained during the test. All objects that were reported by the detector were correctly classified, and only a very small fraction of the system resources were wasted tracking false alarms. Table 3 Results of object classification during a 20-min run on a freeway

Number of actual objects misclassified as false alarms Percentage of false alarms that were correctly rejected

0

immediately

67%

within a few seconds of tracking

33%

Number of final misclassifications

0

Fraction of time spent tracking false alarms

1.3%

6.5 Quality of Tracking

Because the true motions of the objects tracked in natural scenes were not measured, it is difficult to determine the quality of the tracking, or the accuracy of the localization of object features, quantitatively. Figures 8 and 9 are intended to give at least a qualitative impression of the quality of tracking. The ranges of applicability of the two models, for the near range and for the far range, overlap to a large extent. This allows a direct comparison between the two models by using an object in the overlap range. In each of the Figures, 8 and 9, the left graphs show the image coordinates of the two sides of a car driving in front of the vehicle with the camera. The right graphs show the width of the car's image in pixels. The scene was recorded on video tape while driving in a car, and then evaluated in real time in the laboratory, using the methods for the near range (Figure 8) and for the far range (Figure 9) on the same analog data. Both methods yield nearly identical results. The method for the far range yields slightly smoother curves (in both cases raw, unfiltered data are shown). This is probably due to the robust localization of the lower corners. Altogether both curves indicate that the unfiltered measured data are consistent and free from significant outliers, in spite of the image deterioration caused by the video tape recorder. This may be taken as evidence for a robust tracking.

83

image column 240

/

width of object (pixels)

"'-~ ,,,~_

"~--- I " ~

60

200

-- _ ---"---

\ 40

160 , 120

20 0

100

200

300 400 program cycles

0

100

200 300 400 program cycles

Figure 8

Results of tracking a car in an intermediate distance range using the model for the far range (1 program cycle is equivalent to 20 ms)

image column

width of object (pixels)

240

200

F'-

6(1 ~ ' ~

9, - , - ~

n r -

-

4()

160 /

"~1.. " ' , v ~ ~

120 0

100

200 program cycles

2C

0

100

200 program cycles

Figure 9

Results of tracking a car in an intermediate distance range using the model for the near range (1 program cycle is, on average, approximately equivalent to 40 ms)

7. SUMMARY As an essential part of a driver support system for road vehicles a module for the robust classification and tracking of objects on roads has been realized. Obstacles are recognized in real time and from great distances (200 to 300 m) as required when driving at high speed on freeways. The obstacle recognition module was originally designed for the relatively simple scenes on freeways, but it has proven sufficiently robust to allow the tracking of moving vehicles even in heavy city traffic. The module is based on generic 2-D models optimized for the rear views of cars and trucks. However, the genetic models are sufficiently general to also allow the tracking and classification of various other objects that may occasionally be encountered on roads. For a truly reliable recognition of arbitrary objects the repertoire of internal object models should be expanded.

84

ACKNOWLEDGEMENT Part of the work reported has been performed with support from the Ministry of Research and Technology (BMFT) and from the Daimler-Benz AG within the EUREKA project PROMETHEUS.

REFERENCES Alizon, J.; Gallice, J.; Trassoudaine, L.; Treuillet, S. (1990): Multi-sensory data fusion for obstacle detection and tracking on motorway. In Monique Thonnat (Ed.): Proceedings of the Workshop on Vision (Prometheus, Pro-Art). Sophia Antipolis, pp 179-187.

Carlsson, S.; Eklundh, J.-O. (1990): Object detection using model based prediction and motion parallax. In Monique Thonnat (Ed.): Proc. of the Workshop on Vision (Prometheus, Pro-Art). Sophia Antipolis, pp 139-145. Davies, E.R. (1990): Machine Vision: Theory, Algorithms, Practicalities. London: Academic Press, 1990. Dickmanns, E.D.; Christians, Th. (1989): Relative 3-D State Estimation for Autonomous Visual Guidance of Road Vehicles. Proceedings, Intelligent Autonomous Systems. Amsterdam. Efenberger, W.; Ta, Q.-H.; Tsinas L.; Graefe V. (1992): Automatic Recognition of Vehicles Approaching from Behind. Proceedings of the IEEE Intelligent Vehicles '92 Symposium. Detroit, pp 57-62 Enkelmann, W. (1990): Obstacle detection by evaluation of optical flow fields. In Monique Thonnat (Ed.): Proceedings of the Workshop on Vision (Prometheus, Pro-Art), Sophia Antipolis, pp 146-166. Friedberg, S.A. (1986): Finding Axes of Skewed Symmetry. Computer Vision, Graphics, and Image Processing, 34, Nr. 2. San Diego: Academic Press, pp 138-155. Graefe, V. (1989): Dynamic Vision Systems for Autonomous Mobile Robots. Proceedings, IEEE/RSJ International Workshop on Intelligent Robots and Systems, IROS '89. Tsukuba, pp 12-23. Graefe, V. (1990): The BW-Family of Robot Vision Systems. In O. Kaynak (ed.): Proceedings of the IEEE Workshop on Intelligent Motion Control, Istanbul, pp IP55-IP65. Graefe, V. (1992): Vision for Autonomous Mobile Robots. Proceedings, IEEE Workshop on Advanced Motion Control. Nagoya, pp 57-64. Graefe, V. (1993): Vision for Intelligent Road Vehicles. Proceedings, IEEE Symposium on Intelligent Vehicles. Tokyo, pp 135-140. Graefe, V.; Fleder, K. (1991): A Powerful and Flexible Co-processor for Feature Extraction in a Robot Vision System. International Conference on Industrial Electronics, Control, Instrumentation and Automation (IECON '91). Kobe, pp 2019-2024. Graefe, V.; Jacobs, U. (1991): Detection of Passing Vehicles by a Robot Car Driver. IEEE/RSJ International Workshop on Intelligent Robots and Systems, IROS '91. Osaka, pp 391-396. Hartmann, G.; Mertsching, B. (1992): A Hierarchical Vision System. Proceedings, IEEE Intelligent Vehicles '92 Symposium. Detroit, pp 18-23.

85 Jiang, X.Y.; Bunke, H. (1990): Detektion von Symmetrien polyedrischer Objekte. In GroBkopf (Hrsg.): Mustererkennung 1990, 12. DAGM-Symposium, Oberkochen-Aalen, InformatikFachberichte 254. Berlin: Springer, pp 225-231.

Korn, A. (1989): Zur Erkennung yon Bildstmkturen durch Analyse der Richtungen des Grauwertgradienten. In Burkhardt et al. (Hrsg.): Mustererkennung 1989, 11. DAGM-Symposium, Hamburg, Informatik-Fachberichte 219. Berlin: Springer, pp 507-511. Kiihnle, A. (1991): Symmetry-based recognition of vehicle rears. Pattern Recognition Letters, Volume 12, Number 4. North-Holland, pp 249-258. Kuhnert, K.-D. (1986): A Model-driven Image Analysis System for Vehicle Guidance in Real Time. Proceedings of the Second International Electronic Image Week. CESTA, Nice, pp 216-221. Marola, G. (1989)" Using Symmetry for Detecting and Locating Objects in a Picture. Computer Vision, Graphics, and Image Processing, Vol. 46, Nr. 2, San Diego: Academic Press, pp 179-195. Masaki, I. (1992): Vision-based Vehicle Guidance. I. Masaki, editor. New York: Springer 1992. Mori, It.; Nakai, M.; Chen, H. (1989): A Mobile Robot Strategy: Stereotyped Motion by Sign Pattern. Preprint of the 5th International Symposium of Robotics Research. Tokyo, pp 225-236. Raboisson, S.; Even, G. (1990): Road Extraction and Obstacle Detection in Highway Environment. 2nd Prometheus Workshop on Collision Avoidance (CED 3). Coventry, pp 74-83. Regensburger, U.; Graefe, V. (1990): Object Classification for Obstacle Avoidance. Proceedings, SPIE Symposium on Advances in Intelligent Systems. Boston, pp 112-119. Saint-Marc, Ptt.; Medioni, G. (1990): B-Spline Contour Representation and Symmetry Detection. In G. Goos, J. Hartmanis (Ed.): Computer Vision- ECCV '90, Lecture Notes in Computer Science, Vol. 427. Berlin: Springer, pp 604-606. Seitz, P.; Lang, G.K.; Giiliard, B.; Pandazis, J.C. (1991): The robust recognition of traffic signs from a moving car. In Radig (Ed.): Mustererkennung 1991, 13. DAGM-Symposium, Informatik-Fachberichte 290, Mtinchen, Springer, pp 287-294. Schmid, M.; Thomanek, F. (1992): Real-time Detection and Recognition of Vehicles for an Autonomous Guidance and Control System. Pattern Recognition and Image Analysis, Vol. 3, pp 337-38O. Solder, U. (1992): Echtzeitfa,hige Entdeckung von Objekten in der weiten Vorausschau eines StraBenfahrzeugs. Dissertation, Fakultfit fur Luft- und Raumfahrttechnik der Universit~it der Bundeswehr Munchen. Solder, U.; Graefe, V. (1993): Visual Detection of Distant Objects. IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS '93. Yokohama, pp 1042-1049. Thomanek, F.; Dickmanns, D. (1992): Obstacle Detection, Tracking and State Estimation for Autonomous Road Vehicle Guidance. Proc. 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems IROS '92. Raleigh, pp 1399-1406. Tsinas, L., Graefe, V. (1992): Automatic Recognition of Lanes for Highway Driving. IFAC Conference on Motion Control for Intelligent Automation. Perugia, pp 295-300.

86 Wershofen, K. P. (1992): Real-time Road Scene Classification Based on a Multiple-lane Tracker. Proceedings, International Conference on Industrial Electronics, Control, Instrumentation and Automation (IECON '93). San Diego, pp 746-751. Young, E.; Tribe, R.; Conlong, R. (1992): Improved Obstacle Detection by Sensor Fusion. IEE Colloquium on "Prometheus and Drive", Savoy Place.

Young, G.-S.; Hong T.-H.; Herman, M.; Yang, J.C.S. (1992): Obstacle Detection for a Vehicle Using Optical Flow. Proc. IEEE Intelligent Vehicles '92 Symposium. Detroit, pp 24-29. Yuen, Sh.-Y.K. (1990): Shape from Contour Using Symmetries. In G. Goos und J. Hartmanis (Ed.): Computer Vision- ECCV '90, Lecture Notes in Computer Science, Vol. 427. Berlin: Springer. pp 437-453. Zieike, T.; Brauckmann, M.; v. Seelen, W. (1992): Intensity and Edge-Based Symmetry Detection Applied to Car-Following In G. Sandini (Ed.): Computer Vision- ECCV '92, Lecture Notes in Computer Science, Vol. 558. Berlin: Springer, pp 865-873.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

87

Visual Collision A v o i d a n c e b y S e g m e n t a t i o n Ian Horswill* *MIT Artificial Intelligence Laboratory, Cambridge, MA 02139 USA Visual collision avoidance involves two difficult subproblems: obstacle recognition and depth measurement. We present a class of algorithms that use particularly simple methods for each subproblem and derive a set of sufficient conditions for their proper functioning based on a set of idealizations. We then discuss and compare different implementations of the approach and discuss their performance. Finally, we experimentally validate the idealizations.

1

Introduction

One of the major difficulties of visual obstacle avoidance is that obstacles can seemingly have arbitrary appearance. They can be tall or fat, plain or textured, rigid or flexible. Different algorithms can be seen as embodying different assumptions about the appearance of obstacles. Moravec's Stanford Cart [11] can be seen as assuming that obstacles (1)project significantly from the ground plane and (2) are densely covered with visual corners. Kriegman, Triendl and Binford's Mobi system [9] can be seen as assuming that obstacles (1) have at least a certain height and (2) have vertical edges around their perimeters. The stereo algorithm of Storjohann et al. [13] is similar, but does not require the presence of vertical edges, only some sort of texture. The motion-based systems of Coombs and Roberts [2] and of Sandini et al. [ 12] also make these assumptions. Each system is compatible with a particular class of obstacles and incompatible with another. For example, none of these systems would know to avoid an oil slick, a patch of ice, or a delicate cable lying on the floor, because none of these objects would project sufficiently from the floor to be geometrically distinguishable from the floor. The other major difficulty with visual collision avoidance is depth recovery. Both the Stanford Cart and Mobi use stereo triangulation to determine position. Triangulation involves computationally expensive correspondence calculations and generally requires the use of well-calibrated cameras. In this paper, we will discuss a class of algorithms based on assumptions about the appearance of the background. Simply put, the method is as follows: take an image, use some criterion to discard pixels that look like the floor, and avoid driving toward the remaining pixels. Our current implementations are designed to operate in environments with textureless carpeted floors. In such environments, the presence of an edge is a cue to the presence of an obstacle, so the robot need only steer to avoid image edges. The algorithm uses 2D image position to determine obstacle position: higher edges are assumed to be farther away. The algorithm has several advantages: 9 All computations can be performed on 2D rasters. 9 The algorithm requires very little computation. 9 The algorithm is very conservative, preferring false positives to false negatives. We have implemented the algorithm on five different robots here at MIT. The algorithm has also been implemented and tested in other labs [5][ 14]. The algorithm is quite robust and has seen hundreds of hours of testing in many different environments.

88

Figure 1: Polly (left), the first generation vision-based robot, Frankie (center), the second generation, and Elvis (right), the third generation. The next section describes the edge-based algorithm. Section 3 shows that the algortihm's behavior is equivalent to the behavior of a system with a perfect depth map, provided that a set of assumptions about the environment and the robot dynamics are valid. Section 4 gives the details of several implementations done here at MIT. Section 5 discusses their performance and failure modes and concludes that the assumptions about environment and vehicle dynamics are accurate enough for the purposes of the analysis. Finally, section 6 provides conclusions.

2

Algorithm

We assume a forward-looking camera. Let I(t) be the image at time t and let E(.) be some edge detection operator. We define the bottom projection ("bprojection") of a binary image J to be a vector b ( J ) , indexed by the x (horizontal) coordinate, whose x'th element is the height of the lowest marked pixel in the x'th column of J:

b~(J) = min{y : J ( x , y ) =

1}

It will be shown below that under the right conditions, b ( E ( I ) ) is a radial depth map: a mapping from direction to the amount of freespace in that direction. Given the radial depth map b(E(I(t)))), we define the left, right, and center freespaces to be the distances to the closest objects to the left, right and center:

l(t)

=

r(t)

-

c(t)

=

min

b~(E(I(t)))

min

b~(E(I(t))) b~(E(l(t)))

~~c

min

(1)

where xc is the x coordinate of the center column of the image, and w is a width parameter which in our implementations has been 1/4 the width of the image. The exact choices of the ranges over which the minima are taken is not important. Finally, we define a simple reactive control system for a holonomic base:

dO dt (t)

v(t)

-

co ( l ( t )

r(t))

(2)

=

c. (c(t) - dm,~)

(3)

-

89 where (dO/dt)(t) is the rotational velocity of the robot, v(t) is its translational velocity, d,~n is the closest that the robot should ever come to an obstacle, and co, c~ are user-defined gains. The algorithm turns in the direction of greater freespace at a rate proportional to the difference in freespace on the two sides. It drives forward if the distance to the closest object ahead is greater than d~,~, otherwise it drives backward. Of course, real systems will implement velocity caps (maximum velocities), which introduce min and max terms to the definitions above. We have left these out for simplicity.

3

Analysis

The discussion above leaves open a number of questions: in what environments will this algorithm work? What types of edge detectors can be used for N? Do the gains matter? In this section, we will examine these questions by comparing the algorithm to one which uses the same velocity equations (eq. 2), but which computes the distances l(t), T(t) and c(t) using a perfect oracle. 1 Space precludes a formal presentation in full detail. The interested reader is referred to the proofs of lemma 3 and theorem 2 in [7]. We can represent the perfect-information system schematically as follows: image

oracle

l, r, c

velocity eqs

-d-~' v

An image is fed to an oracle that somehow determines for each direction the distance to the closest object in that direction. The l, r and c values are computed from the resulting depth map and these values are passed to the velocity equations (eq. 2), which control the motors. Suppose that we were to warp the oracle's distance estimates by some strictly increasing function f: q

oracle ~ - ~

/, v, c H velocity eqs b

Because f is strictly increasing, this transformation will preserve the signs of both dO/dr and v. If we model the rotation and translation subsystems as first-order, zero delay systems, and if we can treat them as approximately separable (see section 5), then each subsystem will always converge to the same angle/distance, regardless of our choice of f. Only the rate of convergence will be affected. To say it more abstractly, if we treat the robot-environment combination as a dynamical system (whose state space is the configuration space), then the dynamical system will have the same attractors, repellors, and fixed points, regardless of the choice of f. Thus, the qualitative behavior of the system is invariant with respect to the introduction of a strictly increasing function. The practical import of this is that that we do not need to use a calibrated depth m e a s u r e ~ a n y monotonic measure will do, so long as we adjust d ~ n accordingly. Furthermore, the system should be robust in the presence of variations in sensor parameters. An oracle is a theoretical device generally used to prove that some function is not Turing computable. It is ,assumed to correctly compute the value of some function whether or not that function is formally computable. Oracles ,arenot intended to be physically realizable. We use the term here simply to mean a system that somehow correctly computes the right answer.

90 Now suppose we have an oracle that can identify obstacle pixels in the image: its input is an image, and its output is a binary image whose pixels are 1 iff the corresponding pixels in the input were projected from an obstacle. Call this the "figure/ground" image. It has been known at least since Euclid (see Cutting [3]) that for points projected from a ground plane, image plane height is a strictly increasing function of distance. This result can be extended to show ([7], theorem 2) that in the right environments, the bottom projection of the figure/ground image is equal to the radial depth map, modulo a strictly increasing function. Therefore we can substitute the figure/ground oracle for the distance oracle:

q F/Goracle H bproj H l,r,c H vel. eqs ground-planeconstraint

without altering the behavior of the system, provided that the (GPC) is true of the robot's environment. The GPC requires that all obstacles rest on the ground plane and that their projections be contained within the interiors of their regions of contact with the ground plane (see [7]). To complete the transformation of the distance oracle algorithm into the section 2 algorithm, we need only replace the figure/ground oracle with an edge detector. We would like to justify this substitution and to know what constraints there are on the choice of edge detector. Suppose the surface markings of the ground plane (e.g. the carpet) consist of a high frequency component and a constant (DC) component. Let ta be the lowest spatial frequency in the high frequency component. If the image plane is parallel to the ground plane, looking down at the ground plane from a distance d, then the to component will appear on the image plane with frequency tat

__

tad

Y where f is the lens focal length. Increasing d or changing the camera orientation can only increase this frequency. If we apply a low-pass filter with cutoff w t to the image, only the DC component will be preserved. Since an edge detector is effectively a band-pass filter, it should not respond to regions of the image projected from the ground plane, provided that it's high cutoff is less than w t. Thus we can substitute the edge detector for the F/G oracle:

q edgedet. H bprojH l, r,c~ vel.eqs b backgroundtextureconstraint

provided that the (BTC) holds: that the environment is uniformly illuminated, that the ground plane's surface markings have no components less than w other than DC, and that the scene is viewed from a distance of at least d. 2 The derivation of our algorithm from the oracle algorithm tells us that we can expect them to behave equivalently provided that the GPC and BTC hold of the environment, and that the robot base can be approximated as two non-interacting first-order systems. We will return to these

2This substitution step is not strictly valid, since it assumes that the edge detector fires for all the object pixels. In real life, the edge detector will only fire for the outline of the object and part of its interior. To be strictly valid, the substitution step should replace the F/G oracle with an edge detector combined with a fill algorithm (the fill would mark the object pixels missed by the edge detector). However, it is easy to show that the bprojection is invariant w.r.t, the fill operation, so it can be safely omitted.

91

Robot Polly Gopher Frankie Wilt Elvis

Speed lrn/s 0.2m/s lm/s lm/s 2m/s

Processor TMS320C30 MC68332 MC68332 MC68332 TMS320C31

Frames/s 15 5-10 5-10 5-10 15

Cost $11,000 $700 $7OO $7OO $1,000

Power

Rm2

e(co2)

a2 --+ >

a--+ n

... ....

>

R*

(4)

e(O)

where the ai is an a d a p t a t i o n c o m m a n d g e n e r a t e d by the p l a n n e r and e0 is an evaluation function. The a s s u m p t i o n relaxation priorities are governed by the ordering on o~ dictated by e0. The conditions t h a t trigger the p l a n n e r to produce an improved reactor are described in [9] and the constraints on convergence to a* overviewed.

106 3.

THE KITTING ROBOT PROBLEM

Assembly components are fed from stores to a kitting station on a conveyor belt. The kitting station consists of one or more kitting robots. Each kitting robot must take parts off the belt, place them in the appropriate slots of a kitting tray, and then place the tray onto the belt. The trays are stored in a stack in the robot's work space. The product we are kitting is a small DC servomotor manufactured by Philips. It has three parts: a cap, a motor and a body, all of which have a number of variants. The planner-reactor approach provides a way to incrementally construct reactive systems with improving performance. At the heart of this iterative mechanism is the concept of characterizing the environment by a set of assumptions and methods to detect whether they hold or not. Kitting Assumptions. The set of assumptions we have developed for the kitting environment is described below in order of likelihood of not holding: the ordering we have chosen for the planner to relax them (i.e., the e0 function in eq. 4). The planner uses these assumptions to help it to quickly build a reasonable reactor. Later, as it has time, it relaxes assumptions and fashions improved versions of the reactor. When the environment goes through regimes where different subsets of assumptions hold, the planner can capitalize on this, especially when new factory goals have been received and resultant changes in behavior are necessary. Some of the assumptions, such as that of the quality and substitutability of parts, must eventually be relaxed to get to a robust work cell. Some other assumptions, such as that of cooperation in kit assembly and tray disturbance, describe contingencies that make a more versatile work cell, but are not necessary for a robust system. Still other assumptions pertain to unusual operating conditions, such as the assumptions of no downstream disturbances. The kitting robot problem uses nine assumptions in total. In our experiments, however, only the following five were relaxed: 1. 2. 3. 4. 5.

Assumption of Parts Quality (AQ): All the parts coming into the kitting work cell are of good quality and do not need to be tested. Assumption of non-substitutability of parts (AS): Each part has only one type. Assumption of no parts motion (AM): The kit parts do not move around once delivered on the belt to the work cell. Assumption of no downstream disturbance (ADD): Downstream automation is always ready to receive finished kits. Assumption of parts ordering (AO): The robot cannot modify the preplanned parts ordering.

In addition to the iterative relaxation of assumptions based on likelihood, the planner can also be forced to relax assumptions. Whenever the planner builds a reactor that depends upon a particular assumption holding, it needs to embed a monitor process that can detect whether the assumption actually holds in practice or not. The monitor process triggers a 'clean up' action if the assumption ever fails, and notifies the planner that it needs to relax this assumption next.

107 4.

SITUATIONS

Reactor situations are a m e c h a n i s m to group related reactions together in a hierarchical fashion. This m o d u l a r i t y is i m p o r t a n t because it allows (human) designers to more easily u n d e r s t a n d the reactor and diagnose any problems t h a t m a y occur; it provides the p l a n n e r with a unit around which to define safe changes to the reactor s t r u c t u r e [10] and it provides a way to give computational resources to those reactions t h a t currently need it, while 'suspending' others. Intuitively: a situation being active expresses the a p p r o p r i a t e n e s s for the reactor to enable the set of reactions associated with t h a t situation. Situations can be nested hierarchically, and m a n y situations m a y be active at one time so t h a t the execution of their reactions will be interleaved. We introduce a small set of basic processes with which to build reactor situations in T5 (table 1). For example, an instance of a situation is asserted by the execution of the A S S E R T s p l , ~ 2 .... process, w h e r e p l, p 2 .... are the values for the p a r a m e t e r s associated wit~ the situation. The SIT s ( k , p l , . . . ) process, w h e n executed, suspends itself until an instance of situation s is asserted. It t h e n t e r m i n a t e s and passes on the details of the situation instance as its results. * ASSERTS,p Assert situation s and initialize the situation p a r a m e t e r s p,. . . . . . This t e r m i n a t e s w h e n the situation is t e r m i n a t e d and stops or aborts depending on w h e t h e r FAIL or SUCCEED was used. * FAILS, k T e r m i n a t e s instance k of situation s with fail status. * SUCCEEDs, k T e r m i n a t e s instance k of situation s with success status. * SIT s ( k , p l , p 2 , . . . ) T e r m i n a t e s if an instance of situation s is currently asserted. k is the instance n u m b e r and p l , p2, ... are the current values of the situation parameters. Table 1: Situation basic processes We use the S I T s process to ensure t h a t a reaction associated with a situation is only 'enabled' w h e n an instance of t h a t situation is asserted. For example, let p1, . . . . , p n be the reactions for situation s, t h e n we would r e p r e s e n t this in the reactor as follows:

PS

=

SITs:;pI I SITs:;P2 I ""I SITs:;Pn

(5)

The ':;' operation ensures t h a t as long as the situation is active, t h e n the reactions are continually re-enabled. Note t h a t once enabled, a reaction cannot be disabled until it has t e r m i n a t e d . The asserting and t e r m i n a t i o n of instances of situation s h a p p e n as the side effects of ASSERTs and FAIL s or SUCCEEDs processes in reactions in other situations.

108

5.

THE PLANNER

The planner needs to adapt the reactor to suit new goals and/or changing operating conditions. New goals may be issued by factory management. Through feedback from perceptions t h a t the planner has inserted in the reactor the planner is made aware of the current operating conditions, e.g. the current set of assumptions t h a t hold in the environment. Based on these two sources of information, the planner reasons to arrive at an incremental set of adaptations, such t h a t the reactor's behavior becomes ever more goalworthy. This section outlines the principles of operation of the planner with emphasis on its interactions with the reactor, a detailed description of the planner is given in [9].

5.1. The P l a n n e r Algorithm At every planner iteration t, based on the environment model TM~ and current goals Gt, the planner generates what we call an expectation T_t:an abstract description of the changes it expects to make to the reactor to achieve Gt. At every iteration, the combination of the reactor model and the expectation is exactly the co-ideal reactor (where co is the current set of assumptions the planner is working with).The process of incremental improvement of the reactor can now be seen as incremental reduction of the planner's expectation by some ATt and corresponding increment of the reactor ARt, until when Tt is null, the current reactor is exactly the ideal reactor. Additionally, when assumptions are used to construct the reactor increment, the planner will insert assumption monitors AIt into the reactor. Similarly, if the planner needs specific information for its own planning purposes, it can insert additional perception processes APt, processes t h a t collect information in the reactor and report back to the planner. Goals G

T,

r I I I I I Z~

t

I

i i i

current reactor

Rt

AIr ~

perception monitors

SI

t

I I I I

Pt

=1

Wt

PS i

PLANNER

',

Fit

REACTOR

.i

Figure 2: Planner-Reactor Interactions

WORLD

12

109 The input/output interactions between planner and reactor are as shown in Figure 2. The planner incrementally produces changes in the reactor structure ARt plus the associated assumption monitors AI t and additional perception processes APt. It receives back information from existing perception process vP t and any signals from assumption monitors that have failed FI t .

Figure 3: The Planner Architecture

5.2. The p l a n n e r architecture. The block diagram of the planner's internals is shown in figure 3. Based on the environment model TM and the goals G,, the planner accumulates a current Problem-Solving Context (PSC). Within that PSC, the planner generates an abstract plan, the expectation Tt. The current set of assumptions ~ s ~ is used to filter the environment model to generate a simpler model in the PSC. The current expectation Tt is analyzed to determine how much of it can be reduced A~. This incremental expectation is decomposed (in the planning sense) in the context of this filtered world, to generate an adaptation of the reactor ARt and the set of assumption monitoring processes to go along with it AIt. A set of perception requests can also be generated at this point APt; these requests will return data that clarify uncertainty in T M t and allow further decomposition of ~. The assumption monitor processes may signal that the assumptions they protect have been violated; this results in ~ 5 ~ being decremented by that assumption, and it may thus result in being revised and probably increased again.

110

5.3. Expectation reduction. To reduce this expectation, the planner reasons within a Problem Solving Context consisting of the relevant parts of the environment model TM, the action repertoire A and the assumptions o~ The outcome will be a reactor adaptation ARt, the reactions necessary to implement the expectation reduction ATt. The manageable size of AT_t is constrained by the following factors: 1.

The response time of the planner. This can come from the design specification of the planner, or from the temporal constraints on goals and actions in the plan. 2. The effects of uncertain knowledge on the plan. The planner may need to issue perception requests before more of the expectation can be reduced. 3. Resource use and mappings. Initially, the planner reduces an expectation by looking for an abstract plan that achieves the given goals. This abstract plan, representing the structure of a more detailed plan, is maintained through insertion of situations into the reactor. The more detailed actions corresponding to the expansion of an abstract action are encoded as reactions in the situation, representing the abstract action. Any reactor segment sent to the reactor must be directly executable. However it need not necessarily be concrete, i.e., a reaction of a given situation may only contain an STUB process, a process that acts as a 'placeholder' and, when executed, only sends a perception back to the planner stating that this particular situation has become active. We call this perception signal a stub-trigger. In this way, the planner can adapt the reactor without needing to fully refine its plan first. Before inserting any adaptation into the reactor however, the planner proceeds to detail the first step in the abstract plan, such that for this first step a concrete plan is in place (i.e., no STUB) when the adaptation is sent. This avoids an immediate stub-trigger signal from the reactor for this segment. Instead it allows the reactor to start operating immediately, while the planner can proceed in parallel to flesh out other abstract parts of the plan. For the kitting robot domain, an initial abstract plan would be a sequential composition of the situations AcquireTray, FillTray, and RemoveTray. Only AcquireTray needs to be concrete in order for the planner to adapt the reactor. The other situations can have S'I~'B reflexes initially. The planner can refine these while the reactor acquires a tray. The s ~ m processes in the reactor provide the planner with much needed focus-of-attention information, i.e. the stub trigger signal tell the planner which part of the reactor configuration is in immediate need of refinement. Hence the planner can concentrate on refining the relevant parts of the reactor first, as indicated by the environment by a stub-trigger. The following process descriptions show a reactor segment generated by the planner for this initial adaptation. Each reaction in the segment below is in an ifthen-else form ( ( - C o n d : I f T r u e ) : E l s e ) . The REASSERT process continually reasserts failing situations. P0 is necessary to assert the topmost situation repeatedly. The process MOVeobj,loc repositions a grasped object obj to a location loc.

111 S T O P : ; ASSERTKitting

PO

=

P1

=

SITKitting(k >: ; (- ( (REASSERTAcquireTray (xl> : REASSERTFilITray, xl : REASSERTFinishTray ,xl ) " SUCCEEDKitting, k) : FAILKitting, k)

P2

=

SITAcquireTray(k >: ; ( - ( ( REASSERTFindPart (xl> : MOVExl ' trayarea : (UPDATEAcquireTray, k, xl; SUCCEEDAcquireTray, k ) ) : FAILAcquireTray, k )

P3

=

SITFindpart(k> : ; ( " ( ( Findtray(xl> : UPDATEFindPart, k, xl ; SUCCEEDFindPart, k ) ) : FAILFindPart, k )

P4

=

SITFillTrayt(k, x l } : ; (STUBFillTray: SUSPEND2000; FAILFiUTray, k)

P5

=

SITFinishTrayt(k , xl>. ; (STUBFinishTray : SUSPEND2000; FAILFinishTray, k)

Barring any perceptions received from the reactor, the planner continues this process, incrementally fleshing out abstract segments. At every iteration, the expectation is reduced by fleshing out a situation. The situation hierarchy is restricted to be an acyclic digraph, i.e., abstract actions used higher in the plan hierarchy cannot be used to construct a concrete reaction. If all the reactor segments are made concrete and all the STUB processes removed. Then, the planner has achieved an co-ideal reactor, and can proceed to relax the next assumption.

5.4. Perceptions Perceptions may change the course of operation of the planner. They provide key information for the planner to decide what part of the reactor to elaborate upon next. The planner specifies which perceptions it wants to receive, i.e. which sensory data the reactor needs to collect for the planner's purpose. Reactions are ideally suited for situations where the data yielded by the embedded sensing is of no interest to the planner, because such data has only local effect within the reflex. Unknown or uncertain data that can have an effect on future reactor segments, however, should be collected differently. For example, in the case of a constant, but unknown, fact, it is inefficient to manufacture a reaction with a conditional on this fact, since we know a priori that all but one branch of that conditional is wasted. Factoring in resource use constraints means that some problems would become impossible when implemented with conditionals. There simply won't be enough resources available to cover all the a priori options. In addition, posting perception processes is useful for longer term information that is important to the planner, e.g. in the kitting domain, the fact of whether downstream automation is always ready to receive finished kits. In these cases, the planner produces perception requests: these are actions that report information back to the planner. The planner uses this information to refine its knowledge about the world, and thus remove the troublesome uncertainty. In contrast, insertion of communicating networks in the reactor is clearly most appropriate for transitory information, e.g., the position of a cap on the belt, or for determining which of the situations, sit(t), of a task, t, is currently present.

112 This fosters a fast response on the part of the reactor, it allows the reactor to deal with contingencies, and to take advantage of opportunities such as fortuitously occurring goal conditions. Apart from informational perceptions, the planner relies on four specific perceptions to set its priorities for the planning process. These perceptions are the following: 1.

2.

3.

4.

Guard-kills. The planner is informed when an ~. process is killed. ~. processes ('guards') form the basis by which an adaptation can remove old structure and install new structure in the reactor. This perception indicates to the planner the completion of an adaptation. Stub trigger. The execution of a sTtm process in the reactor signals the planner that the reactor has started to execute a segment for which no concrete plan yet exists. Situation failure. This signals the planner that a reactor segment failed (usually through an invalidated assumption), and needs to be modified as soon as possible. Assumption failure. The planner installs dedicated processes (A]:t) in the reactor to monitor assumptions. If one of the assumptions is invalidated, its monitor will signal the planner.

Guard-kills govern the adaptation cycle and are necessary for the planner to preserve the safety of the adaptation process. If a situation is still being adapted, the planner cannot send out a next adaptation for that same situation since the safe adaptation mechanism is only safe for one adaptation per situation at the time. Thus the planner holds back the next adaptation instructions for that situation until the guard-kill perception is received. The other three types influence the planner's priority in planning adaptations. The planner uses these to decide which part of the expectation to reduce first. In effect, the environment (through the reactor) tells the planner what parts of the reactor should be considered for adaptation. The priory ordering for planning is the following: Failed situations > Forced assumption relaxation > Active stubs > Quiescent stubs.

In absence of any perceptions received i.e in a fully cooperating environment, the normal procedure is to build an initial reactor first using all assumptions. This initial reactor is built in increments with stub refinement proceeding along temporal ordering of the reactor segments. Subsequently, assumptions are relaxed to widen the scope of the reactor.

113 5.5. F o r c e d R e l a x a t i o n .

Any of the assumption monitors or stub triggers i n the initial reactor segments can potentially signal the planner and divert its attention from the a priori established ordering of assumptions relaxation. Failure of a situation to successfully complete its reactions is also cause for a perception. These three sources of perceptual input have different effects on the planner. Stub trigger. On receiving a stub trigger signal, the planner redirects its focus of attention for refining the abstract plan to the portion of the plan containing the stub trigger. The planner inserts the appropriate refinement into the reactor. The reactor then can start executing that segment immediately, since the situation is still active. Assumption failure. An assumption failure perception has a larger effect than simply refocusing attention. The assumption failure signal causes the planner to negate the assumption in its Problem Solving Context and begin to rebuild the effected portions of its plan. Situation failure. If a situation relying on that assumption failed, that particular situation is given highest priority for adaptation, even though other assumptions may have priority in the assumption ranking. This means the planner may have to turn its focus back to parts of the plan it had previously considered finished. Apart from its cause, a forced relaxation or refinement results in the same reactor segment and adaptation sequence as a normal relaxation or refinement. 6.

EXPERIMENTAL RESULTS

The kitting example described section 3 was implemented and several experimental runs conducted on a PUMA-560 based kitting work cell. Trace statistics were gathered on each run on, e.g., the times when assumption failures were detected, when adaptations were made, the number of assumptions in use in the reactor, and so on. In each run, the planner utilized all the assumptions described in section 3. However, only five of the assumptions were actually relaxed in these trials. The purpose of these experimental runs was to begin to explore the behavior of a planner-reactor system in practice. Results from the runs are presented in the following subsections. Initial Reactor Construction. Figure 4 contains trace statistics for the start-up phase on a typical experimental run of the kitting work cell. Graph (a) shows the number of assumptions in use at any time in the work cell by the planner and by the reactor. Graph (b) shows the number of adaptations issued (by the planner) and the number of discrete changes to the reactor structure (adaptations applied). Initially there is an empty reactor in operation. After 26 seconds (in this example) the planner is in a position to start to update the reactor. (This time include planning plus communication time.) The planner begins by sending adaptations to the reactor (adaptations issued). Each adaptation involves one or more changes to the reactor structure. Thus, eventually, the adaptations applied will outnumber the adaptations issued.

114 The constraints involved in safe adaptation can cause a time lag in implementing changes. Active situations cannot be i n t e r r u p t e d - that would leave the reactor in an undefined s t a t e - instead the adaptation waits for the situation to end before applying the changes; the theory behind this is presented in [10]. Thus the adaptations applied trace always lags the adaptations issued trace. The lag time varies, depending on the activity of the reactor. This is what causes the reactor assumptions trace in Figure 4(a) to lag the planner assumption trace.

Figure 4: Start-up phase statistics for one run of the Planner-Reactor based Kitting Robot The first reactor was in place roughly 28 s after start-up. Even though this reactor was not complete, the kitting robot could start kitting; its kitting actions were overlapped in time with the further refinement of its kitting 'program'. This reactor was elaborated over the following 20 s until by 40 s after start-up the first complete reactor was in place. This reactor employed all nine assumptions. Up to this point, there was little delay between the planner reducing an assumption and the reactor following suit. Assumption Relaxation. Then the planner began to relax assumptions to incrementally improve the reactor. It took roughly 5 s to relax the first assumption (the length of the 'plateau' in the planner trace in Figure 4(a)). It then began to issue the adaptations to the reactor. Typically one or more adaptations is necessary to relax any one assumption. Each adaptation will give rise to one or more structural changes in the reactor, again subject to the delays of the safe adaptation constraint. In the case of this first assumption relaxation, its took the reactor roughly 60 s before it was able to implement the last structural change to relax the assumption (the 'plateau' on the reactor trace in Figure 4(a)). It took the planner roughly 30 s to relax the second assumption; but the change was implemented in the reactor almost immediately after the first relaxation. These lags and overlaps in operation underscore the highly asynchronous and independent operation of planner and reactor.

115 In these experiments, we restricted to planner to only relax two assumptions voluntarily. Having done this, the planner will only relax other assumptions when the environment forces their failure. This could occur within the start-up phase, of course, but for convenience of presentation we have separated the two here.

Figure 5: Forced relaxation statistics Environment Forced Relaxation. Figure 5 shows trace statistics for the ongoing operational phase in a typical experimental run of the kitting work cell. Again, graph (a) displays the assumptions in use by the planner and reactor, and graph (b) displays the trace of adaptations issued and applied. Additionally, Figure 5 (b) now also shows the trace of assumption failure perceptions. These are the perceptions that force the planner to relax an assumption. The two forced assumption relaxations in Figure 5 are the no-motion (AM) and downstream disturbance (ADD) assumptions. The failure perception for AM is generated when the robot fails to acquire a part it had identified in an initial visual image of the work space. The failure perception for ADD is manually invoked, and would normally be an input signal from downstream automation. At 152 s the AM assumption failure was signaled (Fig. 5(b)). The planner responded very quickly and began to issue adaptations by 155 s, continuing for about 10 s. There was little delay in the safe adaptation procedure and the structural changes to the reactor lag the adaptations issued by only about ls. This is also evident from the assumptions trace in (b). Part of the changes in this adaptation was to remove the assumption failure monitor for this assumption; thus, partway into the adaptation (161 s) the signals from the assumption monitor cease. The total forced relaxation response time ~ the time from failure to final change in the reactor m is roughly 11 s. The ADD failure occurred at 215 s and proceeded in a similar fashion. Reinstating Assumptions. In our theoretical analysis [9, 10] we did n o t address the problem of reasserting previously failed assumptions. Nonetheless, this is convenient in practice to model operating regimes, and therefore is part of our objectives. In the experimental run, at time 357 s the ADD assumption was reinstated

116 (Figure 5b)). On the relaxation of such reinstateable assumptions, the planner adds a reactor monitor that will signal when the assumption holds again. (It doesn't show up as an assumption failure, since it isn't). As the trace statistics show, our implementation does indeed handle the reintroduction of assumptions. However, further work is necessary to extend our convergence results to include this case. Ordering of Assumption Relaxation. The graphs of assumptions versus time do not indicate which assumptions are relaxed at each step. Figure 6 shows the interval for which each assumptions is in use in both planner (solid line) and reactor (broken line). The acronyms on the Y-axis refer to assumptions. Four assumptions m ADT (tray disturbance), ACP (cooperation), ACT (competition), and AF (filled trays) m are not relaxed in this experiment and hence hold throughout. The assumptions of quality (AQ) and substitutability (AS) are introduced first. Note that the reactor does not drop these assumptions for some time after the planner has dropped them. This is also shown in Figure 4 but there it wasn't evident which assumptions were involved. The downstream disturbance assumption (ADD) is introduced, then relaxed and then reintroduced, as shown in Figure 5.

Figure 6: Reactor assumption usage over time

Figure 7: The response to immediate failure of assumptions

Immediate Assumption Failure. The assumption relaxation in the experiment shown in graphs 4 and 5 is artificial: The planner-directed relaxation of assumptions is carefully separated in time from the environment-directed relaxation.

117 This is unlikely in practice, both planner and environment will be simultaneously directing relaxation. The assumptions graph in Figure 7 was generated in simulation (the previous graphs have been from real robot experiments) by assuming that whenever the reactor tests an assumption, the assumption failso The first difference with the previous experiment, (Figure 4) is that the planner never gets to use all the assumptions. Failure signals begin as soon as the first reactor update is made, even before the entire first ideal reactor is constructed. The planner quickly relaxes the AQ, AS and AO assumptions. It then finishes the reactor, introducing the ADD assumption which also immediately fails (the 'blip' at 44 s) and is replaced. 7.

DISCUSSION

In summary, we have reviewed our motivation and approach to casting planning as adaptation of a reactive system and have presented the first performance results of the approach, a first for integrated systems of this kind. These results show the feasibility of our approach. The incremental reactor construction and assumption relaxation strategy minimizes planning delays. The reactor can start kitting while the planner still is refining later parts of the task. The asynchronous communication between planner and reactor allows the latter to be real-time and not bogged down by the planner. The planner stays informed through its implantation of perceptions in the reactor. Information from these perceptions gets shuttled back to the planner, which employs it to decide on the appropriate parts of the reactor to modify next. Thus, the planner is able to support the reactor's direct needs, despite the asynchronous link between the two. REFERENCES

1. J. Bresina & M. Drummond, Integrating Planning and Reaction, in: AAAI Spring Workshop on Planning in Uncertain, Unpredictable or Changing Environments, Stanford, CA, J. Hendler (ed.), Systems Research Center, Univ. of Maryland (1990). 2. R. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE Journal Robotics & Automation, 2 (1986). 3. J. Connell, SSS: A hybrid architecture applied to robot navigation. IEEE Int'l Conf. on Robotics & Automation (1992). 4. R.E. Fikes and N.J. Nilsson. Strips: A new approach to the application of theorem proving to problem solving. Artificial Intelligence 2, (1971), 189-208 5. B.R. Fox and K.G. Kempf, Opportunistic scheduling for robotic assembly. IEEE Int'l Conf. Robotics & Automation, St.Louis MO (1985), 880-889. 6. Th. Fraichard & C. Laugier, On-line Reactive Planning for Non-Holonomic Mobile Robot in a Dynamic Environment, IEEE Int'l. Conf. on Robotics & Automation, CA (1991). 7. E. Gat, Alfa: A language for programming reactive robot control systems, IEEE Int'l Conf. on Robotics & Automation, Sacramento, CA (1991), 1116-1121. 8. D.M. Lyons, Representing and analyzing action plans as networks of concurrent processes. IEEE Trans. Rob. & Aut}, 9 (3), (1993).

118 9. D.M. Lyons & A.J. Hendriks, Planning for Reactive Robot Behavior, IEEE Int'l Conf. on Robotics & Automation, Nice, France (1992). 10.D.M. Lyons & A.J. Hendriks, Safely Adapting a Hierarchical Reactive System, SPIE Intelligent Robotics & Computer Vision Boston, MA, (1993). 11. D.M. Lyons, A.J. Hendriks & S. Mehta, Achieving Robustness by Casting Planning as Adaptation of a Reactive System, IEEE Int'l Conf. on Robotics & Automation, Sacramento, CA (1991), 198-203. 12. A.J. Hendriks and D.M. Lyons, A Methodology for Creating and Adapting Reactive Systems, IEEE Int'l conference on Tools for AI, San Jose, CA (1991), 220227. 13.D. McDermott, Robot Planning, Tech. Report YALEU/CSD/RR #861, Yale University, (1991). 14.M. Schoppers, Representation and Automatic Synthesis of Reaction Plans, Ph.D. Thesis, Tech. Rep. uiucdcs-r-89-1546, CS dept., University of Illinois at Urbana-Champaign (1989). 15. C.J. Sellers & S.Y. Nof, Performance Analysis of robotic kitting systems, Rob. & Comp. Integ. Manuf. 6 (1989). 16.X. Xiadong and G.A. Bekey, Sroma: An adaptive scheduler for robotic assembly systems, IEEE Int'l Conf. on Robotics & Automation, Philadelphia, PA (1988).

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

119

Robot task programming by human demonstration: Mapping human grasps to manipulator grasps Sing Bing Kang and Katsushi Ikeuchi

School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

Abstract To alleviate the problem of overwhelming complexity in grasp synthesis and path planning associated with robot task planning, we adopt the approach of teaching the robot by demonstrating in front of it. A system with this programming technique is able to temporally segment a task into separate and meaningful parts for further individual analysis and recognize the human grasp employed in the task. With such derived information, this system would then map the human grasp to that of the given manipulator, plan its trajectory, and proceed to execute the task. This paper describes how grasp mapping can be accomplished in our system. The mapping process essentially comprises three steps. The first step is local functional mapping, in which grasps of functionally equivalent fingers are established. This is followed by gross physical mapping which produces a kinematically feasible manipulator grasp. Finally, by carrying out local grasp adjustment using some task-related criterion, we arrive at a locally optimal manipulator grasp. We describe these steps in detail in this paper and show results of example grasp mappings.

1. Introduction There is always a need for a portable, safe, easy, fast, and robust means of programming robots to perform tasks. Current conventional methods of programming robots such as programming using a robot language and direct manipulation of the robot through the desired trajectory (i.e., manual lead-through), while useful, fall short on some of these characteristics. Completely automatic task planning (with the inputs being just the high-level specifications of the task goals) is difficult to achieve in practice. This is attributed to the high combinatorial complexity of grasp synthesis and path planning in the presence of obstacles. We seek to mitigate this problem by using cues extracted directly from human performance of the task of interest. The

120 approach of programming a robot by allowing it to observe the human demonstrating the task is also known as Assembly Plan from Observation (APO) [9]. A major component in task planning is planning the manipulator grasp. In the following section, we briefly discuss some of the issues involved in grasp planning and reiterate the justification for our approach.

2. Grasp Planning Grasp planning involves determining the hand placement relative to the object of interest as well as the posture of the hand assumed in grasping the object, both in a manner consistent to the requirements of the task. It is computationally intensive due to both the large search space and the incorporation of geometric and task-oriented constraints. Approaches to grasp planning can be categorized as geometric and mechanical [ 19]. The geometric approach considers only the spatial aspects of the objects and arm/hand system, while the mechanical approach incorporates the arm/hand kinematics, system dynamics, and their task-related constraints. The large search space for an optimal grasp can be reduced in size or dimensionality by using certain heuristics or structural constraints. For a 2-finger gripper, for example, the surface element pair selection can be guided by a series of constraints, such as face parallelism, permissible face separation, and non-zero overlap between projected faces [24]. By imposing fixed contact locations or fixed types of grasp, Pollard computes a 6D projection of the 15 DOF configuration space for the Salisbury hand [20]. This reduced space represents the space of wrist poses for geometrically feasible grasps, and is used to search for a globally optimal grasp. In addition, a popular strategy for reducing the complexity of planning the dextrous hand grasp is by using a small set of predetermined grasps or grasping behaviours (e.g., [2], [7] and [21 ]). We adopt the approach to make grasp planning more tractable by using cues extracted from human execution of the grasp. The approach first involves the temporal segmentation of the recorded task executed by the human which locates the instant when the grasp occurs [ 11]. The subsequent steps are recognizing the type of grasp [12][13] and recovering the approximate grasp contact locations from observed data. This is followed by determining the initial approximate grasp which satisfies kinematic and geometric constraints, and subsequently performing local optimization of the grasp using the approximate grasp as the starting condition and a taskoriented objective function. Our grasp mapping approach is described in detail in the following sections.

3. Mapping grasps 3.1. General approach At the highest level of description, planning the manipulator grasp based upon the observed human grasp comprises the following steps:

121

1. Local functional mapping This is done by observing the type of grasp used by the human, and the functions and grouping of the palm and individual fingers. The latter is determined using the real-finger-to-virtual finger mapping described in [ 13]. The mapping of the manipulator end-effector to the virtual fingers is dependent on the effective cohesive index and the cohesive indices of the individual virtual fingers.

2. Adjustable or gross physical mapping This operation is carried out using the outcome of the local functional mapping and the initial location of the manipulator (near the pose of the human hand while carrying out the task). It results in an approximate form of the manipulator grasp of the object which is geometrically feasible.

3. Fine-tuning or local adjustment of grasp The grasp determined from steps 1 and 2 may not exhibit static stability, or that it may not be locally optimal according to a chosen task-oriented criterion. By analyzing the criterion and iteratively perturbing the initial grasp configuration, we arrive at a locally optimal grasp. In the case of the power grasp which involve high kinematic constraint, this step is skipped. 3.2. A n a l y t i c m e a s u r e s of a g r a s p Cutkosky and Howe [5] summarizes a number of analytic measures that have been used by various researchers in their analyses of grasps, several of which are for the purpose of grasp synthesis. We can group these grasp analytic measures into different categories according to both the depth of knowledge of the grasp and hand mechanism, and the type of analysis (Fig. 1). These analytic grasp measures are instrumental in determining the manipulator grasp. The overall strategy of mapping grasps is depicted in Fig. 2. The human grasp can be described in a hierarchical fashion [ 12]. With a priori knowledge of the human hand and data derived from observation, certain analytic grasp measures can be extracted (such as connectivity and mobility). Armed with these information, together with a priori kinematic, structural and geometric knowledge of the manipulator, the grasp translator would then attempt to produce a manipulator grasp with compatible analytic grasp measures. In our work reported in this paper, as a starting point, we employ the manipulability measure as the criterion used in generating the locally optimal manipulator grasp.

122

Analytic Grasp Measures Degree-of-freedom Measures connectivity mobility

Static Measures

Increasing depth of knowledge required of grasp and hand mechanism

force closure form closure internal forces resistance to slipping manipulability grasp isotropy

Structural and Servo Parameter Measure$ compliance stability Fig. 1 Categories of analytic grasp measures

f

Human grasp

~

g] l ras7 ulat~

observation

| | ofHierarchicalgrasp description ~

a priori

knowledge of human ' hand ~.,

i/

~.~ ~ ~ ~" '~ ~ ~ [ "~ ~

.

description of grasp Analytic grasp measure________~s

I a priori

knowledge f manipulator

~

J

Fig. 2 Grasp mapping strategy

The functions of the hierarchical description of the human grasp are local functional mapping and gross physical mapping. Meanwhile, the functions of the analytic grasp measures are: 1. Verification of appropriateness of gross physical mapping; and 2. Local fine-tuning or perturbation of grasp to a locally optimal posture.

4. Local functional mapping This first step in mapping grasps is common to all types of grasps, i.e., power and precision grasps. The local functional mapping associates the manipulator fingers to those of the hand,

123 and is generally different for different grasps. To guide the local functional mapping, we first assign manipulator fingers as either a primary finger, secondary finger, or a palm.

4.1. Assigning manipulator fingers Prior to mapping the the manipulator fingers to those of the human in the observed grasp, the manipulator finger has to be assigned as one of the groups of fingers:

1. Primary finger/s A finger in this group would correspond to the human thumb, which has at least the same number of degrees of freedom as the other fingers. It serves not only to secure the hold of the object against the palm and/or other fingers, but also to manipulate the object.

2. Secondary finger/s This group would correspond to the fingers of the hand other than the thumb. This group of fingers would serve to secure the hold of the object against the palm and/or primary finger/s. A special case of a "finger" which is also assigned is the palm; it is passive with no local degree of freedom (i.e., discounting the global 6 DOF of the manipulator). Manipulators are designed with specific functions or level of dexterity in handling objects. The repertoire of grasping abilities may be as limited and simple as just clamping the object between its jaws as a means of securing a hold on it (as in the parallel-jaw gripper). The manipulator may be anthropomorphic to some degree and be able to mimic certain grasps of the human hand; examples of such hands are the Utah/MIT hand [10], Salisbury hand [22], Okada hand [18], Minnesota hand [15], and the Southampton hand [4]. It is usually simple to label a priori each of the manipulator fingers as either a primary or secondary finger. Once the primary and secondary fingers have been identified, the hand-to-manipulator finger mapping can proceed with the aid of another type of mapping. This mapping relates real fingers to groups of functionally equivalent fingers called virtual fingers [1][13]. A consequence of this real-finger-to-virtual finger mapping is the cohesive index [ 13], which is associated with the degree to which the real fingers in a virtual finger act in a similar manner (by comparing the object normals at the contact points). A cohesive index of a virtual finger is maximum at unity, and would indicate that the fingers within that virtual finger act exactly alike. 4.2. U s i n g t h e

cohesive index to guide the mapping of fingers

An indication of how the cohesive index is used to guide the virtual finger mapping between those of the human hand and the manipulator is illustrated in Fig. 3. (VF n refers to the nth virtual finger with C n being its associated cohesive index.) The palms, primary fingers, and secondary fingers are mapped appropriately as seen in Fig. 3. The cohesive index serves as a check - the virtual finger corresponding to the group of secondary fingers normally has a cohesive index less than that of the collection of primary fingers. The virtual finger composition determined from the real-finger-to-virtual finger mapping [ 13] take precedence, however.

124

VF o = {0} (palm) .~ VF 1={1} VF 2 = {2, 3, 4, 5}

C o = 1.0 C 1=1.0 C 2 < 1.0

5-fingered cylindrical power grasp VF o' = {0 } (palm) VF 1' = {1} VF 2' = {2}

VF0" = {0 } (palm) VFI" = {1} VF2" = {2, 3}

VF0'" = {0 } (palm) VFI'" = {1} VF2"' = {2, 3, 4}

(Decompose VF 1 and VF 2 this way because C 2 < C 1)

Two-fingered parallel-jaw gripper

Three-fingered Salisbury hand

Four-fingered Utah-MIT hand

Fig. 3 Mapping grasps according to cohesive indices of virtual fingers Once the functional finger assignments have been accomplished, the more detailed manipulator grasp planning can then proceed.

5. Approach in generating grasp and pregrasp trajectory The diagram representing the entire manipulator grasp and trajectory planning approach is shown in Fig. 4. Prior to the more detailed manipulator grasp planning, the manipulator assumes a fixed pose and posture relative to the human hand throughout the task execution; local planning is carried out within the vicinity of this pose and posture. Note that by virtue of the different form and function of the power and fingertip precision grasps, they are planned in a different manner elucidated in subsequent sections. Once the local planning is done, starting with the pose and posture of the manipulator during the grasping stage, the pregrasp trajectory is then planned backwards, with the goals of a smooth and collision-free path. This path would closely follow that assumed by the human hand trajectory.

125 Depending on the type of grasp, the approach in manipulator grasp planning differs. In the subsequent description of such mapping techniques, we concentrate on the most widely used grasps, namely the power grasp and the fingertip precision grasp. Identify human grasp Generate grasp hierarchy

$ Functional finger group assignment for manipulator xV

Follow approximate hand trajectory

Power grasp < t~

Fingertip precision grasp ~,,i~

Gross grasp posture followed by fine-tuning of grasp posture ~V

Regenerate the pregrasp trajectory

Fig. 4 Scheme for generating the manipulator grasp and pregrasp trajectory

6. Generating the power grasp The primary purpose of the power grasp, in which the hand generally envelops the object, is to ensure a secure hold of the held object at the expense of dexterity. Hence, we can dispense with the third step of local grasp adjustment.

6.1. General approach The approach taken to generate the power or "enveloping" grasp follows that taken in "reactive" planning (similar to that described in [2]). Note that this approach is taken in generating the grasp, and is not used in the actual execution of the grasp itself. The approach is characterized with the following phases:

126

1. Aligning the dextrous hand with the approach vector assumed by the human hand just prior to grasping (approach alignment phase). 2. Translating the hand (preserving its orientation) towards the object until the palm touches the object (approach phase). 3. Adjust the hand to "fit" the object to the palm (adjustment phase). 4. Flex the fingers to envelope the object (wrapping phase). Central to the alignment motions of the dextrous hand is the pivot point. It is predefined to be a point on the surface of the palm between the fingers. Its frame is oriented such that its z-direction points away from the palm and the x-direction points from the thumb (or a primary finger/ s) to the rest of the fingers. 6.2. A p p r o a c h a l i g n m e n t of t h e d e x t r o u s h a n d In order to minimize the problem of collision with the object during the pregrasp and post grasp phases, we first align the pivot frame of the dextrous hand to the approach vector assumed by the human hand just prior to grasping the object. This is known as the approach alignment phase. The approach vector is determined as the translation component of the differential transformation AT, where AT = Tg Tg_k-1, where Tg is the transformation of the hand at the grasp frame and Tg_k is the transformation of the hand k frames prior to the grasp frame (k is chosen to be 2). Once the rotational transformation (about the pivot point) has been determined, the dextrous hand is rotated in steps, with the intermediate collision detection with the object. If collision is detected, the dextrous hand is translated or "repulsed" away (preserving its orientation) from the surface of object that makes contact with the hand. The normals of the surface or surfaces in question determine the direction of the repulsion vector. 6.3. A d j u s t i n g t h e h a n d a n d e n v e l o p i n g t h e o b j e c t Once the desired orientation of the hand is assumed, planning proceeds with the following behaviors:

1. Pure translation toward the object (approach phase) The translational motion is dictated by attractive forces by the object on the pivot point on the dextrous hand. This behavior is active prior to any collision between the object and the manipulator palm.

127

~ Object n j--

~

Fig. 5 Determining direction of translation From Fig. 5, p is the pivot point on the manipulator palm, and qj is the jth sampled object surface point, rj is the vector from point qj to p, i.e., rj = qj - p. fij is the object normal at qj. The motion at each stage is M:

d M = 8=-=, Idl

d = -

E llj. rj>o

fD.r. J J

where 8 is the step size in 3D space. r is the weighting factor dependent on Irjl; the function used is o~j = Irj1-2.

Translation and rotation about the pivot point (adjustment phase) Once collision between the object and the manipulator palm is detected, this behavior becomes active. The motion associated with this behavior is defined by both the point or points of contact and the entire object. The points of contact determine the motion which tend to push away the manipulator (repulsive forces) while the entire object tends to pull the manipulator (via the pivot point) towards the object. This has the property of trying to align the manipulator relative to the object prior to grasping. It should be noted that while these behaviors are active, the posture of the manipulator is that of full extension of the fingers (for parallel jaw gripper, maximally distanced fingers). Note: The circumflex (^) over the symbol indicate its normalized version. The rotation about the pivot point is ~, where Q-

E Qj qjCP

and P is the palm, Qj is the quaternion associated with the rotation due to the jth contact point and is given by Qj =

2 Ecos ~-Oj, mjsin-Oj

0j = 8/Irjl and mj = -rj x nj.

128

3. Flexion of fingers around the object (wrapping phase) Once the manipulator has been adequately aligned with respect to the object, the manipulator would then proceed to wrap its fingers around it. This is done by flexing successively distal joints (starting with the most proximal joint) until contact is made between the associated link and the object (Fig. 6 (a) and (b)).

Fig. 6 Results of mapping a cylindrical power grasp for Utah/MIT hand (a) and Salisbury hand (b) This approach is a simple one, and is relatively independent of the manipulator (once the manipulator models have been built). This approach is made possible by the cues given in the human execution of the same task.

7. Generating the fingertip precision grasp 7.1. G e n e r a l a p p r o a c h The approach taken to plan the fingertip precision grasp is different than that for the power grasp, primarily because this precision grasp does not envelope the object. As with the power grasp, there are basically two stages:

1. Initial grasp pose and posture determination phase Let the (N+ 1) contact points on the object be denoted Pc0, Pc 1 . . . . PcN" The initial grasp pose is determined by calculating the pose of the contact frame F c and orienting the pivot frame Fpivot centered at the pivot point on the hand by a certain fixed transformation relative to F c (Fig. 7).

129

Pcl Pco

9

d .....~ P c N Fpivot

Fig. 7 Orienting the pivot frame with respect to the contact frame Subsequently, the pose of the hand is searched at this neighhorhood until a kinematically feasible grasp is found. The contact frame F c is centered about point c, defined to be

1

N

1

1/

N

c -- ~ Z 2 (Pco +pc/) = ~ Pco + ~ Pc/ i=l i=1

/

The orientation of F c is determined as follow (note that the caret ^ on a vector term indicates its normalized version):

N N l = ~ (Pc/-Pco) - Z P c / - N P c o i=1 i=1 If the best fit plane H to the contact points is r 90 = a, where r is any point on H, and 0 its unit normal, then

h = I 0,

l

if

-0,

(P X1) 9(PeN--Pc0) >0 otherwise

and~h = ~ x ~ .

2. Local grasp pose and posture adjustment phase This is done by locally optimizing a task-oriented criterion with respect to the pose and posture of the hand. The local adjustment is done at two levels; the two levels are associated with the optimization with respect to the pose (outer level) and with respect to the joint angles (inner level). Note that the inner level optimization require iterations only if redundancy of finger DOFs exists. For the Utah/MIT hand, each finger has a redundant DOF since there are 4 DOFs per finger with only the 3D position of each fingertip being fixed. There is no redundancy of DOF for the three fingers of the Salisbury hand.

130 Let Aj = (tj, rj) be the vector representing the hand pose, Ojk be the vector of joint angles representing the hand posture, and C0 be the task-oriented criterion to be optimized. Then the optimization can be symbolically represented as:

max fmax C (Aj, Ojk) ) = max C (Aj, Oj, opt)

A.J Lo.

)

A.j

The outer optimization level uses the Nelder-Mead simplex algorithm (see, for example, [ 17]), while the inner optimization is accomplished using Yoshikawa's scheme for optimizing manipulator postures with redundant DOFs [25]. For a given initial hand pose Aj, the locally optimal hand posture Oj, opt is determined using the following pseudocode:

Repeat until either the joint angle limitations are exceeded or the magnitude of joint angle change vector 6)8t is below a threshold {

= ( I - J+J) ~k since ic = O, the contact points being fixed at each iteration.

%k COS0e and points on face j if lhfj. h > cos0f. Since we favor faces over edges, Oei> Of, with 0 e = 75 ~ and Of = 45 ~

131 3. For each virtual finger with more than 1 real finger, fit a line to the human hand contact points, and space the hypothesized dextrous hand-object contact points equally along this line. 4. Find the object surface points nearest to the hypothesized dextrous hand-object points; for each face point, impose a margin ~mfrom the nearest edge (~m = 1.Ocm) if possible; other wise arrange its position in the center between opposite edges. An example of a fingertip precision grasp mapping is shown in Fig. 8.

Fig. 8 Results of mapping a fingertip precision grasp for Utah/MIT hand (a) and Salisbury hand (b)

8.1. The optimization criterion The precision grasp may be generated using a variety of task-oriented measures such as the manipulability measure [25], task compatability index [3], sum of force minimization [ 16], measures based on the task ellipsoid [14], among many others. We illustrate the notion of generating the precision grasp by local perturbation using Chiu's task comparability index [3]. The velocity and force transmission characteristics can be represented geometrically as ellipsoids defined as a function of the manipulator Jacobian. Let the kinematic transformation from joint space to task space be given by x = x (| where | = [01, 02 ..... 0n] T and x = [x 1, x 2, .... Xm]T are the joint and task coordinate vectors. The Jacobian links the time differentials: :e = J ( | where J(| is the mxn manipulator Jacobian matrix, whose elements are Ji,j = ~gxi/~90j. The unit sphere in joint space 91n defined by II| -< 1 is mapped into an ellipsoid in ~ m defined by

This ellipsoid is variously called the manipulability ellipsoid [25] and the velocity ellipsoid [3]. Analogous to the velocity ellipsoid is the force ellipsoid defined by fT ( j (O) jT ( O ) ) f < 1

132 w h e r e f is the force vector in task space and x is the joint torque vector. The force transmission ratio c~ and velocity transmission ratio ~ along direction u in task space can be determined to be 1

1

cz = [u T ( J J T ) u ] 2 and

13 = [u T(JJT)-~u]

respectively. Chiu defines the

compatibility index as

C : Z i

i iw~-t-"k'Zwj~j2 2 j

= Z W i [U Ti ( j j T ) Ui] ~=1 +

i

2

~_wj [u~ (jjT)

-1 uj]

:]:1

j

where the positive exponent of ~i or 13jis used for directions in which the magnitude is of interest, and the negative exponent is used for the directions in which accuracy is of interest. The w's are weighting factors that indicate the relative magnitude or accuracy requirements in the respective task directions. In our application, the finer force control directions are chosen to be along the object normals at the points of contact, while the finer velocity control directions are along the tangent plane defined by two spanning tangents.

9. System implementation The arm/hand system used to verify the grasp mapping technique is shown in Fig. 9. The arm used is the PUMA 560 arm while the dextrous hand used is the Utah/MIT hand. Both the arm and hand is controlled via Chimera, which is a real-time operating system for sensor-based control developed at Carnegie Mellon University [23]. The joint angles of the arm and the hand are fed from the grasp simulator (the display of which is shown in Fig. 10) to the arm/ hand system via Chimera. The two types of grasp performed by the Utah/MIT hand are shown in Fig. 11.

133

Fig. 9 PUMA 560 and Utah/MIT hand system

Fig. 10 Simulator display

Fig. 11 Cylindrical power grasp (left) and fingertip precision grasp (right) by the Utah/ MIT hand

134

10. Concluding remarks Our approach in programming a robot to perform a task is to have the system observe a human executing that task, identify and plan what needs to be done to accomplish the task, and finally execute the task. A very important component of planning the execution of the task is the generation of the manipulator grasp. In this paper, we have described our method in mapping the observed human grasp to that of the manipulator. While this method may not produce a globally optimal manipulator grasp (by some task-related measure), it greatly simplifies grasp planning. In addition, it illustrates our philosophy of easing the programming burden to mostly demonstrating the task; it reduces the complexity of programming and planning the task by taking advantage of the cues derived from observing a human.

Acknowledgments Many thanks to Richard Voyles for helping us set up the PUMA arm and Utah/MIT hand system. This research was supported in part by the Avionics Laboratory, Wright Research and Development Center, Aeronautical Systems Division (AFSC), U.S. Air Force, Wright-Patterson AFB, Ohio 45433-6543 under Contract F33615-90-C-1465, ARPA Order No. 7597, and in part by NSF under Contract CDA-9121797.

References [1]

M.A. Arbib, T. Iberall, and D.M. Lyons, "Coordinated control programs for movements of the hand," Hand Function and the Neocortex, eds. Goodwin, A.W., and DarianSmith, I., Springer-Verlag, 1985, pp. 111-129.

[2]

D.L. Brock and J.K. Salisbury, "Implementation of behavioral control on a robot hand/ arm system", 2nd lnt'l Symposium on Experimental Robotics, 1991, pp. 1-19.

[31

S.L. Chiu, "Task compatibility of manipulator postures," The lnt'l Journal of Robotics Research, vol. 7, no. 5, 1988, pp. 13-21.

[4]

R.M. Crowder, "An anthropomorphic robotic end effector," Robotics and Autonomous Systems, vol. 7, no. 4, 1991, pp. 253-268.

[5]

M.R. Cutkosky and R.D. Howe, "Human grasp choice and robotic grasp analysis," Dextrous Robot Hands, S.T. Venkataraman, and T. Iberall (eds.), Springer-Verlag, 1990, pp. 5-31.

[6]

C. Ferrari and J. Canny, "Planning optimal grasps," Proc. Int'l Conf. on Robotics and Automation, 1992, pp. 2290-2295.

135

[7]

E. Grosso and G. Vercelli, "Grasping strategies for reconstructed unknown 3D objects," Proc. IEEE/RSJ Workshop on Intelligent Robots and Systems, Osaka, Japan, 1991, pp. 70-75.

[8]

K.H. Hunt, Kinematic Geometry of Mechanisms, Oxford University Press, 1978.

[9]

K. Ikeuchi and T. Suehiro, "Towards an Assembly Plan from Observation, Part I: Assembly task recognition using face-contact relations (polyhedral objects)," Proc. Int'l Conf. on Robotics and Automation, 1992, pp. 2171-2177.

[10]

S.C. Jacobson, J.E. Wood, D.EKnutti, and K.B. Biggers, "The Utah/MIT dextrous hand: Work in progress," Int'l Journal of Robotics Research, vol. 3, no. 4, 1984, pp. 2150.

[11]

S.B. Kang and K. Ikeuchi, "Determination of motion breakpoints in a task sequence from human hand motion," to appear in Proc. IEEE Int'l Conf. on Robotics and Automation, San Diego, May, 1994.

[12]

S.B. Kang and K. Ikeuchi, "A grasp abstraction hierarchy for recognition of grasping tasks from observation," Proc. IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems, July, 1993, pp. 194-201.

[13]

S.B. Kang and K. Ikeuchi, "Toward automatic robot instruction from perception - Recognizing a grasp from observation," IEEE Trans. on Robotics and Automation, vol. 9, no. 4, 1993, pp. 432-443.

[14]

Z. Li and S. Sastry, "Task-oriented optimal grasping by multifingered robot hands," IEEE Journal on Robotics and Automation, vol. 4, no. 1, 1988, pp. 32-44.

[15]

D. Lian, S. Peterson, and M. Donath, "A three-fingered, articulated, robotic hand," Proc. 13th Int'l Symp. on Industrial Robots, vol. 2, 1983, pp. 18/91-18/101.

[16]

X. Markenskoff and C.H. Papadimitriou, "Optimum grip of a polygon," The Int'l Journal of Robotics Research, vol. 8, n. 2, 1989, pp. 17-29.

[17]

J.H. Mathews, Numerical Methods for Computer Science, Engineering, and Mathematics, Prentice-Hall, 1987.

[18]

T. Okada, "Object-handling system for manual industry," IEEE Trans. on Systems, Man, and Cybernetics, vol. SMC-9, no. 2, 1979, pp. 79-89.

[19]

J. Pertin-Troccaz, "Grasping: A state of the art," The Robotics Review 1, O. Khatib, J.J. Craig, and T. Lozano-P6rez (eds.), MIT Press, Cambridge, MA, 1989, pp. 71-98.

[20]

N. Pollard, "Planning grasps for a robot hand in the presence of obstacles," Proc. Int'l Conf. on Robotics and Automation, 1993, pp. 723-728.

[21]

K. Rao, G. Medioni, H. Liu, and G.A. Bekey, "Robot hand-eye coordination: Shape description and grasping," Proc. Int'l Conf. on Robotics and Automation, 1988, pp. 407411.

136 [22]

J.K. Salisbury, Kinematic and Force Analysis of Articulated Hands, Ph.D. dissertation, Stanford University, Stanford, CA, 1982.

[231

D.B. Stewart and P.K. Khosla, Chimera 3.1, The Real-Time Programming Operating System for Reconfigurable Sensor-Based Control Systems, Carnegie Mellon University,

[24]

1993. J.D. Wolter, R.A. Volz, and A.C. Woo, "Automatic generation of gripping positions for assembly," Robot Grippers, D.T. Pham and W.B. Heginbotham (eds.), Springer-Verlag,

[25]

1986, pp. 55-74. T. Yoshikawa, "Manipulability of robotic mechanisms," The Int'l Journal of Robotics

Research, vol. 4. no. 2, 1985, pp. 3-9.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

137

Robot Learning By Nonparametric Regression Stefan Schaal and Christopher G. Atkeson Department of Brain and Cognitive Sciences & The Artificial Intelligence Laboratory Massachusetts Institute of Technology, 545 Technology Square, Cambridge, MA 02139"

Abstract: We present an approach to robot learning grounded on a nonparametric regression technique, locally weighted regression. The model of the task to be performed is represented by infinitely many local linear models, i.e., the (hyper-) tangent planes at every point in input space at which a prediction is to be made, i.e., at every query point. Such a model, however, is only generated at the time of prediction and is not retained. This is in contrast to other methods using a finite set of linear models to accomplish a piecewise linear model. Architectural parameters of our approach, such as distance metrics, are also a function of the current query point instead of being global. Statistical tests are presented for when a local model is good enough such that it can be reliably used to build a local controller. These statistical measures also direct the exploration of the robot. We explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a center of exploration and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach by describing how it has been used to enable a robot to learn a challenging juggling task: Within 40 to 100 trials the robot accomplished the task goal starting out with no initial experiences. 1. INTRODUCTION Learning control means improving a motor skill by repeatedly practicing a task. We are exploring systems that learn by explicitly remembering their experiences. These systems will be useful in situations where the form or structure of the task to be performed is not known in advance. Previous work tested such methods by implementing learning for one-shot or static tasks, such as throwing a ball at a target (Aboaf et. al., 1988), and also repetitive or dynamic tasks, such as bouncing a ball on a paddle (Aboaf et. al., 1989) and hitting a stick back and forth (a form of juggling known as devil sticking) (Van Zyl, 1991). This experimental work has highlighted the importance of making sure the control paradigm used is robust to uncertainty, that the representational scheme is able to compute what is known about the task, and how well it is known, and that there is some process that generates exploration, so that modNew address of both authors: Georgia Institute of Technology, College of Computing, 801 Atlantic Drive, Atlanta, GA 30332, USA

138 els and controllers based on insufficient data are improved. In this paper we discuss the current stage of our non para metric learning methods by describing some of their recent extensions with the help of an implementation of learning with the devil sticking robot. In the given context, we are using a form of indirect learning, where a model is learned and then control actions are chosen based on the model, rather than direct learning, where appropriate control actions are learned directly (Barto et. al., 1991); the demonstrated framework, however, is equally well suited to direct learning, too. Our starting point for modeling is that we do not know the structure or form of the system to be con trolled. However, we assume we do know what constitutes a state of the system, and that we measure the complete state. Future work will discuss how to approach tasks in which complete measurements of the state are not available, or what constitutes a state is not even known. We adopted a recently developed statistical technique from nonparametric regression analysis, locally weighted regression (LWR (Cleveland, 1979; Farmer & Sidorowich, 1987; Atkeson, 1992), to learn models the system we are trying to control (Section 2). The LWR approach allows to efficiently estimate local linear models for different points in the state space. This approach is extended here to give information about the reliability of the predictions and local linearizations generated by locally weighted regression. Thus, as will be demonstrated in Section 3, a robot can monitor its own skill level and guide its exploratory behavior, and it can ensure that even in high dimensional state space good learning results are accomplished. 2. L O C A L L Y W E I G H T E D REGRESSION The point of view acquired here that the goal of a learning system for robots is to be able to build internal models of tasks during execution of those tasks. These models are multidimensional functions that are approximated from sampled data (the previous experiences or attempts to perform the task). The learned models are used in a variety of ways to successfully execute the task. Models should incorporate the latest information. As they will be continuously updated with a stream of new training data, updating a model should take a short period of time. There are also time constraints on how long it can take to use a model to make a prediction. Because we are interested in control methods that make use of local linearizations of the plant model, a representation is needed that can quickly compute a local linear model of the represented transformation. Moreover, negative interference from learning new knowledge on previously stored information should be as minimal as possible. As the most generic approximator that satisfies many of these criteria, we explore a version of nonparametric learning called locally weighted regression (LWR). LWR is a memory-based learning (MBL) system: it is trained just by storing the training data in a memory. This allows to achieve real-time learning and avoids interference between new and old data by retaining and using all the data to answer each query. Nonparametric learning sy stems like LWR approximate complex functions by using simple local models, as does a Taylor series. Examples of types of local models include nearest neighbor, weighted average, and locally weighted regression. Each of these local models combines points near to the input point of a desired prediction, the query point, to estimate the appropriate output. Locally weighted regression uses a regression procedure to form the local model, and is thus more expensive than nearest neighbor and weighted average memory-based learning procedures. However, local linear models have been shown to have a favorable balance between bias and variance

139 (Cleveland et al., 1979). They also provide the first derivative at the query point which will be shown below to be very use ful for control tasks. For each query LWR forms a new local model. The rate at which local models can be formed and evaluated limits the rate at which queries can be answered. At the end of this paper, it will be described how locally weighted regression was implemented for a real-time application. Unweighted regression finds the solution to the equations: y=X.13 (2.1a) by minimizing j=l m )2 -~E(Yi--Yi (2. lb) i=l

where X is an m.(n+ 1) matrix consisting of m data points, each represented by its n input dimensions and a "1" in the last column, y is a vector of the corresponding outputs for each data point, ~ is the (n+l)-vector of regression parameters which are to be determined, and J is the sum of squared errors over all given data points (Yi is the predicted result after fitting the parameters [3). The well-known result for [3 yields = ( x T x ) -1 XTy (2.2) and a prediction of the outcome of a query point xq becomes: (2.3)

yq = xT~

However, this gives distant points equal influence with nearby points on the ultimate answer to the query. To weight similar points more, locally weighted regression first calculates the distance of each of the stored data points (rows in the X matrix) to the query point Xq:

d~

n = E(Xq r=l

-- Xir) 2

(2.4)

The weight for each stored data point is a function of that distance:

f(d~) (2.5) Each i-th row in X and y is multiplied by the corresponding weight wi, and the regression problem is solved as be fore. In matrix notation, this introduces the diagonal m , m weight matrix Win J: j= 1 m .~Z(wi(Y i _ ~i))2 = 11 z W ( y - ~)l2 (2.6) W i -"

i=1

and the local weighted regression parameters become: 13= [(wx)T WX] -1 (wx)T Wy

(2.7)

A simple weighting function just raises the distance to a negative power, which determines how local the regression will be (the rate of drop-off of the weights with distance). 1

wi=(df) k

(2.8)

This type of weighting function goes to infinity as the query point approaches a stored data point which forces the locally weighted regression to exactly match that stored point. If

140 the data is noisy, exact interpolation is not desirable, and a weighting scheme with limited magnitude is desired. One such scheme, which we use in what follows, is a Gaussian kernel: w i = exp - 2k 2 )

(2.9)

As in (2.8), the parameter k scales the size of the kernel to determine how local the regression will be. A LWR lookup has three stages: forming the weights (Eqns. (2.4), (2.5)), forming the regression matrix WX, and solving for the regression parameters as given in (2.7). A more thorough introduction of LWR can be found in Atkeson (1992) or Schaal and Atkeson (1994). We have implemented the local weighted regression procedure on a 33MHz Intel i860 microprocessor. The peak computation rate of this processor is 66 MFlops. We have achieved effective computation rates of 20 MFlops on a learning problem with 10 input dimensions and 5 output dimensions, using a linear local model. This leads to a lookup time of approximately 15 milliseconds on a database of 1000 points.

2.1. Tuning The Fit Parameters LWR uses a few parameters which require tuning to optimize the quality of fit. Of most importance is the parameter k in (2.9) and, in multiple input systems, the distance metric m which determines the weighting of the input dimensions with respect to each other. The distance metric enters (2.4) and modifies this equation to: d 2 =~[(Xq--Xir)mr]

2

(2.10)

r=l

In the past we have used off-line global cross validation to estimate reasonable values for these fit parameters. Since we are using a local model that is linear in the unknown parameters, derivatives can be computed of the cross validation error e i = ~ - Yi with respect to the fit parameters k and m and minimize the sum of the squared cross validation error using a Levenberg-Marquardt (nonlinear least squares) procedure (MINPACK, NL2SOL). However, it is clear that these parameters should depend on the location of the query point. In this section we describe new procedures that estimate local values of the fit parameters optimized for the site of the current query point. We want to demonstrate the differences between local and global fitting in an example where we only focus on the kernel width k of a Gaussian weighting function (2.9). In Figure la, a noisy data set of the function 3 3 y = x - s i n 3 (2trx)cos(2tcx )exp(x 4 ) was fitted by locally weighted regression with a globally optimized constant k. In the left half of the plot, the regression starts to fit noise because k had to be rather small to fit the high frequency regions on the right half of the plot. The prediction intervals, which will be introduced below, demonstrate high uncertainty in several places. To avoid such undesirable behavior, a local optimization criterion is needed. Standard linear regression analysis provides a series of well-defined statistical tools to assess the quality of fits, such as coefficients of determination, t-tests, F-test, the PRESS-statistic, Mallow's Cp-test, confidence intervals, prediction intervals, and many more (e.g., Myers 1990). These tools can be adapted to locally weighted regression. We do not want to discuss all possible available statistics here but rather focus on two that have proved to be quite helpful.

141

Figure 1: Optimizing the locally weighted regression fit using different measures: (a) global cross validation, (b) local cross validation Cross validation has a relative in linear regression analysis called the PRESS residual error. The PRESS statistic performs leave-one-out cross validation, however, without the need to recalculate the regression parameters for every excluded point. This is computationally very efficient. Schaal and Atkeson (1994) show how the PRESS residual can be expressed as a mean squared cross validation error M S E c.... . In Figure l b, the same data as in Figure l a was fitted by adjusting k to minimize M S E c. . . . a t e a c h q u e r y p o i n t . The outcome is much smoother than that of global cross validation, and also the prediction intervals are narrower. It should be noted that the extrapolation properties on both sides of the graph are quite appropriate (compared to the known underlying function), in comparison to Figure 1a.

142 Prediction intervals I i are expected bounds of the prediction error at a query point X i . Their derivation can be found in most text books on regression analysis (e.g., Myers 1990), and an adaptation for LWR is straightforward (Schaal and Atkeson, 1994). Besides using the intervals to assess the confidence in the fit at a certain point, they provide another optimization measure. Using prediction intervals as optimization criterion for k, a plot similar to Figure l b is obtained. The curves in Figure 2a,b are generated by using this optimization criterion.

2.2. Assessing The Quality of the Local Model Both the local cross validation assess the quality of the local fit:

a:t

=

4MSEcr~

or

Q:t =

error

17-17'

C

MSEcros s and the prediction interval I i may serve to

(2.11)

C

The factor c makes Qat dimensionless and normalizes it with respect to some user defined quantity. In our applications, we usually preferred Q:t based on the prediction intervals, which is the more conservative assessment.

2.3. Dealing with Outliers Linear regression analysis is not robust with respect to outliers. This also holds for locally weighted regression, although the influence of outliers will not be noticed unless the outliers lie close enough to a query point. In Figure 2a we added three outliers to the test data of Figure 1 to demonstrate this effect. Again, we would like to localize our criterion for outlier removal. The PRESS statistic can be modified to serve as an outlier detector in LWR. For this, we need the standardized individual PRESS residual ei,cros s (Schaal and Atkeson, 1994). This measure has zero mean and unit variance. If, for a given data point x i, it deviates from zero more than a certain threshold, the point can be called an outlier. A conservative threshold would be 1.96, discarding all points lying outside the 95% area of the normal distribution. In our applications, we used 2.57 cutting off all data outside the 99% area of the normal distilbution. As can be seen in Figure 2b, the effects of the outliers is reduced.

2.4. Data Compression Despite that the original version of LWR is purely memory-based and does not attempt any kind of data compression, it is straightforward to add this property. The "infinity" of local linear models can be converted into a finite set of piecewise linear models. An efficient way to do this is by means of a recursive tessellation of the inputs space, another nonparametric technique well-known from decision tree induction (Breiman et al., 1984). K-d tree partitioning is the most common of these methods (Friedman et al., 1977). K-d trees partition the input space into hyper-rectangles by recursively splitting the partitions along one input dimension. There are many different ways when and where to split, and we will sketch only what was used in our algorithms.

143

Figure 2: Influence of outliers on LWR: (a) no outlier removal, (b) with outlier removal For building piecewise linear models, our method starts with one partition which includes all the data. Local linear models are fit to the center of the partition and all its vertices, by LWR. If the local linear model at the center is capable of predicting the outputs of the LWR lookups at the vertices within a certain error margin, the partition is not split anymore, and, if the prediction intervals of the lookup at the center are narrow enough, the partition will be marked valid. The outputs of all query points falling into this partition can be predicted with the linear model at the center. If the linear model of the center of the partition fails to predict the vertices well enough, the partition is split at the center along the dimension where the greatest mismatch between predicted value and real value had been encountered. Then the process repeats with the newly

144 created partitions. The algorithm terminates when no more splitting needs to be done. Figure 3 shows the result of this method for the function introduced in Figure 1. As Figure 3 illustrates, the piecewise linear fit of the data represents the given curve very well. At locations where the data has high frequencies, more partitions are allocated than at the more linear regions. By changing the error margin when to accept a partition and its local linear model, the piecewise fit could become coarser or finer. One advantage of this method over other piecewise linear partitioning techniques is that the partitioning of the input space is only done in order to find candidate centers for the linear models. For the local linear fits, all data are always included in the LWR lookup. Thus, there is no hard but rather a soft partitioning of the input space since each local linear models also includes data from the neighboring partitions. Due to space limitations, this data compression method cannot be discussed in detail. It can be applied in conjunction with all the methods mentioned so far.

1.5

.-.I1 >,,

0.5

-

+

0

-0.5 0

''''1''''1''''1''''1''''1''''1''''1''''1''''1'''' 0.1 0.2 0.3 0.4 0.5

0.6

0.7

0.8

0.9

1

XH>

Figure 3: Piecewise linear fit of the data from Figure 1; the black dots denote the boundaries between two neighboring partitions.

3. R E A L - T I M E LEARNING OF R O B O T J U G G L I N G We have constructed a system for experiments in real-time motor learning (van Zyl 1991). The task is a juggling task known as "devil sticking". A center stick is batted back and forth between two handsticks ( Figure 4a). Figure 4b shows a sketch of our devil sticking robot. The juggling robot uses its top two joints to perform planar devil sticking. Hand sticks are mounted on the robot with springs and dampers. This implements a passive catch. The center stick does not bounce when it hits the hand stick, and therefore requires an active throwing motion by the robot. To simplify the problem the center stick is constrained by a boom to move on the surface of a sphere. For small movements the center stick movements are ap-

145 proximately planar. The boom also provides a way to measure the current state of the center stick. Ultimately we want to be able to perform unconstrained three dimensional devil sticking. The task state is the predicted location of the center stick when it hits the hand stick held in a nominal position. Standard ballistics equations for the flight of the center stick are used into a task state. The dynamics of to map flight trajectory measurements throwing the devilstick are parameterized by 5 state and 5 action variables, resulting in a 10/5-dimensional input/output model for each hand:

(x(t),y(t),O(t))

x=(p,O,k,p,O)

(3.1)

The task command is given by a displacement of the hand stick from the nominal position 0,, and a throw velocity vector (v~, vy).

(Xh,Yh), a center stick an gular velocity threshold to trigger the start of a throwing motion

U:'(Xh,Yh,Ot,Vx,Vy)

(3.2)

Figure 4: (a) an illustration of devil sticking, (b) sketch of our devil sticking robot: the flow of force from each motor into the robot is indicated by different shadings of the robot links, and a position change due to an application of motor 1 or motor 2, respectively, is indicated in the small sketches

146 Every time the robot catches and throws the devil stick it generates an experience of the form: (Xk,Uk,Xk+l)

(3.3)

where x k is the current state, u k is the action performed by the robot, and xk+1 is the state of the center stick that results. Initially we explored learning an inverse model of the task, using nonlinear "deadbeat" control to eliminate all error on each hit. Each hand had its own inverse model of the form: uk = )~-1(Xk,Xk+l)

(3.4)

Before each hit the system looked up a command with the expected impact state of the devilstick and the desired state:

=?

Inverse model learning was successfully used to train the system to perform the devil sticking task. Juggling runs up to 100 hits were achieved. The system incorporated new data in real time, and used databases of several hundred hits. Lookups took less than 15 milliseconds, and therefore several lookups could be performed before the end of the flight of the center stick. Later queries incorporated more measurements of the flight of the center stick and therefore more accurate predictions of the state x k of the task. However, the system required substantial structure in the initial training to achieve this performance. The system was started with a default command that was appropriate for open loop performance of the task. Each control parameter was varied systematically to explore the space near the default command. A global linear model was made of this initial data, and a linear controller based on this model was used to generate an initial training set for the memory-based system (approximately 100 hits). Learning with no initial data was not possible. We also experimented with learning based on both inverse and forward models. After a command is generated by the inverse model, it can be evaluated using a memory-based forward model with the same data: xk+, = )~(Xk,fik)

(3.6)

Because it produces a local linear model, the LWR procedure generates estimates of the derivatives of the forward model with respect to the commands as part of the estimated parameter vector [3. These derivatives can be used to find a correction to the command vector that reduces errors in the predicted outcome based on the forward model: - ~ aa = x k + l - x d

(3.7)

The pseudo-inverse of the matrix ~ / oM is used to solve the above equation for Aft k in order to handle situations in which the matrix is singular or a different number of commands and states exists (which does not apply for devil sticking). The process of command refinement can be repeated until the forward model no longer produces accurate predictions of the outcome. This will happen when the query to the forward model requires significant extrapolation from the current database. We investigated this method for incremental learning of devil sticking in simulations. The outcome, however, did not meet expectations: without sufficient initial data around the setpoint, the algorithm did not work. Two main reasons can be held responsible:

147 i)

ii)

Similar to the pure inverse model approach, the inverse-forward model acts as a one-step deadbeat controller. One-step deadbeat control applies unreasonably high commands to correct for deviations from the setpoint. In the presence of errors in the model, this is detrimental since it magnifies the model errors. Additionally, the workspace bounds and command bounds of our devil sticking robot limit the size of the commands. Due to the nonlinearities in the dynamics of the robot, the 10-dimensional input space of the forward model suffers from the first symptoms of Bellman's "curse of dimensionality". Error reduction as described in (3.7) only works if sufficient data exists at the query sites. The inevitable model errors will make the robot explore randomly, leading to dispersed data. This gives little chance for model improvements. Imagine we had to place data in a (hyper-)cube of normalized edge length 0.1. In a 3-dimensional input space there are 103 such cubes which leaves some probability to finally arrive at the goal. In a 10-dimensional state space, there are 101~such cubes - a prohibitive number for random exploration. Thus, two ingredients had to be added to the devil sticking controller:

a)

b)

Control must start as soon as possible with the primary goal to increase the data density in the current region of the state-action space, and the secondary goal to arrive at the desired goal state. Control actions must avoid deadbeat properties and must be planned to go to the goal in multiple steps.

Both requirements are fulfilled by a simple exploration method, the shifting setpoint algorithm (SSA) (Schaal and Atkeson, 1994). The SSA attempts to decompose the control problem into two separate control tasks on different time scales. At the fast time scale, it generates controls as a nonlinear regulator by trying to keep the controlled system at the setpoints. On a slower time scale, the setpoints are shifted to accomplish a desired goal. The SSA tries to explore the world by going to the fringes of its data support in the direction of the goal. It extends its knowledge in the fringes until statistically sufficient data has been collected to make a further step towards the goal. In this way the SSA builds a narrow tube of data support in which it knows how to control. This data can be used by more sophisticated control algorithms for planning or further exploration. Applied to devil sticking, the SSA proceeds as follows: (1) Regardless of the poor juggling quality of the robot (i.e., at most two or three hits per trial), the SSA makes the robot repeat these initial actions with small random perturbations until a cloud of data is collected somewhere in state-action space of each hand. An abstract illustration for this is given in Figure 5-top-left---> 5-top-right. (2) Each point in the data cloud of each hand is used as a candidate for a setpoint of the corresponding hand by trying to predict its output from its input with LWR. The point achieving the narrowest local confidence interval becomes the setpoint (Xs,Us) of the hand and an LQ controller is calculated for its local linear model: x s = A x s + B u s+c, or Xs,ou t

:

X s --C

:

Ax s + Bu s

(3.8)

148 As in (3.7), the linear model is estimated by LWR. By means of these controllers, the amount of data around the setpoints can quickly and rather accurately be increased until the quality of the local models exceeds a certain statistical threshold.

Figure 5: Abstract illustration how the SSA algorithm collects data in space: top-left: sparse data after the first few hits; top-right: high local data density due to local control in this region; bottom-left: increased data density on the way to the goals due to shifting the setpoints; bottom-right: ridge of data density after the goal was reached

149 (3) At this point, the setpoints are gradually shifted towards the goal setpoints until the data support of the local models falls below a statistical value. The shifting algorithm executes as follows: 1) calculate the error of the predicted output state: e r r s , o u t = Xdesire d - - X S , o u t 2) solve the equation: &S,out

O3Us

m u s ._

BAus = a err s out

for Au s , e.g., by singular value decomposition (Press et al, 1989); o~ e [0,1] determines how much of the error should be compensated for in one step. 3) update u s" u s = u s + Au s and calculate new predicted output setpoint state Xs,ou , with a LWR lookup. 4) assess the fit for the updated setpoint with the help of prediction intervals. If the quality is above a certain threshold, continue with 1), otherwise terminate shifting. Shifting occurs for both x s and Xs,ou ,. After shifting, the kernel k (cf. Eqn. (2.9) is optimized by minimizing the local cross validation error M S E c.... . (In Figure 5, the goal setpoints are given explicitly, but they can also develop automatically from the require ment to throw the devilstick increasingly close to a place, in which the other hand has data, i.e., X desired,lefi : X s,right and X desired,right : X S,lefi .) (4) The SSA repeats itself by collecting data in the new regions of the workspace until the setpoints can be shifted again (Fig. 5-bottom-left). The procedure terminates by reaching the goal, leaving a (hyper-) ridge of data in space (Figure 5-bottom-right). The LQ controllers play a crucial role for devil sticking. Although we statistically exploit data rather thoroughly, it is nevertheless hard to build good local linear models in the high dimensional forward models, particularly at the beginning of learning. LQ control has useful robustness even if the underlying linear models are imprecise.

Figure 6: Learning curves of devil sticking for three runs.

150 We tested the SSA in a noise corrupted simulation and on the real robot. Learning curves of the real robot are given in Figure 6. The learning curves are typical for the given problem. It takes roughly 40 trials before the setpoint of each hand has moved close enough to the other hand' s setpoint. For the simulation a break-through occurs and the robot rarely loses the devilstick after that. The real robot (Figure 6) takes more trials to achieve longer juggling runs. In our first experiments, its performance was not very consistent. Applying LWR and a LQ controller in the above described form turned out to undersample certain directions in control space. Regression analyses need full rank data matrices to achieve good results. By undersampling one direction, the regression matrix becomes close to singular. To avoid this problem some small amount of uniformly distributed noise was added to the controls. The performance of three runs is shown in Figure 6 (note the scale of the ordinate). On average, humans need roughly a week of 1 hour practicing a day before they learn to juggle the devil stick. With respect to this, the robot demonstrated rather nice learning performance, but the controller is still local and could not correct large perturbations, and due to the boom the task is slightly simplified. Future work will attempt to further improve learning rate and robustness. 4. DISCUSSION In this paper we adopted a nonparametric approach to leaming control. By means of locally weighted regression we built models of the world first, and exploited the models subsequently with statistical methods and algorithms from optimal control to design controllers. Despite the computational complexity of these methods, we demonstrated the usefulness of our algorithms in a real-time implementation of robot learning. Using models for control according to the certainty equivalence principle is nothing new and has been supported by many researchers in the last years (e.g., An et al., 1988; Sutton, 1990; Jordan, 1992; Moore, 1991). Using memory-based or non parametric models, however, has only recently received increasing interest. One of the advantages of memory-based modeling lies in the least commitment strategy which is associated with it. Since all data is kept in memory, a lookup can be optimized with respect to the few open architec tural parameters. Parametric approaches do not have this ability if they discard their training data; if they retained all the training data they essentially become memory-based. As we demonstrated in our LWR approach to nonparametric modeling, several established statistical methods may be adopted to assess the quality of a model. These statistics form the backbone of the SSA exploration algorithm. So far we have only examined some of the most obvious statistical tools which directly relate to regression analysis. Many other methods may be suitable, too, particularly from Bayesian statistics. Training a memory-based model is computationally inexpensive, as the data is simply stored in a memory. Training a nonlinear parametric model typically requires an iterative search for the appropriate parameters. Examples of iterative search are the various gradient descent techniques used to train neural network models (Hertz et al., 1991). Lookup or evaluating a memory based model is computationally expensive, as described in this paper. Lookup for a nonlinear parametric model is often relatively inexpensive. If there is a situation in which a fixed set of training data is available, and there will be many queries to the model after the training data is processed, then it makes sense to use a nonlinear parametric model. However,

151 if there is a continuous stream of new training data intermixed with queries, as there typically is in many motor learning problems, it may be less expensive to train and query a nonparametric memory-based model then it is to train and query a nonlinear parametric model. A question that often arises with memory-based models is the effect of memory limitations. Memory use can be minimized based on several methods. One approach is to only store "surprises". The system would try to predict the outputs of a data point before trying to store it. If the prediction is good, it is not necessary to store the point. Another approach is to forget data points. Points can be forgotten or removed from the database based on age, proximity to queries, or other criteria. Because memory-based learning retains the original training data, forgetting can be explicitly controlled. Additionally, we also presented a method for data compression. Data compression on basis of k-d tree partitioning will be useful when sufficient data was collected in a region such that there is enough confidence to build a local parametric model. The piecewise local models which are generated by our method can be used in form of a lookup table which quickly provides predictions and derivatives at a query point. Smoothing across partition can be done by interpolation and will guarantee continuous first derivatives. The LWR-k-d-tree can also be built on-line and in combination with the regular LWR learning. If the k-d tree does not provide a valid partition for a query, the system falls back to the normal LWR mode, otherwise the result is generated by the LWR-k-d-tree. That computational complexity does not necessarily limit real time applications was demonstrated with our successful devil sticking robot. We are able to do lookups for memorybased local models in less than 15 milliseconds for a thousand data points modeling a 10 to 5 mapping, and we are able to build on-line LQ controllers in another 5ms. The initial shortcomings of our deadbeat inverse or inverse-forward model controllers are not due to the LWR learning algorithm but rather to the inherent problems of this kind of control. As has been pointed out by Jordan (1992), inverse models are not goal-directed and perform data sampling in action and not state space. They do not establish a connection between a certain sensation and a certain action but rather a connection between two sensations. Hence, they do not learn from bad actions. A forward model overcomes these problem. Pure forward model controllers, however, are still deadbeat controllers which try to cancel the plant dynamics in one step. This results in large commands if the system deviates only moderately from its desired goal and conflicts with the workspace bounds and command bounds of our robot. Additionally, modeling errors are strongly amplified by deadbeat control. Accurate data sampiing, as it is necessary in high dimensional spaces, becomes thus rather difficult. Due to the statistical properties of locally weighted regression, a simple exploration algorithm like the shifting setpoint algorithm is powerful enough to accomplish the desired task. Deadbeat control was replaced by LQ control which naturally blends into the LWR framework. By no means was the SSA algorithm intended to replace high-level controllers. Indeed, it remains to be explored in how far the chaining of individual LQ controllers is actually robust, and whether an approach from trajectory optimization (Dyer and McReynolds, 1970) would not be more appropriate. In favor of the SSA algorithm stands its easy implementation for real-time systems. Two crucial prerequisites entered our explanations on robot learning. First, we assumed we know the input/output representations of the task, and second, we were able to generate a goal state for the SSA exploration. A good choice of a representation is crucial in order to be able to accomplish the goal at all (Schaal and Atkeson, 1993, Schaal and Sternad, 1993), and

152 we have very limited insight so far how to automate this part of the learning process. Of equal importance is a good choice of a goal state. In devil sticking, the goal state developed out of the necessity that the left and right hand have to cooperate. The initial action, however, which was given by the experimenter, clearly determined in which ballpark the juggling pattern would lie. Certain patterns of devil sticking are easier than others (Schaal et al., 1992), and we picked an initial action of which we knew that it was in the right ballpark. One part of our future work will address these issues in more detail in that we search for good initial actions and strategies to approach a task. 5. CONCLUSIONS The paper demonstrated that a real robot can indeed learn a non-trivial task. As pointed out above, by taking input/output representations and good learning goals as given, a large portion of the task was already solved in advance. Solving the remaining problems became practicable mainly because of the characteristics of the LWR learning method. The local linear models that this algorithm generated at every query point allowed us to make use of optimal control techniques which added useful robustness to the controllers. Since LWR is memory-based, the local linear models could be optimized with respect to statistical uncertainty measures. These measures also served as the basis of the SSA exploration algorithm. Such statistical tools are not generally available in learning. LWR is particularly suited to exploit statistics since it originates from a statistical method, and we could thus easily assure compatibility of the statistics and the learning algorithm. As a last point, LWR does not suffer from problems of interference when being trained on new data. Interference means a degradation of performance in one part of the model when training the model with data relevant for different parts. Such an effect could happen during SSA shifting if a parametric learning method were applied. But since lookups with LWR are affected only by a small neighborhood of the query point, interference problems are greatly reduced. Our future work will focus on extending LWR model-based control to multistage problems in the optimal control domain. Devil sticking should not only be stable within the validity of the local linear controllers but rather exhibit global stability. This requires nonlinear optimal control and planning techniques which we are currently exploring. Future work must furthermore address how we could approach tasks in which complete measurements of the states are not available, or what constitutes a state is not even known. ACKNOWLEDGMENTS Support was provided by the Air Force Office of Scientific Research, by the Siemens Corporation, the German Scholarship Foundation and the Alexander von Humboldt Foundation to Stefan Schaal, and a National Science Foundation Presidential Young Investigator Award to Christopher G. Atkeson. We thank Gideon Stein for implementing the first version of LWR on a DSP board, and Gerrie van Zyl for building the devil sticking robot and implementing the first version of learning of devil sticking.

153

REFERENCES Aboaf, E. W., Atkeson, C. G., & Reinkensmeyer, D. J. (1988). "Task-level robot learning." In: Proceedings of the IEEE International Conference on Robotics and Automation, April 24-29, Philadelphia, Pennsylvania. Aboaf, E. W., Drucker, S. M., & Atkeson, C. G. (1989). "Task-level robot leafing: Juggling a tennis ball more accurately." In: Proceedings of lEEE Interational Conference on Robotics and Automation, May 14-19, Scottsdale, Arizona. An, C. H., Atkeson, C. (3., & Hollerbach, J. M. (1988). Model-based control of a robot manipulator. Cambridge, MA: MIT Press. Atkeson, C. G. (1992). "Memory-based approaches to approximating continuous functions." In: Casdagli, M., & Eubank, S. (Eds.), Nonlinear Modeling and Forecasting, pp.503-521. Redwood City, CA: Addison Wesley. Barto, A. G., Bradtke, S. J., & Singh, S. P. (1991). "Real-time learning and control using asynchronous dynamic programming." Technical Report 91-57, University of Massachusetts at Amherst, Amherst: Univ. of Massachusetts. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group. Cleveland, W. S. (1979). "Robust locally weighted regression and smoothing scatterplots." Journal of the American Statistical Association, 74, pp.829-836. Dyer, P., & McReynolds, S. R. (1970). The computation and theory of opitmal control. New York: Academic Press. Farmer, J. D., & Sidorowich (1987). "Predicting chaotic time series." Phys. Rev. Lett., 59 (8), pp.845-848. Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). "An algorithm for finding best matches in logarithmic expected time." A CM Transactions on Mathematical Software, 3 (3), pp.209-226. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Redwood City, CA: Addison Wesley. Jordan, I. M. (1992). "Supervised learning with a distal teacher." In: Cognitive Science, 16, 3, pp.307-354. Mallows, C. L. (1966). "Choosing a subset regression." unpublished paper presented at the annual meeting of the American Statistical Association, Los Angles,. Moore, A. (199 l a). "Fast, robust adaptive control by learning only forward models." In: Moody, J. E., Hanson, S. J., & and Lippmann, R. P. (Eds.), Advances in Neural Information Processing Systems 4. San Mateo, CA: Morgan Kaufmann. Myers, R. H. (1990). Classical and modern regression with applications. Boston, MA: PWSKENT. Press, W. P., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes in C - The art of scientific computing. Cambridge, MA: Press Syndiacate University of Cambridge.

154 Schaal, S., Atkeson, C. G., & Botros, S. (1992b). "What should be learned?." In: Proceedings of Seventh Yale Workshop on Adaptive and Learning Systems, p.199-204, New Haven, CT. Schaal, S., & Atkeson, C. G. (1993b). "Open loop stable control strategies for robot juggling." In: IEEE International Conference on Robotics and Automation, 3, pp.913-918, Georgia, Atlanta. Schaal, S., & Stemad, D. (1993c). "Learning passive motor control strategies with genetic algorithms." In: Nadel, L., & Stein, D. (Eds.), 1992 Lectures in complex systems. Redwood City, CA: Addison-Wesley. Schaal, S., & Atkeson, C. G. (1994a). "Robot juggling: An implementation of memory-based learning." Control Systems Magazine, 14, 1, pp.57-71. Sutton, R. S. (1990). "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming." In: Proceedings of the International Machine Learning Conference. van Zyl, G. (1991). "Design and control of a robotic platform for machine learning." MS Thesis, Cambridge, MA: Massachusetts Institute of Technology, Dept. of Mech. Engineering. Wahba, G., & Wold, S. (1975). "A completely automatic french curve: Fitting spline functions by cross-validation." Communications in Statistics, 4 (1),.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

155

Behavioral control in mobile-robot navigation using a fuzzy decision making approach H.R. Beom a, K.C. Koh a and H.S. Chob aResearcher, Research & Development Laboratory, LG Industrial Systems, Co., Ltd. 533 Hogae-dong, Dongan-gu, Anyang, 430-080, Korea bprofessor, Center for Robotics and Automation, Department of Precision Engineering and Mechatronics, Korea Advanced Institute of Science and Technology 373-1 Kusong-dong, Yusong-gu, Taejon, 305-701, Korea

This paper proposes a sensor-based mobile robot navigation which utilizes a multi-criterion decision-making method in uncertain environments. In order to acquire information about the environment around a mobile robot, ultrasonic sensors mounted on the front of the mobile robot are used. The multi-crtefion decision-maker uses fuzzy sets to select a reactive and purposive behavior in accordance with a global mission from a higher-level planner and the situation around the robot. The reactive and purposive behaviors correspond to avoidance and goal-seeking behaviors, respectively. The two behaviors are designed to produce the prescribed responses to currently perceived sensory information using fuzzy logic and reinforcement learning. A decision function to appropriately select one of two behaviors is formulated by use of a fuzzy set. The optimal alternative of two behaviors is defined as the one achieving the higher degree of membership in the fuzzy decision function. The fuzzy decisionmaker provides the mobile-robot navigator with a degree of flexibility which cannot be found in conventional decision-making methods. The effectiveness of the proposed method is verified by a series of simulations. 1. INTRODUCTION Path planning is one of the most vital tasks in navigation of autonomous mobile robots and is usually divided into two categories. One is global path planning based on a priori complete information about the environment and the other is local path planning based on sensory information in a cluttered environment. Local path planning utilizes the information provided by sensors such as an ultrasonic sensor, vision, laser rangefinder, proximity sensor or touch sensor. Brooks [1] applied the force-field concept to the obstacle avoidance problem for a mobile robot equipped with ultrasonic sensors whose readings are used to 'compute the resultant repulsive force. Borenstein and Koren [2] proposed the vector-field histogram method for a fast-running mobile robot equipped with ultrasonic sensors. However, with the above methods, it is difficult to find the force coefficients that influence the velocity and the direction of the mobile robot in a cluttered environment. As the altematives, fuzzy logic and a neural network have been employed in obstacle avoidance for mobile-robot navigation [3]. The neural network informs the mobile robot of its

156 situation at the present instant. According to the situation, fuzzy rules are brought into play to cause the mobile robot to act. However, it is difficult to construct and tune the rule bases. Thus, in the earlier study [4], a navigation method using avoidance behavior and goal-seeking behavior was proposed. Each behavior was designed to make the mobile robot act in accordance with currently perceived sensory information by using the fuzzy logic and reinforcement learning. Recently, advanced behavior-based systems have been proposed to give a purposive capability to the mobile robot. Arkin [5] proposed a motor schema-based architecture describing the interaction between perception and action of the mobile robot. The behaviorbased approach decomposes the system into individual modules, each of which is responsible for one behavior to be performed by the mobile-robot system. Whenever the mobile robot navigates in a cluttered environment towards the goal position, the two behaviors, avoidance behavior and goal-seeking behavior, always conflict. Hence, in order for the mobile robot to arrive at the goal position without colliding with the obstacles, a method to effectively fuse the behaviors is required. Saffiotti et al. [6] have proposed a behavior-blending method using fuzzy logic, and Salichs et al. [7] have proposed a method using reinforcement leaming. In this study, a new way of combining two behaviors using a multi-criterion decisionmaking method is proposed. A general decision-making problem can be expressed in terms of the simultaneous satisfaction of conflicting criteria. In conventional decision-making problems, a single scalar criterion, the weighted sum of the deviation from an ideal goal value, is constructed and optimized to identify non-dominated solutions. One drawback of such a formulation is that the optimum solution tends to be sensitive to numerical weights. The fuzzy decision-making approach models objective functions and constraints in the form of fuzzy goals and then transforms the problem into a conventional optimization using fuzzy weights. The decision function is a combination of criteria obtained by some aggregation operations. Two well-known aggregation operations are intersection and union. In multi-criterion decisionmaking using fuzzy sets, decision alternatives are usually obtained by constructing a decision function. In this study, the fuzzy decision function is constructed by using the sensory information about the environment around the mobile robot and global mission from a higherlevel planner. 2. MOBILE-ROBOT NAVIGATION ARCHITECTURE The proposed navigation method is used for a wheeled mobile robot named LCAR which has been developed in the Laboratory for Control Systems and Automation, KAIST. As shown in Fig. 1, the robot has four wheels: two drive wheels fixed at the sides of the mobile robot and two castors attached at the front and rear of the robot. The robot moves and changes its heading angle by the rotation of the two drive wheels. Four DC servo motors are used to drive the two wheels and one camera mounting device. LCAR is equipped with a camera, a laser lightstripe, ultrasonic sensors and a DSP board. The mobile-robot navigation architecture employed in this study is shown in Fig. 2. The architecture consists of the following modules: higher-level planner, perception module, primitive behaviors, fuzzy decision-maker, and lower-level controller. The higher-level planner, giving the goal and sub-goal positions, consists of a global path planner and a mission planner. The primitive behaviors produce a prescribed robot action to the currently perceived sensory information. According to the intervention of the fuzzy decision-maker, the output of only one

157 behavior transfers to the lower-level controller, called the dead reckoner. The lower-level controller controls the robot, taking into account the limitation of its driving angle and maximum speed. The primitive behaviors and fuzzy decision-maker will be explained in detail later.

Figure 1. The wheeled mobile robot, LCAR

Figure 2. The mobile-robot navigation architecture

2.1. Perception system using ultrasonic sensors Environments of mobile-robot navigation can be classified as static or dynamic. The dynamic environment is defined as follows: obstacles can move or change their shapes, and some obstacles may not initially be registered on the global map. In this study, eighteen ultrasonic sensors mounted in the front face of the mobile robot are used to understand environmental situations, for example, what types of obstacles exist in front of the mobile robot or whether the robot can proceed without colliding with obstacles. These sensors act both as transmitters and receivers. Fig. 3 shows the arrangement of ultrasonic sensors mounted in the front face of the mobile robot with sensors being marked as dots in the figure. The distances ej ( j - 1,2 ..... 18), from the origin of the robot frame {R} to obstacles detected by jth sensor Sj, can be defined as ej - 8j + R. Here, R is the radius of the robot and the 6j is thejth sensor reading shown as a dashed line. In this figure, ~pj denotes the orientation of thejth sensor with respect to the flame {R}. Thus, the orientations of ultrasonic sensors can be expressed by the relationship: ~oj - (j - 9.5)~, (j = 1,2,...,18).

(1)

158 Here, the angle ~ denotes the angle between two adjacent sensors and is fixed at 11.25~. The number, 9.5, is required to express the orientation angles of the sensors installed at the fight and the let't hand sides from x-axis by negative and positive values, respectively.

Figure 3. The arrangement of ultrasonic sensors and sensor suites It is assumed that the y-axis of the coordinate frame {R} is aligned with the direction of motion of the mobile robot. As shown in Fig. 3, in order to avoid the increase of the dimension of input space, sensor suites are introduced. A sensor suite consists of a group of neighboring sensors. The first and the filth sensor suites, ssa and s s 5, a r e composed of three neighboring sensors, while the other sensor suites are composed of four neighboring sensors. The distance d~ ( i - 1,2 ..... 5) measured by the ith sensor suite from the center of the mobile robot to the obstacle, and the direction #i of the obstacle with respect to the frame {R} are defined, respectively, by R + min(6j IJ : 4 i - 4, 4 i - 3, 4 i - 2, 4 i - 1) /f i : 2,3,4 at, =

+min(6j Ij = 2'-~, 2 '-' +1,2 '-I +2)

7r.

~i -- "~-(1 --

3)

9

/ f i = 1,5

(2)

(3)

The ultrasonic sensors are controlled by an 8031 microcontroller communicating with a PC/AT computer. Their control boards consist of a transmitter module to fire the pulses, a receiver/gain-control module to receive the attenuated echo pulses, and a switching module to arbitrarily select the activation of the sensor according to the sensing mode. In the simulation study, it is assumed that the sensors can ideally detect the obstacles irrespective of inclination angle. 2.2. Primitive behaviors The primitive behaviors may be classified as follows: goal-seeking behavior, wall-following

159 behavior, keep-away behavior, free-space explorer, and emergency stop. The primitive behavior can be characterized by a temporal sequence of appropriate values for its heading velocity v and incremental steering angle A0 which cause the robot to exhibit the prescribed response to sensory information. Thus, the output vector of a primitive behavior n(t) is defined as u ( t ) = (v(t),AO) r

(4)

where T denotes the transpose and t denotes the time step of the robot controller. From now on, the index t is omitted for notational simplicity. The wall-following behavior follows the boundary of stationary obstacles maintaining a prespecified distance from the center of the mobile robot. This behavior considers the minimum radius of curvature of the mobile robot. The free-space explorer provides the mobile robot with movement in a direction with the greatest distance between the robot and obstacle boundaries. The keep-away behavior basically makes the mobile robot avoid obstacles in its proximity, and produces an output vector corresponding to movement in the direction that ensures no collision with obstacles. This behavior takes information from measurements provided by the front sensors. Since the above behaviors produce mobile-robot actions based on the currently perceived sensory information, we will unify them as an avoidance behavior producing the same action as the individual behaviors do. The goal-seeking behavior produces an output which directs the robot towards a specific goal defined by a point ( x , y ) in the world coordinate frame {W}. This behavior receives the goal command from the higher-level planner and produces the mobile-robot action that causes the robot to move smoothly towards the goal position. This takes into account the minimum radius of curvature of the mobile robot. Therefore, it corresponds to purposive behavior. In this study, we will design the navigator using the avoidance behavior and the goal-seeking behavior. The fuzzy decision-maker decides the next behavior based on the fuzzy goals. Let {u~,u2,...,n,} be a set of outputs of behaviors and {q~,q2,...,q.} be an output of the fuzzy decision-maker. If the ith behavior is chosen by the fuzzy decision-maker, then q~(t) is equal to 1 and the others are zero. Thus, the mobile-robot motion command transferred to the lowerlevel controller at time step t is u(t) - ~ q~(t)u,(t) = q,(t)ul(t ) + q2 (t)u2 (t) + "'" + q~(t)u,(t) + ... + q , ( t ) u , ( t ) = u, (t).

(5)

i

2.3. Design of two behaviors Our goal is to design a navigator operating in an uncertain environment. In the previous study [4], we designed two behaviors using fuzzy logic and reinforcement learning. In this study we will deal with how to choose one of two behaviors using fuzzy decision-making Design of the two behaviors proceeds in the following sequence: (A) fuzzification of the input/output variables, 03) rule-base construction through reinforcement learning, (C) reasoning process, and (D) defuzzifieation of output variables. The fuzzy decoder partitions the input space formed by the information from internal and external sensors into subspaces. The partitioned input spaces are represented by the rules. Each behavior is composed of fuzzy

160 decoder, decision making, rule base, reinforcement-learning mechanism, and defuzzifier. The goal of navigation in an uncertain environment is to make the mobile robot arrive at a destination point without colliding with obstacles. In order to accomplish the navigation effectively, we must control the mobile-robot motion by considering the obstacle position Xoi =(Xo~,Yoi), the robot's position X = (x,y), and its heading angle 0 with respect to the world coordinate frame {W} shown in Fig. 4.

Figure 4. The coordinate frames and control variables The motion of the mobile robot is characterized by its heading velocity v and incremental steering angle A0. Thus, we choose the input variables for avoidance behavior, 4 -IlXo,- xll 1,2 ..... 5), and those for goal-seeking behavior, heading-angle difference and distance to goal z=

IIx,- xll

output v

u

for two behaviors is expressed by the

velocity v and the incremental steering angle A0. Here, the subscript i denotes the number of the sensor suite detecting the obstacles in front of the robot and I1~ denotes the Euclidean norm. The variable d~ is calculated by Eq. (2). The angle g is that between the heading direction of the mobile robot and the direction of the goal position, z is the distance from the current position X-(x,y) to the goal position Xg- (xg,yg) with respect to the frame {W}. All the variables are defined in Fig. 4. Using these variables, a rule base is constructed for each behavior to infer two actions. Avoidance behavior and goal-seeking behavior have 324 rules and 56 rules, respectively. The output fuzzy sets are expressed by the linguistic values with membership functions having the triangular shaped functions. Initially, all the center values of the membership functions of the output fuzzy sets, representing the linear velocity and the incremental steering angle, are set to constants. When the control-action signals are applied to the mobile robot at time step t the robot moves by the amount of the movement vector re(t). As a result of the series of movement vectors {m(0),..., m(t - 3), m(t - 2), m(t - 1), re(t)}, shown in Fig. 5, the mobile robot may collide with the obstacle or move away from the goal position. When the robot navigates

161 through a section of the environment similar to the previously experienced one, the rules contributed to generate the control action must be corrected in order to avoid collision. The rules influencing the mobile robot's failures are the rules used at the previous time steps t, t - 1, t - 2 . . . . . Thus, such rules must be changed into the correct rules. This task is accomplished by two neuronlike adaptive elements [8] consisting of an associative search element and an associative critic element. By use of these dements, the center values of the membership functions of output fuzzy sets approach the correct values as the learning step progresses.

Figure 5. The movement vectors associated with colliding with an obstacle or moving away from the goal position

3. BEHAVIOR SELECTION Whenever the mobile robot navigates in an uncertain environment towards the goal position, two behaviors, avoidance and goal-seeking, always conflict. Therefore, for the complete navigator, it is necessary for the fuzzy decision-maker to efficiently select one of the behaviors.

3.1. Multi-criterion decision-making using fuzzy sets In 1970, Bellman and Zadeh considered the classical model of a decision and suggested a model for decision making in a fuzzy environment [9]. They considered a situation of decision making under certainty, where the objective function and constraints were characterized by their membership functions. Since we want to satisfy the objective function and the constraints simultaneously, the decision in a fuzzy environment is defined by analogy to that in a crisp environment as selecting the one among the alternatives which satisfies the objective function and the constraints. Thus, the decision in the fuzzy environment can be considered as the intersection of fuzzy objective function and fuzzy constraints. In this case there is no difference between objective function and constraints. Usually it is impossible for a single criterion to determine an optimal solution to a decision problem and to judge the suitability of the solution. Thus, a multi-criterion decision-making has been proposed and it has led to numerous evaluation schemes. The multi-criterion decisionmaking using fuzzy sets can be divided into two major areas: multi-objective decision-making

162 and multi-attribute decision-making [10]. The former concentrates on continuous decision space. The latter focuses on the problem with discrete decision spaces. The problem for selection of appropriate behavior corresponds to multi-attribute decision-making. Let u = ( u 1, u2,..., u,) be a set of alternatives resulting from primitive behaviors and a set of fuzzy goals (~ = ((~1,(~,..., (~m), by which the suitability of a behavior is judged. The jth fuzzy goal (~s is characterized by its membership function/aoj(u). In the following, the tilde sign (-~) representing the fuzzy sets will be dropped for notational simplicity. However, there are some cases where some goals and constraints are of greater importance than others. In such cases, the fuzzy decision function D might be expressed as the goals with weighting coefficients reflecting the relative importance of the constituent terms. The importance of goal Gs is expressed by the exponent a s. Thus, the fuzzy decision function is defined as the intersection of all the fuzzy goals (6)

D = G~' N GZ~ N ... N G:, ..

The rationale behind using weights as exponents to express the importance of a goal can be found in [10]. The greater the importance of a goal, the larger is the exponent of its fuzzy set, if the fuzzy sets are normalized. The condition aj > 1 means more importance, while a / < 1 means less importance. The problem is to determine one of the alternatives u i with the highest degree of suitability with respect to all relevant goals Gj, j = 1,2,...,m. The fuzzy set decision D in discrete space as the intersection of the Gj(u~) becomes

o- [{o,. min o

}.,- ....

1

(7)

The relative importance, a/, of the goals is established in the form of matrix A by pairwise comparison as follows:

A

=

I a,/b, a,/a~ ... a,/a. a2(a~ .

.

(8)

By applying Saaty's eigenvector method [11] to matrix A, consistent weights ws for the membership function of the j th goal are determined. As an alternative to the above, a fuzzy set decision D may be calculated by

o: [{.i, 1 , 2 , . . I,. =, 1,2,...,o;j: o ] ,

(9)

163 m

where

~ wj = 1.0. This method is useful when the number of fuzzy goals is small. The J optimal alternative is defined as the one achieving the highest degree of membership in D.

3.2. Fuzzy goals in the navigation problem In order for the mobile robot to arrive at its goal position without colliding with obstacles, an appropriate behavior must be selected according to the situation around the robot. When the obstacles are located around the mobile robot as shown in Fig. 4, the robot must reduce its speed in order not to collide with them. If the maximum acceleration am~x is applied to the mobile robot with an arbitrary initial velocity, the minimum time required to avoid the obstacle is calculated by trn~ - v c o s t p J a ~

x .

(10)

If a constant acceleration less than the maximum value is applied to the mobile robot, the time required to avoid the obstacle is calculated by to -

2(ej

-

R)/(veos~j).

(11)

Therefore, when the distance from the center of the mobile robot to the obstacle and the velocity component towards the obstacle are given, the marginal avoidance time tma r shown in Fig. 6 can be defined as the difference between the above two required times.

Figure 6. The marginal avoidance time

Figure 7. The minimum time to transfer the mobile robot to the goal position

Since the marginal avoidance time denotes the degree of collision, it is used in this study to represent the repulsive potential. Repulsive potential E r e ~ is produced by the inversion of the marginal avoidance time, which is a function of the mobile robot's velocity and the distances to the obstacles measured by the ultrasonic sensor array. Thus, the repulsive potential is defined by

164 su x"

Krar~xv COS~j 2am~ (ej - R ) - v2 cos2 %

(12)

where K r is a scale factor set to 0.0003 and SN is the number of ultrasonic sensors. Repulsive potential increases as the mobile robot goes closer to the obstacles, and approaches zero as its velocity in the direction of an obstacle approaches zero. When the mobile-robot configuration is as given in Fig. 4, the attractive potential is defined in order to transfer the robot with initial velocity v to the goal position. This potential is calculated using the minimum time tg as follows:

:

c~

+4a xl[X-

x ll-v ~

(13)

ar~x

Attractive potential depends on the distance to the goal position, the heading angle difference, and the initial velocity. As can be seen from Eq. (13) and Fig. 7, attractive potential increases as the mobile robot moves away from the goal position. The situation around the robot can be represented by the use of two potentials, Erep and Ea. When two behaviors are assumed to apply at the next step, the control actions derived from each behavior are calculated. Then, the changes of potential, Crep(Ui ) "-" AEre p (Ui) and cau(u~) = AEau(U~) (i = 1 (avoidance), 2 (goal-seeking)) resulting from the actions of the two behaviors are calculated. In order that the mobile robot can move through the environment to the goal position without colliding with obstacles, the changes of the weighted sum of the two potentials must be negative. However, it is very difficult to find the weighting coefficients in a cluttered environment. Furthermore, the uncertainties in the sensor measurements can be introduced into the potentials. Hence, in this study, one behavior to be used at the next action step is selected by the fuzzy decision-maker. To perform this, three fuzzy goals are defined by Gl: the change in repulsive potential Crev (Ui) shouM be smaller than q and G 2: the change m attractive potential Can(Ui) shouM be smaller than c2 and G 3:the repulsive potential Erev (u~) should be smaller than E 1 9

Here, the connective "and" denotes the intersection of fuzzy goals. The membership functions of the fuzzy goal are defined by 1.0 ~s 1(Crep (Ui )) -

~s (Catt(Ui)):

1.0 1.0 + M 1(Crep(Ui) + C1) Nl

i f Crep(Ui) ~_ C1

otherwise

g c=,(u,)_< c~ 1.0 1.0 otherwise 1.0 + Mz (c~u (u, ) + c2 ) N~

(14)

(15)

165

if E,,,(,,,)__ E, otherwise ~as (Er~v(ui))= 1.O+Ms(Er,v(ui)+E,)Ns 1.0 1.0

Here,

(16)

Nj, j = 1, 2, 3 are even integers and the Mj, j = 1,2, 3 are positive constants. The q, c2

and E1 are determined through a series of simulations. As can be seen from the above equations, all the fuzzy sets are normalized. 4. SIMULATION AND RESULTS 4.1. Simulation conditions As an illustration of the foregoing process, a series of simulations were carried out using an arbitrarily constructed environment with sixteen obstacles. It was assumed that the radius of the mobile robot is 0.2m and the extent of the environment is 10m • 10m. The maximum speed of the robot was assumed to be 35cm/s. The measurement range of the sensor was assumed to be eight times the radius of the mobile robot. In order to acquire the information about the environment, a sensor simulator giving the range data of the ultrasonic sensors was used. For the sensor simulator, it was assumed that the ultrasonic sensors can ideally detect the nearest obstacles within their beams. The correct rule bases for each behavior were those constructed in the previous study [4] and they were employed here for the simulations to select a behavior using the fuzzy decision-maker. In the simulations, the parameters Nj, j = 1,2,3 determining the shape of the membership

functions of the fuzzy goals, were all set to 2. The constant values, q, c~ and E 1, were determined through a series of simulations and were set to -0.01, -0.02 and 1.0 respectively. 4.2. Simulation of the selection of behaviors By use of Eqs. (2) and (3), the eighteen distance values measured by the sensors are grouped into the five distance values belonging to each sensor suite. The behaviors produce the prescribed responses to these five distance values. Let us assume that avoidance and goalseeking behaviors are both used at the next action step. Then, we can obtain two movement vectors corresponding to the outputs of the two behaviors. Using these two vectors and the mobile robot's current position and orientation, the potentials are calculated by Eqs. (12) and (13) and their changes are calculated. Using these values, the membership degrees representing the degree of fuzzy goal satisfaction are determined by Eqs. (14) through (16). Then, the minimum value of the membership degrees is chosen. This value denotes the degree of behavior satisfaction with respect to fuzzy goals. Thus, the fuzzy decision set D represented by Eq. (7) is obtained. The optimal alternative, the behavior to be used at next step, is determined by the behavior with the highest degree of membership in fuzzy decision set D. To test the effectiveness of the method using the fuzzy decision-maker, various start positions and initial heading angles were employed. In all simulations, the goal position is fixed and assumed to be located at the upper comer of the right-hand side within the environment as shown in Fig. 8. The weights wj (j = 1,2 ..... m) reflecting the relative importance of the fuzzy goals must be

determined. It was assumed that all the goals are of equal importance. Since a 1, % and a 3 in Eq. (8) can be equally set to 1.0, we can obtain the eigenvalue set as {0.0, 0.0, 3.0}. After

166 calculating three eigenvectors we can determine one eigenvector with the elements of the same sign as w = {w~,w2,w3} r = {1.0,1.0,1.0} r among the three eigenvectors.

Figure 8. The results of execution of navigation Fig. 8(a) shows the result of execution of navigation when M 1, M 2 and M 3 are all set to 0.2. The mobile robot can arrive at the goal position by passing through the environment without colliding with any obstacles. Fig. 8(b) shows the result of execution of navigation under the same conditions as above, except that M 2 = 0.05. As can be seen from Eqs. (14) through (16), the parameter M~ plays the role of modifier. The modifier is an operation which

167 changes the meaning of a fuzzy set. The case with the small _M2 corresponds to the case when w2 < 1.0. Therefore, this corresponds to the case when the fuzzy goal (7,_ associated with the change in attractive potential is less important than the others. As can be seen from this figure, the smaller the value of M 2, the larger should be the number of time steps for arrival at the goal position. Fig. 8(c) shows the result of execution of navigation when a different start position and initial heading angle are given. Fig. 8(d) shows the result of execution of the case when a local minimum occurs. As can be seen from these figures, even if the mobile robot is faced with situation where a local minimum occurs, the navigator with fuzzy decision maker shows a feasibility to escape from the local minimum. We believe that the proposed navigator with the fuzzy decision-maker enables the mobile robot to navigate through the environment more smoothly if more than three fuzzy goals are used. 5. CONCLUSIONS A behavior selection method based on fuzzy decision-making has been presented for mobile-robot navigation in complex environments. The mobile robot's situation at any instant is classified by calculating the repulsive and attractive potentials using both the currently perceived sensory information and the goal position provided by the higher-level planner. The decision function to select the behavior suitable in the vicinity of the robot is formulated as a fuzzy set using the fuzzy goals. Then, optimal selection of behaviors is determined by achieving the highest degree of membership in a fuzzy decision set. The fuzzy decision-making provides a flexibility which cannot be found in conventional decision-making. Furthermore, it is easy to formulate the relevant problems since there is no difference between goals and constraints in the fuzzy environment. In this study, even if three fuzzy goals are used, the mobile robot can move towards the goal position through the environment without colliding with obstacles. It has been shown that the proposed method enables the robot to navigate through complex environments where a local minimum occurs. REFERENCES

1. R.A. Brooks, "A robust layered control system for a mobile robot", IEEE J. of Robotics and Automation, vol. RA-2, no. 1, pp. 14-23, 1986. 2. J. Borenstein and Y. Koren, "Real-time obstacle avoidance for fast mobile robot", IEEE Trans. Syst. Man Cybern., vol. 19, no. 5, pp. 1179-1187, Sept./Oct. 1989. 3. H.R. Beom and H.S. Cho, "A sensor-based obstacle avoidance controller for a mobile robot using fuzzy logic and neural network", in Proc. Int. Conf. Intelligent Robots and Systems, pp. 1470-1475, 1992. 4. H.R. Beom and H.S. Cho, "A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning", IEEE Trans. Syst. Man Cybern., vol. 25, no. 3, Mar. 1995. 5. R.C. Arkin, "Motor schema-based mobile robot navigation", International Journal of Robotics Research, vol. 8, no.4, pp. 92-112, Aug. 1989. 6. A. Saffiotti, E.H. Ruspini and K. Konolige, "Blending reactivity and goal-directedness in a fuzzy controller", in Proc. IEEE Int. Conf. Fuzzy Systems, pp. 134-139, 1993. 7. M.A. Salichs, E.A. Puente, D. Gachet and J.R. Pimentel, "Learning behavioral control by reinforcement for an autonomous mobile robot", in Proc. IECONV93, pp. 1436-1441, 1993.

168 8. A.G. Barto and R.S. Sutton, "Neuronlike adaptive elements that can solve difficult learning control problems", IEEE Trans. Syst. Man Cybem., vol. SMC-13, no. 5, pp. 834-846, Sept./Oct. 1983. 9. R.E. Bellman and L.A. Zadeh "Decision making in a fuzzy environment", Management Science, vol. 17, no.4, pp. 141-164, Dec. 1970. 10. H.J. Zimmermann, Fuzzy set theory and its application. Kluwer-NijhoffPublishing, 1985. 11. T.L. Saaty, "Exploring the interface between hierarchies, multiple objectives and fuzzy sets", FSS 1, pp.57-68, 1978.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

169

Learning Emergent Tasks for an Autonomous Mobile Robot D. Gachet, M.A. Salichs and L. Moreno Dpto. Ingenier[a, Universidad Carlos III de Madrid, Butarque 15, 28911, Leganfis, Madrid, Spain e-mail: gachet~ing.uc3m.es We present an implementation of a reinforcement learning algorithm trough the use of a special neural network topology, the AHC (Adaptive Heuristic Critic). The AHC is used as a fusion supervisor of primitive behaviors in order to execute more complex robot behaviors, for example go to goal, surveillance or follow a path. The fusion supervisor is part of an architecture for the execution of mobile robot tasks which are composed of several primitive behaviors which act in a simultaneous or concurrent fashion. The architecture allows for learning to take place at the execution level, it incorporates the experience gained in executing primitive behaviors as well as the overall task. The implementation of this autonomous learning approach has been tested within OPMOR, a simulation environment for mobile robots and with our mobile platform Robuter. Both, simulated and actual results are presented. The performance of the AHC neural network is adequate. Portions of this work has been implemented within the EEC ESPRIT 2483 PANORAMA Project. 1. I n t r o d u c t i o n The process of guiding autonomous mobile robots in unstructured, dynamic, and complex environments is difficult. A multitude of problems arise such as noisy measurements, lack of appropriate sensors, difficulty in achieving real-time sensor processing, and constructing an appropriate model of the environment based on sensory information. For example, when working with ultrasonic range sensors there are inaccuracies in the values measured due to noise or to the physics of the ultrasound waves. Thus obtaining an accurate state of the environment is difficult. To deal with the aforementioned problems, researchers are advocating the development and use of intelligent systems not only in mobile robots but also in diverse areas such as manufacturing, control, vision and communications. Intelligent systems are difficult to define and indeed there are several definitions in the literature. There are also several theories of intelligent systems [1-4] and intelligent machines [7-10]. Regardless of a precise definition of an intelligent system, most definitions include learning as one of its main attributes. In addition to learning, the intelligent systems that we advocate follow a behavioral

170

methodology where all robot tasks are performed in terms of a set of elementary tasks referred to as primitive behaviors. The fundamental idea of behavioral control is to view a robot mission or task, also referred to as an emergent behavior as the temporal execution of a set of primitive behaviors. One of the most challenging problems in behavioral control involves the learning the appropriate mixture of primitive behaviors in order to execute complex tasks. For this purpose, in our research group several approaches have been investigated [5-7]. Our experience indicates that perhaps the most important learning requirement for autonomous systems is that of autonomous learning, in other words, the ability to learn from self-exploration or observation without the need of examples or a teacher. Although there is a great deal of research on learning in a general context [1-3], research on autonomous learning is just beginning. The neural network paradigm of reinforcement learning appears promising for autonomous learning. Accordingly, the goals of this paper are to discuss the reinforcement learning with a neural network AHC topology in the context of a behavioral control methodology for mobile robots, and to discuss needs for future research. 2. T h e A F R E B Control Architecture Several researchers have already argued the importance of looking at a mobile robot as a set of primitive behaviors. Primitive behaviors are also important components of reactive control which is a recently emerged paradigm for guiding robots in unstructured and dynamic environments [10,11]. Mobile robots must continuously interact with its environment and this is the essential characteristic of reactive programs. By reactive, we mean that all decisions are based on the currently perceived sensory information. An e m e r g e n t behavior can be defined as a simple or complex task or a task which is made up of more elementary (primitive) behaviors. Whereas primitive behaviors correspond to robot actions, emergent behaviors correspond to mission, goals, or subgoals. We have developed a behavioral control architecture called AFREB Adaptive Fusion of Reactive Behaviors for experimenting with a wide range of reactive control methodologies [5],[12]. The architecture is depicted in Figure 1 and basically consists of the following modu-

les:behavioral analysis, fusion supervisor, behavior primitives, and executor. The function of behavior primitives and fusion supervisor will be explained later, the function of the executor module is the calculation of the actual robot commands while enforcing control limitations due to non-holonomic constraints (e.g., limitation of driving angle of the robot, maximum speed and so on) and exhibiting robot reflexive behavior (e.g., emergency stop). We have used the behavioral architecture in several simulation and actual experiments with our mobile robot Robuter. All simulation work has been performed with OPMOR [13].

2.1. Behavior Primitives The set of primitive behaviors active at any given time depends upon the occurrence of specific events in the environment (e.g., the detection of an obstacle, time for the robot

171

UPPER LEVELS

BEHAVIORAL - Task decomposition ANALYSIS - Chek for inconsistences I - Check for constrains - Enable behaviors

( R o b o t Model~

I

,os,o

I P

L

E

SUPERVISOR Evaluate modified tasks

~1

al

9

n

C1

c2

- Produce robot ,,C,,,n commands for individual behaviors

R

C E P T

~r

PRIMITIVE BEHAVIORS

ains IlJ

i

I

- Final robot commands ] I Enforce control limitations I EXECUTOR i range, rates (vel.) [ i Safety monitoring I Emergency stop ]

i I

I

O N

~r W: Dynamic Environment

Figure 1. The A F R E B Control Architecture

--

172 to recharge itself, the issuance of an interactive command, etc). The primitive behaviors (i.e., actions) which have been implemented are: cl :

goal attraction

c2 :

perimeter following (contour left)

c3 :

perimeter following (contour right)

c4 :

free space

c5 :

keep away (from objects)

C6

follow a path

"

A primitive behavior can be characterized by a temporal sequence of appropriate values for linear velocity v(i) and curvature k(i) which cause the robot to exhibit the pre-specified response to sensorial information. Thus we define the output of a primitive behavior c(i) as the vector:

c(i) = (v(i),k(i)) T

(1)

where the variable i denotes the ith cycle of the robot controller. In what follows we will drop the index i for notational simplicity. The goal attraction primitive produces an output which directs the robot towards a specific goal which is defined by a point (x, y) in a global reference frame. The perimeter following behavior follows the perimeter of a fix (i.e., static) obstacle maintaining a pre specified distance away from it. This behavior take into account the minimum radius of curvature R~i~ = v/w~ax of the robuter in order to avoid uneven corners. The movement can be performed left or right. The free space behavior will cause movements in a direction so that the robot frees itself from objects. This behavior takes the information from measurements provided by the frontal sensors and choose the direction of the longer adjacent measures in order to ensure a safe course in that direction. The outputs of this behavior is the direction of motion at a constant speed. The keep away behavior basically avoids obstacles in the proximity of the robot, choosing directions that ensure no collision with nearby obstacles. The follow path behavior drives the robot closest to a pre- defined path. The path is composed by several cartesian coordinates (xi, yi) in the robot's environment.

2.2. D y n a m i c Fusion of Primitive Behaviors The fundamental assumption of the fusion supervisor module is that more complex behaviors, also called emergent behaviors can be obtained in terms of primitive behaviors, in the following way: If we denote Cl, c2,..., cn the output of each primitive behavior, then the output of an emergent behavior (i.e., mission, task, or subtask) is: N

= E i=1

(2)

173 Where the ai coefficients, with 0 _< ai _< 1 are found by an appropriate combination of measurement information provided by the perception system. Thus, the main function of an intelligent controller is the learning of the weights ai so that the performance of the robot for the execution of the tasks is adequate. There are two possibilities for learning the coefficients ai: on line or off line. For autonomous systems the on line learning method is the most interesting. Whether or not learning is present it is to be noted that the coefficients ai are dynamically adjusted by the controller when the robot is in motion. For the system described above, the learning problem dealt with in this paper can be formulated as follows" how can the system learn the appropriate values for the ai coefficients (i.e., mixture of primitive behaviors) in order to execute a given task in a on-line, real-time fashion without the intervention of a trainer?. In the context of the AFREB architecture, the learning process is performed within the fusion supervisor module. For demonstrating the power of autonomous learning approach, we have implemented the following emergent behaviors (i.e., tasks or missions)" M~

~

M2

~

surveillance of the environment (exploration). motion between two points (i.e., towards a goal) while avoiding unexpected obstacles.

M3 " follow a predefined path while avoiding unexpected obstacles. In general, a mission is executed while taking into account the dynamic nature of the environment as sensed by the perception system of the robot. It is important to note that in the case of navigation or follow a path there is a principal primitive behavior (go to goal or follow a path) and the others can be seen as auxiliary behaviors in order to ensure a safe motion in a dynamic and unstructured environment. 3. T h e A d a p t i v e H e u r i s t i c Critic as Fusion S u p e r v i s o r In this section, we summarize a fusion supervisor module which learns the values of the relative gains ai by the use of an external reinforcement signal [14] which rewards or punishes the actions of the mobile robot. The objective is the successful completion of missions M1 to M3,( i.e., surveillance of the environment, navigation and follow a path). A learning system that is built based on this technique is not provided with any sets of instances or examples of a concept. In addition, the learning system may have to deal with a number of observations that represent several concepts rather than just concentrating on a single concept at a time. In the learning approach based on exploration or observation the system generates both, a set of control rules and appropriate examples autonomously. The Neural Network used is called Adaptive Heuristic Critic (AHC). The architecture of the AHC topology is depicted in Figure 2, where r(t) represents the external reinforcement signal to be used in the learning algorithm. Our implementation of the AHC neural network consists of a classifier system which has five or six inputs, depending on the mission, that correspond to a zone arrangement of frontal and lateral ultrasonic sensors, the output of this module is a situation that

174

Figure 2. The A H C Neural Network consists in a vector of 32 binary elements with only one of them with value 1 at any time (64 elements in the case of navigation and follow a path missions). The AHC Network has an output layer with 4 processing elements for mission M1 or 5 processing elements for missions M2 and M3. This is called the Associative Search Element (ASE), and computes the values of a~ (the relative gains for primitive behaviors) in the following way : k ---,grb

ai = f( ~

Ik. Wki + cr(t))

(3)

k=O

Where Wki is a matrix of adaptive weights between the input and output layers, a(t) is a random value in the interval [-1,1] with a uniform distribution, rn is the number of inputs. The function f is defined by:

f(x)

_ f

1

if x > 0

- 1 otherwise

The weight matrix W is modified according to the following law :

Wki(t + 1) = Wki(t) + a . bi(t) . eki(t)

(4)

Where a is the learning rate, In our case a = 3, eki(t) is the eligibility of the weight Wki for reward or punishment. This is an important concept because by it we can reward or punish not only the latest action but the predescesors. The eligibility is modified b y :

ek~(t + 1) = 5. eke(t) + (1 - 5). I k ' a i

(5)

175 With 5 is a value in the range [0,1] and in our case 0.3. The internal reinforcement signal bi(t) is given by the other layer ACE (Adaptive Critic Element), this layer computes the prediction P~(t) of b~(t) at time t : k~'m,

=

(6) k=0

Where Vki are the weight connections between the classifier and ACE layer. The internal reinforcement signal is computed by : bi(t) = r ( t ) + q,. P i ( t ) - P i ( t - 1)

(7)

Where 7 is a positive constant in the range [0,1], r(t) is the external reinforcement signal and the weights Vki are updated b y :

(8) Where/3 is a positive constant and Xk is the eligibility for this layer and :

Xk(t +

1) = s

(1 - s

Ik(t)

(9)

With ~ between 0 and 1, in our case 0.4. The four or five output nodes correspond to the relative contributions of the last four primitive behaviors (i.e., a2 through as) for the surveillance mission, al through as for the navigation and a2 to a6 for the follow path mission. 4. S i m u l a t i o n E x p e r i m e n t s 4.1. S u r v e i l l a n c e R o b o t For the surveillance mission, the external reinforcement r(t) provides a graded performance value describing how well the system performed (-1 = robot collision and 0 -- no collision). Roughly the system works as follows: At the beginning, the weights of the AHC network were initialized with random values and the robot executes random movements with r(t) evaluated continuously and used for changing the internal networks weights. When collisions occur the robot is returned to the position it was 30 steps before the collision took place and the process is resumed from there. The learning process involves the association of actions with the external reinforcement (i.e., whether collisions are present) for each situation. As a result of the learning process, the system learns how to move across the environment without collisions. Figure 3 depicts the initial collisions of the robot in an initial simulation experiment whereas Figures 4 and 5 depict the path of the robot in subsequent simulation experiments where the weights of the NN were stored at the end of each experiment and used in the following. It can be seen that the number of collisions decrease as the robot goes through more learning cycles. After a sufficient number of collisions, the robot navigates with virtually no collisions in the simulator including unknown environments as in the case of Figure 5.

176

Figure 3. Initial Path of the Surveillance robot

Figure 4. Intermediate path of the Surveillance robot

177

Figure 5. Final path of the Surveillance robot in an unknown environment

4.2. G o t o G o a l In the case of execution of mission M2 the external reinforcement signal r(t) take a negative value-1 in the following cases:

9 When the robot is too close to an obstacle (imminent collision) 9 When the frontal sensors do not detect obstacles in the goal direction and the current heading of the robot is too far from the correct direction. In all other cases the reinforcement signal is 0. In this case there is an additional input in the AHC network as compared to the surveillance mission, this input is a binary signal indicating whether obstacles are present in the goal direction. The execution loop is the same as described above for the surveillance robot. We can see in Figure 6, the first environment used for simulating the execution of the AHC network. In this case the weights of the network were initialized with random values, the networks start by giving random values to outputs al to a5 in the first cycles because the system does not have any knowledge of the environment. In this first cycles the robot accepts many punishments until the networks decide to utilize the contour followin 9 and free areas behaviors in order to avoid collisions, some cycles later the network uses the 9oal attraction. When the robot advances in the appropriate direction and no collisions occurs, it accepts rewards. Figure 7 depicts the path of the robot in a subsequent simulation environment where the weights of the NN were stored at the end of each experiment and used in the following.

178

Figure 6. Go to goal, initial path

Figure 7. Go to goal, final path

179 4.3. Follow P a t h In the case of execution of this mission, the path is defined a priori as a set of points (x~,y~), the external reinforcement signal r(t) take a negative value -1 in the following situations: 9 When the robot is too close to an obstacle. 9 When the frontal sensors do not detect obstacles and the current heading of the robot is too far from the direction of the path's next point. In all other cases the reinforcement signal is 0.

Figure 8. Follow a path, first environment

We can see in Figures 8 and 9, the first and the last environment used for simulating the execution of this mission. The control cycle is the same described above for the missions M1 and /1//2. In the follow path mission, the primitive behaviors used are c2 to c6 5. E x p e r i m e n t s w i t h t h e R o b u t e r The final step is to execute the control software on the actual robot, Robuter, which is a mobile platform with two differential wheels, and equipped with a belt of 24 ultrasonic sensors as shown in Figure 10. Perhaps the main advantage of our software development environment is that the control software developed under the simulator is virtually the same as that used to control the actual vehicle.

180

Figure 9. Follow a path, second environment

Figure 10. The Robuter vehicle

181 In most cases the only changes made to the control software in the simulator before it controls the actual robot are of two categories: lowering the speed of the robot, and adjusting some parameters (e.g., the distance away from contours when following it). Other changes involved additional sensor processing (e.g., a filter) to deal with highly noisy sensors, sensor failures, multiple reflections, and no-reflections. The filter used with experiments with the actual robot works as follows: if a range measurement is below 30 cm (the lower limitation of the ultrasonic sensor) it is discarded and replaced with the mean of two valid measurements at each side of the sensor in question. This filter has proven practical to solve many problems dealing with bad range measurements. The physical laboratory environment used correspond to a small room of 6.5 m x 4.5 m where we usually put boxes as obstacles. The size of the room poses a number of practical problems as it does not leave enough space to maneuver. Thus the room is an excellent test bed for maneuvering in tight spots. By contrast, we did not have similar maneuvering problems in the simulation environment. As noted, by reducing the velocity of the robot and adjusting some parameters of some primitive behaviors we were able to run all our simulation control algorithms with the real vehicle in the small room.

Figure 11. Motion of the Robuter executing the Surveillance mission

Figures 11 and 12 depicts the movement of the Robuter in the room where we have placed four boxes at different locations and orientations. Figure 11 shows the execution of the mission M1 i.e. surveillance, and Figure 12 depicts the movement of the robot execution mission M2, in this case the goal point is located in the upper right corner of the lab.

182

Figure 12. Path of the vehicle with the Go to goal mission

As noted above, there are only two differences in the control software running with the robot as that running with the simulator: the actual robot speed is smaller and the number of back steps after a collision is reduced from 30 to 10.

6. D i s c u s s i o n

A major problem with the use of reactive behavioral control is the correct fusion of the primitive behaviors in order to execute complex tasks. In order to solve this problem, it is clear that the autonomous learning is an interesting paradigm, because it liberates us from explicitly programming or training the system via examples. The implementation of the autonomous learning paradigm presented here, demonstrates the feasibility of this approach by the execution of some missions for an autonomous mobile robot, the observation of the overall behavior of the robot confirms the idea of that the self-learning methodology trough the use of the behavioral control is possible. Some drawbacks are present, for example the incomplete coverage of situations due to the limitations in the number of inputs for the classifier system. The characteristics of the environment are very important in the learning process of the AHC, the choice of the best set of situations is a non trivial problem, because we must take care of not training the network with contradictory situations. In other words the reinforcement signal must always be coherent, and the design of a correct external reinforcement signal is perhaps the most important step in this approach.

183 7. Conclusions and F u t u r e W o r k s The successful completion of three missions through the application of autonomous learning has been presented in this paper. The use of a special neural network topology AHC as a fusion supervisor of primitive behaviors within a control architecture AFREB has been demonstrated. We have built a system that learns by itself how to execute the missions, surveillance, go to goal and follow a path, the simulated and real experiments demonstrate that the autonomous learning through reinforcement is a viable technology for motion control of non- holonomic robots operating in complex environments. In future publications we will present the implementation of others task with this methodology for example the execution of a complex missions composed of several sub-missions. We are also attempting to solve the dimculties which arise due to the poor resolution of the inputs with the use of a Kohonen Neural Network.

Acknowledgments We wish to thank Prof. J.R. Pimentel for his efforts in the Panorama project. The first author wishes to thank the Direccidn General de Investigacidn Cientifica y T~cnica (DGICYT) of the Spanish Ministry of Education and Science for research funding.This research has been founded by the Commission of the European Communities ( Project ESPRIT 2483 PANORAMA ) and the Comisidn Interministerial de Ciencia y Tecnolog{a CICYT (Projects ROB90-159 and ROB91-64). REFERENCES

1. W. Fritz, R.G. Martinez, J. Banque, A. Rama, R.E. Adobbati, and M. Sarno, The Autonomous Intelligent System, Robotics and Autonomous Systems, Vol. 5, pp. 109125, 1989. 2. P.F.M.J. Verschure, B.J.A. Krose, and R. Pfeifer, Distributed Adaptive Control: The Self Organization of Adaptive Behavior, Robotics and Autonomous Systems, Vol. 9, pp. 181-196, 1992. 3. A. Famili, Integrating Learning and Decision-Making in Intelligent Manufacturing Systems, Intelligent and Robotics Systems, Vol.3, pp. 117-130, 1990. 4. W. Fritz, World View and Learning Systems, Robotics and Autonomous Systems, Vol. 10, pp. 1-7, 1992. 5. D. Gachet, M.A. Salichs, J.R. Pimentel, L. Moreno, A. de la Escalera, A Software Architecture for Behavioral Control Strategies of Autonomous Systems, Proc. IECON'92, pp. 1002-1007, San Diego CA, Nov. 1992. 6. E.A Puente, D. Gachet, J.R. Pimentel, L. Moreno, and M.A. Salichs, A Neural Network Supervisor for Behavioral Primitives of Autonomous Systems, Proc. IECON'92, San Diego, CA, Nov. 1992. 7. D. Gachet, J.R. Pimentel, L. Moreno, M.A. Salichs, and V. Fernandez, Neural Network Control Approaches for Behavioral Control of Autonomous Systems, 1st IFAC Int. Workshop on Intelligent Autonomous Vehicles, Southampton, UK, pp. 330-334, April 1993. 8. G.Pang, A Framework for Intelligent Control, Intelligent and Robotics Systems, Vol. 4, pp. 109-127, 1991.

184

10. 11. 12.

13.

14.

W. Davis, A. Jones, and A. Saleh, Generic Architecture for Intelligent Control Systems, Computer-Integrated Manufacturing Systems, Vol. 5, No. 2, pp. 105-113, 1992. R. A. Brooks, A Robust Layered Control System for a Mobile Robot, IEEE Journal on Robotics and Automation, RA-2, pp 14-24, Apr. 1986. R.C. Arkin, Motor Schema-Based Mobile Robot Navigation, The International Journal of Robotics Research, Vol. 8, No. 4, pp. 92-112, Aug. 1989. M.A. Salichs, E.A. Puente, L. Moreno and J.R. Pimente, A Software Development Environment for Autonomous Mobile Robots, Chapter 8 of Recent Trends in Mobile Robots. Ed. Y.F. Zheng. World Scientific. 1993. J.R. Pimentel, E.A. Puente, D. Gachet, and J.M Pelaez, OPMOR: Optimization of Motion Control Algorithms for Mobile Robots, Proc. IECON'92, pp. 853-861, San Diego CA, Nov. 1992. J. del R. Millan, and C. Torras, Learning to Avoid Obstacles Through Reinforcement: Noise-tolerance, Generalization, and Dynamic Capabilities, Proc. IROS'92, Raleigh, N.C., pp. 1801-1807, Jul. 1992.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

185

Efficient reinforcement learning of navigation strategies in an autonomous robot Jos~ del R. Millgm a and Carme Torras b a Institute for Systems Engineering and Informatics, European Commission, Joint Research Centre, TP 361, 21020 Ispra (VA), Italy bInstitut de Cibern~tica (CSIC-UPC), Diagonal, 647, 08028 Barcelona, Spain

This paper describes a reinforcement connectionist learning architecture that allows an autonomous mobile robot to adapt to an unknown indoor environment. As a result, the robot learns efficient reactive navigation strategies. The robot learns on top of built-in reflexes. Also, the neural controller is a modular network which is built automatically. In this way, the robot is operational from the very start and improves its performance rapidly and incrementally as it safely explores the environment. These appealing features make this learning robot's architecture very well suited to real-world applications. The paper also reports experimental results obtained with a real mobile robot that demonstrate the feasibility of this approach.

1.

INTRODUCTION

Efficient navigation is critical for autonomous robots operating in hostile environments, which are usually unknown the first time robots face them. This paper deals with the problem of controlling an autonomous mobile robot so that it reaches efficiently a goal location in an unknown indoor environment. Instances of the problem where the goal is not inside the perception range of the robot all the time are also considered. An usual approach to this problem is that of reactive systems (e.g., [1]). However, basic reactive systems stiffer from two shortcomings. First, they are difficult to program. Second and most important, pure reactive controllers may generate inefficient trajectories since they select the next action as a function of the current sensor readings and the robot's perception is limited. To address this second shortcoming, some approaches combine planning and reaction (e.g., [2, 3]). In contrast to classical planning that acts on a perfect (or sufficiently good) model of the environment, these approaches only require a coarse global map of the environment made out of landmarks. Then, planning takes place at an abstract level and all the low level details are handled by the reactive component as the robot actually moves. In the case of robots operating in initially unknown environments, this

186

global map can be built from sensory data gathered either while travelling to the goal (e.g., [21) or in a previous exploration phase (e.g., [3]). Global maps are a valuable aid for navigation which must be used when available. When not, we claim that the addition of learning capabilities to reactive systems is sufficient to allow a robot to generate efficient trajectories after a limited experience. This paper presents experimental results that support this claim: a real autonomous mobile robot equipped with low-resolution sensors learns efficient goal-oriented obstacleavoidance reactive strategies in a few trials. Moreover, a learning approach like ours even overcomes the first shortcoming of reactive systems. As some researchers have recently shown, the robot programming cost is considerably reduced by letting the robot learn automatically the appropriate navigation strategies (e.g., [4, 5, 6, 7, 8]). This paper describes the testing of a reinforcement connectionist architecture on a real autonomous mobile robot. A reinforcement-learning robot learns by doing and does not require a teacher who proposes correct actions for all possible situations the robot may find itself in. Instead, the robot simply tries different actions for every situation it encounters and selects the most useful ones as measured by a reinforcement or performance feedback signal. Reinforcement connectionist learning [9, 10, 11] brings four benefits to autonomous robots. First, this kind of learning robot can improve its performance continuously and can adapt itself to new environments. Second, the connectionist network does not need to represent explicitly all possible situation-action rules as it shows good generalization capabilities. Third, connectionist networks have been shown to deal well with noisy input data, a capability which is essential for any robot working upon information close to the raw sensory data. Fourth, connectionist learning rules are well suited to on-line and real-time learning. In addition to these benefits, the architecture described in this paper also overcomes three critical limitations of basic reinforcement connectionist learning that prevent its application to autonomous robots operating in the real world. The first and most important limitation is that reinforcement learning might require an extremely long time. The main reason is that it is hard to determine rapidly promising parts of the action space where to search for suitable reactions. The second limitation has to do with the robot's behavior dttring learning. Practical learning robots should be operational at any moment and, most critically, they should avoid catastrophic failures such as collisions. Finally, the third limitation concerns the inability of "monolithic" connectionist networks ~i.e., networks where knowledge is distributed over all the weights-- to support incremental learning. In this kind of standard networks, learning a new rule (or tuning an existing one) could degrade the knowledge already acquired for other situations. An important assumption of our approach is that the robot receives a reinforcement signal after performing every action. Although this assumption is hard to satisfy in many reinforcement learning tasks, it is not in the case of goal-directed tasks since the robot can easily evaluate its progress towards the goal at any moment. A second assumption of our approach is that the goal location is known. In particular, the goal location is specified in relative cartesian coordinates with respect to the starting location.

187

Figure 1. The Nomad 200 mobile robot.

2.

EXPERIMENTAL

SETUP

The system composed by the robot and the reinforcement connectionist controller has been called T E S E O . The physical robot is a wheeled cylindrical mobile platform of the Nomad 200 family (see Figure 1). It has three independent motors. The first motor moves the three wheels of the robot together. The second one steers the wheels together. The third motor rotates the turret of the robot. The robot has 16 infrared sensors and 16 sonar sensors, from which distances to the nearest obstacles can be estimated, and 20 tactile sensors detect collisions. The infrared and sonar sensors are evenly placed around the perimeter of the turret. Finally, the robot has a dead-reckoning system that keeps track of the robot's position and orientation. The connectionist controller maps the cLtrrently perceived situation into a spatially continuous action. Then, the controller waits until the robot has finished to perform the corresponding motor command before computing the associated reinforcement signal and the next action. We will describe next what the input, output and reinforcement signals are. The input to the connectionist network consists of a vector of 40 components, all of them real numbers in the interval [0, 1]. The first 32 components correspond to the infrared and sonar sensor readings. In this case, a value close to zero means that the corresponding sensor is detecting a very near obstacle. The remaining 8 components are

188 derived from a virtual sensor that provides the distance between the current and goal robot locations. This sensor is based on the dead-reckoning system. The 8 components correspond to a coarse codification of an inverse exponential function of the virtual sensor reading. The main reason for using this codification scheme is that, since it achieves a sort of interpolation, it offers three theoretical advantages, namely a greater robustness, a greater generalization ability and faster learning. The output of the connectionist network consists of a single component that controls directly the steering motor and indirectly the translation and rotation motors. This component is a real number in the interval [-180,180] and determines the direction of travel with respect to the vector connecting the current and goal robot locations. Once the robot has steered the commanded degrees, it translates a fixed distance (25 cm.) and, at the same time, it rotates its turret in order to maintain the front infrared and sonar sensors oriented toward the goal. In this way, a relative codification of both sensor readings and motor commands with respect to the goal direction is always maintained. This codification scheme is directly responsible for TESEO's generalization capabilities. The reinforcement signal is a real number in the interval [-3, 0] which measures the cost of doing a particular action in a given situation. The cost of an action is directly derived from the task definition, which is to reach the goal along trajectories that are sufficiently short and, at the same time, have a wide clearance to the obstacles. Thus actions incur a cost which depends on both the step clearance and the step length. Concerning the step clearance, the robot is constantly updating its sensor readings while moving. Thus the step clearance is the shortest distance measured by any of the sensors while performing the action. Finally, TESEO is equipped with a low-level asynchronous emergency routine to prevent collisions. The robot stops and retracts whenever it detects an obstacle in front of it which is closer than a safety distance.

3.

EXTENSIONS

TO BASIC REINFORCEMENT

LEARNING

TESEO's aim is to learn to perform those actions that optimize the total reinforcement received along the trajectory to the goal. As mentioned in the Introduction, TESEO has been designed to overcome 3 limitations of basic reinforcement learning: lack of incremental improvement, slow convergence, and failure to be operational from the very beginning. The following three aspects of TESEO's architecture address these limitations. First, the connectionist controller is a modular network, each module codifying a set of similar reaction rules. That is, these rules map similar sensory inputs into similar actions and, in addition, they have similar long-term consequences. Modularity guarantees that improvements in the response to a given situation will not negatively alter other unrelated reactions. Second, instead of learning from scratch, TESEO utilizes a fixed set of basic reflexes every time its connectionist controller fails to generalize correctly its previous experience to the current situation. The connectionist controller associates the selected reflex with the perceived situation in one step. This new reaction rule is tuned subsequently through reinforcement learning. Basic reflexes correspond to previous elemental knowledge about

189

Figure 2. Network architecture.

the task and are codified as simple reactive behaviors [1]. Each reflex selects one of the 16 directions corresponding to the current orientations of the infrared and sonar sensors. We have chosen this fixed set of directions because they are the most informative for the robot in terms of obstacle detection. It is worth noting that, except in simple cases, these reflexes alone do not generate efficient trajectories; they just provide acceptable starting points for the reinforcement learning algorithm to search appropriate actions. Integrating learning and reaction in this way allows T E S E O t o focus on promising parts of the action space immediately and to be operational from the very beginning. Third, T E S E O explores the action space by concentrating the search around the best actions currently known. The width of the search is determined by a counter-based scheme associated to the modules (see Section 4.2). This exploration technique allows T E S E O to avoid experiencing irrelevant actions and to minimize the risk of collisions.

4.

NETWORK

ARCHITECTURE

The connectionist controller is a modular two-layer network (see Figure 2). The first layer consists of units with localized receptive fields, which we call exemplars, because each of them represents a point of the input space and covers a limited area around this point. The second layer is made of one single stochastic linear unit, the output unit. There exists a full connectivity between the exemplars and the output unit. 4.1.

Exemplars The activation level of an exemplar is a real value in the interval [0, 1], it being 0 if

190 the perceived situation is outside its receptive field, and 1 if the situation corresponds to the point where the exemplar is centered. The j t h module consists of the exemplars c~,..., e~ and their related links. All modules respond to each perceived situation, but only the module owning the exemplar with the maximum response propagates the activities of its exemplars to the output unit. If no exemplar "matches" the perceived situation --i.e., if the input does not fall in the receptive field of any exemplar--, then the basic reflexes are triggered and the current situation becomes a new exemplar. Section 5.1 provides more details on this resource-allocating procedure. The modules are not predefined, but are created dynamically as T E S E O explores its environment. Every module j keeps track of four adaptive values. First, the width of the receptive fields of all the exemplars in this module, dj. Second, the expected total future reinforcement, bj, that the robot will receive if it uses this module for computing the next action. Third, a counter that records how many times this module has been used without improving the robot's performance, Q. Fourth, the prototypical action the robot should normally take whenever the perceived situation is classified into this module, paj. There are as many prototypical actions as reflexes. Thus the first one is the direction of the front infrared and sonar sensors --i.e., a deviation of 0 degrees from the vector connecting the cm'rent and goal robot locations--, the second one is 22.5 degrees, and so on. Sections 5.1 and 6.2 explain how paj is initially determined and how it evolves, respectively. After reacting, the cvaluator computes the reinforcement signal, z, as specified in Section 2. Then, if the action was computed through the module j, the difference between z and bj is used for learning (see Section 5.4). Only the weights of the links associated to the winning module j are modified. 9

4.2.

Output Unit The output of the connectionist controller is a prototypical action pa, normally the prototypical action of the winning module, plus a certain variation s that depends on the location of the perceived situation in the input subspace dominated by that module (see Figure 3). In order to find a suitable action for each situation through reinforcement learning, T E S E O needs to explore the action space. However, this exploration is not conducted upon the whole action space, but only around the best actions currently known. Since each action is the sum of two components, pa and s, the exploration mechanism works on each of them separately. This exploration mechanism depends on cj, the counter associated to the winning module. On the one hand, the exploration mechanism selects pa from the prototypical actions associated to all the modules that classify the perceived situation. That is, if the situation is located in the receptive field of one of the exemplars of the module m, then pare is a candidate. Assuming j to be the winning module, the selection goes as follows. If cj is not divisible by, say, 3 then paj is chosen. Otherwise, pa is taken to be the prototypical action associated to the module m with the best expected total future reinforcement, bin. The basic idea behind this exploration mechanism is that the winning module could well benefit from the knowledge of neighboring modules. On the other hand, the deviation s from pa is computed through a stochastic process in the interval [-45 ~ 45~ Thus, TESEO will only explore actions between pa and its four

191

Figure 3. The output unit.

neighboring prototypical actions (two to the left and two to the right). The computation of s is done in three steps. The first step is to determine the value of the stochastic process' parameters. The mean # is a weighted sum of the activation levels of the exemplars e l , . . . , e{ of the winning module: n

#=

(1)

k k, k=l

where w~ is the weight associated to the link between e~ and the output unit, and a~ is the activation level of e~. The variance a is proportional to cj. This follows from the idea that the most often the module j is used without improving T E S E O ' s performance, the higher a must be. In the second step, the unit calculates its activation level 1 which is a normally distributed random variable: 1 = N(p, a).

(2)

In the third step, the unit computes s: s=

{

5.

FOUR LEARNING

5.1.

45~ - 5, l,

if I > 45, if 1 < - 4 5 , otherwise.

(3)

MECHANISMS

Network Growth The first learning mechanism makes the controller network grow as a function of the inputs received. Initially, there exist neither exemplars nor, consequently, modules and the resource-allocating procedure creates them as they are needed.

192 As mentioned in Section 4.1, if no exemplar "matches" the perceived situation, then the basic reflexes are triggered and the current situation becomes a new exemplar. That is, both represent the same point of the input space. The weight of the link from this exemplar to the output unit is initially set to zero and evolves subsequently through reinforcement learning. The new exemplar is added to one of the existing modules if its receptive field overlaps the receptive fields of the module's exemplars and the selected reflex is the same as the module's prototypical action. The first condition assures that every module will cover a connected input subspace. If any of the two conditions above is not satisfied, then the new exemplar is added to a new module. This module consists initially of the exemplar and its associated connections. Concerning the four parameters associated to this new module f, they are initially set to the following values: d I equals 0.4, c I equals 0, pa I is the selected reflex, and bI is estimated on the basis of the distance from the next location to the goal and the distance from the next location to the perceived obstacles in between the robot and the goal. T u n i n g Exemplars The second learning mechanism moves the position of the exemplars e ~ , . . . , e~ of the winning module j in order to better cover the input subspace dominated by that module. That is, the coordinates of the k th exemplar, v~, are updated proportionally to how well they matches the input coordinates x: 5.2.

9

v~(t +

I) =

v~(t) + e * a~(t) 9 [x(t) - v~(t)],

(4)

where e is the learning rate. In the experiments reported below, the value of e is 0.1.

5.3.

Improving Reinforcement Estimates

The third learning mechanism is related to the update of the future reinforcement estimates bj, and it is based on temporal difference (TO) methods [12]. Every value bj is an estimate of the total future reinforcement TESEO will obtain if it performs the best currently known actions that take it from its current location (whose associated observed situation is classified into the jth module) to the goal. Consequently, the value bj of the module j should, after learning, be equal to the sum of the cost z of reaching the best next module i plus the value bil:

bj =

max (z) + bi.

Actions

(5)

In order to iteratively update the values of bj, so that finally (5) holds for all of them, we have used the simplest TD method, i.e. TD(0) (see [12] for details). If the situation perceived at time t is classified by module j and, after performing the computed action, the next situation belongs to module i and the reinforcement signal - - o r cost-- is z(t + 1), then:

bj(t + 1) = bj(t) + U * [z(t + 1) + bi(t) - bj(t)].

(6)

1 As described in Section 2, z takes negative values. So, minimizing future cost corresponds to maximizing future reinforcement.

193 The intensity of the modification is controlled by r], which takes the value 0.75 when TESEO behaves better than expected, and 0.075 otherwise. The rationale for modifying less intensively bj when z(t + 1) + b~(t) - bj(t) < 0 is that the error in the estimation is probably due to the selection of an action different from the best currently known one for the module j.

5.4.

Weight U p d a t e

The fourth learning mechanism concerns weight modification and uses the classical associative search (AS) [9, 11]. AS uses the estimation given by TD to update the situation-action mapping, which is codified into the connectionist controller:

w~(t + 1) = wJk(t) + a

*

[z(t + 1) + bi(t)

-

bj(t)]

*

r

(7)

where a is the learning rate, and r is the eligibility factor. The eligibility factor of a given weight measures how influential that weight was in choosing the action. In our experiments, r is computed in such a manner that the learning rule corresponds to a gradient ascent mechanism on the expected reinforcement

[11]: Oln____N , l(t) r (t) = Owjk (t) = a~ (t)

(8)

where N is the normal distribution function in (2). The weights w~ are modified more intensively in case of reward--i.e., when T E S E O behaves better than expected-- than in case of penalty. These two values of a are 0.2 and 0.02, respectively. The aim here is that T E S E O maintains the best situation-action rules known so far, while exploring other reaction rules.

6.

FOUR LEARNING OPPORTUNITIES

Let us present now the four occasions in which learning takes place. The first arises during the classification phase, the next two happen after reacting, and the last one takes place when reaching the goal.

6.1.

Unexperienced Situation

If the perceived situation is not classified into one of the existent modules, then the basic reflexes get control of the robot, and the resource-allocating procedure creates a new exemplar which is added either to one of the existing modules or to a new module.

6.2.

Performing within Expectations

If the perceived situation is classified into the module j and z(t+ 1)+b~(t)-bj(t) >_ kz, where kz is a negative constant, then (i) the exemplars of that module are tuned to make them closer to the situation, (ii) the weights associated to the connections between the exemplars and the output unit are modified using the AS reinforcement learning rule, (iii) bj is updated through TD(0), and (iv) dj, cj, and paj are adapted.

194 The adaptive parameters are updated differently in case of reward than in case of penalty. In case of reward, dj is increased by 0.1, cj is initialized to 0, and if the output of the action unit, pa + s, is closer to a prototypical action other than paj, then paj becomes this new prototypical action. In case of penalty, cj is increased by 1 and dj is decreased by 0.1 if it is still greater than a threshold kd.

6.3.

Performing rather Badly If the perceived situation is classified into the module j and z(t+ 1)+b~(t)-bj(t) < kz, then the topology of the network is slightly altered and dj is decreased by 0.1 if it is still greater than a threshold kd. If the total future reinforcement computed after reacting, z + bi, is considerably worse than the expected one, bj, this means that the situation was incorrectly classified and needs to be classified into a different module. The resource-allocating procedure creates a new exemplar, e~, that has the same coordinates as the perceived situation, but it does not add it to any module. The next time this situation will be faced, e~ will be the closest exemplar. Consequently, no module will classify the situation and the basic reflexes will get control of the robot. Then, the resource-allocating procedure will add e~ either to one of the existing modules or to a new module as described in Section 5.1. 6.4.

Reaching the Goal Finally, whenever the goal is reached, the value bj of every winning module j along the path to the goal is also updated in reverse chronological order. Thus, TESEO needs to store the pairs (j(t),z(t + 1)) along the current path. This supplementary update only accelerates the convergence of the bj's, but does not change their steady values [13].

7.

E X P E R I M E N T A L RESULTS

TESEO's performance has been tested on a corridor with offices at both sides. The task is to generate a short but safe trajectory from inside an office to a point at the end of the corridor. TESEO achieves the target location every time and it never gets lost or trapped into malicious local maxima. The first time it tries to reach the goal it relies almost all the time on the basic reflexes. Essentially, the basic reflexes make TESEO travel in the direction that (i) is the closest to the current one, (ii) is the safest, and (iii) brings TESEO toward the goal. As illustrated in Figure 4, in the first trial, TESEO enters into a dead-end section of the office (but it does not get trapped into it) and even it collides against the door frame because its sensors were not able to detect it. Collisions happened because the frame of the door is relatively thin and the incident angles of the rays drawn from the sensors were too large resulting in specular reflections. Thus this simple task offers TESEO the opportunity to learn three skills. The first one is to smooth out certain sections of the trajectory generated by the basic reflexes. The second skill is to be able to avoid dead-ends or, in general, not follow wrong walls. The third and most critical is to avoid obstacles that its sensors cannot detect. TESEO learns these three skills very rapidly. It reaches the goal efficiently and without colliding after travelling 10 times from the starting location to the desired goal

195

Figure 4. The environment and first trajectory generated for a starting location within the office. Note that T E S E O has some problems in going through the doorway.

(Figure 5). The total length of the first trajectory is approximately 13 meters while the length of the trajectory generated after TESEO has learned the suitable sequence of reactions is about 10 meters. This experiment was run several times, obtaining similar results. Concerning the acquisition of the third skill, TESEO associates safe actions to malicious situations --made out of a coarse codification of the distance to the goal and of uncorrect estimated distances to the obstacles-- immediately after colliding with unperceived obstacles. Moreover, due to its noise tolerance and generalization capabilities (see below), TESEO only hits a few times with obstacles that its sensors cannot detect before learning to avoid them. In addition, the learned navigation strategies have the following featL~es. First, the trajectories are quite smooth even if the basic reflexes have not been programmed in this way (it is rather difficult to do it!!) and the reinforcement signal only penalizes long and unsafe paths. Second, the acquired reactions are robust to noisy sensory data. Since sensors are not perfect, TESEO does not perceive the same situations along similar trajectories. However, it is still able to generate suitable actions using only its acquired reactions. Figure 6 illustrates instances of the reaction rules learned. For every location considered (little circles) the move to be taken is depicted. Figure 6 shows that TESEO generates solution paths from any starting location inside the room. This simulation experiment indicates that TESEO exhibits good generalization abilities, since it can handle many more situations than those previously perceived. Once TESEO has learned efficient navigation strategies, occasional obstacles are put

196

Figure 5. Trajectory generated after travelling 10 times to the goal.

Figure 6. Generalization abilities: Situation-action rules applied for a sample of locations within the office and the first part of the corridor.

197 in the way to the goal. TESEO moves around the obstacles and then it returns to the original, efficient trajectory. In other experiments, the goal is changed. T E S E O learns also to navigate to this new goal in a few trials and it is still able to reach the first goal as efficiently as before.

8.

RELATED

WORK

The robot learning architectme described in this paper shows some resemblances with several previous works. In the case of goal-directed tasks, it is possible to use the future reinforcement associated with situation-action pairs to define a vector field over the environment. Then, the robot will normally perform the action that follows the gradient of the vector field. Seen in this way, our approach is reminiscent of potential field approaches (e.g., [14, 15]). However, our approach differs from them in three fundamental aspects. First, any potential field algorithm relies on predetermining a potential function that will allow the robot to reach the goal efficiently. Contrarily, a reinforcement approach like ours is adaptive in that it starts with an initial and inefficient vector field generated from rough estimates of the future reinforcements and adapts the field to generate efficient trajectories as it converges to the correct estimates through TD methods. Second, the kind of vector fields that can arise in our case are much more varied than those defined by a weighted sum of the potential fields originated by the different obstacles. Third, most of the potential field approaches require a global model of the environment while our approach only needs local information obtained from the robot's sensors. There exist also potential field approaches which only need local workspace models built out of sensory information (e.g., [14]), but they cannot produce efficient trajectories. The benefits of letting the robot learn automatically the appropriate reaction rules have recently been emphasized by several authors. [4] combines reinforcement connectionist learning and teaching to reduce the learning time. In this framework, a human teacher shows the robot several instances of reactive sequences that achieve the task. Then, the robot learns new reaction rules from these examples. The taught reaction rules help reinforcement learning by biasing the search for suitable actions toward promising parts of the action space. In our approach, the basic reflexes play the same guidance role, requiring only a programmer instead of a teacher. [5] use Kohonen maps [16] to split the sensory input space into clusters, and then associate an appropriate action to every cluster through reinforcement learning. Their architecture maps all the situations of a given cluster to a single action, and classifies situations solely by the similarity of their representations. Ore" architecture classifies situations, first, based on the similarity of their input representations. But, then, it also incorporates taskspecific information for classifying based on the similarity of reinforcements received. In this manner, the input space is split into consistent clusters since similar future reinforcement corresponds to similar suitable actions for similar situations. [7] integrate inductive neural network learning and explanation-based learning. The domain theory is previously learned by a set of neural networks, one network for each discrete action the robot can perform. Our basic reflexes also represent prior knowledge about the task; however, they are much more elemental and are used in a different way. Our approach is also related to the Dyna integrated architectures introduced by

198 [17] and further extended by [4, 18, 19], among others. Roughly, Dyna uses planning to speed up the acquisition of reaction rules through reinforcement learning. Dyna also learns a global model of the task on which it plans.

9.

SUMMARY

AND FUTURE

WORK

We have described a reinforcement learning architecture that makes an autonomous mobile robot rapidly learn efficient navigation strategies in an unknown indoor environment. Our robot T E S E O not only is operational from the very beginning and improves its performance with experience, but also learns to avoid collisions even when its sensors cannot detect the obstacles. This is a definite advantage over non-learning reactive robots. T E S E O also exhibits incremental learning, high tolerance to noisy sensory data, and good generalization abilities. All these features make our robot learning architecture very well suited to real-world applications. However, the current implementation of our approach suffers from one main limitation, namely it requires a reliable odometry system that keeps track of the robot's relative position with respect to the goal. Currently, odomtery is totally based on deadreckoning. In all the experiments we have carried out so far, dead-reckoning has proven sufficient to reach the goal. As long as its estimation of the position of the robot does not differ greatly from the actual one, the connectionist controller is still able to produce correct actions. But, dead-reckoning will probably be insufficient in more complicated missions requiring long travels and many turns. Current work focusses on reliable odometry. The goal is designated by a modulated light beacon and the robot is equipped with sensors especially designed to detect that light. Whenever T E S E O detects the goal it uses the beaconing system; otherwise, it relies on dead-reckoning.

ACKNOWLEDGMENTS We thank Aristide Varfis for helpful discussions. JRM is also grateful to people of Nomadic Technologies for their excellent support with the physical robot. The work reported in this paper was conducted in the Institute for Systems Engineering and Informatics, Joint Research Centre of the European Commission. This research has been partially supported by the ESPRIT Basic Research Action number 7274.

REFERENCES

[1] R.A. Brooks, "A robust layered control system for a mobile robot," IEEE Journal of Robotics and Automation, vol. 2, pp. 14-23, 1986. [2] D.P. Miller and M.G. Slack, "Global symbolic maps from local navigation," Proc. of the 9th National Conf. on Artificial Intelligence, pp. 750-755, 1991. [3] J.H. Connell, "SSS: A hybrid architecture applied to robot navigation," IEEE Int. Conf. on Robotics and Automation, pp. 2719-2724, 1992.

199

[4] [5] [6]

[7]

[s] [9]

[10] Ill] [12]

[13] [14] [15]

[16] [17] [is] [19]

L.-J. Lin, "Programming robots using reinforcement learning and teaching," Proc. of the 9th National Conf. on Artificial Intelligence, pp. 781-786, 1991. K. Berns, R. Dillmann, and U. Zachmann, "Reinforcement-learning for the control of an autonomous mobile robot," IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 1808-1815, 1992. J. del R. Millhn and C. Torras, "A reinforcement connectionist approach to robot path finding in non-maze-like environments," Machine Learning, vol. 8, pp. 363395, 1992. T.M. Mitchell and S.B. Thrun, "Explanation-based neural networks learning for robot control," in C.L. Giles, S.J. Hanson, and J.D. Cowan (eds.), Advances in Neural Information Processing Systems 5, pp. 287-294. San Mateo, CA: Morgan Kaufmann, 1993. J. del R. Mills "Reinforcement learning of goal-directed obstacle-avoidance reaction strategies in an autonomous mobile robot," Robotics and Autonomous Systems, in press. A.G. Barto, R.S. Sutton, and C.W. Anderson, "Neuronlike elements that can solve difficult learning control problems," IEEE Trans. on Systems, Man, and Cybernetics, vol. 13, pp. 835-846, 1983. A.G. Barto, R.S. Sutton, and C.J. Watkins, "Learning and sequential decision making," Tech. Report 89-95, Dept. of Computer and Information Science, University of Massachusetts, Amherst, 1989. R.J. Williams, "Simple statistical gI"adient-following algorithms for connectionist reinforcement learning," Machine Learning, vol. 8, pp. 229-256, 1992. R.S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988. S. Mahadevan and J. Connell, "Automatic programming of behavior-based robots using reinforcement learning," Artificial Intelligence, vol. 55, pp. 311-365, 1992. J. Borenstein and Y. Koren, "Real-time obstacle avoidance for fast mobile robots," IEEE Trans. on Systems, Man, and Cybernetics, vol. 19, pp. 1179-1187, 1989. J. Barraquand and J.C. Latombe, "Robot motion planning: A distributed representation approach," The International Journal of Robotics Research, vol. 10, pp. 628-649, 1991. T. Kohonen, Self-Organization and Associative Memory. Second Edition. Berlin: Springer-Verlag, 1988. R.S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," Proc. 7th Int. Conf. on Machine Learning, pp. 216-224, 1990. L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching", Machine Learning, vol. 8, pp. 293-321, 1992. J. Peng and R.J. Williams, "Efficient learning and planning within the Dyna framework," Adaptive Behavior, vol. 1, pp. 437-454, 1993.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

201

A Lifelong Learning Perspective for Mobile Robot Control Sebastian Thrun Universit~t Bonn Institut ffir Informatik III R5merstr. 164, 53117 Bonn, Germany E-mail: [email protected], [email protected] Designing robots that learn by themselves to perform complex real-world tasks is a still-open challenge for the field of Robotics and Artificial Intelligence. In this paper we present the robot learning problem as a lifelong problem, in which a robot faces a collection of tasks over its entire lifetime. Such a scenario provides the opportunity to gather general-purpose knowledge that transfers across tasks. We illustrate a particular learning mechanism, explanation-based neural network learning, that transfers knowledge between related tasks via neural network action models. The learning approach is illustrated using a mobile robot, equipped with visual, ultrasonic and laser sensors. In less than 10 minutes operation time, the robot is able to learn to navigate to a marked target object in a natural office environment. 1. I N T R O D U C T I O N Throughout the last decades, the field of robotics has produced a large variety of approaches for the control of complex physical manipulators. Despite significant progress in virtually all aspects of robotics science, most of today's robots are specialized to perform a narrow set of tasks in a very particular kind of environment. Most robots employ specialized controllers that have been carefully designed by hand, using extensive knowledge of the robot, its environment and its task. If one is interested in building autonomous multi-purpose robots, such approaches face some serious bottlenecks. 9 K n o w l e d g e b o t t l e n e c k . Designing a robot controller requires prior knowledge about the robot, its environment and the tasks it is to perform. Some of the knowledge is usually easy to obtain (like the kinematic properties of the robot), but other knowledge might be very hard to obtain (like certain aspects of the robot dynamics, or the characteristics of the robot's sensors). Moreover, certain knowledge (like the particular configuration of the environment, or the particular task one wants a robot to do) might not be accessible at all at the design-time of the robot. 9 E n g i n e e r i n g b o t t l e n e c k . Making domain knowledge computer-accessible, i.e., hand-coding explicit models of robot hardware, sensors and environments, has often been found to require tremendous amounts of programming time. As robotic

202 hardware becomes increasingly more complex, and robots are to become more reactive in more complex and less predictable environments, the task of hand-coding a robot controller will become more and more a cost-dominating factor in the design of robots. 9 T r a c t a b i l i t y b o t t l e n e c k . Even if the robot, its environment and its goals can be modeled in sufficient detail, generating control for a general-purpose device has often been found to be of enormous computational complexity (e.g., [18]). Moreover, the computational complexity often increases drastically with the complexity of the mechanical device. Machine learning aims to overcome these limitations, by enabling a robot to collect its knowledge on-the-fly, through real-world experimentation. If a robot is placed in a novel, unknown environment, or faced with a novel task for which no a priori solution is known, a robot that learns can collect new experiences, acquire new skills, and eventually perform new tasks all by itself. For example, in [5] a robot manipulator is described which learns to insert a peg in a hole without prior knowledge regarding the manipulator or the hole. Maes and Brooks [8] successfully applied learning techniques to coordinating leg motion for an insect-like robot. Their approach, too, operates in the absence of a model of the dynamics of the system. Learning techniques have frequently come to bear in situations where the physical world is extremely hard to model by hand (e.g., the characteristics of noisy sensors). For example, Pomerleau describes a computer system that learns to steer a vehicle driving at 55mph on public highways, based on input sensor data from a video camera [15]. Learning techniques have also successfully been applied to speed-up robot control, by observing the statistical regularities of "typical" situations (like typical robot and environment configurations), and compiling more compact controllers for frequently encountered situations. For example, Mitchell [11] describes an approach in which a mobile robot becomes increasingly reactive, by using observations to compile fast rules out of a database of domain knowledge. However, there is a fundamental shortcoming in most of to date's rigorous learning approaches. Most of the robot control learning approaches focus on learning to achieve single, isolated performance tasks. If one is interested in learning with a minimum amount of initial knowledge, as is often the case in approaches to robot learning, such approaches have a critical limiting factor: the number of training examples required for successful generalization. The more complex the task at hand and the lesser is known about the problem beforehand, the more training data is necessary to achieve the task. In many robotics domains, the collection of training data is an expensive undertaking due to the slowness of robotics hardware. Hence, it does not surprise that the time required for realworld experimentation has frequently been found to be the limiting factor that prevents rigorous machine learning techniques from being truly successful in robotics. The task of learning from scratch can be significantly simplified by considering robots that face whole collections of control learning problems over their entire lifetime. In such a lifelong robot learning scenario [27], learning tasks are related in that they all play in the same environment, and that they involve the the same robot hardware. Lifelong learning scenarios open the opportunity for the transfer of knowledge across tasks. Complex tasks, which might require huge amounts of training data when faced in isolation, can conceivably

203 be achieved much faster if a robot manages to exploit previously learned knowledge. For example, a lifelong learning robot might acquire general-purpose knowledge about itself and its environment, or acquire generally useful skills that can be applied in the context of multiple tasks. Such functions, once learned, can be applied to speed up learning in new tasks. Hence, a lifelong learning perspective, as proposed in this paper, promises to reduce the number of training examples required for successful learning, and hence to scale machine learning technology to more complex robot scenarios. In this paper we will present one particular candidate approach to the lifelong learning problem: The explanation-based neural network learning algorithm (EBNN) [13,26,27]. EBNN uses a hybrid learning strategy to generalize training data. On the one hand, it allows to learn functions inductively from scratch, just like neural network Backpropagation [17]. On the other hand, EBNN also allows to learn task-independent domain knowledge, which applies to multiple tasks. More specifically, in the robot control learning scenarios studied in this paper, domain knowledge is represented by neural network action models, which model the effect of the robot's actions. These models are independent of the particular control learning task at hand, and they are acquired over the lifetime of the robot. When learning control, they are used to analyze observations, in order to generalize better to unseen situations. Since in this paper we are interested in learning robot control, we will describe EBNN in the context of Q-Learning [28]. Q-Learning is a popular method for learning to select actions from delayed and sparse reward. To illustrate the appropriateness of the EBNN learning mechanism for robotics problems, we will describe experimental results obtained in an indoor mobile robot navigation domain. 2. L E A R N I N G

CONTROL

Q-Learning [28] is a reinforcement learning technique that has frequently been applied to robot control. The goal of Q-Learning is to learn a policy for generating whole action sequences, which maximizes an externally given reward function. In general, the reward may be delayed and/or sparse, adding a significant degree of complexity to the learning problem. For example, in the robot navigation experiments reported here, reward is only received upon reaching a particular goal position, or upon total failure (i.e., loosing a target object out of sight, as described in Sect. 4 ). The goal of learning, thus, is to construct a reactive controller that carries the robot to its goal position in as few steps as possible. Q-Learning works as follows. Let 5' be the set of all possible robot percepts. For simplicity, let us assume that at any (discrete) point in time, the agent can observe the complete state of the world 1, denoted by sES. Based on this observation, the learner then picks an action a (from a set of actions A). As a result, the world state changes. In addition, the learner receives a scalar reward value, denoted by r(s, a) (or rt, if we refer to the expected reward at a certain time t), which measures its current performance. Throughout this paper we will assume that reward will exclusively be received upon reaching a designated goal location, or upon total failure, respectively. 1This restrictive assumption is frequently referred to as the Markov assumption. See [2,7,10,22] for approaches to reinforcement learning in partially observable worlds.

204

Q

(goal/failure)

valuenetworkQ~I N~

valuenetw~

N~

valuenetw~

N~

Figure 1. Episode and Q-networks in Q-Learning.

Formally, in Q-Learning one seeks to find an action policy 7r:s ~a, i.e., a mapping from environment states s to actions a, which, when applied for action selection, maximize the

cumulative discounted future reward oo

R

=

(1) t'-tO

Here 7 (with 0 < 7 5 1 ) is a discount factor that favors rewards reaped sooner in time. Notice that actions may impact reward many time steps later. Hence, in order to learn to pick reward-maximizing actions, one has to solve a temporal credit assignment problem. The key of Q-Learning is to learn a value function for picking actions. A value function, ~ , maps robot percepts sqS and actions aCA to scalar "utility" denoted by Q:S• values. In the ideal case, Q(s, a) is, after learning, the maximum cumulative, discounted reward one can expect upon executing action a in state s (cf. Eq. (1)). Hence, Q ranks actions according to their reward: The larger the expected cumulative reward for picking action a at state s, the larger its value Q(s, a). Q, once learned, allows to generate optimal actions by greedily picking the action which maximizes Q for the current state s: 7r(s)

=

argmax Q(s, ~t) aeA

Initially, all values Q(s, a) are set to zero. During learning values are refined incrementally, using the following recursive update procedure. Suppose the robot just executed a whole action sequence which, starting at some initial state Sl, led to a final reward r(sT, aT). Such an episode is shown in Fig. 1. For all time steps t within this episode, Q(st, at) is then updated through mixture of the values of subsequent observation-action pairs, up to the final value at the end of the episode. The exact update equation, combined with a modified temporal differencing rule [24], is: +100 if at final action, robot reached goal - 1 0 0 if at final action, robot failed

~target(st, at)

(2)

7" [(l-A). max Q(St+l,a) a action l

+)~. ~target(st+l,at+l) ]

otherwise

205 actionmodelMal

actionmodelMa2

(goal/failure)

valuenetworkQal ~

valuenetworkQa2 ~

valuenetworkQa3

Figure 2. E B N N . General-purpose action models are used to derive slope information for training examples.

Here A (0_A ci + c2 + c, move and expand the region represented by NI: Let ~ = c + cl + c2, N.,, = N:B., N.: - N~B: x ( N 1 ) - :(N,)+:. y ( N ~ ) - u(N,)+u, z ( N 2 ) - :(N:)+:, '

2

'

2

'

2

'

y(N2) = ~(N~)+y,, d(N~) - d(Nx) + dl and d(N2) - d(N2) + d2. 11. Else (a) if Cl > c2 expand the region represented by N2 with d(N2)= d(N2)+ d2 and continue with 12. (b) else expand the region represented by N1 with d(N1) = d(N~)+ d~, let/V = Nx and continue with 13.

225 12. Insert a new node representing the current start region Add a node N1 to D.

(b)

Let (z(N1),y(N1)) = (zo,yo) be the center of the cluster represented by the node and d(N1) = dmi, be its width.

(c)

Find the set SNo of nodes of D such that d = y/(ZiN) ' Zg)~+ V/(Y(N) - yg)2 < d(N) for N e Sn,.

(a) (e) (f) (g)

If SNg is empty, let N = N1 and continue with 13. Let N2 6 Sn, be the node with the minimum d. Initiahze the connection between N1 and N2 with the data obtained from B. exit BUILD-DS.

13. With N representing the corresponding start region, insert a new node representing the current goal region: (a) Add a node N2 to D. _

(b) Initialize the connection between N and N2 with the data obtained from B. Summarized, the method works by building a structure representing (possibly overlapping) regions of different size and virtual connections, estabhshed by already driven paths between them, thus improving the selection of good entry/exit points over time. The method features a minimum overhead regarding the apphcation of the geometrical planner, since geometrical planning is during learning only necessary in the case of very close start- and goal positions. It should be noted that the costs of known paths are not stored in the decision structure, because the estimation of edge costs is improved continuously (see below). 4 . 2 . I n c r e m e n t a l i m p r o v e m e n t of edge c o s t e s t i m a t i o n

Given the topological map, a start node, a goal node, and the costs of moving along the individual interconnections (edges) between each of the nodes in the map, an optimal path from the start to the goal can be found by several means. Provided that a lower bound estimation of the individual edge's costs exists, algorithms such as A* produce optimal solutions. However, they also require to keep track of the real costs that emerge and possibly to climb up the search tree if the current branch does not yield the desired solution. After the goal node has been reached, the best path between the start and the goal is known. Transferred to the given apphcation domain, this behaviour would result in the robot moving back and forth along paths. This solution is of course not satisfactory, since the aim must be to obtain a near optimal path before the robot starts moving. Therefore, the lower-bound condition for the edgecost estimation (as, for instance, given by the length of the corresponding passage) is not sufficient. In fact, the edge costs must be known as accurately as possible in order to produce a good path. Moreover, in a dynamic environment the costs of traversing an individual edge can also be changing. Therefore, the edge cost function C(e) must meet the following requirements:

226 9 The a-priori estimation of C must incorporate not only static components, but also information about changes in the environment. 9 After the real costs of traversing an edge are known, the edge's cost function must be updated. These requirements yield the following definition of the cost function C: Ca_priori(e) =

g(l,,p,) + h(hist(e)),

where 9 a monotonically increasing function of the length l, of the edge and the complexity p, of the required behaviour, hist(e) is the accumulated information about the costs to move along the edge e and h is a function mapping this complex information to a numerical value. After the robot has moved along the planned path, the real costs of the individual edges are known: Ca.posteriori(e) =

f(t(e),t,t(e)) + g(l,,pe) + f(o(e),o,t(e)),

where f is a monotonically increasing function of the fraction between the measured time t(e) required to move along an edge and the estimated time t,,t(e), and f is a monotonically increasing function of the fraction between the measured traversing difficulty o(e) and the estimated one o~,t(e). Summarizing, the following terms must therefore be determined: 9 The parameter te,t(e), which is first given by the length of the edge and the nominal speed of the robot, but which will change according to the measured time t(e) required to traverse edge e. t(e) 9 The function f, which is realized a s Ote t,,,(e). t(,) 9 The function f, which is realized as oT,t,o,(, ). -

9 The parameter p,, which can be calculated for each edge from the complexity of the commanded movement, which might, for instance, depend on certain security requirements. 9 The parameter le, which is a static attribute of each edge and can be calculated from the geometrical model of the environment. 9 The function g which combines the length of the edge and the commanded movement to a numerical value. This function is static as are the physical characteristics of the robot and the length of the edges. 9 The information about the edge e over time hist(e) 9 The history analyzer function h. It is obvious that the information that is most difficult to handle concerns the behaviour of a particular edge in the past. While the exploration strategy described in section 4.2.2 accounts for a changing traversability of the edge (i.e., if the edge can be used at all), dealing with continuous valued, possibly periodically changing parameters (such as the density of obstacles) requires much more sophisticated strategies.

227 4.2.1. P r e d i c t i o n of edge costs Analyzing the history of an edge in terms of its costs over time aims at value of the function h. Therefore, the problem can be stated as follows:

predicting the

Given: The history of the edge in terms of the costs C ( e ) t , . . . , C(e)l~J (with le[ being the number of times the edge has been used) to traverse it. D e t e r m i n e : ] t ( C ( e ) t , . . . , C(e)lel) such that ]t(C(e)t,..., C(e)lel) ~ h(hist(e)). Given the definition of Ca_priori(e)and Ca_posteriori(e), it can be found that h(hist(e)) and therefore h should approximate H = f(t(e),t~,t(e))+ ](o(e),o+ot(e)). As H can be calculated directly after the edge has been passed, each passing of an edge yields an example from which the general shape of the function tt can possibly be derived. The task to be solved is therefore to improve the prediction of the value of it by means of learning from examples of h(hist(e)). An appropriate learning technique to deal with such problems is temporal difference learning [5]. 4.2.2. A s t r a t e g y for e x p l o r a t i o n As mentioned in section 4.1, temporal changes such as a passage blocked by a human, a vehicle, or a large workpiece, must be taken into account. In fact, allowing the robot to try again to use a passage that has been found to be blocked before, requires an exploration strategy. Principally, this strategy works by assigning a temporally valid blocked attribute to edges that could not be traversed. This attribute is only valid for a certain time, with the time depending on the experiences made with the corresponding edge, i.e., how often the edge has been found to be traversable respectively blocked. In detail, the exploration strategy works as follows: 1. Initialization (a) Assign a blocked counter to each edge. (b) Assign a blocked h i s t o r y counter to each edge. (c) Set the blocked counter to zero. (d) Set the blocked h i s t o r y counter to one. 2. During planning: (a) If the blocked counter of an edge is zero, the edge can be traversed, (b) else decrease the blocked counter by one. 3. During execution" (a) If an edge is found to be traversable, set its blocked h i s t o r y counter to mar.(-ti blocked h i s t o r y counter, 1). (b) If an edge is found to be blocked, set its blocked counter equal to the blocked h i s t o r y counter, double the blocked h i s t o r y counter. This strategy realizes a dynamic adaptation of the frequency of retries the robot performs on edges that have been found to be blocked.

228 5. Conclusion and f u r t h e r work The work described in this paper deals with an important aspect of the realization of mobile systems that are adaptive with respect to their environment. The main idea that has been followed is to realize adaptive behaviour not only on the level of reflexes, but also with respect to the planning capabilities of the robot. This approach is closely related to the hierarchical control architecture of the mobile robot PRIAMOS, which has been used for experiments. By means of continuously improving the estimation of edge costs and the evaluation criteria of the topological planner, changes in the environment are not only reflected in the explicit world model, but also in the robot's reasoning system. The main activities at the moment are concerned with the practical evaluation of the theoretically achieved results. Additionally, some more theoretical work has to be done concerning the decision structure. Especially algorithms for shrinking, deleting, splitting and merging of regions seem to be necessary. It is not clear if the different evaluation functions cover all situations that occur in a real factory. However, assuming that the robot's working environment is not too cluttered (which should be the case e.g. in an automated factory), both the planning and the adaptive components can be expected to work flawlessly. Acknowledgement This work has partially been supported by the ESPRIT Project 7274 "B-Learn II". It has been performed at the Institute for Real-Time Computer Systems & Robotics, Prof. Dr.-Ing. U. Rembold and Prof. Dr.-Ing. R. Dillmann, Department of Computer Science, University of Karlsruhe, 76128 Karlsruhe, Germany. The authors would like to thank the anonymous reviewers for their helpful comments. REFERENCES 1. A. Elfes. A sensor-based mapping and navigation system. In Proceedings of the IEEE International Conference on Robotics and Automation, 1986. 2. J . H . Gennari, P. Langley, and D. Fisher. Models of incremental concept formation. Artificial Intelligence, 40:11- 61, 1989. 3. P. Kampmann and G. Schmidt. Topological structured geometrie-database and global motion planning for the autonomous mobile robot macrobe. Robotersysteme, 5, 1989. 4. J. del R. Mills and C. Torras. Efficient reinforcement learning of navigation strategies in an autonomous robot. In Proceedings of the IEEE//RSJ Conference on Intelligent Robots and Systems, 1994. 5. R.S. Sutton. Learning to predict by methods of temporal difference. Machine Learning, 3:9 - 44, 1988. 6. F. Wallner, T. C. Lfith, and F. Langinieux. Fast local path planning for a mobile robot. In Proc. of the 2nd Int. Conf. on Automation, Robotics and Computer Vision, Singapore, 1992.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

A self-organizing representation

229

of s e n s o r s p a c e for

mobile robot navigation Ben J. A. KrSse and Marc Eecen a aFaculty of Mathematics and Computer Science, University of Amsterdam Kruislaan 403, 1098 SJ Amsterdam, the Netherlands k r o s e @ f w i , uva. nl

The paper describes a sensor-based navigation scheme for a mobile robot system which makes use of a global representation of the environment by means of a self-organizing map or Kohonen network. In contrast to existing methods for self-organizing environment representation, this discrete map is not represented in the world domain or in the configuration space of the vehicle, but in the sensor domain. The map is built by exploration. A conventional path planning technique now gives a path from the current state to a desired state in the sensor domain, which can be followed using sensor-based control. Collisions with obstacles are detected and used in the path planning. Results from a simulation show that the learned representation gives correct paths from an arbitrary starting point to an arbitrary end point. 1. I N T R O D U C T I O N Goal-directed, collision-free robot navigation can be a result of planned actions, where a path planner determines off-line the optimal path from the current configuration to the desired configuration. Alternatively, it can be induced by a reactive behaviour, where sensor data is used directly to generate actions. If a planning method is used for navigation, global information about the environment is needed. Since such global information is not present in the data of sensors on the robot system, an internal representation of the environment must be used. Such a global model is usually represented in a configuration space spanned by the degrees of freedom of the robot system. For a mobile robot these are its position and its orientation. The environment is usually represented as a (geometric) model indicating the free and occupied areas. Given knowledge about desired and current configuration, a large number of methods exist for planning in configuration space or world space [4]. The construction of such an environment model can be done on the basis of a-priori knowledge, but the maps can also be learned by exploration. Self-organizing networks have been presented as maps in the world domain or configuration space [6],[7]. The role of sensor measurements in such a navigation scheme is to estimate the actual state of the robot system in the configuration space. Controlling the system consists of sending control commands to the actuators such that the error between actual state and desired state in the configuration space is minimized.

230 Reactive systems on the other hand consider the state of the robot not in configuration space but in sensor space. A state of the robot is now defined as a constellation of sensory data or features extracted from these data. It has been shown that the difference between the actual and desired state in sensor space can be used to control the robot, either using an explicit model of the system [11] or a learned, implicit model in the form of a neural network [9]. However, since the sensor information is inherently local, reactive systems suffer the from the existence of local minima. Methods have been described for escaping from those minima but do not result in optimal paths. The only guarantee for an optimal path is the use of global knowledge. For reactive systems this should be a representation in the sensor space, so that paths in sensor space can be planned which can be followed by a reactive system. In this paper we introduce the sensor space as the state space of the robot and describe some of the possibilities and problems for representing environment information. A selforganizing model is described which represents the environment in the sensor space. The model is used for path planning and path following for a non-holonomic mobile robot. 2. S E N S O R - B A S E D

CONTROL

AND GLOBAL

REPRESENTATION

Although animals pick up objects and navigate through complex environments without precise numerical three-dimensional position information, most robot systems still use an internal representation of the world in a carthesian domain. However, for certain robot tasks, robot motion can be successfully controlled directly by using sensor data or features derived from these data instead of computing positional information from these data. For robot manipulators, "visual servoing" methods have been developed [2],[11] which use a direct mapping from image features to control commands. These methods form the basis for our sensor space representation and navigation scheme for mobile robots. A mobile robot is equipped with a variety of sensors, which may be range sensors, visual sensors which detect marker positions in the image, beacon detectors, and so on. Let the state be described by the vector f - (fl, ..., fro) C 9r in which fi is a sensor output or the value of a feature derived from sensor data. The robot is controlled by sending it displacements Ax. For sensor-based navigation, the vehicle has to follow a trajectory f(t) in sensor space 9t-. Similar to visual servoing methods, an error in the sensor domain Af(t) can be mapped into the necessary motion A x via the m x n "Feature Sensitivity Matrix" JR: Ax = JFtAf.

(1)

in which J F t is the generalized inverse of JF, and can be derived from a model of the system or can be represented in a neural network which is trained from a learning set. For goal-directed navigation in a cluttered environment, the problem is to find a trajectory from a current state f to the goal state fgoaz. Similar as for carthesian-based navigation schemes, a procedure which navigates the vehicle along the vector f(t) - fgoaZ will be stuck in local minima. For this, we need a global representation of the sensor space, as well as methods to plan paths in this space. Our first question is how global environment knowledge looks like when expressed in the sensor space ~-. When the robot operates within a given environment R, all possible measurements f will lie in a fixed sub-space 7~ of the m-dimensional sensor space F , since not all sensor

231

Figure 1. For a fixed orientation of the vehicle all sensor data f will be represented as a 2-dimensional manifold in the n-dimensional sensor space. In a) this manifold is drawn for a 3-dimensional sensor space for the sensor configuration sketched in b). The environment is a square room.

vectors will occur in R. Furthermore, we know that this subspace is 3-dimensional, since the number of degrees of freedom to generate those data is only 3: position and orientation of the vehicle. In order to study the complexity of such a subspace we have computed and visualized this subspace for some simple situations. For example, the sensor space of a vehicle equipped with 3 range sensors is 3-dimensional. For a fixed orientation of the vehicle all sensor data f will be represented as a 2-dimensional manifold in this sensor space. Figure 1 shows the complex nature of the data. A trajectory of the vehicle in the room will result in a trajectory on the manifold. The remainder of the paper is concerned with how to construct this subspace which is basically finding a non-linear projection for the mapping of sensor data onto this subspace. The model is used for pathplanning and navigation.

3. R E P R E S E N T A T I O N MAIN

OF T H E E N V I R O N M E N T

IN THE SENSOR

DO-

Because it is assumed that the environment is not known (and even if it is known it is not a trivial task to find an analytical expression of the subspace) a learning m e t h o d has to be found which builds the internal representation, and is able to represent the nonlinear mapping from 9c to 7Z. In this paper we used a self-organizing vector quantization scheme in the form of a Kohonen network. A similar approach was presented by [3]. The Kohonen algorithm learns from examples a mapping from the input space jc to a lattice ~4 of N formal neurons rl, ...rN. Since we know that the intrinsic dirnensionality of the sensor data is three, a three-dimensional lattice is used. The map Cw : 9r ~ j t assigns to each sensor vector f C ~ an element Cw(f) E A, and is defined by the "weights"

232

l

16-d range data

compass r3

Figure 2. Different positions may result in identical sensor states. Therefor a compass was added. The value of the compass determines the winning "layer" while the value of the range sensor determines the winning neuron in a layer.

w = (wl, w2, ...wN), wi E ~-. The function Cw(f) is specified by the condition IWcw(f)- f l -

min Iwr - fl rcA

(2)

i.e., an sensor vector f is mapped onto that neuron r for which I W r - f] becomes minimal. During learning the weights of all neurons r in the network are adapted according to the Kohonen rule nWr = ~A(r, k ) ( f -

Wr),

(3)

where the learning parameter 7/is multiplied with A(r, k), a decreasing function of distance of neuron r to the winner k [8]. Learning also decreases with time to obtain a stable network.

4. E X P E R I M E N T S

ON REPRESENTATION

Our mobile robot is equipped with 16 range sensors, each of which gives the distance to the nearest object within a beamwidth of 2 degrees. The sensors are fixed on the vehicle at orientations uniformly distributed between 0 and 2~r. Each measurement can be considered as a vector f in a 16-dimensional sensor space. It is obvious that if the vehicle is not equipped with only range sensors but also with other external sensors such as beacon detectors, ultrasonic sensors, etc, the dimensionality of the Kohonen network can still remain three. A problem with using range data only is that (in particular for symmetric environments) range data at a particular location can be identical to range data at another location but for a different orientation of the vehicle. We equipped the vehicle with a compass in order to distinguish between these situations. With data from the 16-dimensional range data and the compass, a 10 x 10 x 10 network was trained. The 17dimensional sensor vector was not fed directly into the network but instead a hierarchical arrangement was used, similar to [5]. According to the output ~o of the compass a layer in the "z" direction of the network was selected. Within this layer a winning neuron k was selected as the neuron having a weight vector closest to the 16-dimensional range input vector f (see Figure 2). Weights of all neurons r in the network were adapted according to the Kohonen rule (Eq.3). Periodicity in the orientation ~ was taken into account by

233

Figure 3. Regions in the work space where a neuron is winner and positions where the distance to the winning neuron is m i n i m a l (vertices of network). This is an easy room.

a w r a p - a r o u n d of the network: neurons of the u p p e r and lower layer of the network are considered neighbours. For two environments we tested the algorithm. The first environment was a square room of 20 x 20 meters, with only two small obstacles in the lower-left and lower-right corner. The second test was done in a square r o o m with a n u m b e r of large obstacles. The latter r o o m was also used as test environment for the p a t h planning. Training samples are generated by positioning the vehicle at 40.000 r a n d o m positions and orientations in the r o o m and d e t e r m i n i n g the range data. In this way the 3-dimensional network in the sensor space is built. For visualization we d e t e r m i n e d which neurons were active for which configurations of the vehicle. This was done by systematically displacing the vehicle by small steps in the working space for a fixed orientation and r e p e a t i n g it for other orientations. For each position the label of the winning neuron was stored as well as the distance to this neuron.

234

Figure 4. Positions where the distance to the winning neuron is minimal (vertices of network). This is a difficult room.

Results are depicted in Figure 3. The boundaries between the regions in which a neuron is winner are plotted as well as the positions at which the distance to the neuron is minimal (vertices of the network). For the simple room depicted in Figure 3, the neurons form a nice square lattice in the room. However, for the more complex room we have used in our experiments, the results are somewhat different, as shown by Figure 4. Noticeable in the first place is the fact that some connections are missing. The reason for this is that only connections are drawn between nearest neighbours in the lattice. Apparently not all neurons will become winner during the test. This may be a result of the fact that because of the obstacles there will be many discontinuities in the sensor space. The Kohonen learning procedure may put neurons at the discontinuity. Secondly, the figure shows that very distant positions in the world space may activate neighbouring neurons in the network.

235 5. P L A N N I N G

AND NAVIGATION

It has been shown (e.g., [10]) that discrete representations of the working space, in the form of Kohonen or similar networks can be used for path planning using conventional search algorithms. Also, if the neighbour relations of neurons are considered as excitatory synaptic connections, the dynamics of such a topologically organized neural net can be used for path planning and obstacle avoidance [1]. In our application, where a path in sensor space has to be planned for a mobile robot this approach can not be followed for a number of reasons. 1. For non-holonomic systems such as our mobile robot, two neighbouring ('connected') neurons in the sensor space are not necessarily connected in the sense that an elementary action can bring the system from one state to the other. 2. If planning is done within the Kohonen network, the metric which will be used for path planning will be the metric of the lattice of neurons. The Kohonen learning rule places the nodes according to the probability density function of the inputs: regions in input space with high probability will have a large density of neurons. This means that the length of the path will depend on the distribution of learning samples. For this reason we decided not to use the neighbour relations of the Kohonen network for path planning. Instead we considered the navigation as a Markovian decision problem with the neurons of the Kohonen network as a discrete set of sensor states. During exploration of the environment, the sensor values f and corresponding actions u were stored. For each f the winning neuron ri is calculated before and after the action u, and an estimation of the state transition probability pij(u) is made. pij(u) is defined as the probability that, given an initial state (neuron) i, the vehicle is found in state (neuron) j after action u. In this way, neighbourhood relations between sensory states are imposed by the elementary actions. For navigation a greedy policy is used with respect to a cost-function f* which assigns to each neuron i the (expected) cost f* for driving toward the goal:

u -- argmin [c(u) + 9/ ~j pij(u) f~] ,

(4)

with c(u) the immediate costs of the action. The problem is to find the cost function f* for the Markovian decision problem. For this a dynamic programming approach is chosen. The cost function can be approximated via an iterative process if the direct cost c(u) of an action u and the transition probabilities are known:

fi(k + l) - min [c(u) + 3'~PiJ(u)fj(k) 3

,

(5)

in which fi(k) is the approximated costs for neuron i at time step k. This procedure converges to the optimal cost function f* if the goal state is initialized to zero and 7 < 1.

236 During exploration, collisions with obstacles are stored as well, so for each state i we can make an estimate of pi,cot(U), defined as the probability that given the state (neuron) i the vehicle will collide with an obstacle after action u. Note that the collision is seen as a discrete state rcoz, which also has a value for the cost function f*ot. In our model the cost for the collision neuron is always fcoz(k) = 1 + maxi fi(k), so the system will prefer actions which do not result in a collision.

6. E X P E R I M E N T S

ON PLANNING

Experiments were carried out in simulation. In our experiments, the set of possible actions b / w a s restricted to left, straight, right, indicating that the vehicle rotates along - 3 0 , 0 respectively 30 degrees, followed by a linear displacement of .75 m. The workspace of the robot is shown in Figure 4, which also shows a layer from the Kohonen network. S t a t e transition probabilities pij(u) were estimated by performing 100.000 random actions at random positions. The immediate costs c(u) of an action are equal to one for each action. We tested the planning algorithm by selecting an a r b i t r a r y goal state in the sensor domain. The vehicle was put at an arbitrary position and the cost function given in Eq 5 was iteratively determined. As soon as the current neuron (the neuron which is active at the current position) has neighbours with lower cost, a steepest descent m e t h o d can be applied: the action is selected for which the expected future costs are minimal. Note that this does not necessarily mean that the function f(k) at time step k has converged to the optimal solution f*(k). We therefore tested how many iterations were needed for a correct path. Results are shown in Figures 5 and 6. The lower part of Figure 5 shows the path of the vehicle if it starts moving after 10, 30, 40 or 200 iterations of the dynamic programming algorithm. The upper part shows the number of elementary actions of the vehicle which are needed to reach the goal state. If the number of iterations is less than 50 the vehicle does not reach the goal but collides with an obstacle. Figure 6 shows a more dimcult situation: the vehicle has to make a turn in a narrow alley. At least 50 iterations are needed before the cost function is sufficiently good for collision free navigation. Remarkable is that the path length increases after 300 iterations (while a good, short path was found after 50 iterations). The reason for this is our choice for the costs for the collision "neuron", which is always larger than the costs of any other neuron and therefore contributes (too) much to an expected cost assessment. The longer path is apparently a result of the trade-off between minimizing the probability on a collision and minimizing the expected number of actions needed to reach the goal.

7. D I S C U S S I O N Results show that an internal representation of the sensor space, acquired by an exploration of the environment, can be used for path planning and navigation. The sensor space of the robot system consists all possible constellations of sensory data which can occur in a given static environment of the system. In this paper a Kohonen network is used for the nonlinear mapping from the high-dimensional sensor space to the 3-dimensional network. The use of such a projection also enables us to take into account the uncertainty

237

Figure 5. The upper part shows the number of actions needed to reach the goal. A black disk indicates termination of navigation because of a collision. The lower part of the figure shows some typical paths of the vehicle if it starts moving after 10, 30, 40 or 200 iterations of the dynamic programming algorithm.

238

Figure 6. As in the previous figure. The lower part of the figure shows some typical paths of the vehicle if it starts moving after 10, 50, 200 or 300 iterations of the dynamic programming algorithm.

239 of the sensor data. If the data of one of the sensors is more uncertain than the data from another sensor, we can take this into account by lowering the contribution fi of this sensor to the distance measure which is used to determine the winning neuron. Some shortcomings of the approach have shown up which will have to be studied in the near future. One of the major drawbacks of the approach presented in this paper is that many learning samples are needed: the Kohonen map already needs about several thousands of samples, but to estimate the transition probabilities we need even more samples. In a more efficient approach both the Kohonen map and the transition probabilities could be learned at the same time, while a more realistic learning procedure the learning samples will not be generated by random placement of the vehicle but by an exploratory navigation.

Acknowledgements The authors thank Leo Dorst for discussions on the shape of the sensor data subspace and his help in plotting the space for a simple situation. The work was carried out partly within the project "Active Perception and Cognition", supported by the Japanese RWC programme.

REFERENCES 1. Glasius, R., Komoda, A., Gielen, S. Biologically Inspired Neural Networks for Trajectory Formation and Obstacle Avoidance. in: S. Gielen and B. Kappen (eds) ICANN'93, Proceedings of the International Conference on Artificial Neural Networks. Springer-Verlag, (1993) 281-284. 2. Won Jang and Zeugnam Bien, Feature-based Visual Servoing of an Eye-In-Hand Robot with Improved Tracking Performance. Proceedings of the 1991 IEEE Int. Conf. on Robotics and Automation, (1991), 2254-2260. 3. Kurtz, A. Building maps for path-planning and navigation using learning classification of external sensor data. in: I. Aleksander and J. Taylor (eds): Artificial Neural Networks 2, North-Holland/Elsevier Science Publishers, Amsterdam, (1992) 587-591. 4. Latombe, J.C. Robot motion planning. (1991) Kluwer Academic Publishers, Boston. 5. Martinetz T, Schulten K. Hierarchical neural nets for learning control of a robot's arm and gripper. IJCNN-90 Conf. Proc., San Diego 1990, III, (1990) 747-752. 6. Morasso, P., Vercelli, G. and Zaccaria, R. Hybrid systems for robot planning, in: I. Aleksander and J. Taylor (eds): Artificial Neural Networks 2, North-Holland/Elsevier Science Publishers, Amsterdam, (1992) 691-697. 7. Najand, S., Zhen-Ping Lo, Bavarian, B. Application of Self-organizing neural networks for mobile robot environment learning, in: G. Bekey and K. Goldberg, Neural networks in robotics, Kluwer Academic Publishers, Dordrecht, (1992) 85-96. 8. Ritter, H, Martinetz T, Schulten K. Topology conserving maps for learning visuomotor coordination. Neural Networks 2, (1989) 159-168. 9. P.P. van der Smagt and B.J.A. KrSse. A real-time learning neural robot controller. In: Kohonen, M/ikisara, Simula and Kangas (eds): Artificial Neural Networks. Elseviers Science Publishers B.V. (1991) 351-356.

240 10. J.M. Vleugels, J.N. Kok and M.H. Overmars A self-organizing neural network for robot motion planning, in: S. Gielen and B. Kappen (eds) ICANN'93, Proceedings of the International Conference on Artificial Neural Networks. Springer-Verlag, (1993) 281-284. 11. Weiss, L.E., A.C. Sanderson and C.P. Neuman. Dynamic sensor-based control of robots with visual feedback. IEEE Journal on Robotics and Automation, RA-3, (1987) 404-417.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

241

Path Planning and Guidance Techniques for an Autonomous Mobile Cleaning Robot Christian Hofner and G/inther Schmidt ~* ~Department for Automatic Control Engineering (LSR), Technische Universit~t Mfinchen, D-80290 M/inchen, Germany P h o n e : (+49 89) 2105-3416, F A X : (+49 89) 2105-8340 E-mail: [email protected] K e y w o r d s : Cleaning path planner, self-localization, sensor application planner, tracking error compensation, obstacle avoidance. Application of freely navigating mobile robots to service tasks such as floor cleaning or inspection requires specific techniques for path planning and vehicle guidance. This article describes in its first part a rule-based geometric path planner. In addition to a 2D environmental map the planner takes into account robot kinematics and geometry as well as technological requirements. The result of path planning is a list of motion commands which leads during path execution to a most complete coverage of a user-specified cleaning area. In its second part the article discusses problems of vehicle guidance including initialization of the robot's start location, cleaning path execution, accurate path tracking, vehicle localization as well as path replanning in case of obstacles. The article concludes with a discussion of implementation issues and results of a floor coverage experiment with the full-scale mobile robot MACROBE. 1. I N T R O D U C T I O N Many challenging research problems arise in the application of freely navigating mobile service robots to floor cleaning or related tasks in extended public areas such as corridors, halls or platforms [11]. Although most of todays cleaning and sweeping machines are still guided by human operators there exist already several semi-autonomous cleaning robots for commercial usage [12]. These systems assist a human operator in cleaning of rectangular or polygonal shaped areas. Obstacles cause stops, until either the operator or the obstacle clears the blocked path. A few publications [4],I2] discuss application of a mobile robot to the cleaning task without making use of explicit a-priori knowledge about geometry and complexity of the cleaning area. In spite of these developments there exist many open problems in intelligent path planning and vehicle guidance for cleaning robots due to advanced requirements as listed in Table 1. *The work reported in this article was supported by the Deutsche Forschungsgemeinschaft DFG as part of an interdisciplinary research project on "Information Processing Techniques in Autonomous Mobile Robots (SFB 331)". The authors would like to thank colleagues and students of the laboratory for their valuable contributions to this work.

242 Table 1 Advanced requirements for an autonomous floor coverage robot Task-related requirements Identification of the cleaning area Path planning considering technological and economical aspects Accurate long distance path tracking

Safety/operational requirements Man-robot-interface for operator assistance Collision and tumbling avoidance Detection of obstacles with the result of velocity adaptation and path replanning

Detecting, memorizing and cleaning temporarily occupied parts of the cleaning area

In [8], [9] we have introduced basic ideas of our planning and guidance approach for autonomous floor-covering operations in extended indoor environments. In Section 2 of this article we describe in more detail the rule-based planning system which automatically generates the motion command sequence for an appropriate cleaning path according to robot geometry and kinematic restrictions. Results from automatic planning are illustrated for robots with various degrees of agility in typical cleaning environments. In Section 3 we discuss a vehicle guidance scheme comprising regular robot localization, i.e. estimation of position and heading, as well as appropriate compensation of drift errors during path execution. Unlike obstacle avoidance approaches known from point-to-point operations we propose in Section 4 a new approach for detection and handling of obstacle situations during cleaning motion. In Section 5 we discuss the implementation of the total control structure as well as results from a floor coverage experiment evaluating the proposed path planning and tracking techniques. 2. P L A N N I N G

OF A C L E A N I N G P A T H

Reasonable guidance of a cleaning robot in real indoor environments requires an appropriately planned cleaning path. A few publications have already introduced grid-based [13] or rule-based [4] planners for complete floor coverage. However, a flexible path planner must pay attention to the technological feasibility of a cleaning path in real indoor environments and to problems arising during path execution. Thus robot geometry and agility as well as appropriate adaptation of the cleaning path to the specific environment must be considered. In this Section we describe a semi-automatic approach for generation of a robot motion program which ensures most complete coverage of a specified area.

2.1. Geometry and Agility of the Robot and its Cleaning Units To define relevant design parameters of mobile robots for floor coverage operations we refer to the features of typical cleaning machines as illustrated in Figure 1. Most of these conventional systems are equipped with two or more cleaning units [6]. The front unit comprises brushes soaked in detergent. They scrub the dirt from the floor. Dirt is then removed by a vacuum suction unit at the rear end of the vehicle. For path planning we will always assume rectangular shaped vehicles with car-like non-

243

Figure 1. Manual cleaning machine

holonomic kinematics but varying degrees of agility. Agility is characterized by the robot's minimum turning radius train. Furthermore we will assume the existence of two cleaning units mounted at the front and rear end of the vehicle being as wide as the robot itself, i.e. dd~an - b. 2.2. R e p r e s e n t a t i o n of t h e C l e a n i n g A r e a With respect to the environment we assume the existence of a 2D-map of a-priori known walls, pillars, staircases or other fixed objects. These items are represented by closed 2D-object polygons as illustrated in Figures 4b) and 5b). Planning starts with operator-based identification of the borders of the cleaning area in a visualized 2 D - m a p which is part of the planning system. Both, the borders and all object polygons inside the cleaning area are represented by contours. Each contour c is described by cartesian end points SPc and EPc with respect to an inertial reference system E, see Figure 3. 2.3. S e l e c t i o n of a Start L o c a t i o n To determine the alignment of motion patterns to be planned, the operator selects in the visualized map an appropriate start position and a reference contour. Typically a point at a corner and a neighbouring wall are chosen for this purpose. Based on this information the robot's start location S as illustrated in Figures 4b) and 5b) is automatically calculated. S is defined by an initial path frame/9o = (x0, y0, r representing vehicle position (x0, y0) and heading r with respect to E.

244 2.4. Automatic Planning of a C l e a n i n g P a t h For satisfactory floor coverage a cleaning path consists typically of overlapping parallel tracks. Two basic changing maneuvers are used for connecting neighbouring tracks. 9 U - T u r n defined by desired turning radius rturn and a turning angle of 180 ~ 9

S i d e - s h i f t defined by a concatenation of two circular arcs with different sign of

curvature U-turn maneuvers are preferred for path planning, because no change in motion direction is required and operating time as well as floor coverage redundancy can be reduced. Because of vehicle swing out during U-turns, safety distances between objects and the vehicle have to be considered [8]. On the other hand Side-shift maneuvers are preferably employed in narrow areas or in case of a lateral approach to a wall. Once all parameters of the robot and its environment are defined the planning procedure a u t o m a t i c a l l y constructs a technologically feasible cleaning path based on basic path planning templates.

2.5. Basic Path Planning Templates Five templates proposed for path planning are defined in Table 2 and Figure 2. Table 2 Basic path planning templates Template TM UT SS UT-Stretched UT-Squashed

Purpose Move Move Move Move Move

straight straight straight straight straight

forward/backward forward and change track using a backward and change track using forward and change track using a forward and change track using a

simple U-turn a Side-shift stretched U-turn squashed U-turn

Application of a certain template depends on the robot's agility (rmi,). For this reason, path planning distinguishes two types of robots. -- rt~,rn Vmin < rturn < r m i n R~.

299 START

Stage 1

9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . { Position the robot : in front of the Door ~. . . . . . . . . . . . . . . . . . . . . -~ . . . . . . . . . . . . . . . . . . . . . .

"~ i,

Obstacle

on

the

Path'?

,

l

Stage 2

............................................. Find and Grasp the Door Knob ~9 . . . . . . . . . . . . . . . . . . . . -~ . . . . . . . . . . . . . . . . . . . . .

Stage 3

............................................ Rotate the Knob and Push the Door ~9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

~

Can't

see

Door

Knob?

l ;

Open ......................

the Door "~ . . . . . . . . . . . . . . . . . . . . .

.............................................

Stage 5

R e l e a s e

~9 . . . . . . . . . . . . . . . . . . . .

the Knob -~ . . . . . . . . . . . . . . . . . . . . .

Door

Locked'?

,

,. . . . . . . . . . . . . . . . . . . . . . . . . . . ."~ ................

Stage 4

The

i J

Obstacle Back

at of

the the

Obstacle

on

the

Door

?

] i J

l Stage 6

~ ........................................... Move Behind the Door ~,. . . . . . . . . . . . . . . . . . . . -~ . . . . . . . . . . . . . . . . . . . . . .

Stage 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . .T. . . . . . . . . . . . . . Find and Grasp the Door Knob ...................... ~ .....................

~

Path?

I

--,I l .'

f

............................................

Stage 8

Close 9~ ....................

the

Door

-~- .....................

1 ,

f

Another (Obstacle

Task

avoidance

etc.)

............................................

Stage 9

Release ~9 . . . . . . . . . . . . . . . . . . . .

the Knob "~ . . . . . . . . . . . . . . . . . . . . . . I END

Figure 4. Procedure to pass through a door-way S t a g e 1 --Position the Robot in Front of the Door

The mobile robot is supposed to have an environment map and navigates itself to the front of the door by the navigation method reported in reference [13]. The stop position is d ~ i g ~ d to be (R_~2-t- r/21) from the front of the door and (R_~ + rn2) from the right side of the door. This position is shown in Figure 5. Where, TIZ 1 and rn2 are the margins for safe movement of the robot. S t a g e 2 - - F i n d and Grasp the Door Knob

If the robot knows its position and knob's position exactly, the robot can grasp the knob by the end-effector own only by position control of the manipulator. However in reality, errors exist between the estimated position and the actual position of the mobile robot. Therefore, the robot must confirm the exact position of the knob by

300

Figure 5. Position the Robot in front of the Door

using the vision which is mounted on the manipulator. If the robot can't find the knob of the door, the robot abandons the task. The next step in the task is to move the end-effector toward the knob and grasp it with the manipulator. This behavior is controlled by a kind of visual servo based on the vision sensor. S t a g e 3 - - R o t a t e the Knob and Push the Door The robot rotates the knob, and pushes the door with a small force. If the force sensor mounted on manipulator senses excess force, the robot recognizes that the door is locked and abandons the task. S t a g e 4 - - O p e n the Door The total length of the manipulator is limited. Therefore the mobile robot must move forward while the manipulator pushes the door. The manipulation controller nmst cooperate with locomotion controller. Designing this behavior is realized in the following way. The locomotion controller moves the robot along a straight line. The line is (_5_~+ rn2) from the right side of the door, shown in Figure 6. At the same time, the manipulation controller controls the manipulator to push the door. The desired position of the mobile robot depends on the angle of the door. As the door angle increases, this results larger forces acting on the end-effector. To open the door

301 with a fixed angular velocity, the robot's velocity must be varied proportional to the angle of the door. As the door angle increases, the robot's velocity must be decreased. The robot's position is calculated as follows.

P~176

D~176

The desired posture of the manipulator is also calculated from the knob's position which also depends on the angle of the door. While a robot pushes the door with mobile base moving forward, position and orientation errors of the mobile base may generate large forces in the end-effector. To absorb the force, a compliant control at the top of the end-effector is needed by using the force sensor. It is discussed in section 5.3. In this stage, the robot also monitors the force for the direction of the motion of door with the force sensor. If excessive force is sensed, the robot concludes that there are obstacles behind the door and abandons the task. According to above formula, the door is opened 90 degrees when the robot arrives at stop position.

Figure 6. Open the Door

302 S t a g e 5 --Release the Knob The end-effector releases the knob, and the manipulation controller returns the manipulator to its initial position. In this stage, only the manipulation controller is active. S t a g e 6 --Move Behind the Door In order to close the door, the robot moves behind it by navigating itself [13]. The robot's path and the exact parameters are shown in Figure 7. At the stop position, the robot uses its ultra-sonic range sensor to correct its position using the wall and the opened door.

Figure 7. MoveBehind the Door S t a g e 7 - - F i n d and Grasp the K n o b The robot finds and grasps the knob by using vision sensor as same as stage 2. S t a g e 8 --Close the Door In this phase, the robot pushes the door to close while moving. This behavior is similar as the Stage4, but the motion of the robot is different. The straight line is {D..,- (-~ + m2)} from the wall, as shown in Figure 8. The desired position of the robot is calculated, P~176

- D w * sin ( 90 - D~176Angle "

303 When the robot stops at the stop position, the door may already be closed. Confirmation of the closure of the door is done by sensing the force acting on the knob.

Figure 8. Close the Door

Stage 9 --Release the Knob The end-effector releases the knob, and the manipulation controller returns the manipulator to its initial position. In this stage, the robot completes the task, and continue to navigate itself to the destination of the robot. The activity of controllers is shown in Figure 9. In this figure, the locomotion controller needs to cooperate with manipulation controller in Stage 4 and Stage 8, and the vision sensor unit needs to cooperate with manipulation controller in Stage 2 and Stage 7.

304

Locomotion Control

Manipulator Control

Vision Sensor

Cooperation

9 9 9

9

Vision and Manipulator

0

Stage 1 Stage 2 Stage 3

0

Stage 4

Locomotion and Manipulator

O

Stage 5

0

Stage 6

O 9

Stage 7

0

Stage 8

Vision and Manipulator

O

Locomotion ,and Manipulator

O

Stage 9

Figure 9. Activity and Cooperation between Controllers

4. S i m u l a t i o n We implemented a program to simulate the behavior of the mobile robot equipped with a manipulator. In this simulator, the model of the robot and environment are built into the simulator. The cooperation between a locomotion controller and a manipulation controller can be studied. Calculating method of the arm position is performed using differential motion control [12]. To find the best parameters for perforlning the behavior, the follow parameters can be changed and the simulation can be repeated. 9 The initial position of the robot 9 The trajectory and the speed of the robot 9 The desired angular speed of the door during opening and closing According to the assumption of the robot, the parameters of detailed behavior is, D w Rw - 70cm Rd -- 50cm. Under these parameters, we used simulator repeatedly, margin parameter is fixed as follows empirically. 90cm

ml

-

50cm,

m2

-

5crrz,

m3

-

15c'm

Sn example of the simulation result is shown in Figure 10. In this simulation, we did not consider the dynamics of the robot and accumulated error of the locomotion. We should consider these factors in real environment.

305

I

(3)

(2)

(I)

!

I

(4)

(5)

I

I

(9)

oo)

.__----=.---

i

I

I

(6)

(7)

(8)

@ ---03

8

l _ I

(11)

(12)

(13)

(14)

(15) 1

S_

k__ I

I

I

I

I

(16)

(17)

(18)

(19)

(20)

I

I

(21)

(22)

9 (23)

F i g u r e 10. An Exa, l-nple of S i m u l a t i o n R e s u l t

@ -\ (24)

t_..a

(25)

306 5. I m p l e m e n t a t i o n Next step of our task-oriented approach is to implement the proposed algorithms and architecture on a real robot system. First, we implemented stage 4 behavior on real robot system. One key of the behavior is how to realize a cooperation between locomotion control and manipulation control, and the other key is how to realize a compliant control of the manipulator. In this section, we report a control architecture of robot-system, outlines how to realize a cooperation behavior and compliant control, and experimental result of stage 4.

5.1. A r c h i t e c t u r e of R o b o t - S y s t e m Our group has proposed a distributed control architecture for autonomous robot systems [7][8]. To realize the behavior which is proposed in Section 3, we basically use it. Each function of the robot is modularized in this distributed architecture, and we refer to each distributed function as a module. The architecture form two level hierarchy. The top of the structure is master module which supervises the total control of the robot system. Other modules are connected by bus and each module communicates with master module by using dual-port memory. In this implementation, we allow for direct communication between the locomotion module and the manipulation module. This is because, fast communication is necessary in stages 4 and 8 of the robot's task. The schematic diagram of proposed architecture is shown in Figure 11. Now, master module, manipulation module, locomotion module and force sensor module are working to realize a behavior of stage 4.

. ......................................................................................

B U S [. . . . . . . . . . . . . . . . . . . . .

..... 9- p

I ....

.....

I

I

~2~I::~~ .........

[

.....

/ .....

_[~'- ...... .r..... " ' ~ 1 M,.,o~ I ............

(Locomotion "~ Board

i

~.aoara/i,~

il Data

(~~

.oco

o, on

o0ue

i

an

ua, o0

[.......

sensor

)

.......... Mo~u[~............ :"................................,

Ultra-Sonic Transducer and Receiver

~ Motors for Manipulator

I

)

! Motors for Locomotion

I ....

l,,'V~a S o ~

i(V~onsenso~!

i '.. aoar~ l - i l

Data( ~ ~

I

.."...... I M~,,,~ I........: ."...... 1 M~.~

(Manipulati~

--

o0, e ....

.........................................

'" .........................................

:i [ i

[

] { i

9 Module . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 11. Architecture of the Robot

i :

Force Sensor

J ;

Module

"- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

307

5.2. Cooperation between Manipulator and Locomotion According to the design of stage 4, we realized cooperation behavior between manipulator and locomotion. All parameters are same as parameters of simulation. To synchronize a behavior of locomotion and manipulation, master module totMize both modules. It sends reference position to locomotion module, and also sends reference joint angle to the manipulation module. Master module always check differences between current and reference position of the robot vehicle and joint angles of the manipulator. When the both differences are enough small, the master module changes references of position and joint angle in small amount to step forward.

5.3. Compliant Control for Manipulator When a robot pushes the door while moving, position and orientation errors of the robot may result in large forces in the end-effector. Therefore, we have to control the end-effector compliantly along the plane perpendicular to axis of the knob using the force sensor (Figure 12).

Figure 12. Compliant Control

The compliant control is realized as follows. The force sensor detects a force on the vartical plane. In propotion to the amount of the force, the reference position of the endeffector is shifted to detected force direction. It is calculated in every 20 mili-seconds. This mechanism absorbs low frequency component of the position error. The high frequency component of the error is covered by soft rubbers attached inside of the end-effector. 5.4. E x p e r i m e n t a l R e s u l t s We implelnented proposed stage 4 behavior on the target robot "YAMABICO-type TEN", and made door opening experiment. In the experiment, the robot can open the door while the robot-base moving forward slowly. This behavior is shown by continuous pictures in Figure 13.

308

Figure 13. An Experiment of Door Opening Behavior

6. C o n c l u s i o n and F u t u r e W o r k s In this paper, we proposed a design of a behavior of the mobile robot equipped with a manipulator to open a door and to pass through a door-way. Such a behavior is realized by a cooperation between the locomotion controller module and manipulation controller module. We verified this cooperation of it by simulation. Finally, we showed the implementation of stage 4 on our mobile robot "YAMABICO type-TEN" and experimental result. To realize a total behavior of robot's passing through a door-way, we must solve a number of technical problems. An important topic is how to confirm the position of the knob and how to grasp it with the end-effector (stage 2), and our next step is to realize such grasping behavior. To confirm the exact position of the knob, we will use a vision sensor. After realizing the stage 2 behavior, we will make a total behavior of robot's passing through a door-way. In a future, we will make a long distance navigation for an autonomous mobile robot equipped with a manipulator including a door opening behavior.

309

Acknowledgement T h e a,uthors express their g r a t i t u d e to the members of Intelligent Robot L a b o r a t o r y of University of T s u k u b a .

REFERENCES 1. Wayne F.Carriker, Pradeep K.Khosla, Bruce H.Krogh : "Path Planning for Mobile Manipulators for Multiple Task Execution", in IEEE Transactions on Robotics and Automation, VOL.7, No.3, JUNE 1991. 2. W.Miksch, D.Schroeder :"Performance-Functional Based Controller Design for Mobile Manipulator", in Proc. of IEEE Int. Conf. on Robotics and Automa, tion, pp227-232, 1992. 3. Homayoun Seraji : "An On-line Approach to Coordinated Mobility and Manipulation", in Proc. of IEEE Int. Conf. on Robotics and Automation, pp28-33, 1993. 4. S.Dubowsky, E.E.Vance : "Planning Mobile Manipulator Motions Considering Vehicle Dynamic Stability Constraints", in Proc. of IEEE Int. Conf. on Robotics and Automation, pp 1271-1276, 1989. 5. Norbert A.M.Hootsmans, Steven Dubowsky : "Large Motion Control of Mobile Manipulators Including Vehicle Suspension Characteristics", in Proc. of IEEE Int. Conf. on Robotics and Automation, pp2336-2341, 1991. 6. Yoshio Yamamoto, Xiaoping Yun : "Control of Mobile Manipulators Following a Moving Surface", in Proc. of IEEE Int. Conf. on Robotics and Automation, ppl-6, 1993. 7. Yutaka Kanayama, Shin'ichi Yuta : "Computer Architecture for Intelligent Robots", in Journal of Robotic Systems, 2(3), pp237-251, 1985. 8. S.Yuta and J.Iijima : "State Information Panel for Inter-Processor Communication in an Autonomous Mobile Robot Controller", in Proc. of IEEE Int. Workshop on Intelligent Robots and Systems, 1990. 9. S.Yuta., S.Suzuki, S.Iida : "Implementation of a. small size experimental self-contained autonomous robot", in Proc. of Second International Symposium On Experimental Robotics, 1991. 10. S.Iida and S.Yuta : "Vehicle Command System and Trajectory Control for Autonomous Mobile Robots", in Proc. of IEEE Int. Workshop on Intelligent Robots and Systems, pp212217,1991. 11. K.Nakazawa, M.Shimizu, S.Yuta, M.Nakajima : "Measurement of 3-D Shape and Position by Fiber Grating Vision Sensor Installed on a Manipulator", in Proc. of IEEE Int. Workshop on Intelligent Robots and Systems, pp611-616,1988. 12. Rechard P.paul : "Robot Manipulators", the MIT Press, 1981. la. K.Nagatani and S.Yuta : "Path and Sensing Point Planning for Mobile Robot Navigation to Minimize the Risk of Collision", in Proc. of I E E E / R S J Int. Conference on Intelligent Robots and Systems, pp2198-2203, 1993.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

311

The D e v e l o p m e n t of R e - u s a b l e S o f t w a r e S y s t e m s for Intelligent A u t o n o m o u s R o b o t s in Industrial and Space Applications E. Freund and J. Rol3mann Institute of Robotics Research, University of Dortmund, Otto-Hahn-Str. 8, 44227 Dortmund, Germany, Email: rossmann @damon.irf.uni-dortmund.de ABSTRACT

The aim of the development of modern robot control systems at the Institute of Robotics Research (IRF, Institut fiir Roboterforschung) is to provide higher flexibility and autonomy for robot systems, together with consistent and user-transparent concepts for installation, programming and operation of robots in their working-environment. An important feature of a flexible future-oriented robot controller is the capability to incorporate information from different classes of sensors and to have standardized communication interfaces with other factory-automation components. To ease the robot systems' operation an intuitive graphical user-interface is provided. Furthermore, implicit programming and action planning methods allow robot programming on a much higher level of abstraction than today's procedural programming languages. For industrial applications, this implies easier and quicker adaptation of robot workcells to new tasks and a significant increase of the robots' flexibility to allow the feasibility of small quantity orders. This paper gives a basic description of "the principle of the hierarchical coordinator", the basic system concept for the development of complex control systems at the IRF. This concept has been developed and employed for the design and implementation of intelligent, autonomous robot systems, which are described in detail. The applications described range from the design of a flexible assembly workcell to a multirobotsystem for autonomous experiment-servicing in a space laboratory module. 1. INTRODUCTION The development of intelligent and autonomous robotic systems with a high degree of flexibility, good operational capabilities and the potential for efficient integration into an industrial CIM-environment is still a difficult task which requires both, technical and management skills. In the recent years the transfer of know-how from the area of space robotics to industrial applications has been emphasized at the Institute of Robotics Research because robotic system capabilities for space applications have already reached a considerable standard with regard to system autonomy, safety and local intelligence that is appropriate to be applied to industrial applications. The development of robot control software at the IRF has always aimed at the realization of components which take into account industrial standards and which can be integrated into comprehensive automation concepts. Particular care in the design of precise interfaces has paid off, not only when adapting various software components to control systems in an industrial environment, but also has permitted a consistent integration of modules in the design and development of new products.

312 2. COORDINATION OF THE DEVELOPMENT OF I N T E L L I G E N T AUTONOMOUS ROBOTS Fig. 1 gives an overview of the interdependence between the research activities at the IRF dedicated to the design of intelligent autonomous robot and automation systems. This overview is structured according to the activities of the four research groups at the IRF: "Autonomously Guided Vehicles and Sensor Technology", "Multirobot Systems and Space Robotics", "Automation Systems" and "Communication Structures and Intelligent Systems". In Fig. 1, the arrows symbolize the transfer of results into new projects and products. A strictly modular realization of functions with the capability for transfer and further development requires a great deal of inter-project management, especially regarding the specification of module interfaces. Experience has shown, that this additional expense in the first stage of a new development is made up with increasing efficiency and lower costs for the resulting product.

Fig. 1: Coordination of research projects at the IRF Fig. 1 shows that developments in the projects ROTEX (Robot Technology Experiment) and CIROS (Control of Intelligent Robots in Space) [3] can be regarded as basic research works at the IRF, whose functionality contributed significantly to the results of the projects evolving.

2.1 Space Robotics as a Motor for the Development of Intelligent Robot Control Systems Essential parts of the path control software of the ROTEX-robot its capabilities were demonstrated in the second German space mission D 2 - were developed at the IRF. One of the major challenges of this development was the cooperation with several project partners and the demand to meet the quality standards for flight software. Fig. 2 displays a simplified scheme of the ROTEX path control structure. The functionality for microgravity path planning deserves special attention due to its capability to plan robot motions which minimize the forces and torques exerted on the space shuttle. Robot motions are planned such, that the amplitude of disturbances remains limited in a specified frequency band. This guarantees that payload experiments requiring zero gravity can be carried out successfully. The other path control modules such as collision detection and -avoidance are

313 not specifically designed for space applications and can directly be used in terrestrial robot controllers.

Fig. 2: Simplified structure of the ROTEX path control While the ROTEX software was developed concentrating especially on the precise project timing and the quality standards for flight software, research beyond the scope of ROTEX was carried out developing a robot control system suitable to control a multirobot system for space applications. In the C I R O S (Control of Intelligent Robots in Space) [3] project, this multirobot control structure was built, based on the concept of the hierarchical coordinator [ 1][9]. Using the methodology of the hierarchical coordinator, a robot control system with a significant increase in autonomy and the operational capabilities for space laboratory servicing was designed. The implemented control system IRCS (Intelligent Robot Control System) and is well suited to perform service tasks in a space laboratory environment such as the CIROS testbed shown below.

Fig. 3: The CIROS multirobot testbed

314 The CIROS testbed is equipped with two robots with redundant kinematics. Each has 6 revolute and one translational joint. The layout of the laboratory is similar to that of the German Spacelab and it was built in a modular manner, so that it is possible to incorporate the latest experiments and to adapt the environment to reality as precisely as necessary. In four double and two single racks, switches and other operating elements of the experiments were reproduced and arranged in order to be performing realistic operational sequences. A tool exchange capability and a torque/force-sensor was mounted at each robot's wrist to allow the robots to operate drawers and flaps for experiment processing. Example tasks comprise the exchange of printed circuit boards and inserting of a sample into a heater. The redundant two-armed robot configuration with the torque/force-sensors at the robots' wrists permits fully coordinated operation as well as synchronized or independent action of the two robots. Furthermore, the robots are equipped with hand cameras and the whole laboratory can be supervised by a scene camera. A vision system performs the imageprocessing in order to provide the planning levels with necessary information, i.e. about the exact location of objects to be gripped. The control desk in the foreground contains devices for system control and supervision.

Fig. 4." Structure of the multirobot control system IRCS

315 Fig. 4 shows the structure of the CIROS multirobot control IRCS (Intelligent Robot Control System), that contains, superimposed on the level of the individual robot controllers, the additional levels for online-collision-avoidance, multirobot-coordination and automatic action planning [3][ 10][11]. The capabilities of this multirobot system include the ability to operate both robots independently or in a coordinated manner. Cooperative operation enables the operator to handle big, heavy or sensitive loads with high precision, while independent robot operation allows a parallel execution of tasks. Both, cooperative and parallel operation are presented in the following sections as coordinated operation [2]. A task oriented programming interface facilitates the programming of the multirobot system. The automatic action planning component [4] can interpret action descriptions which are formulated on a high level of abstraction. With the aid of the central world model that contains a mathematical description of the environment, tasks formulated in a "user-friendly" manner like "close drawer 1 in rack 4" or, as shown in fig. 5, "put sample 1 into heater slot 1" are decomposed into so-called elementary actions. These elementary actions are selected from a predetermined set of actions comprising the activation of endeffectors, locking or unlocking of the gripper exchange system and the path planning functionality for independently working or cooperating robots.

Fig. 5: Automatic action planning for multirobot systems During task execution, the robots are supervised by the level of online collision avoidance [13][12][14], which predicts potential collision dangers between robots and between robots and their environment. In case of collision danger, it actively avoids a collision by changing the pre-programmed path of the endangered robots in real-time, considering the static and dynamic limits of the robots and task specific constraints. Fig. 6 shows an example of such an automatically generated avoidance trajectory. As a part of its task, the upper robot starts its motion on the right side of the CIROS-environment, moving to a drawer in the left side of the cell. The lower robot is not busy during this motion and should therefore give way without disturbing the upper robot's motion. Snapshots of the resulting paths are shown in fig. 6. It

316 should be emphasized here, that the lower robot's motion only results from the need to avoid a collision with the upper robot and to return to its starting position. With the help of the online collision avoidance level, the upper robot was able to follow its trajectory exactly in the way it was planned by the path planning module in the level of coordinated operation (fig. 4).

Fig. 6: Online-collision-avoidance in the CIROS-demonstrator The description of the functionality of the CIROS multirobot control system gives an idea of the complexity of a system including capabilities for automatic action planning, coordinated operation and online collision avoidance.

Fig. 7: Structure of the multirobot system using the hierarchical coordinator

317 The entire control architecture was designed using the concept of the hierarchical coordinator which provides the system theory to describe the interfaces and the assignment of functionality to the different hierarchical levels in a systematic manner. Fig. 7 shows the architecture of the multirobot system from the system-theoretical point of view of the hierarchical coordinator as introduced in [9].

2.2 Modern Virtual-Reality Based Man-Machine-Interfaces for Intuitive Operation The use of methods for automatic action planning facilitates the programming of a multirobot system like CIROS very much, but still requires a great deal of knowledge about the workspace environment and symbolic descriptors for individual components. The project VITAL (Virtual Reality for Telerobotics) pursues the goal to develop an easy to use, intuitive man-machine-interface for single robots and multirobot systems that uses today's Virtual Reality technology. Various tasks that are executed by robots in a space laboratory environment in order to conduct experiments, are carried out by researchers, who do not have robot specific knowledge. In the VITAL project, these researchers can now perform their experiments in a graphically simulated virtual environment, employing a data glove for intuitive interaction with the control system. Eye-phones, small displays integrated into a helmet, give a stereoscopic view of the environment and the experiments. Any operation executed with the data glove, which is an animated human hand in virtual reality, is translated into robot actions with the aid of the action planning component and executed by the robots in the physical environment. Fig. 8 shows a view into the Virtual-Reality representation of the CIROS environment, showing the animated hand, a drawer and the printed circuit boards of a computer system, that the operator is manipulating. Please note that the robots are not shown in the virtual environment, the operator's hand is the only active part.

Fig. 8: A view of the CIROS environment in the virtual reality representation (Photo: CAE)

318 A straight forward approach to control a robotic system via Virtual Reality (VR) is to directly track the data-glove with the robot's tool-center-point (TCP); this mode of operation will further be called "telemanipulation or hand-tracking mode". The n theoretical ~ advantage of this approach is its flexibility. With the robot directly tracking the operators hand, the robot should ~ in theory ~ be able to do the same things as the human operators. In practice, this mode of operation cannot really be recommend for most of the tasks that involve contact between a robot and its environment. If e.g. an assembly task should be commanded by a virtual reality system in hand-tracking-mode, the following problems are encountered: 9 time delays between the display of a robots movement in the VR and its physical movements are critical for the stability of the handling process, because, similar to standard telemanipulation approaches, the user is still "in a real-time control loop" 9 the graphical model has to be very precise 9 the measurement of the position and orientation of the data-glove has to be very precise 9 measures have to be taken to reduce "trembling" of the operators hand 9 online-collision avoidance features are necessary to grant safe operation 9 a versatile sensor-control is necessary to compensate for unwanted tensions when objects are inserted into tight fittings The problems mentioned above make very difficult the control of the robot by just tracking the operators hand, so that a new method to control the robotic system [ 15] was developed that helps to facilitate the user's job to control the robots. The aim was to hide from the user the typical robot-related difficulties like singularity-treatment, collision avoidance, sensor-based guidance of the robot and the problem to coordinate different robots. By making use of modern control techniques that can provide these features, the necessary expertise to e.g. conduct an experiment in a space laboratory environment like CIROS is thus shared between the user with the necessary knowledge about the experiment and the robot-control with the necessary "knowledge" about how to control the robots. So another mode of operation of the VR-system was developed: the task deduction mode, which makes use of the capabilities already developed for the IRCS. The solution was to enhance the VR-system in the way that while the user is working, the different subtasks that are carried out by the User are automatically recognized and task descriptions for the IRCS are deduced. These task descriptions are then sent to the action planning c o m p o n e n t of the IRCS and are carried out safely by the IRCS with its inherent capabilities. This approach has several advantages over the direct hand-tracking: 9 the user is no longer in the "real-time control loop". Complete subtasks are recognized and carried out as a whole without the necessity for immediate feedback to the user 9 the subtasks are carried out safely using the sensor-control and -supervision capabilities of the IRCS 9 for physical assembly tasks, the accuracy of the environment model can be compensated for by automatic sensor-supported strategies 9 the accuracy of the data-glove tracking-device is not as important as for the direct tracking mode. The allowable tolerances when the user is gripping an object or inserting a peg in a hole can be adjusted in the VR-software 9 if a heavy or fragile object is moved in the VR, the planning component is automatically capable of using the two CIROS robots in a coordinated manner to do the transport in reality 9 different users working at different VR-systems can work on different tasks that are sent to the planning component of the IRCS, which then can decide, depending in the available

319 resources, an adequate sequence of the tasks to be done. Thus one robotic system can serve e.g. multiple experimentators in a space laboratory environment. if the robot control is versatile enough, there is no longer a need to even show a robot in the virtual environment displayed to the user; so the user more and more gets the impression of carrying out a task "himself", which is the highest level of intuitivity, that can be reached. if the planning component is versatile enough, it cannot only control the robots, but also other kinds of automated devices. The action planning component in the CIROS environment "knows" that to open the leftmost one of the three drawers, it doesn't need to employ a robot. This drawer is equipped with a motor, so that it just has to control the motor to open this drawer. Robot-automated and hard-automated tasks are thus controlled under one unified framework. The only deficiency of this mode of operation is, that the IRCS must principally know the different types of tasks carried out by the user. Therefore, for special applications that are hard to describe as parameterized tasks m e.g. the shaking of a material sample m or for applications that were not foreseen by the developers of the action planning component, the hand-tracking mode is still available in the realized VR-system. Experience showed, that 95% of the tasks carried out in the CIROS environment can successfully be commanded in task-

deduction-mode. In August 1994, as a part of a cooperation between the Institute for Robotics and Intelligent Systems of the University of Southern California (USC) and the Institute of Robotics Research in Germany, the PC-based version of the VR-system, that builds on the PC-based robot simulation system COSIMIR, described in chapter 3.2, successfully performed the control of the CIROS testbed in Germany from Southern California via INTERNET. In a research project at USC, the same VR-system is used to simulate, supervise and control a sixlegged walking machine. The user here can "enter the same room" with the simulated walking machine, he can place different obstacles in the way of the walking machine and see how the different implemented algorithms behave in the simulated world before they are downloaded into the physical walking machine for testing purposes. This allows several team-members to work with the machine at the same time, running several instances of the VR-system and thus reduces the need to work with the physical machine.

3. SCALEABLE ROBOT CONTROL ARCHITECTURES: HIGH-PERFORMANCE AND L O W - C O S T NO C O N T R A D I C T I O N In the research field "robot control and simulation" at the IRF, experiences and results from the projects ROTEX and CIROS were directly applied to products for industrial applications. As shown in fig. 1, the product development was conducted in the projects FRC (Flexible Robot Control), PCROB (PC-based Robot Control) and C O S I M I R (Cell Oriented Simulation of Industrial Robots).

3.1 Robot Control Development As only an Intel 8086 based computer had space qualification when the ROTEX software was developed, the control software had to b e coded very efficiently. Exactly these efficient routines make up the kernel of the PC-based robot control PCROB (PC-based Robot Control).

320 At first, PCROB was planned as a low cost robot control with an easy to learn user-interface and dedicated to small robots. PCROB runs on a PC, a low-cost hardware platform, in order to provide a low cost robot control for didactic purposes and as a starting point for automation and robotics in smaller companies. In parallel, the VME-based robot control FRC (Flexible Robot Control) was implemented, providing all the features of a modern robot control. In the FRC, a sensor-interface, a network-interface and a programming environment for IRL, VAL-II and IRDATA were provided. In the past two years, due to the increasing capabilities and speed of PC-hardware, the development of both robot control could be harmonized, such that more and more features could be realized for the PCROB as well. So today high performance and low cost are no longer a contradiction because PCROB now provides the same functionality as the FRC including a teach pendant, the programming environments for IRL, VAL-II, B APS and IRDATA, a sensor interface and a network interface as shown in fig. 9.

Fig 9: The structure of the PC-based robot control PCROB

3.2 Robot Simulation Systems Based on the results of the P C R O B and CIROS projects the robot simulation system C O S I M I R (Cell Oriented Simulation of Industrial Robots) was developed. It runs on either a PC with MS-Windows, Windows NT or on a Silicon Graphics workstation. COSIMIR consists of three major components: the control and display component, the robot controller (PCROB) and the environment model (fig. 10).

321

Fig. 10: Structure of the COSIMIR simulation system In addition to providing a graphical user interface, the control and display component supervises the environment model and robot controller components. During each simulation cycle, new state values for the robot and simulated mechanisms in the workcell (i.e. revolving table, conveyor belts, parts feeders, etc.) are requested from the robot controller (PCROB) and then transferred to the environment model. An object manager maintains the environment model of the simulated workcell calculating the current 3D polyhedral model of the workcell taking into consideration the interaction between the individual mechanisms. E.g. an object on a conveyor belt or an object placed on a revolving table, must be moved together with a moving belt or table. The object manager was adopted from CIROS project. In the CIROS project, the object management task is principally the same, however the volume model of the workcell here is not used for the model visualization but rather makes up the basis of calculation for online collision detection and avoidance. In the CIROS collision avoidance, corresponding to its task in CIROS, this component is named the collision avoidance specific world model [13]. The reason that the CIROS collision avoidance can be incorporated in COSIMIR is one of the important characteristics of the CIROS collision avoidance, it is extremely flexible because the model information is not hard coded in the source code but rather formulated by a textual description language known as GEMOL. The model description is read during the initialization phase and is converted during run time to the necessary realtime data structures. COSIMIR optimally uses this flexibility because the user just loads the GEMOL-description of another work cell to work with it. To facilitate the work cell definition, i.e. the programming of a GEMOL work cell description, the program COSIMOD,

322 delivered with COSIMIR, offers a graphical interactive environment to develop the work cell descriptions; thus alleviating the user from learning GEMOL. Besides the flexible modeling of the environment, the integration of the industrial controller PCROB is a strong argument for COSIMIR. The robot programs generated by COSIMIR will be capable of running in the actual robot work cell without changes. This is because the PCROB kernel that controls the simulated robot in COSIMIR also controls the actual robot allowing the problems such as joint limits, static and dynamic limits and reachability in the neighborhood of singularities to be found during the simulation execution. The integration of the tested object manager and PCROB components shortens the integration time of COSIMIR. With the implementation of an advanced communication concept, COSIMIR will always be integrated with the latest version of PCROB; thus all advances of PCROB will be utilized in COSIMIR. COSIMIR is available in English and German and since December 1993 is distributed world wide by an industrial partner.

3.3 Robot Controllers for Flexible Manufacturing Scaleable, task adaptable robot controllers form an important base of modern factory automation. These controllers are not developed with only a specific industrial robot in mind but rather with standardized interfaces, network connections and are programmable with a standard robot programming language for applicability with a wide range of industrial robots. In addition such systems must not only be of high performance to make flexible computer controlled manufacturing possible, they must also be cost effective in the integration into factory's automation and control as well as provide means to be integrated into the factory's communication infrastructure. In the project SENROB at the IRF, several new communication techniques have been developed for flexible manufacturing cells. Fig. 11 shows a work cell and its various components that are integrated together to form a manufacturing cell.

Fig. 11: The flexible manufacturing cell

323 In fig. 11, a SCARA type robot and a 6 revolute axis type robot coordinately work on a process plan determined by the cell controller with the goal of minimizing the cycle time. The cell controller selects the appropriate process plan, after which the workpiece carrier is moved onto the transport system and is identified by the barcode reader. The cell controller is realized similar to the C I R O S action planning component, such that it represents a superimposed control level for the different components of a work cell. The components of the work cell receive their commands via the component drivers which provide standardized interfaces of the different components for the cell controller. Fig. 12 shows the overall communication structure of the flexible manufacturing cell which is an important aspect of the practical realization for application of industrial robots. The communication functions are implemented using standardized busses such as the InterBus-S or ProfiBus and the simple control functions are implemented via a SPS that is connected to the bus system.

Fig. 12:

The communication structure related to the flexible manufacturing cell

Due to the standardized communication mechanisms incorporated, the cell controller, as shown in fig. 12, can also request logistic services e.g. from an autonomously guided vehicle, which delivers new production materials. This is only one example illustrating the advantages

324

of the IRF's aim to connect all automation components in a consistent manner via networks and to provide the necessary interfaces and control structures to request special services as needed.

3.4 Autonomously guided vehicles For the material flow control in the frame of flexible manufacturing today, many autonomously guided vehicles (AGV) are available for use. Experience from the European Community's project PROMETHEUS demonstrated that the problems of the control and coordination of AGV transport systems are tightly related with the control of a multirobot systems. The AGV controller contains a route planning system, a level for multiple vehicle coordination and a level for collision detection and avoidance. It was obvious that the theoretical basis for the realization of the vehicle controller could be the principle of hierarchical coordinator [9], the same principle that served as an architectural design guide for the CIROS multirobot system. Using the standardized communication structure that was shown in fig. 12, the cell controller, submits a transport request. By means of either the sensory or model based provided path information, the vehicle's trajectory is generated by the control computer for all the vehicles within the control computer's movement zone while considering the physical constraints such as the speed and lateral and longitudinal accelerations. The partial steps of the trajectory determination are for example: 9 Detection of conflict situations 9 Generation of alternative maneuvers 9 Calculation and selection of optimal maneuver combinations The calculated desired paths are then transmitted to the vehicle, in our case by means of a communication link based on infrared light transmission.

Fig. 13:

The IRF's autonomously guided vehicle CAESAR

325 In addition to the control of the vehicle the exact determination of the position of the AGV is an extremely difficult problem. Because of the possible inertial and the odometric systems with unacceptable accuracy, a new landmark navigation system was developed on the basis of a 2D laser scanner [6] in the project CAESAR. A special feature of this method is that no artificial landmarks or induction wires are needed and that the process is very robust against visibility obstructions. For the application with the AGV using this navigation process, for example in a manufacturing plant, no structural steps are required such as the laying of induction wires. To initialize the environment model for the navigation system, only o n e vehicle has to be driven over the predetermined paths o n c e being controlled by hand. This defines the environment model that will then be transmitted to all the other vehicles which will then be immediately ready for action. Fig. 13 shows one of the three autonomously guided vehicles at the IRF equipped with this new navigation system. Above the laser, there is a hemisphere visible that contains the infrared transceiver for communication with the control computer.

4. C O N C L U S I O N In the scope of this paper the principle functionalities necessary for the realization of an intelligent autonomous robot system were discussed. Based on the discussion of the CIROS multirobot controller it was illustrated which functionalities are needed and how they can be implemented to ensure optimal cooperation. In addition, it was pointed out how the development of various sub-components are coordinated and how their requirements are defined, such that the necessary subsystem interfaces are suitable. Done properly, this allows the re-use of various components to economically generate a whole family of compatible automation products ranging from scaleable controllers to robot simulators and virtual-reality man machine interfaces. Reacting to the market requirements, new products can be quickly and effectively realized such as the presented PC-based robot controller PCROB and the PCbased robot simulation system COSIMIR. The control structure underlying the theoretical system principles of the hierarchical coordinator was briefly illustrated and the most important functions were presented; examples of these are the coordinated operation of robots and the collision avoidance. Besides the mentioned direct spin-offs for the space robotics, the experience from the development of the intelligent autonomous robot systems for the CIROS and ROTEX projects could be adopted for the development of control structures for an autonomously guided vehicle that serves as a flexible transport system in a factory environment. All the examples presented here demonstrate that in the components' development careful attention is paid to the maintaining of the interface definitions and when possible they use industrial standards for interfaces and communication protocols. The described realizations are characterized by high modularity and precise interfaces that guarantee that a desired automation system can be developed piecemeal and can be easily and clearly adopted to new requirements and technologies.

REFERENCES Freund, E., Fast Nonlinear Control with arbitrary Pole-Placement for Industrial Robots, The International Journal of Robotics Research, Vol. 1, No. 1, 1982.

326 2. .

.

Kaever, P., Kinematische und Taktile Koordination in Mehrrobotersystemen, Ph.D.Thesis, Institute of Robotics Research, University of Dortmund, February 1993. Kaever, P.; Kernebeck, U.; Pesara, J.; Rol3mann, J., Abschlul3bericht zum Projekt CIROS, Institute of Robotics Research, Dortmund, Germany, April 1991. Kernebeck, U., Action Planning for Multiple Robots in Space, The Fifth International Conference on Industrial Engineering Applications of Artificial Intelligence and Expert Systems, Paderborn, Germany, 1992. Freund, E.; Buxbaum, H.-J., Control of Robot-based Flexible Manufacturing Work Cells, Proc. Int. Conf. on Advanced Mechatronics, Tokyo, Japan, 1993, pp. 348-353. Freund, E.; Dierks, F.; Rogmann, J., Untersuchungen zum Arbeitsschutz bei Mobilen Robotern und Mehrrobotersystemen, Schriftenreihe der Bundesanstalt ffir Arbeitsschutz und Unfallforschung, Wirtschaftsverlag NW, ISBN 3-89429-297-0, 1993. Freund, E.; Hypki, A.; Uthoff, J.; van der Valk, U., COSIMIR und PCROB: Integration von Zellensimulation und Robotertsteuerung auf PCs, Fachtagung Intelligente Steuerung und Regelung von Robotern, Langen, Germany, November 1993. Freund, E.; Judaschke, U., Automated Vehicle Guidance as a Component of a Traffic Control System, Proceedings of the 2nd Prometheus Workshop, Stockholm, Sweden, October 1989, pp. 204-213. Freund, E.; Rogmann, J., A Generic Hierarchical Control Structure for a Large Class of Automation Problems: Theory and Applications, Proceedings of the International Workshop on Advanced Robotics, Beijing, China, August 1991.

10. Freund, E.; Rol3mann, J., Autonomous Operation of Multi-Robot-Systems in Factory-ofthe-Future Scenarios, Proceedings of the 6th International Conference on CAD/CAM, Robotics and Factories of the Future, Vol. 1, S. 269-274, London, Great Britain, August 1991. 11. Freund, E.; Rol3mann, J., Control of Multi-Robot-Systems for Autonomous Space Laboratory Servicing, Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space, i-SAIRAS, Vol. 1, S. 443-453, Toulouse, France, September 1992. 12. Rol3mann, J., Echzeitf/ihige, kollisionsvermeidende Bahnplanung ftir Einzel- und Mehrrobotersysteme, 27. Colloquium on Control Techniques, Boppard, Germany, February 1992. 13. Rol3mann, J., Echtzeitf~ihige kollisionsvermeidende Bahnplanung far Mehrrobotersysteme, Ph.D.-Thesis, Institute of Robotics Research, University of Dortmund, Germany, February 1993. 14. Rol3mann, J: "Online Collision Avoidance for Multi-Robot-Systems, a new Security Methodology", Proceedings of the 6.th Topical Meeting on Robotics and Remote Systems, Monterey, CA, Februar 1995. 15. Rol3mann, J., Virtual Reality as a Control and Supervision Tool for Autonomous Systems, Proc. of the 4th International Conference on Intelligent Autonomous Systems, Karlsruhe, Germany, March 1995.

Intelligent Robots and Systems

V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

327

Toward Integrated Operator Interface for Advanced Teleoperation under Time-Delay Antal K. Bejczy, Paolo Fiorini, Won Soo Kim, and Paul S. Schenker ~* ~Jet Propulsion Laboratory California Institute of Technology, Pasadena, CA, USA This paper briefly describes an Advanced Teleoperator (ATOP) system and its control station where a variety of computer-based operator interface devices and techniques are integrated into a functional setting, accomodating a primary operator and secondary operators. The overall sensing and computer aided system setting intends to include all perceptive components that are necessary to perform sensitive, dual-arm manipulation efficiently, focussed at non-repetitive and unexpected tasks. Computer graphics is a key operator interface component in the control station where new types of manual interface devices also are employed. The results of some generic and application task experiments are summarized, including the performance of a simulated remote satellite servicing task, carried out under four to eight seconds communication time delay, using satellite TV and Internet computer communication links. In conclusion, the paper highlights the lessons learned so far. 1. I N T R O D U C T I O N In general, teleoperation implies continuous human operator involvement in the control of remote manipulators. Typically, the human control is a manual one, and the basic information feedback is through visual images. Continuous human operator involvement in teleoperation has both advantages and disadvantages. The disadvantages become quite dramatic when there is an observable, two-way communication time delay between the operator and the remotely controlled equipment. Modern development trends in teleoperators control technology are aimed at amplifying the advantages and alleviating the disadvantages of the human element in teleoperation through the development and use of various non-visual sensing, intelligent or task-level computer controls, computer graphics or virtual reality displays, and new computer-based human-machine interface devices and techniques in the intormation and control channels between the operator and the remotely controlled manipulators. These development trends are typically summarized under the popular titles of telepresence and supervisory control technologies. In this paper, those two titles are lumped under the term advanced teleoperation. Several notes should be added here to the objective description of telepresence and supervisory control technologies. First, none of them eliminates the human operator from the control operation, but both change the operator's function assignments and employ *The research described in this paper has been carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration.

328 human capabilities in new ways. Second, both technologies promise the performance of more complex tasks with better results. But, in doing so, both technologies also make a close reference to human capabilities of operators who will use evolving new devices and techniques in the control station. Third, both telepresence and supervisory control make reference to evolving capabilities or features of other technologies like sensing, high performance computer graphics, computerized electro-mechanical devices, algorithms-based flexible automation, expert systems for planning and error recovery, and so on. Thus, the progress in both technologies are tied to rich multidisciplinary activities. Fourth, both technologies need the evaluation and validation of their results relative to application environments. This paper is focused at the description and some practical evaluation of an integrated operator interface system for advanced teleoperation, developed at the Jet Propulsion Laboratory (JPL), during the past six to seven years, and exercised recently for realistic remote control experiments between JPL in California and the Goddard Space Flight Center (GSFC) in Maryland, 4000 Km away from JPL, using satellite TV and computer communication links between JPL and GSFC. First we describe the JPL Advanced Teleoperator (ATOP) system and its control station where a variety of operator interface devices and techniques are integrated into a functional setting, accomodating a primary operator and secondary operators. Then, we will summarize the results of some generic and application task experiments. In the third part of the paper, the JPL-GSFC simulated remote satellite servicing task, under communication time delay, will briefly be described. Throughout the paper we will emphasize the designs and use of the operator interface elements and their functional integration. In the concluding part of the paper, we will highlight the lessons learned so far. 2. A T O P A N D I T S C O N T R O L S T A T I O N The basic underlying idea of the JPL ATOP system setting is to provide a dual arm robot system together with the necessary operator interfaces in order to extend the twohanded manipulation capabilities of a human operator to remote places. The system setting intends to include all perceptive components that are necessary to perform sensitive remote manipulation efficiently, including non-repetitive and unexpected tasks. The general goal is to elevate teleoperation to a new level of task performance capabilities through enhanced visual and non-visual sensing, computer-aided remote control, and computer-aided human-machine interface devices and techniques. The overall system is divided into two major parts: the remote (robot) work site and the local (control station) site, with electronic data and TV communication between the two sites. The remote site is a workcell. It comprises: (i) two redundant 8-d.o.f. AAI arms in a fixed base setting, each covering a hemispheric work volume, and each equipped with the latest JPL-developed Model (3 smart hands which contain 3D force-moment sensors at the hands' base and grasp force sensing at the base of the hand claws, (ii) a JPL-developed control electronics and distributed computing system for the two arms and smart hands, and (iii) a computer controllable multi-TV gantry robot system with controllable illumination. This gantry robot currently accomodates three color TV cameras, one on the ceiling plane, one on the rear plane, and one on the right side plane. Each camera can be

329

Figure 1. The JPL ATOP Dual Arm Workcell with Gantry TV Frame

position controlled in two translational d.o.f, in the respective plane, and in two orientation directions (pan and tilt) relative to the respective moving base. Zoom, focus and iris of each TV camera can also be computer controlled. A stereo TV camera system is also available which can be mounted on any of the two side camera bases. The total size of the rectangular remote work site is" about 5 m width, about 4 m depth, and about 2.5 m hight. See Figure 1 for ATOP remote workcell. The control station site organization follows the idea of accomodating the human operator in all levels of human-machine interaction, and in all forms of human-machine interfaces. Presently, it comprises: (i) two general purpose Force-Reflecting Hand Controllers (FRHC), (ii) three TV monitor, (iii) a TV camera/monitor switchboard, (iv) manual input device for TV control, and (v) three graphics displays" one is connected to the primary graphics workstation (IRIS 4D/310 VGX) which is used for preview/predictive displays and for various graphical user interfaces (GUI's) in four-quadrant format; the second is connected to an IRIS 4D/70 GT workstation and is solely used for sensor data display; the third one is connected to a SUN workstation (SparcStation 10) and is used as a control configuration editor (CCE), which is an operator interface to the teleoperator control software based oll X-window environment. See Figure 2 for ATOP local control station.

330

Figure 2. The JPL ATOP Control Station

2.1. H a n d C o n t r o l l e r s

The human arm and hand are functionally both powerful mechanical tools and delicate sensory organs through which information is received from and transmitted to the world. Therefore, the human arm-hand system (thereafter simply called hand here) is a key communication medium in teleoperator control. With hand actions, complex position, rate or force commands can be formulated and very physically written to the controller of a remote robot arm system in all workspace directions. At the same time, the human hand also can receive force, torque, and touch information from the remote robot armhand system. Furthermore, the human fingers offer additional capabilities to convey new commands to a remote robot controller from a suitable hand controller. Hand controller technology is, therefore, an important technology in the development of advanced teleoperation. Its importance is particularly underlined when one considers computer control which connects the hand controller to the remote arm system. The direct and continuous (scaled or unscaled) relation of operator hand motion to the remote robot arm's motion behavior in real time through a hand controller is in sharp contrast to the computer keyboard type commands which, by their very nature, are symbolic, abstract and discrete (non-continuous), and require the specification of some set of parameters within the context of a desired motion. In contrast to the standard force-reflecting, replica master-slave systems, a new form

331 of bilateral, force reflecting manual control of robot arms has been implemented at the JPL ATOP project. The hand controller is a backdrivable six d.o.f, isotonic joystick. It is dissimilar to the controlled robot arm both kinemathically and dynamically. But, through computer transformations, it can control robot arm motion in six task space coordinates (in three position and three orientation coordinates). Forces and moments sensed at the base of the robot hand can back-drive the hand controller through proper computer transformations so that the operator feels the forces and moments acting at the robot hand while he controls the position and orientation of it. This hand controller can read the position and orientation of the hand grip within a 30 cm cube in all orientations, and can apply arbitrary force and moment vectors up to 20 N and 1.0 Nm, respectively, at the hand grip. The computer-based control system supports four modes of manual control: position, rate, force-reflecting, and compliant control in task space (Cartesian space) coordinates. The operator, through an on-screen menu, can designate the control mode for each task space axis independently. Position control mode servos the slave position and orientation to match the master's. In force-reflecting mode, the hand controller is back-driven based on force-moment data generated by the robot and sensor during the robot hand's interaction with objects and environment. The indexing function allows slave excursions larger or smaller than the 30 cm cube hand controller work volume. Rate control mode sets slave endpoint velocity in task space based on the displacement of the hand controller. This is implemented through a software spring in the control computer of the hand controller. Through this software spring, the operator has a sensation of the commanded rate, and the software spring also provides a zero-referenced restoring force. Rate mode is useful for tasks requiring large translations. Compliant control mode is implemented through a low-pass software filter in the hybrid position-force loop. This permits the operator to control a springy oi" less stiff robot. Active compliance with damping can be varied by changing the filter parameters in the software menu. Setting the spring parameter to zero in the low pass filter will reduce it to a pure damper which results in a high stiffness hybrid position-force control loop. The overall control organization permits a spectrum of operations between full manual, shared manual and automatic, and full automatic (called traded) control, and the control can be operated with variable active compliance referenced to force-moment sensor data. More on the hand controller and on the overall ATOP control system can be found in [1] through [6].

332

Figure 3. DATAHAND Switch Modules Integrated with FRHC Hand-Grip

333

1. Each module contains five switches. 2. Switches can give tactile and audio feectloack. 3. Switches require low strike force.

4. Switches surround finger creating differential feedback regarding key that has been struck.

Figure 4. Five Key-Equivalent Switches at a DATAHAND Fingertip Switch Module

The present FRHC has a single hand grip equipped with a deadman switch and with three function switches. In order to better utilize the operator's finger input capabilities, an exploratory project recently evaluated a design concept that would place computer keyboard features attached to the hand grip of the FRHC. To accomplish this, three DATAHAND TM [7] switch modules were integrated with the hand grip as shown in Figure 3. Each switch module at a finger tip contains five switches as indicated in Figure 4. Thus, the three switch modules at the FRHC hand grip contain fifteen function keys which directly can communicate with a computer terminal. This eliminates the need that the operator moves his/her hand from the FRHC hand grip to a separate keyboard to input messages and commands to the computer. A recent test and evaluation, using a mock-up system and ten test subjects, indicated the viability of the finger-tip switch modules as part of a new hand grip unit for the FRHC as a practical step towards a more integrated operator interface device to the ATOP system. More on this concept and evaluation can be found in [8]. 2.2. C o m p u t e r G r a p h i c s Task visualization is a key problem in teleoperation, since most of the operator's control decisions are based on visual or visually conveyed information. For this reason, computer graphics plays an increasingly important role in advanced teleoperation. This role includes: (i) planning actions, (ii) previewing motions, (iii) predicting motions in real time under communication time delay, (iv) to help operator training, (v) to enable visual perception of non-visible events like forces and moments, and (vi) to serve as a flexible operator interface to the computerized control system. The capability of task planning aided by computer graphics offers flexibility, visual

334

Figure 5. Schematic Layout of the TCE Interface

quality and a quantitative design base to the planning process. The capability of graphically previewing motions enhances the quality of teleoperation by reducing trial-and-error strategies in the hardware control and by increasing the operator's confidence in control decision making during task execution. Predicting consequences of motion commands in real time under communication time delay permits longer action segmentations as opposed to the move-and-wait control strategy normally employed when no predictive display is available, increases operation safety, and reduces total operation time. Operator training through a computer graphics display system is a convenient tool for familiarizing the operator with the teleoperated system without turning the hardware system on. Visualization of non-visible effects (like control forces) enables visual perception of different non-visual sensor data, and helps management of system redundancy by providing some suitable geometric image of a multi-dimensional system state. Last, but not least, computer graphics as a flexible operator interface to the control systems replaces complex switchboard and analog display hardware in a control station. The actual utility of computer graphics in teleoperation to a higher degree depends on the fidelity of graphics models that represent the teleoperated system, the task and the task environment. The JPL ATOP effort in the past few years was focused at the

335

Figure 6. Schematic Layout of the Hierarchical Data Interface

development of high-fidelity calibration of graphics images to actual TV images of task scenes. This development has four major ingredients. First, creation of high-fidelity 3-D graphics models of robot arms and of objects of interest for robot arm tasks. Second, high-fidelity calibration of the 3-D graphics models relative to given TV camera 2-D image frames which cover the sight of both the robot arm and the objects of interest. Third, high-fidelity overlay of the calibrated graphics models over the actual robot arm and object images in a given TV camera image frame on a monitor screen. Fourth, highfidelity motion control of robot arm graphics image by using the same control software that drives the real robot. The high fidelity fused virtual and actual reality image displays became very useful tools for planning, previewing and predicting robot arm motions without commanding and moving the robot hardware. The operator can generate visual effects of robot motion by commanding and controlling the motion of the robot's graphics image superimposed over TV pictures of the live scene. Thus, the operator can see the consequences of motion commands in real time, before sending the commands to the remotely located robot. The calibrated virtual reality display system can also provide high-fidelity synthetic or artificial TV camera views to the operator. These synthetic views can make critical motion events

336 visible that otherwise are hidden from the operator in a given TV camera view or for which no TV camera view is available. More on the graphics system in the ATOP control station can be found in [9] through [13]. The first development of a graphic system as an advanced operator interface was aimed at parameter acquisition, and was handled and called as a Teleoperation Configuration Editor (TCE) [14]. This interface used the concepts of Windows, Icons, Menus, and Pointing Device to allow the operator to interact, select and update single parameters as well as groups of parameters. TCE utilizes the direct manipulation concept, with the central idea to have visible objects such buttons, sliders, icons, that can be manipulated directly, i.e. moved and selected using the mouse, to perform any operation. A graphic interface of this type has several advantages over a traditional panel of physical buttons, switches and knobs: the layout can be easily modified and its implementation cycle, i.e. design and validation, is significantly shorter than hardware changes. The TCE (Figure 5) was developed to incorporate all the configuration parameters of an early single arm version of the ATOP system. It was organized in a single menu divided in several areas dedicated to the parameters of a specific function. Dependencies among different graphical objects are embedded in the interface so that, when an object is activated, the TCE checks for parameters congruency. A significant feature of this implementation is the capability of storing and retrieving sets of parameters via macro buttons. When a macro command is invoked, it saves the current system configuration and stores it in a function button which can later restore it. The peg-in-hole task, for instance, requires mostly translational motions but when holes have a tight clearance, compliance is necessary. An appropriate macro configuration is one that enables x,y and z axes, with position control in the approach direction and automatic compliance on the other two axes. This configuration can be assigned to a macro button and then recalled during a task containing a peg-in-hole segment. The continuing work on a graphic system as an advanced operator interface is aimed at the data presentation structure of the interface problem, and, for that purpose, uses a hierarchical architecture [13]. This hierarchical data interface looks like a menu tree with only the last menu of the chain (the leaf) displaying data. All the ancestors of the leaf are visible to clearly indicate the nature of the data displayed. The content of the leaf includes data or pictures and quickly conveys the various choices available to the operator. A schematic figure of this layout is shown in Figure 6. Parameters have been organized in four large groups that follow the sequence of steps in a teleoperation protocol. These groups are: (i) Layout, (ii) Configuration, (iii) Tools, (iv) Execution. Each group is further subdivided into specific functions. The Layout menu tree contains the parameters defining the physical task structure, such as relative position of the robots and of the FRHC, servo rates etc. The Configuration menu tree contains the parameters necessary to define task phases, such as control mode and control gains. The Tools tree contains parameters and commands for the off line support to the operator, such as planning, redundancy resolution and software development. Finally, the Execution tree contains commands and parameters necessary while teleoperating the manipulators, such as data acquisition, monitoring of robots, hand controllers and smart hands, retrieval of stored configurations and camera commands. This hierarchical data interface helps solving the problem of displaying the large amount

337 of data needed during a teleoperation task, but it does not address the issue of data entry by the operator. Traditional interfaces require several operators, using a different input device for each controlled function. A single-operator using this interface needs to use different input devices: joysticks to move the manipulators, buttons for camera and video control, keyboard and mouse for parameters entry. Operating these devices is time consuming and distracts the operator from the task. An alternative approach would be using the FRHC as the only input device for the operator. This scheme is similar to a virtual reality interface, where operators can access commands and data by simply moving their arms and hands. In our implementation, the operator would use the FRHC as pointing device on the interface menus and would use the FRHC trigger to click the selected button. The discrimination between commands for the robot and those for the data interface could be done by the d e a d m a n switch on the FRHC handle. When not pressed, this switch inhibits the generation of motion commands for the robots, but FRHC motion data are still available and are used to move the cursor. This scheme will be tested to validate its effect on operator performance. This scheme also can be combined with the DATAHAND TMswitch module integrated with the FRHC hand grip, discussed previously in this paper. 3. C O N T R O L

EXPERIMENTS

To evaluate advanced teleoperation capabilities, two types of experiments were designed and conducted: experiments with generic tasks and experiments with application tasks. Generic tasks are idealized, simplified tasks and serve the purpose of evaluating some specific advanced teleoperation features. Application tasks are simulating some real-world uses of teleoperation. 3.1. G e n e r i c T a s k E x p e r i m e n t s In these experiments, described in detail in [15], four tasks were used: attach and detach velcro; peg insertion and extraction; manipulating three electrical connector; manipulating a bayonet connector. Each task was broken down to subtasks. The test operators were chosen from a population with some technical background but not with an in-depth knowledge of robotics and teleoperation. Each test subject received 2 to 4 hours of training on the control station equipment. The practice of individuals consisted of four to eight 30-minute sessions. As pointed out in [15], performance variation among the nine subjects was surprisingly slight. Their backgrounds were similar (engineering students or recent graduates) except for one who was a physical education major with training in gymnastics and coaching. This subject showed the best overall performance by each of the measures. This apparent correlation between performance and prior background might suggest that potential operators be grouped into classes based on interest and aptitudes. The generic task experiments were focused at the evaluation of kinesthetic force feedback versus no force feedback, using the specific force feedback implementation techniques of the JPL ATOP project. The evaluation of the experimental data supports the idea that multiple measures of performance must be used to characterize human performance in sensing and computer aided teleoperation. For instance, in most cases kinesthetic force feedback significantly reduced task completion time. In some specific cases, however, it

338 did not, but it did sharply reduce extraneous forces. More on the results in [15,16].

3.2. Application Task Experiments These experiments were grouped around a simulated satellite repair task. The particular repair task is the duplication of the Solar Maximum Satellite Repair (SMSR) mission, which was performed by two astronauts in Earth orbit in the Space Shuttle Bay in 1984. Thus, it offers a realistic performance reference data base. This repair is a very challenging task, since this satellite was not designed for repair. Very specific auxiliary subtasks must be performed (e.g. a hinge attachment) in order to accomplish the basic repair which, in our simulation, is the replacement of the Main Electric Box (MEB) of the satellite. The total repair, as performed by two astronauts in Earth orbit, lasted for about three hours, and comprised the following set of subtasks: thermal blanket removal, hinge attachment for MEB opening, opening of the MEB, removal of electrical connectors, replacement of MEB, securing parts and cables, replug of electrical connectors, closing of MEB, reinstating thermM blanket. It is noted that the two astronauts were trained for this repair on the ground for about a year. The SMSR repair simulation by ATOP capabilities was organized so that each repair scenario had its own technical justification and performance evaluation objective. For instance, in the first subtask-scenario, performance experiments, alternative control modes, alternative visual settings, operator skills versus training, and evaluation measures themselves were evaluated [17,18]. The first subtask-scenario performance experiments involved thermal blanket cutting and unscrewing MEB bolts. That is, both subtasks implied the use of tools. Several important observations were made during the above-mentioned subtask-scenario performance experiments. The two most important ones are: (i) the remote control problem in any teleoperation mode and using any advanced component or technique is at least in 50% a visual perception problem to the operator, influenced greatly by view angle, illumination and contrasts in color or in shading; (ii) the training or, more specifically, the training cycle has a dramatic effect upon operator performance. It was found that the first cycle should be regarded as a familiarization with the system and with the task. For a novice operator, this familiarization cycle should be repeated at least twice. The real training for performance evaluation can only start after completion of a familiarization cycle. The familiarization can be considered as completed when the trainee understands the system I/O details, the system response to commands, and the task sequence details. During the second cycle of training, performance measurements should be made so that the operator understands the content of measures against which the performance will be evaluated. Note, that it is necessary to separate each cycle and repetitions within cycles by several days. Once a personal skill has been formed by the operator as a consequence of the second training cycle, the real performance evaluation experiments can start. A useful criteria for determining the sufficient level of training can be, for instance, that of computing the ratio of standard deviation of completion time to mean completion time (that is, computing the coefficient of variation). If the coefficient of variation of the last five trials of a subtask performance is less than 20%, than sufficient level of training can be declared. In the above-quoted subtask-scenario experiments, the real training, on the average, required one week per subject. More details on application task experiments can

339

Figure 7. Partially Recurrent Neural Network for Peg-in-Hole Task Segmentation

be found in [17,18]. The practical meaning of training is, in essence, to help the operator develop a mental model of the system and of the task. During task execution, the operator acts through the aid of this mental model. It is, therefore, critical that the operator understands very well the response characteristics of the sensing and computer-aided ATOP system which has a variety of selectable control modes, adjustable control gains and scale factors. The procedure of operator training and the expected behaviour of a skilled operator following an activity protocol offers the idea of providing the operator with performance feedback messages on the operator interface graphics, derived from a stored model of the task execution. A key element for such advanced performance feedback tool to the operator is a program that can follow the evolution of a teloperated task by segmenting the sensory data stream into appropriate phases. A task segmentation program of this type has been implemented by means of a Neural Network architecture [19] and it is able to identify the segments of a peg-in-hole task. With this architecture, the temporal sequence of sensory data generated by the wrist sensor on the manipulators are turned into spatial patterns and a window of sensor observations is

340 20 16

-12 -16 -20

I

I

2

I

I

4

I

I

6

I

I

8

I

I

10

I

I

12

I

I

14

I

I

16

I

I

18

I

I

20

I

I

22

I

I

24

IIII

26

III

28

30

IIII

32

34

36

Time (seconds)

Figure 8. Segmentation in the Real-Time Experiment for a Peg-in-Hole Task

associated to the current task phase. A Partially Recurrent network algorithm [20] was employed in the computation. Partially Recurrent Networks represent well the temporal evolution of a task, since they include in the input layer a set of nodes connected to the output units, to create a context memory. These units represents the task phase already executed - the previous state. A set of fixed weight connections have been established among the output and context layer units (see Figure 7). The Partially Recurrent network gave good results both in training and in simulation and it has been interfaced to the ATOP telemanipulator system for real-time tests. The ATOP configuration is significantly different from the one used for the training data and, therefore, these tests also verified the robustness of the approach to hardware variations. Figure 8 is the output of an experiment of the peg-in-hole task. Three curves are plotted in this figure: the X axis force signal input to the network, the real time output of the network (dotted line) and the off-line classification of the network (solid line). The dotted line shows the actual output of the classifier and the solid line is the the output of the off-line segmentation of the same data. The values of the segments in the two lines are the indices of the peg-in-hole phases, as described in [21]. On the solid line, phase transitions are synchronous with the corresponding data, since the data rate is determined by the processing speed of the network. The dotted line, instead, shows a lag between its phase transitions and the solid line ones, due to the low speed of the on-line segmentation. In the real time segmentation, the delay between corresponding transitions increases as a

341 function of the time elapsed from the beginning of the experiment, since samples arrive to the network at a much higher rate than their propagation speed through the network. Several experiments of the peg-in-hole task have been carried out and the results have been encouraging with a percentage of correct segmentations approximately equal to 65%. Approximately at time 29 s the network misclassified the end of the extract phase (index n. 8) that should have occurred at time 34 s. This can be explained by considering that the network was trained on the data of a single experiment and, therefore, it was very sensitive to changes in the duration and/or amplitude of the force signal. Furthermore, the tests also showed that the temporal dependency built into the network was not as strong as it should have been necessary. This weak temporal structure resulted in phases being classified in the wrong sequence. In other experiments, the network showed the capability of recovering after a classification error and the unexpected feature of classifying tasks whose phases sequence was significantly different from the expected one. During peg extraction, for example, the operator decided to regrasp the peg and the network correctly recognized the new sequence of phases by inserting an insertion, corresponding to the regrasp, within the extraction phase. 4. A T I M E - D E L A Y

CONTROL

EXPERIMENT

The benefits of integrated operator interface to sensing and computer control aided and computer graphics supported advanced teleoperation system become most convincing when the operation has to be performed under communication time delay. The technical meaning of integrated operator interface for such cases signifies two major features of the overall ATOP architecture of remote control and control station: (i) the operator, through high fidelity overlay of computer graphics images of work scenes (virtual reality) over TV camera images of the same work scenes (actual reality), can, with high visual fidelity, preview and predict the outcome of command and control actions in real time; (ii) the operator can, with high cognitive confidence, delegate some commands and control authority to the sensor-based closed loop remote control based on visual preverification of the expected action domain of that control loop. 4.1. C a l i b r a t i o n M e t h o d A high-fidelity overlay of graphics and TV images of work scenes requires a high fidelity TV camera calibration and object localization relative to the displayed TV camera view. Theoretically, this can be accomplished in several ways. For the purpose of simplicity and operator-controllable reliability, an operator-interactive camera calibration and object localization technique has been developed, using the robot arm itself as a calibration fixture, and a non-linear least-squares algorithm combined with a linear one as a new approach to compute accurate calibration and localization parameters. The current method uses a point-to-point mapping procedure, and the computation of camera parameters is based on the ideal pinhole model of image formation by the camera. In the camera calibration procedure, the operator first enters the correspondence information between the 3-D graphics model points and the 2-D camera image points of the robot arm to the computer. This is performed by repeatedly clicking with a mouse a graphics model point and its corresponding TV image point for each corresponding pair of points on a monitor screen which, in a four-quadrant window arrangement, shows both the

342

Figure 9. Graphics User Interface for Calibrating Virtual (Graphics) Images to TV Images

graphics model and the actual TV camera image. (See Figure 9). To improve calibration accuracy, several poses of the manipulator within the same TV camera view can be used to enter corresponding graphics model and TV images points to the computer. Then the computer computes the camera calibration parameters. Because of the ideal pinhole model assumption, the computed output is a single linear 4 by 3 calibration matrix for a linear perspective projection. Object localization is performed after camera calibration, by entering corresponding object model and TV image points to the computer for different TV camera views of the object. Again, the computational output is a single linear 4 by 3 calibration matrix for a linear perspective projection. The actual camera calibration and object localization computations are carried out by a combination of linear and non-linear least-squares algorithms. The linear algorithm, in general, does not guarantee the orthonormality of the rotation matrix, providing only an approximate solution. The non-linear algorithm provides the least-squares solution that satisfies the orthonormality of the rotation matrix, but requires a good initial guess for a convergent solution without entering into a very time-consuming random search. When a reasonable approximate solution is known, one can start with the non-linear algorithm directly. When an approximate solution is not known, the linear algorithm can be used to find one, and then one can proceed with the non-linear algorithm. More on the calibration and object localization technique can be

343 found in [22,23]. After completion of camera calibration and object localization, the graphics models of both robot arm and object of interest can be overlaid with high fidelity on the corresponding actual images of a given TV camera view. The overlays can be in wire-frame or solid-shaded polygonal rendering with varying levels of transparency, providing different visual effects to the operator for different task details. In the wire-frame format, the hidden lines can be removed or retained by the operator, dependent on the information needs in a given task. The calibrated virtual reality display system can also provide synthetic TV camera views to the operator. Synthetic TV camera views can make critical motion events or robot arm position and alignment states visible which otherwise would be hidden from the operator in a given TV camera view or for which no real TV camera views are available. The calibrated virtual reality method also enables the placement of calibrated 3-D target symbols over the live TV video helping the real-time visual guidance/control and real-time visual verification by the operator.

4.2. P e r f o r m a n c e R e s u l t s The performance capabilities of the high-fidelity graphics overlay preview/predictive display technique were demonstrated on a large laboratory scale in May 1993. A simulated life-size satellite servicing task was set up at GSFC and controlled 4000 Km away from the JPL ATOP control station. Three fixed camera setting were used at the GSFC worksite, and TV images were sent to the JPL control station over the NASA-Select Satellite TV channels at video rate. Command and control data from JPL to GSFC and status and sensor data from GSFC to JPL were sent through the Internet computer communication network. The roundtrip command/information time delay varied between four to eight seconds between the GSFC worksite and the JPL control station. The task involved the exchange of a satellite module. This required inserting a 45 cm long power screwdriver, attached to the robot arm, through a 45 cm long hole to reach the module's latching mechanism at the module's backplane, unlatching the module from the satellite, connecting the module rigidly to the robot arm, and removing the module from the satellite. The placement of a new module back to the satellite's frame followed the reverse sequence of actions. Four camera views were calibrated for this experiment, entering 15 to 20 correspondence points in total from 3 to 4 arm poses for each view. The calibration and object localization errors at the critical tool insertion task amounted to about 0.2 cm each, well within the allowed insertion error tolerance. This 0.2 cm error is referenced to the zoom-in view (fovy=8 ~ from the overhead (front view) camera which was about 1 m away from the tool tip. For this zoom-in view, the average error on the image plane was typically 1.2 to 1.6 % (3.2 to 3.4 % maximum error); a 1.4 % average error is equivalent to 0.2 cm displacement error on the plane 1 m in front of the camera. The idea with the high-fidelity graphics overlay image over a real TV image is that the operator can interact with it visually in real time on a monitor within one perceptive frame when generating motion commands manually or by a computer algorithm. Thus, this method compensates in real time for the operator's visual absence from reality due to the time-delayed image. Typically, the geometric dimensions of a monitor and geometric

344 dimensions of the real work scene shown on the monitor are quite different. For instance, an 8-inch long trajectory on a monitor can correspond to a 24-inch long trajectory in the actual work space, that is, three times longer than the apparent trajectory on the monitor screen. Therefore, to preserve fidelity between previewed graphics arm image and actual arm motions, all previewed actions on the monitor were scaled down very closely to the expected real motion rate of the arm hardware. The manually generated trajectories were also previewed before sending the motion commands to the GSFC control system in order to verify that all motion data were properly recorded. Preview displays contribute to operational safety. In order to eliminate the problem associated with the varying time delay in data transfer, the robot motion trajectory command is not executed at the GSFC control system until all the data blocks for the trajectory are received. An element of fidelity between graphics arm image and actual arm motion was given by the requirement that the motion of the graphics image of the arm on the monitor screen be controlled by the same software that controls the motion of the actual arm hardware. This required to implement the GSFC control software in the JPL graphics computer. A few seconds after the motion commands were transmitted to GSFC from JPL, the JPL operator could view the motion of the real arm on the same screen where the graphics arm image motion was previewed. If everything went well, the image of the real arm followed the same trajectory on the screen that the previewed graphics arm image motion previously described, and the real arm image motion on the screen stopped at the same position where the graphics arm image motion stopped earlier. After completion of robot arm motion, the graphics images on the screen were updated with the actual final robot joint angle values. This update eliminates accumulation of motion execution errors from the graphics image of robot arm, and retains graphics robot arm position fidelity on the screen even after the completion of a force sensor referenced compliance control action. The actual contact events (moving the tool within the hole and moving the module out from or in to tile satellite's frame) were automatically controlled by an appropriate compliance control algorithm referenced to data from a force-moment sensor at the end of the robot arm.

345

Figure 10. A. Predictive/Preview Display of End Point Motion. B. Status of Predicted End Point after Motion Execution, from a Different Camera View, for the Same Motion Shown Above.

346 The experiments have been performed successfully, showing the practical utility of highfidelity predictive-preview display techniques, combined with sensor-referenced automatic compliance control, for a demanding telerobotic servicing task under communication time delay. More on these experiments and on the related error analysis can be found in [22,23]. Figure 10 illustrates a few typical overlay views. A few notes are in place here, regarding the use of calibrated graphics overlays for time-delayed remote control. (i) There is a wealth of computation activities that the operator has to exercise. This requires very careful design considerations for an easy and user friendly operator interface to this computation activity. (ii) The selection of the matching graphics and TV image points by the operator has an impact on the calibration results. First, the operator has to select significant points. This requires some rule-based knowledge about what is a significant point in a given view. Second, the operator has to use good visual acuity to click the selected significant points by the mouse. 5. C O N C L U S I O N S The following general conclusions emerged so far from the development and experimental evaluation of the JPL ATOP: 1. The sensing, computer and graphics aided advanced teleoperation system truly provides new and improved technical features. In order to transform these features into new and improved task performance capabilities, the operators of the system have to be transformed from naive to skilled operators. This transformation is primarily an undertaking of education and training. 2. To carry out an actual task requires that the operator follows a clear procedure or protocol which has to be worked out off-line, tested, modified and finalized. It is this procedure or protocol following habit that finally will help develop the experience and skill of an operator. 3. The final skill of an operator can be tested and graded by the ability of successfully improvising to recover from unexpected errors in order to complete a task. 4. The variety of I/O activities in the ATOP control station requires workload distribution between two operators. The primary operator controls the sensing and computer aided robot arm system, while the secondary operator controls the TV camera and monitor system and assures protocol following. Thus, the coordinated training of two cooperating operators is essential to successfully use the ATOP system for performing realistic tasks. It is yet not know what a single operator could do and how. To configure and integrate the current ATOP control station for successful use by a single operator is a challenging R&D work. 5. The problem of ATOP system development is not so much the improvements of technical components and subsystems. Though, they also present challenges. The final challenge is, however, to integrate the improved technical features with the natural capabilities of the operator through appropriate human-machine interface devices and techniques to produce an improved overall system performance capability in which the operator is part of the system in some new way.

347 REFERENCES

[1] A.K. Bejczy and J.K. Salisbury. Controlling remote manipulators through kinesthetic coupling. Computers in Mechanical Engineering, 1(1), July 1983. also in Kinesthetic Coupling Between Operator and Remote Manipulator in Proc. ASME Int'l Computer Technology Conference, Vol. 1, San Francisco, CA, August 12-15, 1980, pp. 197-211. [2] A.K. Bejczy, Z. Szakaly, and W.K. Kim. A laboratory breadboard system for dual arm teleoperation. In Third Annual Workshop on Space Operations, Automation and Robotics, JSC, Huston, TX, July 25-27, 1989. [3] Z. Szakaly, W.K. Kim, and A.K. Bejczy. Force-reflective teleoperated system with shared and compliant control capabilites. In NASA Conference on Space Telerobotics, JPL Publication 89-7, Vol. IV, Jet Propulsion Laboratory, Pasadena, CA, January 31, 1989. [4] A.K. Bejczy, Z. Szakaly, and T. Ohm. Impact of end effector technology on telemanipulation performance. In Third Annual Workshop on Space Operations, Automation and Robotics, number NASA Conf. Publication 3059, JSC, Huston, TX, July 25-27, 1989. [5] A.K. Bejczy and Z. Szakaly. Performance capabilities of a JPL dual-arm advanced teleoperation system. In Space Operations, Applications, and Research Symposyum (SOAR'90), Albuquerque, NM, June 26, 1990. [6] A.K. Bejczy and Z. Szakaly. An 8-d.o.f. dual arm system for advanced teleoperation performa.nce experiments. In Space Operations, Applications, and Research Symposyum (SOAR '91), Houston, TX, July 1991. see also, Lee, S. and Bejczy, A.K., Redundant arm kinematic control based on parametrization, in Proc. IEEE Int'l Conf. on Robotics and Automation, Sacramento, CA, April 1991. [7] L.W. Knight and D. Retter. Datahandtm: Design, potential performance, and improvements in the computer keyboard and mouse. In National Human Factors Society Conference, Denver, CO, November 1989. [8] L.W. Knight. Single environment: Experimental hand-grip controller for ATOP. Jet Propulsion Laboratory, DOE Summer Faculty Fellow (SFF) Reports, July 31, 1992 and July 30, 1993. [9] A.K. Bejczy, W.S. Kim, and S. Venema. The phantom robot: Predictive display for teleoperation with time delay. In IEEE International Conference on Robotics and Automation, Cincinnati, OH, May 1990. [10] A.K. Bejczy and W.S. Kim. Predictive displays and shared compliance control for time delayed telemanipulation. In IEEE Int'l Workshop on Intelligent Robots and Systems (IROS'90), Tsuchiura, Japan, July 1990.

348 [11] W.S. Kim and A.K Bejczy. Graphics displays for operator aid in telemanipulation. In IEEE International Conference on Systems, Man and Cybernetics, Charlottesville, VA, October 1991. [12] W.S. Kim. Graphical operator interface for space telerobotics. In IEEE International Conference on Robotics and Automation, Atlanta, GA, May 1993. [13] P. Fiorini, A.K. Bejczy, and P. Schenker. Integrated interface for advanced teleoperation. IEEE Control Systems Magazine, 13(5), October 1983. [14] P. Lee et al. Telerobot configuration editor. In IEEE International Conference on Systems, Man and Cybernetics, Los Angeles, CA, November 1990. [15] B. Hannaford et al. Performance evaluation of a six-axis generalized force-reflecting teleoperator. Publication 89-18, Jet Propulsion Laboratory, June 15, 1989. [16] B. Hannaford et al. Performance evaluation of a six-axis generalized force-reflecting teleoperator. IEEE Transaction on Systems, Man and Cybernetics, 21(3), May/June 1991. [17] H. Das et al. Performance experiments with alternative advanced teleoperated control modes for a simulated solar max satellite repair. In Space Operations, Automation and Robotics Symposium (SOAR'91), JSC, Huston, TX, July 9-11, 1991. [18] H. Das et al. Performance with alternative control modes in teleoperation. PRESENCE: Teleoperators and Virtual Environments, 1(2), Spring 1993. MIT Press Publ. [19] P. Fiorini et al. Neural networks for segmentation of teleoperation tasks. PRESENCE: Teleoperators and Virtual Environments, 2(1), Winter 1993. MIT Press Publ. [20] J. Hertz, A. Krogh, and R.G. Palmer. Introduction to the Theory of Neural Computation. Addison-Wesley, Redwood City, CA, 1991. [21] B. Hannaford and P. Lee. Hidden markov model analysis of force-torque information in telemanipulation. International Journal of Robotics Research, 10(5), October 1991. [22] W.S. Kim and A.K. Bejczy. Demonstration of a high-fidelity predictive/preview display technique for telerobotics servicing in space. IEEE Transaction on Robotics and Automation, October 1993. Special Issue on Space Telerobotics. [23] W.S. Kim. Virtual reality calibration for telerobotic servicing. In IEEE International Conference on Robotics and Automation, San Diego, CA, May 1994. (Submitted).

Intelligent Robots and Systems V. Graefe (Editor)

9 1995 Elsevier Science B.V. All rights reserved.

349

A c t i v e U n d e r s t a n d i n g of H u m a n Intention by a R o b o t t h r o u g h M o n i t o r i n g of H u m a n Behavior Tomomasa SATO", Yoshifumi NISHIDA b, Junri ICHIKAWA ~, Yotaro HATAMURA ~ and Hiroshi MIZOGUCHI a "Research Center for Advanced Science and Technology, The University of Tokyo 4-6-1 Komaba, Meguro-ku, Tokyo, 153 Japan bFaculty of Engineering, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113 Japan ':Hitachi Inc. Understanding human intention is an essential function if a robot is to help adequately support human beings. It helps smooth communication between the human and the robot. Human behavior is an expressive media of communication. This paper proposes a new flmction, "active understanding of human intentions" by a robot through monitoring human behavior. The proposed flmction is unique because it utilizes multiple communication channels in parallel, i.e., human intentions are understood not only through conscious behavior but also through unconscious behavior. The paper also proposes a new robot architecture to realize the proposed function. The key architecture features are: 1) A robot possesses multiple sensors which surround the human. 2) Information is processed by dual loops-- a loop for information exchange between the human and the robot and a loop for human intention understanding. As an example of a robot with the human intention understanding functions, the authors constructed a micro-teleoperation robot. It can automatically understand an operator's intention through such unconscious human behavior as an operator's hand touching a desk. The understood result is utilized to change the control mode of a master-slave manipulation system from fine motion to rough motion and vice versa. The experimental results prove that the 15roposed function effectively simplifies operation system, making the system operator friendly. 1. I n t r o d u c t i o n Our aging society requires robots which can adequately support human beings. The support should be what the human wants, i.e., it should be in accord with the human's intentions. The authors refer to such a service robot as "a human-supporting robot." Robots which assist senior persons and robots which offer medical helping hands are typical examples of human-supporting robots. The human-supporting robot should be able to discern a human's intentions through comnmnication with the human, because it is impossible for the robot to adequately

350 support the human without knowing what the human wants. Understanding hmnan intention is a key function to realize human-supporting robots. The conventional robots have comnmnicated through input devices such as teachingboxes, keyboards, or switches to obtain information from the human. A common feature of such input devices is that the devices can obtain information only when the human consciously generates a behavior to operate these input devices. Although these devices have the merit that the information which the human doesn't intend to input can be suppressed consciously, they also have such the demerit that they only accept the human's conscious behavior. The demerit comes from the passive utilization of human behavior as a communication media. This imposes a burden on the human. Specifically, human operators must convert all their intentions into conscious behaviors to express their intention. Let us consider an example of writing a letter with a pencil. We usually move the pencil with one side of our hand touching a desk because we desire fine positioning. Moving the pencil is usually a conscious behavior. In contrast, touching the desk with the hand is usually an unconscious behavior. If a robot can automatically perceive such types of unconscious behavior, it can determine whether the human intends to move his or her hand in a fine motion control mode or a rough motion control mode. The example of the touching behavior indicates that unconscious behavior includes instructive information to convey a human's intention to the robot. Conventional robots have not utilized such behavior. Utilization of unconscious behavior requires the robot to actively perceive human behavior. Therefore, if a robot can more actively perceive a human's behavior as a conmmnication media, i.e., if the robot can perceive both conscious and unconscious behavior, the robot can understand human intention without forcing the human to teach all his or her desires consciously. While the conventional robot is a passive one which waits for input behavior as a command, a robot having the proposed function is an active one which understands human intention through monitoring human behaviors. This paper will demonstrate a method by which a robot actively recognizes expression of human intention through human behavior. This paper is organized as follows: The next section shows a model of communication between humans. The authors clarify the mechanism for understanding a partner's intention by the model. In addition, a robot function for active understanding of human intention is proposed. An architecture to realize the proposed function is described in section 3. In section 4, the authors present a microteleoperation robot as an example of the robot with the proposed function. The system effectiveness is proven experimentally. The conclusion is given in section 5. 2. U n d e r s t a n d i n g of h u m a n i n t e n t i o n In this section, we firstly analyze the function and mechanism of understanding of human intention through communication between human beings, based on the analysis, we present the robot's function for actively understanding human intention. 2.1. U n d e r s t a n d i n g of h u m a n i n t e n t i o n by h u m a n s Humans understand each other's intention by interpreting information received through communication. Therefore, a function for information exchange with the other person and a function for interpreting the received information are essential for understanding human

351 intention. We propose a model of communication between humans as shown in Fig. 1. We will use this model to analyze the mechanism of understanding of human intention by a human. Human understands the partner's intentions through communication via two information processing loops as shown in Fig. 1.

Figure 1. Loops in communication between humans

An information-exchange loop refers to a loop on which humans exchange information with each other. Information exchange is realized via language and behavior. An intention-understanding loop refers to a loop on which a human understands another human's intentions. Intention is understood based on knowledge concerning the partner (see models in Fig. 1) . Figure 2 shows the channels which convey information. The information flow from a speaker to a listener is illustrated in the diagram. The speaker's behavior is divided into two classes. The first class is the behavior generated when the speaker is conscious of the listener, the second is the behavior generated when the speaker is unconscious of the listener. The listener can obtain about 35% of the information by means of language and about 65% of it through nonverbal communication [1][2]. Following are the characteristics of communication channels which convey information in the information-exchange loop. Char. 1 H i e r a r c h y There exists a hierarchical structure among channels of communication between humans. Layer (1) L a n g u a g e c h a n n e l

352

Figure 2. Human communication channels conveying information

353 Language is the highest level and most flexible communication channel. A speaker is conscious of the listener's presence when the speaker takes expression. Layer (2) G e s t u r e c h a n n e l ( c h a n n e l of conscious b e h a v i o r w i t h c o n s c i o u s n e s s of a listener) Pointing at something by one's index finger is a typical example of a gesture. This layer generally provides a substitutional comnmnication channel for language. A speaker is trying to transmit information to a listener. Even information that can't be conveyed via language can often be communicated easily via gesture. Layer (3) C h a n n e l of conscious b e h a v i o r w i t h o u t c o n s c i o u s n e s s of a l i s t e n e r The listener can know that the speaker is hot by noting that the speaker fans his or her face with a fan. This is a typical example of conscious behavior without consciousness of the listener. This behavior is not assumed because the operator wants to send information consciously to the listener but because the operator wants to cool the speaker's face, i.e., the speaker behaves for his of her self. This channel differs from the previous two channels in term of consciousness of speakers. Since the speaker's behavior is not taken to convey information to the listener, the listener should intentionally look at and obtain information to communicate with the speaker. Layer (4) C h a n n e l of u n c o n s c i o u s b e h a v i o r Turning over in sleep or cat-napping are typical examples of unconscious behavior. A speaker doesn't send information to a listener intentionally, and the speaker isn't conscious of the behavior. The listener can understand the speaker's intention in the same way as (3). Layer (5) C h a n n e l of i n v o l u n t a r y b e h a v i o r or s y m p t o m s Behavior or symptoms in vital activity which have nothing to do with the consciousness. Breathing, sneezing, and sweating are typical examples of such behavior. A listener can understand a speaker's intention in the same way as the last two channels. Although behavior isn't always conscious behavior or unconscious behavior, i.e., conscious behavior can often change to unconscious behavior and vice versa, human behavior can always be categorized to a certain category if we take one moment into account. Char. 2 D i r e c t i v i t y There is directivity in comnmnication channels. The channels of layer (1) and (2) have directivity since the information is transmitted from the speaker toward the listener. However, the channels of layers (3) to (5) have no directivity since the information is not transmitted from the speaker toward the listener. The listener, therefore, has to actively watch a speaker's behavior to obtain information through layers (3) to (5). Human beings utilize not only directive communication channels but also indirective comnmnication channels through active recognition.

354 The following are characteristics for utilizing information which flows in the dual loops. Char. 1 Parallel utilization of multiple channels Human beings exchange information through multiple communication channels described above depending on the situation in parallel and interpret the received information in parallel. More specifically, human beings utilize multiple channels in the information-exchange loop and in the intention-understanding loop. Char. 2 Interpretation based on tacit rules Rules play an important role in interpreting the received information. For example, English language rules in channel (1) enable humans to interpret the spoken language. Communication in channels (2) through (5) is based on tacit rules which we know instinctively or as common sense. The inner square filled with dots of Fig. 2 represents common rules which make communication possible in a given culture. Considering the above characteristics, we can sunnnarize features of human intention understanding by human beings as follows. Human beings use the hierarchical channels fully by transmitting and receiving information through both the directive and the indirective channels in parallel. Humans can thus receive more information than transmitted consciously by the speaker, because humans can utilize unconscious behavior to receive information as well. Human beings understand the each other's intention by interpreting the received information based on the tacit rules. These features maximize the ability of human beings to understand each other's intention. 2.2. A c t i v e u n d e r s t a n d i n g of h u m a n intention by robot Based on the above analysis, we will propose essential functions of a robot which can efficiently understand a human's intention through communicating with the human to understand human intention, a robot must be able to exchange information with the human and to interpret information to achieve the same understanding as in communication between humans. Let us consider the problems of a conventional robot in terms of information exchange and information interpretation method. Figure 3 illustrates communication loops between a human and a conventional robot. Figure 4 shows communication channels between a human and a conventional robot in the loops. The problems of conventional robots can be summarized as follows. 1. Limited exchange of information Conventional robots receive only information from channels (1) and (2). For example, when an operator inputs commands from a keyboard, the operator uses only (1). In the information-exchange loop between a conventional robot and a human, unlike the information-exchange loop between humans, the robot imposes a burden on the human. Specifically, the human must consciously input all the information from sensors by converting the information of channels (3) and (4) into that of channels (1) or (2).

355

Figure 3. Loops in conmmnication between a human and a conventional robot

Such limited utilization of the channels occurs because the robot doesn't have sensors to receive information conveyed through the channels (3) to (5). 2. L i m i t e d i n t e r p r e t a t i o n of i n f o r m a t i o n Conventional robots interpret only information conveyed by channels (1) and (2). Therefore, the information conveyed by the channels (3) to (5) can't be processed even if received. In other words, conventional robots don't have the necessary function to interpret information from the channels (3) to (5). This is another reason humans have had to convert information of channels (3) and (4) into that of channel (1) or (2). A robot needs interpretation rules to interpret received information. The conventional robot establishes new artificial rules for interpretation which must be remembered by a human. Robot language is a typical example of such artificial rules. Figure 4 illustrates that a human and a robot interpret information by the artificial rules. The reason a beginner differs from a skilled worker is that the beginner is not accustomed to the artificial rules while the skilled worker is. The artificial rules established between a conventional robot and a human differ from the tacit rules in a human society. Therefore, the hulnan must remember complex operation methods. In the following, we will describe how to solve such problems. To solve the first problem, i.e., to overcome the limited receipt of information from a human, a robot should be able to receive information conveyed through all human comnmnication channels in parallel

356

Figure 4. C o m m u n i c a t i o n channels b e t w e e n a h u m a n and a conventional robot

357 channels (1) to (5). We refer to this function Function 1. To solve the second problem, i.e., to interpret received information, a robot should have a function to process all the information conveyed through the channels (1) to (5) in parallel. We refer to this function Function 2. The robot also requires a function to interpret the received information, not based on the artificial rules which must be learned by the human, but also based on tacit rules which are naturally known by the human. We refer to this function Function 3. (In this paper, "efficient" means there is a little the human must teach explicitly and consciously in order to convey his or her intention.) We propose a new function of "active understanding of human intention" which integrates Function 1 to 3 stated above. The function enables a robot to efficiently understand human intention through communication with the human. The main features of this function are described below. 1. A c t i v e n e s s We use the term "activeness" to mean that a robot with the proposed functions can receive information which a human doesn't transmit consciously. The robot thus receives information even through indirective channels. 2. U t i l i z a t i o n o f h u m a n b e h a v i o r as c o m m u n i c a t i o n m e d i a While conventional robots utilize only channels (1) and (2), the robot with the proposed function can utilize not only the language channel (channel (1)), but also the human behavior channels (channels of (2) to (5)) as communication media. The function for actively understanding human intention provides the advantages described below. I. E a s y c o m m u n i c a t i o n w i t h a r o b o t A human can smoothly conmmnicate with a robot because the robot can receive information which is not transmitted consciously. The human therefore need not convert the information of channels (3) and (4) to information of channels (1) and

(2). II. E a s y o p e r a t i o n for b e g i n n e r s A beginner can easily operate the system provided with the proposed function, because the system understands his or her intentions on the basis of tacit h u m a n rules. As a result, the beginner need not remember new artificial rules. As a practical example of robots with the "active understanding of human intention" function, let us explain a teleoperational robot by which the operator performs teleoperations while watching a CRT monitor. Depending upon the channels the robot utilizes, the robot can offer the various support below. 9 Utilization of channels (1) and (2) When the operator points his index finger at something on the CRT monitor and says to the robot, "bring me that bolt over there," the robot can understand the operator's intention from the hand gesture as well as the spoken language. The robot can then bring the desired object to the speaker.

358 9 Utilization of channel (3) When the operator leans toward the CRT monitor to manipulate something more precisely, the robot can display a zoomed image of the object by perceiving the behavior. 9 Utilization of channel (4) The robot can prevent the operator from doing something dangerous by watching him or her cat-napping. 9 Utilization of channel (5) The robot can keep the room at a comfortable temperature by monitoring perspiration on the operator's forehead. 3. Robot architecture to realize active understanding of h u m a n i n t e n t i o n This section proposes an information processing architecture for the robot with the function of active understanding of human intention. 3.1. I n f o r m a t i o n exchange loop and intention understanding loop We propose the information processing architecture shown in Fig. 5. The essence of the architecture is existence of two information processing loops, the information exchange loop and the intention understanding loop.

3.1.1. Information exchange loop: The robot exchanges information with the human via the information exchange loop. As shown in Fig. 6, information exchange for communication is realized by multiple communication channels ((1) to (5)). To utilize hierarchical human communication channels in parallel in the information exchange loop, it is necessary to equip the robot with appropriate input output devices such as sensors, actuators, and displays corresponding to each of the communication channels. 3.1.2. I n t e n t i o n understanding loop" The robot understands human intention by processing information in the intention understanding loop. The following processing is repeated continuously: acquisition of sensor data, understanding of behavior, understanding of intention, and selection of sensing mode. For each channel, there exist multiple intention understanding loops. All the loops run in parallel to realize hierarchical understanding of human intention. Information processing of each loop is based on the model. The models are described below.

Behavior model Using the behavior model, the robot interprets sensor data of human behavior and tries to understand the meaning of human behavior, i.e., it converts sensor data to the meaning of behavior.

359

Figure 5. Conceptual diagram of the information processing architecture

360

Figure 6. Parallel utilization of hierarchical communication channels

361

Intention model The intention model refers to the model which converts input behavior to human intention. It also contains future human intentions which can be predicted from the current intention in order to control sensing modes. Sensing model The sensing model is utilized to determine the sensing modes based on the predicted human intention. Sensing modes includes the kind of sensors to use in the future, the configuration of these sensors, and sensor data handling. Determination of the next sensing mode enables efficient sensor data processing by attention focusing. In addition to these models for active understanding of human intention, the human supporting robot should have a support model. This is the model by which the human supporting robot retrieves the support description based on the understood intention. Figure. 7 shows a detailed flow chart of active understanding of human intention.

Figure 7. Flow chart of active understanding of human intention

362 3.2. M u l t i p l e sensors c o n f i g u r a t i o n s u r r o u n d i n g a h u m a n We propose a new configuration (Fig. 8) for human supporting robots.

Figure 8. Multiple sensors surrounding a human

The configuration is such that the robot's multiple sensors surround the human. Surrounding the human by nmltiple sensors enables the robot to receive information from any desired directions. The information from multiple directions contains various aspects of human behavior. Consequently, the configuration makes the robot sensitive to the human behavior. The robot with the proposed configuration can communicate with the human differently from communication between humans because artificial robot sensors have the following advantages: 1) Some sensors can input data much more accurately than human sensors. 2) Some sensors can obtain data which cannot be obtained by human sensors, e.g. an infrared camera. 3) The robot can select a number and configuration of such sensors freely, e.g. six spherically arranged eyes using six cameras.

363

4. Application to Microteleoperation In this section, we present a microteleoperation robot as an example of a robot with the proposed function. 4.1. Microteleoperation A microteleoperation system is a master-slave manipulation system in which a slave robot handles very small objects according to an operator's movement. The size of handled object is on the order of micrometers[3]. In the system, a pen-shaped master manipulator, referred to a "pen-shaped master," is utilized as a master manipulator [4]. The slave robot executes tasks under a microscope. A CRT display monitor attached beneath a desk surface with its face up serves as the system human interface device. It displays the visual information from the slave robot microscope. The operator can perform teleoperation with the hand-held pen-shaped master while monitoring the working environment displayed on the monitor. Microteleoperation requires the slave robot to execute both fine and rough motions. Consider the task of micromachine assembly as an example. Before the assembly starts, the micromachine components should be conveyed into the microworking environment from the macro world. Therefore, the robot should have rough motion capability to move the object the long distance between the micro world and the macro world as quickly as possible. During the assembly, the robot should move the object with fine motion to align the object precisely. After the assembly, the assembled machine has to be moved from the working environment back to the macro world. The robot is again required to move the object a long distance by rough motion. In microteleoperation, it is necessary to change from the fine motion control mode to the rough motion control mode and vice versa frequently depending upon the working situation to execute tasks smoothly. In case of the conventional robot, the operator is forced to consciously change these modes. If a system can change these two modes automatically in accordance with the operator's intention by understanding the intention from an operator's unconscious behavior, i.e., by utilizing channels (3) and (4), the operator can perform microteleoperation smoothly because the operator is not required to consciously switch modes. 4.2. A p p l i c a t i o n of the active u n d e r s t a n d i n g of h u m a n intention A intention model, a support model, and a behavior model for the task of repairing a micro pattern are presented below. In this example, the operator makes the slave robot shave micro patterns painted on a sheet by maneuvering the pen-shaped master in his or her hand. (A) Intention model Here we will explain the relation between the operator's intention and the operator's behavior, in another word, the relation between the operator's intentions and motions of the operator's hand. Figure 9 shows the typical situations of the operator's hand. In (a), the operator's hand is floating in the air. In this situation, the operator intends to move operator's hand rapidly. In (b), the operator's hand is touching the desk. In this situation, the operator intends to execute the task precisely. These are the relations between the operator's behavior and their intentions. Such human intentions can be judged from

364

Figure 9. Typical situations of operator's hand

whether the operator's hand touches the desk or not. The intention model to judge the operator's intentions from his or her behavior is given as follows: (a) If the hand is not contacting the desk, the operator wants the slave robot to move quickly. (b) If the hand is touching the desk, the operator wants the slave robot to move precisely. (B) S u p p o r t m o d e l The slave robot supports the operator in accordance with the understood intention. When the operator wants to move the slave manipulator quickly, the system should set the slave robot control mode to rough motion control. When the operator wants to execute the task precisely, the system should set the fine motion control mode. The following support model realizes these activities: (a) If the operator wants to execute the task quickly, the slave robot should be in rough motion control mode. (b) If the operator wants to execute the task precisely, the slave robot should be in fine motion control mode. (C) B e h a v i o r m o d e l A touch-sensor panel is a suitable input device for obtaining the touch information between the hand and the desk. The following behavior model is constructed to recognize the operator's behavior from the output of the touch-sensor panel When there are no signals from the touch-sensor panel, the operator's hand is not contacting the desk as shown in (a) of Fig. 9. When there are signals from the touch-sensor panel, the operator's hand is touching the

365 desk as shown in (b) of Fig. 9. A flow chart of the intention-understanding loop in the microteleoperation robot can be summarized as Fig. 10

Figure 10. Flow chart of the intention understanding loop in the micro-teleoperation robot

4.3. E x p e r i m e n t a l S y s t e m Figure 11 shows the experimental micro-teleoperation system. The system consists of a human interface section, a slave robot and computers and controllers. 4.3.1. H u m a n i n t e r f a c e s e c t i o n Figure 12 shows a photograph of the human interface section. In this section, the information-exchange loop is formed between an operator and the system. The human interface section consists of a pen-shaped master manipulator, CCD cameras, a CRT display monitor, and a touch-sensor panel. The pen-shaped master is equipped with two

366

Figure 11. System configuration

367 LEDs to measure its three-dimensional (3D) position. Two CCD cameras are located above the CRT display monitor to obtain the image of two LEDs. The touch-sensor panel is attached to the screen of the CRT display monitor. The touch-sensor panel generates the output data indicating whether the surface is touched or not as well as the 2D touched position. The operator can perform the teleoperation with the pen-shaped master in the operator's hand while watching images of micro objects on the CRT display monitor. Note that the CRT display monitor screen is utilized as the desk surface so the operator can easily view the image as shown in Fig. 11. With the above input devices, the system can receive both conscious behavior of moving the pen-shaped master and unconscious behavior of touching the desk in parallel. In the system, the touch-sensor panel and CCD cameras are "the robot's multi-sensors surrounding a human" stated in the previous section. 4.3.2. Slave r o b o t Figure 13 shows a photograph of the slave robot. The slave robot consists of a slave manipulator, a work table, a stage, and a microscope. The slave manipulator has a handling tool (a knife) at its hand tip. The work table is equipped with a stage having three translational degrees of freedom. The nficroscope is located above the work table in a slave robot. 4.3.3. C o m p u t e r s a n d c o n t r o l l e r s The system computers consist of a host computer (Sun Sparc IPX), a real-time image processor (MaxVideo200) and transputers (T805). Transputers control the motion of the work table and the stage of the robot. The host computer integrates the system. The intention-understanding loop is implemented in the host computer.

Figure 12. Photo of hmnan interface

Figure 13. Photo of slave robot

368

4.3.4. System architecture The system realizes active understanding of human intention by the proposed architecture which consists of the information-exchange loop, the intention-understanding loop, and multiple sensors surrounding a human as shown in Fig. 14. Intention understanding loop (1) in Fig. 14 represents a loop in which the system understands the operator's desired direction. Intention understanding loop (2) represents a loop in which the system understands desired control mode. The two intention understanding loops work in parallel.

4.4. Experiment The following experiment was conducted. The operator shaves micropatterns on a sheet using the slave robot. The sheet is placed on the work table. Figure 15 shows a photograph of examples of patterns. The diameter of the dotted pattern is about 120 pro. Two identical sheets are prepared for the experiments. In the first experiment, the system is manually switched between fine motion control mode and rough motion control mode. In the second experiment, the proposed robot switches the system automatically. We measured the time to complete the task for six subjects. The subjects are taught how to manually change modes switch and how to use the touch-sensor panel for automatic changing in advance. Figures 16 and 17 show repairing times for each pattern in experiment (1) and experiment (2). These results indicate there is reproducibility in data because the same subject has almost fixed time for repairing, although there are differences among subjects. In order to clarify the difference between the two experiments, we compare repairing time by averaging the times for 6 subjects. Figure 18 shows the result. The results indicate that times of the second experiment (automatic control mode switching) are shorter than those of the first, though there are individual differences. The differences can be explained as follows. The operator need not consciously change modes because the system automatically changes them based on the operator's unconscious behavior. Consequently, the operator need pay attention only to the microteleoperation itself. The operator doesn't need to remember how to change modes because the modes are changed based on the operator's unconscious behavior and that unconscious behavior is based on tacit human rules.

369

r

Intention understanding / loop(2) Channel Obtain data from touch panel Understand behavior

vior Judge touching or floating of hand from sensor data.

Sensors surrounding operator

"F

Judge intention from situation of hand.

Understand intention

Touch panel CCD cameras

Select support mode

g

Select rough or fine motion control mode.

Conscious Channel uator and display Obtain data from CCD cameras Monitor

I'

Understand behavior

Support

Information exchange 100p

-~-Behavior model-~ C a l c u l a t e 3D position of pen-shaped master.

Move stage according to 3D positoin.

Intention under loop(l)

Micro-teleoperation robot Figure 14. System architecture

J

370

Figure 15. Example of dotted patterns on a Figure 16. Repairing time for each pattern sheet in experiment(1)

Figure 17. Repairing time for each pattern Figure 18. Comparison of repairing time for in experiment(2) experiment (1)and experiment (2)(average time of subjects)

371 5. C o n c l u s i o n This paper proposed a new function of "active understanding of human intention by a robot through monitoring of human behavior," which should be used by robots in order to efficiently understand human intention. Robots with this function can provide a human appropriate support based on understood intentions. The unique features of the proposed function are as follows: 1) The robot sinmltaneously receives information through nmltiple hierarchical human comnmnication channels of human behavior. 2) The robot understands human intention by interpreting the received information on the basis of tacit human rules. The authors also proposed a robot architecture to realize the function. The key features of the architecture are: 1) The robot is configured such that the robot's nmltiple sensors surround the human. 2) The robot has dual loops, an intention-understanding loop for understanding human intention and an information-exchange loop for exchanging information between the human and the robot. As an example of a robot having the proposed function and architecture, the authors constructed a micro-teleoperation robot. The robot switches between fine motion control mode and rough motion control mode, which is required frequently in micro-master-slave manipulation. The switching is performed automatically by interpreting the touch-sensor information which indicates whether the operator's hand is touching the desk or not. The hand touching the desk is a typical example of unconscious behavior of the operator. The experimental results proved that the function is useful for providing support based on the operator's intention and consequently in making the system friendly to the operator. Finally, the authors describe an example of a robot with the proposed active behavior understanding function for medical application. A patient is taken into an intensive care unit (ICU) when the patient is seriously ill or injured so that the state of the patient can be carefully monitored. Monitoring is performed by obtaining reliable data from many sensors directly attached to the patient. However, such monitoring imposes mental and physical burdens on the patient. A patient who is not so seriously ill or injured should be taken into a simple I CU which doesn't impose burdens on the patient. We are planning to construct "a robotic sickroom" as a simple type of intensive care unit (ICU). The robotic sickroom monitors the patient with the function proposed in this paper and provides the appropriate support. For example, it monitors a patient's turning over in sleep to prevent bedsores. It monitors unusual behaviors 1 and breathing motion to keep the patient's situation from becoming dangerous. The behaviors of turning over in sleep, unusual behavior, and breathing are other examples of unconscious human behavior. Understanding a patient's intention by recognizing such unconscious behavior enables the robotic sickroom to offer the functions of an ICU without imposing a burden on the patients. Future research should be devoted to modeling and processing such human behavior. The authors would like to express the sincere appreciation to Yoshinobu Kotani for his suggestions.

1Extracting a needle of an intravenous medication is a typical example. It arises f r o l n a side effect of a medicine or depression of brain.

372 REFERENCES

1. M. F. Vargas, "Louder than words-An introduction to nonverbal conmmnication-" Iowa State Unversity Press, 1987 2. K. Tachibana, "The structure of Communication", pp45-54, NTT-shuppan, 1993 3. T. Sato, K. Koyano, M. Nakao, Y. Hatamura, "Novel manipulation for micro object handling as interface between micro and human world," Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol3, pp1674-1681, July 1993 4. T. Sato, J. Ichikawa, M. Mituishi, "Micro teleoperation system concentrating visual and force information at operator's hand," Proc. of International Symposium on Ezperimental Robotics, pp118-124, Oct 1993

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

373

Human/machine sharing control for telerobotic systems* Tzyh-Jong Tarn a, Ning Xi a, Chuanfan Guo a, and Antal K. Bejczy b a Department of Systems Science and Mathematics, Washington University, St. Louis, MO 63130, U.S.A. bJet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, U.S.A Telerobotic systems usually consist of autonomous operations and master/slave operations. Based on our newly developed event-based planning and control theory, a new action planning and control scheme for the telerobotic system is presented. It provides a unified model to integrate a human operator control command (supervisory command) with motion planning and control for automatic operation. As a result, supervisory command and autonomous motion planning and control are naturally combined together such that the autonomous motion plan can be adjusted and modified by a human operator during its execution. This scheme lays down a foundation for planning and control of a general robotic system involving human operators. The scheme is implemented and tested on a PUMA 560 dual-arm system. A Spaceball is used by the human operator as a command generator to input the supervisory command. The experimental results of both single arm teleoperation and dual-arm coordinated teleoperation are presented. 1. I N T R O D U C T I O N Applications of the teleoperation of robotic systems range from material handling and automation in hazardous environments to servicing tasks in space. A telerobotic system can be commanded by manually generated motion commands, preplanned motion commands or a combination of both commands. Generally, manually generated commands are obtained from command generators manipulated by human operators in real-time. But preplanned commands are given before the motion execution. The preplanned commands are sometimes called autonomous commands or autonomous functions. Since autonomous functions may not work perfectly in unexpected situations, intervention by the human operator is often needed. The focus of this paper is to present a new approach to the planning and control of telerobotic operations, such that the telerobotic system has the ability to cope with some unexpected events. In order to interface a human operator with a robotic system, many planning and control schemes for telerobotic systems have been proposed. Among them, *Research Partially supported under NSF Grant IRI-9106317, CAD-9404949 and by Sandia National Laboratories Contract No. AC-3752-C.

374 the master/slave type of telerobotic system was intensively studied in [5, 6][8]- [12]. In addition, different kinds of semiautonomous control of the telerobotic system were presented in [8]. The teleoperation of a redundant manipulator is discussed in [9], and the importance of coordination between the human operator and remote robot is addressed in [10]. Obstacle avoidance by using semiautonomous control approach is presented in [11, 12]. In existing planning and control schemes for teleoperations, autonomous functions and manual operations are separated into two different operation modes. Based on some logic conditions, the operation modes can be switched between each other. However, if some unexpected event, such as occurrence of obstacles etc., happens in the autonomous operation, the system has to be switched to the manual mode. After proper manual operation, complete replanning of the autonomous operation is required to switch the system back into the autonomous mode, since the autonomous motion plan is usually described as a function of time, and the switches do not happen at some predefined time. This significantly limits the capability of the telerobotic system to deal with unexpected events. In addition, due to the uncertainty of the switching time caused by unexpected events, it could be extremely difficult to achieve stable and smooth switching between the different operation modes. In order to overcome the above disadvantages, a new event-based planning and control scheme is introduced for autonomous operations. It can be easily adjusted or modified by a human operator, through a command generator, during the operation. As a result, the autonomous and manual operations are naturally combined, an problems caused by mode switching are avoided. The proposed method is applied to single arm, as well as to dual-arm teleoperations. Finally, experimental results for single arm action and for dual-arm material handling are presented. The ultimate goal of this paper is to develop a new planning and control scheme for a telerobotic system to achieve an intelligent and robust performance. 2. E V E N T - B A S E D

PLANNING

AND CONTROL

THEORY

The basic idea of the event-based planning and control theory [3] is to introduce the concept of an action reference parameter. This parameter is directly related to the system output measurement relative to the task. Instead of being parameterized by time, the plan/desired input to the system is parameterized by an action reference parameter. As a result, for any given instant of time, the desired input is a function of the system output. This creates a mechanism for adjusting the base plan based on the output measurement, thus converting the planning to a closed loop real-time process. That is, the plan (the desired action) is an "abstract" entity, like a "desired goal". It only becomes a "real" plan when measurements are made. The event-based planning and control scheme is compared with the traditional time-based planning and control scheme in Figure I. In the event-based planning and control scheme, the function of the Action Reference block of Figure Ib is to compute the action reference parameter s on-line based on system output measurement. The planner then gives the desired input to the system according to the on-line computed action reference parameter s. In addition, the action reference parameter is calculated at the same rate as the feedback control. In other words, the

375

planning process is adjusted at a very high rate, which enables the planner to handle unexpected or uncertain discrete and continuous events together with the control execution.

Yd(t)

[Time-based[

y(t)

e(t)]

...~ K~t ~ ' ~-~/I

a. Traditional Planning and Control Scheme

yd(s) Event-based] "Y.'~ e(~[ Planner ~ C o n t r o l l e r

s1 !

" ._|

|

y(t) ~

Robot

Motion

Reference

L [--..

b. Event-Based Planning and Control Scheme

Figure I. Comparison

of Traditional and Event-Based

Planning and Control Schemes

As an example, the event-based planning and control scheme has been successfully applied to a robot arm motion in which the arm was stopped by an unexpected obstacle [3]. Since the desired input of the system is directly related to the output through an action reference, the error remained constant while the robot arm stopped. Once the obstacle is removed, the robot arm automatically continues its motion, according to the original plan, in a stable manner without replanning. In this case, if the conventional time-based motion plan were implemented, the system error would continue to increase, since the planner would give the desired input according to the original fixed plan. As a result, the system would become unstable. In order to apply the event-based planning and control theory to the telerobotic system, the most important thing is to integrate the human operator command with the action reference parameter [7]. As a result, the action plan will be based on sensing information in an autonomous operation, or in a manual operation on human operator command. This provides a unified planning and control scheme for operations which are commanded by both a human operator and an autonomous motion plan. 3. C O N T R O L

STRATEGY

FOR SINGLE ARM TELEROBOTIC

SYSTEM

A single arm telerobotic system mainly consists of a robot arm and a command generator. It can perform simple teleoperations such as master/slave operation, as well as complicated teleoperations such as object avoidance, which combines both autonomous functions and master/slave commands. "Simple teleoperation" means that the commands

376 of teleoperation are either from a human operator or from some autonomous motion plan. "Complicated teleoperation", on the other hand, is a combination of human operator commands and of autonomous motion plan.

3.1 Robot

dynamic

The dynamic

model

model for a robot arm can be written as

D(q)~ + C(q, (1) + G(q) = T

(1)

The task space is defined as Y c R 6, Y=

(x,y,z,O,A,T)

T,

where (x, y, z) is the position of the tip in the Cartesian space, and (O, A, T) is an orientation representation (Orientation, Altitude, Tool angles) [4]. The relation between Y and joint angle q is Y = h(q)

(2)

Therefore we have = Jh(q)(1

(3)

where Jh(q) is the Jacobian matrix. The robot dynamics in task level can be described as follows D ( q ) J h l ( q ) ( Y - Jh(q)(1) + C(q, (t) + G(q) = 7

(4)

The following nonlinear feedback [3] T = n ( q ) J [ l [ Y d + K,,(Y d - 9 ) + K p ( Y d - Y) + Jh(q)q] + C(q, (1) + G(q)

(5)

is applied to obtain a totally linearized and decoupled error dynamics, (?d _ ? ) + K~(?d _ ? ) + K~(yd _ y ) = 0

(6)

where yd, ~zd, ~d are the desired position, velocity and acceleration, respectively, and Kv, Kp are the velocity and position feedback gains which can be designed to stabilize the system. 3.2 C o m m a n d g e n e r a t o r Master robots which are kinematically similar to the telerobots (slave robots) are often used as command generators [II]. Another kind of command generator is called Position Hand Controller [5, 6]. In this paper, the Spaceball (Figure 2), is used as a command generator. Both Position Hand Controller and Spaceball are not kinematically similar to the telerobots. The Spaceball can generate operation commands in 6 directions. From a

377 manipulation point of view, the end-effector of a robot should follow Spaceball commands generated by a human operator.

Figure 2. Command Generator: The Spaceball Scaled sampled data from the Spaceball are used as the desired velocity of end-effector motion in Cartesian space, which consist of translation velocity and angular velocity. The Spaceball commands can be used as the desired velocity of the end-effector since only the relative motion of the end-effector is important. The desired position and acceleration of the end-effector can easily be obtained by integration or differentiation of the desired velocity. Assume the Spaceball output after scaling is Vx ~3y

~

=

~z

(7)

COx COy COz

Notice that Vsp is described as a function of the motion reference s [3]. A human operator gives a proper command according to his/her observed robot position and orientation. In this sense, the operator plays the same role as a sensor in the event-based planning and control scheme [3]. The desired velocity of the end-effector can be obtained as follows ~/'id -- J1Vs p

(8)

where J1 is the Jacobian matrix from Cartesian space to task space and is described by

[4] J1 = [/a•

[ 0

where

0

K~r

V~p

(9)

378

1] CA -So 0 Co 0

SoSA COSA

KE(~OAT---

CA --Co _So CA

CA

which represents the relationship between the OAT derivatives and angular velocities. The desired position change of the end-effector can be determined by: =

and the desired acceleration is

]/';dp=

jiVs p -.Ji- ,.]19sp

For a simple human operator control, i.e., a teleoperation without an autonomous motion plan, AY~%, ]?~d, and ~;d will provide the robot arm with all the necessary motion commands. 3.3 S e m i a u t o n o m o u s c o n t r o l for t e l e r o b o t i c s y s t e m If the robot arm is moving along a preplanned trajectory and some obstacle occurs in the given path, a scheme must be designed so that the human operator can temporarily stop the robot motion to wait for the clearance of the obstacle, or, by superposing an additional command through the Spaceball, change the motion to avoid the obstacle. Let the autonomous function of robot motion be described by yd(s), Yd(s), Yd(s), and the supervisory command from Spaceball is Ayd(s), yd(s), Y~d(s), where s is the motion reference parameter. To obtain a semiautonomous function which combines the attonomous function and the supervisory command to achieve a complicated teleoperation, the following scenarios have to be considered: (1) After the robot motion is either stopped or slowed down with the Spaceball, the robot motion defined by an autonomous motion plan can resume after the obstacle is removed and the Spaceball is released. (2) Switch between autonomous and master/slave operations. (3) Semiautonomous control, superposing the Spaceball commands to an autonomous motion plan to achieve complicated teleoperations. Notice that case (1) is a special situation of case (3). Case (2) is currently a common operation in telerobotic systems. Based on event-based planning and control schemes, semiautonomous control for the above cases can be easily implemented. Let the command motion direction from the autonomous motion plan be ]~d, and the command motion direction from the Spaceball be Y~p. "d If the autonomous motion plan needs some modification by a human operator command to ensure proper robot motion, the following two conditions may be considered: (1) The projection of ]2~%on the positive direction of ]?d cannot make the total velocity of robot motion in this direction exceed the maximum velocity limit of the robot arm.

379 (2) The projection of ])~d on the negative direction of yd must be modified so that the motion reference s keeps nondecreasing to ensure the stability of the motion [3].

~Pd~k. d ~

lisp2

Sf

So

(a) cos ~ >= 0 yd

d

Ysp2

sf

,

y;d So

(b) cos ot < O

Figure 3 Semiautonomous Commands for Single Arm Teleoperation Let c~ be the angle between ]~d and ]~d in Figure 3, and

~/d (10)

,pl

Ii?dll

~d

" d __ y d

sp2 ~ Ysp

(11)

spi

The semiautonomous command is defined as follows

{ ?..., !cos >0

"d

Yip,~ =

"d

"d


II?d[L, COS ~ < 0

The desired position change of the end-effector will be A Y 4 , s = Ji Vsp,sAt

and the desired acceleration will be "'d ~/'sp,s

__

J1 Vsp,s Jr- Jl Vsp,s

(12)

380

Notice that the desired position command from the Spaceball is actually a relative position increment command superposed on the autonomous function yd. The total desired input is then/kY~dp,~ + yd, Ys d "t" ~/'d and ~2"sp,s-Jc ~.zd.

Spac .'Ball

_. e, er

gobo, I , Controller ~ - Motion Reference ]--i. Human Operator ,

I~--. I-"

Figure 4. Telerobotic System with Semiautonomous

Control

Remark:

In this case, if the autonomous commands are functions of time, it cannot be so easy to combine the Spaceball commands with the autonomous commands because the Spaceball commands are naturally functions of event: the given path of robot motion. Figure 4 shows the structure of single arm telerobotic system with autonomous control.

4. C O N T R O L

STRATEGY

FOR

DUAL-ARM

TELEROBOTIC

SYSTEM

The dual arm telerobotic system consists of two robot arms and a command generator. In this system, under the command of a human operator, two robot arms can manipulate an object in 6 dimensional task space with automatic coordination. 4.1 R o b o t d y n a m i c m o d e l s The dynamic model for each robot can be written as

Di(qi)qi + Ci(qi, qi) + Gi(qi) + J~(qi)fi = "ri, i = 1,2

(13)

where the term J~(q~)f~ represents the interaction force between robots and the object, and Jh~ is the Jacobian matrix and is defined as

381

= Jh~(qi)(ti,

i = 1,2.

where Yi = hi(qi), i = 1, 2, are task space outputs of each robot. The robot dynamics in task space can be described as

Di(qi)j~X(qi)(Yi - Jh (qi)oi) + C~(q~, Oi) + Gi(qi) + j T (qi)fi

7-i

(14)

The following nonlinear feedback with the consideration of force/torque compensation [7] is applied for each robot

Ti = Di(qi) J -h, l [Yid nt- I(v ~. ( t d -- Yi ) -Jc-I(p ~. ( Yi d -- Yi ) +Jh~(qi)oi] + Ci(qi, 0~) + Gi(qi) + j Thi( q i ) f i

(15)

Then a totally linearized and decoupled error dynamics can be obtained for each robot,

(Yi a - ~ ) + Kiv(Yi a - Yi) + Kw(Yi a - Yi) = 0

(16)

where Kiv, Kip can be designed to stabilize the system.

4.2 D u a l - A r m coordinated teleoperation In a dual-arm telerobotic system, it is extremely important to achieve coordinated control. Based on the event-based planning and control scheme [3], the automatic coordination in a teleoperation could be easily achieved.

, v

?,t ......

iR~ I V///•

I I,,i- K.

R~176 "--

////Z

Figure 5. Coordinate Frame Systems of Dual Arm Robot System In Figure 5 , Kw is the world coordinate frame, and robot 1 and robot 2 hold an object by squeezing. It is assumed that the two robots manipulate an object with coordination and there is no relative motion between each robot end-effector and the object. Therefore the following constraints must be satisfied

382 (i) Position constraints 0 pod = pd + Rid (pob _ po),

i=1,2

where pod = (xdb, yo%, Zod)T is the desired position of the object in the world coordinate frame; P~ = (X~ Yob,~Zobo)T is the initial position of the object in the world coordinate frame; pd, p0 and pd , p0 are the desired position and initial position of robot 1 and robot 2 in the world coordinate frame; R1d and R~ are the desired rotation matrix of robot 1 and robot 2 in the world coordinate frame. (ii) Orientation constraints

( R d ) - l R d - (RO)-lRO and

(.Rdb)-l.R d ---(.ROb)-lR o where Rdb and ROb are the desired rotation matrix and the initial rotation matrix of the object in the world coordinate frame; R ~ and R2~ are the initial rotation matrix of robot 1 and robot 2 in the world coordinate frame. Without loss of generality, it can be assumed that the initial rotation matrix of the object is equal to that of robot I, i.e. ROb -- R ~ Then the desired orientation of the object and the desired orientation of robot 1 should remain the same all the times, i.e. R d = Rod. The initial position of the object can be assumed to be the center of the straight line from Y1~ to Y2~ i.e. pOob- 7l(p0 + p0) The sampled data from the Spaceball after scaling is used as the desired velocity of the object motion in Cartesian space. Assume the Spaceball output after scaling is Vsp. The desired velocity of the object is

?g =

Jl%

The desired position increment of the object is = J1y

At

and the desired acceleration is

Using the constraints, the desired position, velocity and acceleration for each robot can be obtained. The goal is to achieve the best possible coordination. In the event-based motion plan, the desired inputs are described by functions of motion reference s, which should closely relate to the goal of control. In other words, the problem is to determine s such that the position constraints are satisfied and also minimize the following coordination criterion

J = klllPld(s)- Pill 2 + k~.llP~(S ) -p~.ll ~

(17)

383 where kl and k2 are weighing coefficients which depend on the importance of coordination requirements in different directions and are usually determined by the task itself. The closed form solution can be obtained by solving

OJ =0 Os subject to constraints (i)[7]. 5. E X P E R I M E N T A L

RESULTS

The above planning and control schemes for telerobotic systems were experimentally implemented and tested on two PUMA 560 6-DOF robot arms. Each robot arm is equipped with FSA-3254 6-Axis force/torque sensor. The planning and control algorithms were implemented in a SGI 4D/340 workstation. A graphical user interface has been developed. It can display sensory information and provide a means for the human operator to adjust the system parameters.

.-~ 1 ooolE /

4

'.

~, ...........

!. . . . . . . . . . . . . .

i .... Y - - -

i

i

z ....

.

,,_ _

===========================

g s0o~.--.-..---..--..'.----.---=---.i --.- x - - - a.

0

500

.....

Y--" z ....

:

i

i

:

5

10

2- i

.

i~

.

r, ~ ! ~ ' : ''! ' :

.

'

..

-2 0

15

0

5

time (sec)

,000

10

15

time (sec)

,

] ,oo01

....

g 50oFz.---.-..=.-.-.'...=-.-..---.-~ .. x - - ' ]

"

500 ! ............. 0

i ..........

':i

5

.... .....

10 time (sec)

15

0

9

0

5

10

15

time (sec)

Figure 6. Single Arm Obstacle Avoidance Figure 6 shows the experimental results of single-arm-teleoperation. At the beginning, the arm was commanded by an autonomous motion plan to move along a given straight line. While an unexpected obstacle was moving toward the given path of the arm, an operator used the Spaceball to superpose an additional command to the autonomous motion plan. The arm stopped the motion, and waited until the obstacle passed the given path. Then, after the operator released the Spaceball, the robot arm continued its original autonomous motion without any replanning. It can be seen that the robot arm undergoes

384 smooth transfer between "simple teleoperation" and "complicated teleoperation" to cope with the unexpected event 9 For both modes of operation, highly accurate control has been achieved 9

E~"1.5f

.

,

,.

........ ', .......... i ........

-

L~

1"]"

i

~0.5[..... 8 o[

i

.

i ........

"

:

"~" 1ooo1,,

E

o

i

i

i

5

10

15

20

.

,

,

!

!

i

I

! Y--"

soo[ . . . . . . . . . .

13-

:

0

,

/

..~ . . . . . . . . . .

;..i ......

!

!

..-..-.- ..'-,_.~..::.~._.._.._._ .... !

I.......... ! z

8-soo ~ 0

time (sec)

i. x "

'~"

,

5

,

10 time (sec)

o

!

....

i ........

15

,

20

A

~.2,

',

,

,

"

"- 1[I .......... ~ i..........i....... i...... t_ ~. 1 ....i-T iO E

,

9

I

.

~w"

0

.

.

.

.

.

.

5

15

l

. . .

9 9 9

.

. 9

.

.

.

.

.

.

9

i x~

:

.

10

l

9 9 9

y--.

9 9

~-50 20

!

0

|

5

.

9. . ,

.

!." .... ! "'..

: .

l..,

. . . .

.2 ........ ! .:

"....'.." o

9

9

i

i

10

15

20

time (see)

time (sec) Figure 7. Dual-Arm

.

- !-: ......

o o

0

I

Teleoperation

In Figure 7, a carton weighing 0.45kg, was manipulated by dual arm teleoperation system as shown in Figure 5. The motion command was generated by a human operator through the Spaceball. Dual arm coordination was automatically maintained 9 The coordination error was defined as e~oo~d

--

IlPl(S)

-- m(~)ll

-- IIP~(S) -- P~(S)II

which is the difference between the actual and the desired distances of two contact surfaces. The experimental results show that high accurate coordination has been achieved in the teleoperation. 6. C O N C L U S I O N A new action planning and control scheme for the telerobotic system has been proposed in this paper. Based on the event-based planning and control theory, a unified model has been developed to integrate the human operator command with the autonomous motion plan. This makes it possible for human operator to adjust or modify the autonomous motion plan on-line to cope with unexpected events. It significantly improves the robustness of the teleoperation. In addition, the method provides a scheme to automatically coordinate the dual-arm in a teleoperation. This significantly reduces the human operator's required operating skill level. The experimental results presented in this paper verify

385

the theoretical results and clearly demonstrate their advantages. More importantly, the proposed methods provide a mechanism for integrating various sensor information with human operator commands. The methods could easily be extended to a general theory for modeling, analyzing and designing a human/machine interaction system. This will be an important step in the advancement of the development of an intelligent machine. REFERENCES

I. J.Y.S. Luh, Y.F. Zheng, Constrained Relations Between Two Coordinated Industrial Robots for Motion Control, The International Journal of Robotics Research, Vol.6, No.3, Fall 1987. 2. T.J. Tam, A.K. Bejczy and Ning Xi, Hybrid Position/Force Control of Redundant Robot with Consideration of Motor Dynamics, Proc. of the Thirtieth Annual Allerton Conf. on Communication, Control and Computing, Urbana, 1992. 3. T.J. Tam, A.K. Bejczy and Ning Xi, Motion Planning in Phase Space for Intelligent Robot Arm Control, Proc. of IROS'92, Raleigh, 1992. 4. T.J. Tam, A.K. Bejczy, G.T. Marth and A.K. Ramadorai, Kinematic Characterization of the PUMA 560 Manipulator, Robotics Laboratory Report SSM-RL-92-15, Washington University, Dec. 1991. 5. A.K. Bejczy and Z.F. Szakaly, A Harmonic Motion Generator for Telerobotic Applications, Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, California, April, 1991, pp.2032-2039. 6. Z.F. Szakaly and A.K. Bejczy, Performance Capabilities of a JPL Dual-arm Advanced Teleoperation System, (Preprint) SOAR'90 Workshop, Albuquerque, NM, June 26-28, 1990. 7. Ning Xi, T.J. Tam and A.K. Bejczy, Event-Based Planning and Control for MultiRobot Coordination, Proceedings of the 1993 IEEE conference on Robotics and Automation, Atlanta, Georgia, pp.251-258, May 2-6, 1993. 8. Yasuyoshi Yokokohji, Akira Ogawa, Hitoshi Hasunuma, and Tsuneo Yoshihawa, Operation Modes for Cooperating with Autonomous Functions in Intelligent Teleoperation Systems, Proceedings of the IEEE International Conference on Robotics and Automation, Atlanta, Georgia, May 2-6, 1993, Vol.3, pp.510-515. 9. Paul G. Backes, Mark K. Long, and Robert D. Steele,The Modular Telerobot Task Execution System for Space Telerobotics, Proceedings of the IEEE International Conference on Robotics and Automation, Atlanta, Georgia, May 2-6, 1993, Vol.3, pp.524-530. I0. T. Sato and S. Hirai,MEISTER: A Model Enhanced Intelligent and Skillful Teleoperation Robot System, Robotics Research- The Fourth International Symposium-, The MIT Press, pp.155-162, 1988.

386

II. P.T. Boissiere and R.W. Harrigan,Telerobotic Operation of Conventional Robot Manipulators, Proceedings of the 1988 IEEE International Conference on Robotics and Automation, pp.576-583. 12. V. Lumelsky and E. Cheung,Towards Safe Real-time Robot Teleoperation: Automatic Whole-sensitive Arm Collision Avoidance Frees the Operation for Global Control, Proceedings of the 1991 IEEE International Conference on Robotics and Automation, pp. 797-802.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

387

T a s k - d i r e c t e d p r o g r a m m i n g o f s e n s o r - b a s e d robots B. Brunner, K. Arbter and G. Hirzinger German Aerospace Research Establishment (DLR), Oberpfaffenhofen, Institute of Robotics and System Dynamics, P.O. Box 1116, D-82230 Wessling, Germany

Abstract

We propose the so-called TeleSensorProgramming concept that uses sensory perception to achieve local autonomy in robotic manipulation. Sensor-based robot tasks are used to define elemental moves within a high level programming environment. This approach is applicable in both, the real robot's world and the simulated one. Beside the graphical off-line programming concept, the range of application lies especially in the field of teleoperation with large time delays. A shared autonomy concept is proposed that distributes intelligence between man and machine. The feasibility of graphically simulating the robot within its environment is extended by emulating different sensor functions like distance, force-torque, and vision sensors to achieve a correct copy of the real system behaviour as far as possible. These simulation features are embedded in a task-oriented high level robot programming approach. This implicit programming paradigm is supported by a sophisticated graphical man-machineinterface. Programming by demonstration is performed by introducing an analytical controller generation approach. Sensor fusion aspects with respect to autonomous sensor-controlled task execution are discussed as well as the interaction between the real and the simulated system. 1

Introduction

Programming a robot off-line has been restricted to showing a sequence of cartesian positions or joint angle configurations. But mechanical inaccuracies at the manipulator level and differences between real and simulated world lead to difficult problems in the execution of off-line defined robot activities. Local adaptations on path generation level have to be done via teach-in commands by a human operator, especially in the context of manipulation tasks, such as gripping or handling objects. Even small deviations in position, orientation, and shape of all the objects to be handled are not allowed, because the task execution will fail in an uncertain environment. Graphical simulation of robot tasks and downloading the generated commands to the real robot is limited to the joint or cartesian motion level. This approach is only useful if geometrical consistency of the real environment and the simulated one can be guaranteed. This demand cannot be met with available programming systems.

388 The most important requirement to achieve autonomy is the ability to successfully execute a given task by intelligent sensor data processing. Actual sensory data are matched with a predefined reference pattern to receive information for the robot controller to achieve the desired end position. Local deviations from the desired state are detected by the sensors and handled by the robot system in an autonomous way. Therefore, sensor-controlled movements have to be used to achieve local autonomy at the machine level. High level planning facilities for task scheduling as well as intelligent error handling mechanisms are required for full autonomy, but state-of-the-art techniques are insufficient to provide the adequate tools. Presently, full autonomy is unattainable.

2 The TeleSensorProgramming approach Therefore, we favour a shared autonomy concept that distributes intelligence between man and machine [ 1]. Presuming that sufficient information about the actual environment is available from the sensor systems, partial tasks can be executed independently at the machine level. Specifications and decisions at a high task planning level have to be performed by human operators. Local sensory feedback loops are executed by the robot system, while global task level jobs have to be specified interactively by a human operator. Coarse planning activities have to be done at a task-oriented level by human intelligence, fine path planning at manipulator level takes place at a sensor-based control level [2]. human operator

low frequency

planning

task-directed

robot system

high frequency

execution

sensor-based

Figure 1 Shared autonomy This shared autonomy approach is the basis of our programming paradigm which is determined TeleSensorProgramming (TSP), resulting in robot programming at a task-directed level. The teach-in of a robot system occurs not at the joint or cartesian manipulator level but at a high language level, i.e. the operator plans activities at a level which can be processed by the robotic system independently from human intervention. Hence, TSP means teaching by showing typical situations at a task-directed level including nominal sensory patterns with the aid of sensory refinement in the real, or a completely simulated, world. All the taught sensor-based actions can be activated in analogous situations. Figure 2 shows the structure of our TSP concept consisting of two parallel working control cascades. One of them is the real system control loop containing internal control loops for local autonomy. The other one is a simulation environment which is structurally equivalent to the real system, with few exceptions. Most important is a signal delay which may occur in the real (remote) system, e.g. in space applications, which is not implemented in the simulation. This makes the simulation predictive with respect to the real system. Another exception is the display of internal variables in the simulation part, which cannot be observed (measured) in the real system. This gives the operator or task planner more insight to what happens or may happen on the system according to his commands. Communication between the two loops

389 runs via a common model data base which delivers a priori knowledge for the execution at the remote level and a posteriori knowledge for the model update of the simulated world.

i!

Graphics and Animation

'1

virtual feedback

real feedback

l

i

local autonomyloop

I

I

,-.-,o0~, I

..... t ...... : .......

task planning and error handling

]i~

path planning 1 and parameter determination

i|

I r.o~,~

.... " ...... ".... t f :

l

(Ground)

:

Simulation

"-!

!

I

....... : l I

world base sensormodel fusiondata algorithms

a posterioriknowledge update of world model

,',',o~,

I! .... "

J

t

a prioriknowledge usageof world model

,obo,

images Plots Graphics

Real System i 9 9 ; ......................................................................................

;

Figure 2 The TeleSensorProgramming concept For such an intelligent programming system different tools are necessary to implement the required functionality. First, a sophisticated simulation system has to be provided to emulate the real robot system. This includes the simulation of sensory perception within the real environment. Beside this sensor simulation, the shared autonomy concept has to provide an efficient operator interface to set up task descriptions, to configure the task control parameters, to decide what kind of sensors and control algorithms should be used, and to debug an entire job execution phase. Two applications in the field of robotics are evident to the proposed TSP approach. First, the graphical off-line programming concept is extended by the processing of simulated sensory data and by the introduction of local autonomy on the task description level. This means that not only the joint and cartesian information is gathered by graphically guiding the robot during the off-line programming task, but also the simulated sensory informations are stored off-line as nominal patterns for subtask execution on a local feedback loop. The fine motion control loop for handling uncertainties takes place independently of any human intervention both at the simulation side and the real one. A main advantage of this off-line programming scheme is the feasibility to verify all the robot actions before real execution is performed. This includes the perception and processing of sensory data and the activation of the sensor-controlled elemental moves.

390 Second, the field of telerobotics with large time delays, especially in space and subsea applications, is a good area for this sensor-based, task-directed programming approach [1]. Direct visual feedback in the control loop, where a time delay of a few seconds is inherent, is not feasible for the human operator to handle the robot movements in a suitable way. Predictive simulation is the appropriate means for an operator to telemanipulate the remote system on-line [3]. Similar approaches are known, which use force reflecting hand-controllers to feed back force sensor signals for shared and teleoperated control modes [4]. Furthermore, an interactive supervisory user interface makes it possible to set up the environmental and control parameters. But in our approach, the sensor-based elemental move concept (see section 3), feasible for all kind of sensors and execution tasks, is preferred. The operator has to only guide the robot through the task space in a coarse way and to activate specific sensor control phases. Sending these basic commands to the remote system enables the real robot to execute the desired task with its own local autonomy after the delay time has elapsed. The main feature of our telerobotic concept is to replace the time-delayed visual feedback by predictive stereo graphics with sensor simulation, providing a supervisory control technique, that will allow to shift more and more autonomy and intelligence to the robot system. The main focus of this paper lies in the task-directed programming approach which facilitates the robot programming work by an intuitive task description level. Predefined complex sensor-based tasks can be described at a task-oriented level which allows for usage in a varying context. To describe robot activities at a high language level we introduce the elemental move concept. This concept allows us to define subtasks that can be executed by the robot system in an autonomous way. To program a complex robot task it is only necessary to define a sequence of elemental moves at an intuitive planning level. An analytical approach is proposed to define sensor-based elemental moves without an extensive controller design. Therefore, we call our method programming by demonstration, which represents an easy variant of teaching by showing. 3

The elemental move concept

In the past, many attempts have been started to describe robot assembly plans at a high language level. Command language approaches [5] or planning tools such as Petri nets [6] or assembly graphs [7] have been proposed. They work well in a structured, well-defined environment for a specific application field like block world assembly [8] or compliant motion tasks [9]. Our task-directed programming approach is driven by an elemental move concept that enables us to program robot tasks in an intuitive implicit manner. A complex robot task is composed of such elemental moves (EMs) that can be divided into three categories. First, pose-controlled EMs that are described by the goal pose (position and orientation) in the cartesian space; second, the sensor-controlled EMs that have to be defined via nominal sensory patterns, and, third, shared-controlled EMs. Each class of EMs has a template of pre- and postconditions that describe the prerequisites to activate or stop the execution of an EM instance. An EM instance can be regarded as a node of a state machine. The

391 transition between two nodes is defined by the matching of post- and preconditions of the corresponding EM nodes. 3.1 Pose-controlled elemental moves

Pose-controlled EMs are defined via the goal pose in the cartesian space. Therefore, the precondition template for the pose-controlled EM class is empty, the postcondition is met if the tool frame of the robot has reached the desired pose within the predefined limits. The path generation considers the inverse kinematic problem that is solved by an iterative approach. Currently, an on-line collision avoidance algorithm is implemented to find a collision-free path from the current start to the defined end pose [10] . 3.2 Sensor-controlled elemental m o v e s

Sensor-controlled EMs are defined via the nominal sensory pattern in the reference pose relative to the environment. Therefore, the precondition template for the sensor-controlled EM class includes the availability and correctness of the sensory data, which is necessary to apply the control algorithm for achieving the desired goal state. This class of EMs represents full autonomy, in the sense that the sensor-based control mechanism leads to a proper system behaviour. An EM is fully sensor-controlled, if all 6 degrees of freedoms (DOFs) in the cartesian space are controlled by sensor values. With the aid of sensor-based path refinement it is possible to act in an environment under the constraints of the sensor measurements. This class of the sensor-controlled EMs also gives us the possibility to handle similar tasks in analogous situations. E.g. APPROACH movements, that will be able to align the robot gripper with the object to be picked up, can be defined as one sensor-controlled EM for a desired distance and/or vision sensory pattern. Similar objects at different poses can be gripped by the same EM. 3.3 Shared-controlled elemental moves

The shared-controlled EM class is a mixture of the previous two. For each instance of shared-controlled EM the constraint space configuration has to be defined. The constraint space configuration description specifies which degree of freedom in the cartesian 3D-space is pose- or sensor-controlled. For instance a contour-following task in the xy-plane can be described by the pose-controlled DOFs Ztrans, Xrot, Yrot, Zrot and the sensor-controlled Xtrans and Ytrans. Based on Mason's C-frame concept [11 ] sensor-controlled subspaces are defined using nominal sensory patterns to control the selected DOFs. The free DOFs are controlled by the path planning algorithm or, in the case of teleoperation, by the human operator using a 6-DOF input device. The techniques used for projecting gross move commands into the pose and sensor-controlled subspaces have been discussed in a number of previous papers, e.g. [12].

4 High-level task-directed programming Task-directed robot programming has been discussed so far in the field of generalized planning [13] and assembly plan generation as an AI paradigm [14],[15]. Especially for the

392 telerobotic applications interactive planning facilities are of crucial importance. To integrate such planning tools into the framework of sensor-based task execution, only theoretical work has been done so far [ 16]. We believe that the realization of task-directed robot programming by the elemental move approach leads to an intuitive high level programming concept. Complex robot tasks are described by an action graph of elemental moves. The nodes of this graph are represented by the EM controllers. The transitions between the corresponding nodes are defined by matching the successor preconditions with the predecessor postconditions. A plausibility check has to be initiated to look for the congruence between the end conditions of the subtask and the start conditions of the following one, so that the consistency of the subtask sequence is ensured within the entire robot task. The feasibility of an EM, i.e. the question whether the preconditions can be met, depends on availability of the required sensor values as well as on determination of the constraint space configuration. To implement a real task-directed programming tool we propose a hierarchical declaration of the EM framework. At the lowest level we define the pose-, sensor-, and shared-controlled EMs as so-called atomic elemental moves that cannot be devided into several control items. Each node in the action graph represents one instance of an atomic EM. A sequence of connected nodes as a part of the action graph can be joined to a so-called composed elemental move, which can be further used as a logical EM at a high task description level. We have to consider that the uniqueness of the graph transitions are guaranteed by the alignment of the pre- and postconditions of successive EMs. Furthermore, a composed EM is put together by a non-empty sequence of atomic and/or composed EMs in a recursive way. One remark to the definition of the atomic EMs: In contrast to other approaches, e.g. [5] or [4], we don't have to predefine all the elemental operations which can be used to execute any task. We can interactively define and configure an atomic EM via programming by demonstration, which is the topic of the next section.

5

Programming by demonstration

Visual programming tools for the direct transfer of human manipulation skills to the robot have been developed, but they are constrained to location gathering by using a marker-based vision system [17] or trajectory generation from observing object features [18]. Compliance controller identification by human demonstration [19] is restricted to straight line motion in well-defined constraint configurations. Methods proposed during the last years [20],[21],[22] for programming by demonstration utilize the real robot system to collect the required demonstration data. The same system is used for further execution, so that this class of direct demonstration approaches can be regarded as simple playback methods. 5.1 Remote demonstration

We have extended this direct demonstration method to the separation of demonstration and execution system that gives us the ability to define task level robot actions off-line or to overcome the problems attended upon time-delayed telerobotic applications. Before the execution of the robot task takes place in the real system, we demonstrate the task by

393 using sophisticated simulation tools including the emulation of the interactions between the manipulator and the environment concerning collision information of contact sensors as well as non-contact sensor devices. Currently we have implemented the functionality of laser distance sensors, a feature-based simulation of stereo cameras, and the behaviour of a stiff force-torque sensor in contact with the environment [23]. We emphasize that the usage of simulated sensory behaviour is only feasible, if the virtual system is properly calibrated. Especially in the emulation of computer vision the knowledge of internal and external camera parameters is indispensable for finding the right mapping between the 3D world model and the 2D camera planes. 5.2 Definition of atomic elemental moves It should be noted that the following concepts for teaching atomic elemental moves are applicable to both, first to the simulation with regard to graphical off-line programming or telerobotics, second to the real robot (including sensor equipment) without any simulation environment. Applying programming by demonstration to simple pose-controlled EMs is straightforward. The actual cartesian pose of the robot's tool frame is stored with current manipulator data like velocity or acceleration. By activating such a pose-controlled EM the robot moves from any pose to the desired one regarding the inverse kinematic problems as well as the collision detection. To move the robot a 6-DOF input device is used, which allows the operator to easily generate such target points within the robot's workspace. To define the various instances of the sensor- or shared-controlled EM class, we propose an intuitive programming by demonstration method. The key issue is to find an appropriate controller to perform the desired EM correctly. 5.3 Automatic controller generation For the handling of the sensor-controlled EMs we outline a method to estimate the motion parameters by the known sensor values to achieve the predefined goal state of the robot's pose (cartesian position in translation and orientation or joint angle position). In other words, we have to find a control sequence that transduces the robot's end effector from a pose xo into the nominal pose x* which is described by its corresponding sensor values y*. x* is assumed to be unknown or uncertain. The motion parameters relative to the sensed environment are expressed by the vector increment Axk, the actual sensor values by the vector Yk. The nominal sensor values y* generally are non-linear functions of the actual interaction between the robot, the sensor and the environment in the robot's nominal pose x*. y*-f(x*)

(1)

In the execution phase we want to find a controller sequence that is able to reach the goal state dependent on the nominal sensor value vector, which the system was taught for. Starting from a pose xo we calculate stepwise the motion parameters Axk in order of the actual and nominal sensor values, i.e. we apply a linear controller of the form

Axk = ~ C ( y k - y*),

(2)

394 where C is a constant N'M-Matrix, which maps the displacement in the M-dimensional sensor space to the N-dimensional control space, and where a is a scalar damping factor, which determines the dynamic behaviour of the closed control loop. The optimal controller coefficients, in the sense of least square estimate, are expressed by the pseudoinverse

C = ( J T j ) - I J T,

(3)

where the elements of J are the partial derivatives of the sensor values y to the motion parameters x:

Ji j

Oyi

=

'

Oxj

(4) x*

The teach-in process is implemented in an efficient way, because the nominal sensor values y* can be easily derived from either the simulation or the real system. The controller design, i.e. the construction of J is also implemented in a simple manner by experimentally estimating the Jacobian J, i.e. moving the simulated or real robot by increments, using the difference quotients AYi (5) J i 'J ~

-s

i

x*

"

The reason why we determine the Jacobian J by experimentally estimating the partial derivatives of the sensor values to the motion parameters, is the robustness of the control approach in the local working space, i.e. if Yk ~ Y* is in a suitable bound. Doing so, we avoid the very high effort which would be necessary if we would analytically determine the partial derivatives using a large non-linear model. Furthermore, the estimates determined by using the real system are expected to be better than the analytical ones, because they are independent of (inevitable) model abstractions. To get the mapping from the actual sensory pattern to the desired robot's reference pose, a sensor calibration is not required because the training process delivers the inverse relation between the robot's movements and the corresponding sensor values. In the case of programming by virtual demonstration within the graphical simulation environment, sufficiently correct world models with respect to sensor and object descriptions must be available for the generation of the controller matrix C. The method is plain and robust. The simplicity of the programming by demonstration paradigm is fulfilled by an easy way to configure the sensor- or shared-controlled EM description. The reference sensor pattern, i.e. the goal state definition of the atomic EM is directly shown by the current sensor value configuration. The controller for achieving this desired state is determined in an automatic manner by the generation of the Jacobian as described above. The human operator has only to specify the controller input variables, i.e. the sensor combination, which should be used for the control task, and the controller output variables, i.e. the sensor-controlled DOFs, to activate the controller generation. This procedure can be viewed as a form of training by doing. Currently an intelligent sensor-planning system is in progress to find out the related sensor usage in combination with the determination of the sensor-controlled subspaces in an autonomous way [24].

395

5.4 Sensor fusion aspects The proposed sensorimotion approach allows a simple integration of different sensor values. An efficient sensorfusion algorithm is inherent implemented by assigning all the used sensor values to the elements of the measurement vector y. To get better estimation results, it is often helpful to use redundant and normalized sensor information. We have implemented this approach, among other cases, to combine stereo vision (2 cameras) data with laser range data (4 distance values) and have observed fast (exponentially) as well as wide (full sensor range) convergence. In the following the main advantages of the proposed controller generation process are shortly discussed. The controller matrix C remains constant during the control process. The training of sensor-controlled EMs, i.e. to set the linear controller coefficients in the Jacobian, is necessary only once for a specific sensor/DOF combination in the desired reference state. The number of training steps for the determination of the controller is equal to the number of DOFs to be controlled. Any constraint space configuration can be selected to integrate the shared-control concept into the control generation process. Sensor failures can easily be handled by introducing a binary valued diagonal weight matrix W extending the controller design to (

C = jTWj

)-1

JTw

with

Wi, j - -

{

E

O" i ~ j

{0, 1} 9 i - j

'

(6,

The controller redesign only needs a failure detection, but does not need to apply a new training process for the changed sensor configuration. If j T j is regular, then JTWJ may be regular too, if the original sensor configuration contains enough redundancy. The condition number of j T w J may be used to qualify different sensor configurations. To realize such a sensorimotion coordination and sensor fusion concept, we already have applied other methods such as neural networks [25]. Our ongoing work is to compare both approaches with regard to applicability, robustness, and efficiency.

6 Experiments The TSP concept has been verified within R O T E X an advanced space RObot Technology EXperiment in May '93 at the D2 spacelab mission [12]. The experimental environment of the ROTEX ground station is used and currently extended to implement our task-directed robot programming approach. A powerful man-machine-interface is one of the key issues to perform the programming by demonstration in an efficient way. A 6-DOF control-ball is used to move the robot's tool center point in the 3D-stereographics simulation. A user-friendly graphical interface based on X and OSF/Motif has been implemented to support the operator in the definition and composition of the various elemental moves. Three types of sensors are available in the real as well as the simulated gripper environment: an array of 8 laser distance sensors with a working range from 0-3 cm, one with 0-30 cm, two 6-axis force-torque wrist sensors (a stiff strain gauge based and a more compliant optical one), and a tiny pair of cameras to provide a stereo image viewed in the gripper's z-direction.

396 The image processing system used in our laboratory is able to extract features like contours or markers in real time (application dependent). In addition to this direct object feature extraction more abstract measurements, such as the Fourier coefficients of a contour, can be delivered as an efficient tool for the determination of the pose, size, and shape of an object given by its contour line [26]. The corresponding simulation works analogously at a featurebased level with respect to the internal and external camera parameters. Especially for the acquisition of stereo vision data this model-based method offers the advantage to avoid the corresponcience problem. Corresponding object features in the different camera views can easily be detected by using the same geometric world model interface. The experimental environment consists of a powerful Silicon Graphics workstation with a multi-processor architecture to enhance the simulation and graphical visualization of the workcell environment. The graphical user interface as well as the controller task are running as independent processes on the same machine. The inter-process-communication occurs via shared memories protected by semaphores to guarantee the mutual exclusion on apparent access collisions. The exchange of an orbital replaceable unit (ORU) seems to be a good example to explain the task-directed programming approach in more detail (see figure 3). This manipulation task can be regarded as a sophisticated pick and place operation where the pick includes a bajonet closure screwing action. The composed EM ExchangeOru is split into three more detailed EMs FreeMotion, ApproachToOru, AttachOru, and the inverse EMs FreeMotionWithOru, ApproachWithOru, ReleaseOru. Let us focus on the EM ApproachToOru. This composed EM is further divided into three atomic EMs which differ in the choice of the used sensors and the constraint space configuration.

Figure 3 The composed EM ExchangeOru

397

First, only the vision sensor is available within its measurement range. ApproachByVision uses four corner points lying on the ORU contour as the reference pattern to achieve the first goal state. After the execution of the EM ApproachByVision, the distance sensors can be applied together with the visible corner points (see figures 4 and 5). The reference state of this EM ApproachByVisionAndDistance is reached if the distance sensor values are zero or non-zero force-torque sensor values appear. The last atomic EM ApproachByDistanceAndForce ends in a state where the attaching of the ORU is possible and the next EM AttachOru can be activated.

Figure 4 Configuration of the EM ApproachByVisionAndDistance

Figure 5 ApproachByVisionAndDistance, viewed out of the gripper-camera All three atomic EMs are completely sensor-controlled, but for teleoperation the constraint space configuration can easily be reconfigured to allow an operator supported shared-control. The Jacobian for the EM ApproachByVisionAnODistance e.g. integrates the vision data of two visible corner points of the ORU contour and four distance sensor values. These sensor readings are represented by the sensor beams approximately perpendicular to the desired contact surface between the ORU and the gripper in the attach phase (see figures 4 and 5).

398 We experimentally estimate the Jacobian J by moving the simulated or real robot by increments of 1 mm in each translational DOF and of 2 degrees in each rotational DOF, creating the difference quotients

Ayi Ji'J ~ ~zj

y (x* + A x j ) - y (x* - A x j )

x*

2 Axj

(7)

The incremental values of 1 mm and 2 degrees depend on the geometrical dimensions of workcell and sensor description. We have to consider that the incremental values Axk have to be large enough to avoid numerical error problems. The linear controller, automatically created and used by the EM ApproachByVisionAndDistance, generates a smooth behaviour of the tool center point movements to achieve the desired position. In the case of programming by demonstration and execution both on the real system, the experimental determination of the controller coefficients considers implicitly the sensor characteristics. This is a great advantage w.r.t, to an analytical approach, where the particular sensor parameters must be explicitly known to obtain the correct mapping between the sensor measurements and the manipulator movements. In the case of programming in the simulated environment and execution in the real world, the alignment of the real and the virtual task execution has to be guaranteed. Due to the fact that the goal state of the sensor-controlled EMs is expressed in terms of sensor values, the real sensor data must match the sensor values which are simulated in the virtual environment. For instance the distance sensors have to deliver the correct distance value between the sensor origin and the measured object surface, or the vision system should be able to provide features such as points, lines, or contours in the image plane without lens distortion or something else. Therefore the congruence between the simulated and the real sensors must be guaranteed, using calibrated sensor models.

7

Conclusion

We have introduced the TeleSensorProgramming concept as an easy way to program a robot off-line in a robust way with sensor-based control structures. The proposed taskdirected programming paradigm is also applicable in the field of telerobotics with large time delays where local autonomy is indispensable. Teaching by showing in the real as well as in the simulated world, including all the interactions between the robot and its environment, leads to a programming by demonstration approach using an elemental move concept with local sensor-based machine intelligence. The elemental moves are defined by a sensor data processing scheme which is experimentally determined via the Jacobian in the working area respectively. Goal of our ongoing work is to integrate the virtual and real world by world model update facilities. As the first step we have to enhance the congruence between the virtual and real world using calibrated sensor models.

399 References

[1] G. Hirzinger, J. Heindl, K. Landzettel, and B. Brunner, "Multisensory shared autonomy a key issue in the space robot technology experiment ROTEX", Proc. of the IEEE Conf. on Intelligent Robots and Systems (IROS), Raleigh, July 7-10, 1992. [2] T. Sheridan, "Human supervisory control of robot systems", Proc. of the IEEE Conf. of Robotics and Automation, San Francisco, 1986. [3] A. Bejczy and W. Kim, "Predictive displays and shared compliance control for timedelayed telemanipulation", IEEE Int. Workshop on Intelligent Robots and Systems, Ibaraki, 1990. [4] P. Backes and K. Tso, "UMI: An Interactive Supervisory and Shared Control System for Telerobotics", Proc. of the IEEE Int. Conf. on Robotics and Automation, Cincinatti, p. 1096-1101, 1990. [5] J. Funda and R. Paul, "Remote Control of a Robotic System by Teleprogramming", Proc. of the IEEE Int. Conf. on Robotics and Automation, Sacramento, 1991. [6] B. McCarragher, "Task-Level Adaptation Using a Discrete Event Controller for Robotic Assembly", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [7] A. Kapitanovsky and O. Maimon, "Conceptual Graph-based Synthesis of Robotic Assembly Operations", Proc. of the IEEE Conf. on Robotics and Automation, Nice, France, May, 1992. [8] G. Schrott, "An Experimental Environment for Task-Level Programming", Proc. of the Second Int. Symposium on Experimental Robotics (ISER), Toulouse, France, June 25-27, 1991. [9] P. Van de Poel, W. Witvrouw, H. Bruyninckx, and J. de Schutter, "An Environment for Developing and Optimising Compliant Robot Motion Tasks", Proc. of the Int. Conf. on Advanced Robotics (ICAR), Tokyo, November, 1993. [ 10] E. Ralli and G. Hirzinger, "Fast Path Planning for Robot Manipulators using Numerical Potential Fields in the Configuration Space", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Munich, Sept. 12-16, 1994. [ 11 ] M. Mason, "Compliance and force control for computer controlled manipulators", IEEE Trans. on Systems, Man and Cybernetics, Vol SMC-11, No. 6, pp. 418--432, 1981. [ 12] G. Hirzinger, B. Brunner, J. Dietrich, and J. Heindl, " R O T E X - The First Space Robot Technology Experiment", Proc. of the Sixth Int. Symposium on Robotics Research, Hidden Valley, PA, Oct 2-5, 1993. [13] W. Yared and T. Sheridan, "Plan Recognition and Generalization in Command Languages with Application to Telerobotics", IEEE Trans. on Systems, Man and Cybernetics, Vol. 21, No. 2, p. 327-338, 1991. [ 14] B. Frommherz and G. Werling, "Generating Robot Action Plans by means of an Heuristic Search", Proc. of the IEEE Int. Conf. on Robotics and Automation, Cincinatti, p. 884889, 1990.

400

[15] Y. Huang and C. Lee, "An Automatic Assembly Planning System", Proc. of the IEEE Int. Conf. on Robotics and Automation, Cincinatti, p. 1594-1599, 1990. [16] M. Gini, "Integrating Planning and Execution for sensor-based Robots", in: Ravani B. (ed.), CAD based Programming for Sensory Robots, NATO ASI Series F50, Springer, Berlin, 1988. [17] S. Tso and K. Liu, "Visual Programming for Capturing of Human Manipulation Skill", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [18] A. Ude, "Trajectory generation from noisy positions of object features for teaching robot paths", Robotics and Autonomous Systems 11, 113-127, 1993. [19] N. Delson and H. West, "Robot Programming by Human Desmonstration: Subtask Compliance Controller Identification", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [20] G. Hirzinger and J. Heindl, "Sensor programming - - a new way for teaching a robot paths and force/torques simultaneously", Proc. of the Third Int. Conf. on Robot Vision and Sensory Controls, 1983. [21] H. Asada and S. Izumi, "Direct teaching and Automatic Program Generation for the Hybrid Control of Robot Manipulators", Proc. of the IEEE Int. Conf. on Robotics and Automation, 1987. [22] K. Kosuge, T. Fukuda, and H. Asada, "Acquisition of Human Skills for Robotic Systems", Proc. of the IEEE Symposium on Intelligent Control Arlington Virginia, August, 1991. [23] B. Brunner, G. Hirzinger, K. Landzettel, and J. Heindl, "Multisensory Shared Autonomy and Tele-Sensor-Programming- Key Issues in the Space Robot Technology Experiments ROTEX", Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Yokohama, July 26-30, 1993. [24] G. Grunwald and G. Hager, "Towards Task-Directed Planning of Cooperating Sensors", Proc. of the SPIE Conf. on Sensor Fusion V, Boston Mass., Nov., 1992. [25] G. Wei, G. Hirzinger, and B. Brunner, "Sensorimotion Coordination and Sensor Fusion by Neural Networks", Proc. of the IEEE Int. Conf. on Neural Networks, San Francisco, 1993. [26] K. Arbter and H. Burkhardt, "A Fourier-method for feature extraction and parameter estimation for planar curves in the 3-D space (in German)", in: Informationstechnik it(33), Oldenbourg, Munich, 1991.

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

401

Telerobotics with large time delays- the ROTEX experience G. Hirzinger and K. Landzettel ~ and Ch. Fagerer b ~DLR German Aerospace Research Establishment, D-82234 Wessling bUniv, der Bundeswehr, D-85577 Neubiberg The paper discusses delay-compensating techniques when operating a space robot from ground or from another remote spacecraft. These kind of techniques have been a key element in the space robot technology experiment ROTEX that has successfully flown with shuttle COLUMBIA end of April 93. During this "spacelab-D2"-mission for the first time in the history of space flight a small, multisensory robot (i.e. provided with modest local intelligence) has performed prototype tasks on board a spacecraft in different operational modes, namely preprogrammed (and reprogrammed from ground), remotely controlled (teleoperated) by the astronauts, but also remotely controlled from ground via the human operator as well as via machine intelligence. In these operational modes the robot successfully closed and opened connector plugs (bayonet closure), assembled structures from single parts and captured a free-floating object. This paper focusses on the powerful delay-compensating 3D-graphics simulation (predictive simulation) concepts that was realized in the telerobotic ground station and which allowed to compensate delays of up to 7 sec e.g. when grasping the floating object fully automatically from ground. 1. I N T R O D U C T I O N Several activities towards flying space robots before the year 2000 have started in a number of countries, one of the largest projects being the space station's mobile servicing center (MSC) with its two-arm special dexterous manipulator system to be built by the Canadian space agency. Other remarkable mid-term projects are the Japanese space station "remote manipulator system" (JEM-RMS) or the Japanese ETS-VII project, an experimental flight telerobotic servicer for maintenance and repair of space systems. The European Space Agency ESA is going to prepare the use of robots in the European part COLUMBUS of the space station. We have a strong belief that for complex, partly autonomous robots with extensive ground control capabilities it is risky to leap from zero experience to a fully operational system; this is the reason why we have flown the space robot technology experiment ROTEX (spacelab mission D2 on shuttle flight STS 55 from April 26 to May 6, 93). ROTEX was the first real robot in space; it contained as much sensor-based on-board autonomy as possible from the present state of technology. There is a general observation which is crucial for the round-trip signal delays in case of on-line teleoperation. Robots on a geostationary "repair and maintenance" satellite would

402 be permanently visible directly from one and the same groundstation. Thus roundtrip delays might be kept down to a fraction of a second (the pure time of flight being around 0,25 sec). But for robots in low orbits (typically 300 to 500 km as with the shuttle) the alternative of "direct" signal contact exists only when the spacecraft has direct line of sight to the ground station; however these periods are typically 10 to 20 minutes every 2 hours. When permanent control of the robot is the goal, relay satellites have to be used that may guarantee "full-time coverage". However in this case the delays may easily sum up to a few seconds. When ROTEX became a real project around 1988 the first statements we got were that the round-trip delays would be around 15 seconds with a jitter of around 5 seconds, as the shuttle board and ground structures were not at all prepared for these kind of requests. Fortunately, later on NASA opened up the socalled "text and graphics channel" via TDRS satellite, which helped essentially to reduce the delays down to 6 -4- 1 sec delay with a jitter of a few hundred milliseconds. How we compensated these delays is the mayor issue of this paper.

2. A B R I E F

OUTLINE

OF THE ROTEX

EXPERIMENT

The experiment's main features were as follows (fig. 1 and 2) 9 A small, six-axis robot (working space .-~ 1 m) flew inside a space-lab rack. Its gripper was provided with a number of sensors, especially two 6-axis force-torque wrist sensors, tactile arrays, grasping force control, an array of 9 laser-range finders and a tiny pair of stereo cameras to provide a stereo image out of the gripper; in addition a fixed pair of cameras provided a stereo image of the robot's working area. (For more details see e.g. [11]) 9 In order to demonstrate servicing prototype capabilities three basic tasks were performed: a) assembling a mechanical truss structure from three identical cube-link parts b) connecting/disconnecting an electrical plug (orbit-replaceable-unit-ORUexchange using a "bayonet closure") c) grasping a floating object 9 All operational modes which we think are important for the future, were verified, - teleoperation on board - teleoperation from ground via human operators and machine intelligence as well - sensor-based off-line programming on ground

403

Figure l a. ROTEX control structures

Figure lb. Integration of ROTEX in spacelab (courtesy of DORNIER)

Figure 2. ROTEX robot and experiment set-up in DLR laboratory, where the multisensory gripper is below the three-part truss structure just in front of the bayonet closure (the "ORU")

404

Figure 3. Presimulation of sensory perception and path refinement in case of teleoperation from ground a) local on-board sensory feedback (e.g. tactile contact) b) sensory feedback via groundstation (grasping a free-flyer)

3. S T A T I C E N V I R O N M E N T : APPROACH

THE TELE-SENSOR-PROGRAMMING

For all operations related to a static environment the (delay-compensating) telerobotic control in ROTEX was in a unified way based on the tele-sensorprogramming approach; it comprises on-line teleoperation on board and on ground as well as sensor-based off-line programming on ground and subsequent on-board execution (following kind of a "learning by showing" philosophy). Basically this approach has two main features a shared control concept (see e.g. [10]) based on local sensory feedback at the robot's site (i.e. on-board or in a predictive ground simulation), by which gross commands were refined autonomously providing the robot with a modest kind of sensory intelligence (fig. 3a). However due to processor limitations, on-board sensory feedback in ROTEX was restricted to force-torque and range finder signals (see below). Active compliance as well as hybrid (pseudo-) force control using nominal sensory patterns in the sensorcontrolled subspace, based on MASON's C-frame-concept [2] was realized locally. Gross commands in this framework may originate from a human operator handling the control or sensor ball (a 6 dof non-force-reflecting hand controller) or alternatively from an automatic path planner. The techniques used for projecting gross commands into the position and sensor-controlled subspaces have been discussed

405 in a number of previous papers (e.g. [10]). Feedback to the human operator- in case of on-line teleoperation - was provided only via the visual system, e.g. for the astronaut via stereo TV images, for the ground operator mainly via predictive stereo graphics (the stereo TV images being an add-on for verification). This allowed us to realize a unified control structure despite of the fairly large round-trip signal delays of up to 7 seconds. Indeed predictive 3D-Stereographic simulation was a key issue for the big success of this space robot experiment, and in our opinion is the only efficient way to cope with large signal delays. Of course for these kind of ideas to work the same control structures and path planners were realized on-board as well as in the predictive graphics ground station. And this in turn meant that not only the robot's free motion but also its sensory perception and feedback behaviour had to be realized in the "virtual environment" on ground. 9 an e l e m e n t a l m o v e concept, i.e. any complex task like the dismounting and remounting of the bayonet closure is assumed to be composed of elemental moves, for which a certain constraint-frame- and sensor-type-configuration and certain nominal sensory patterns hold, so that automatic sensorbased path refinement is clearly defined during these motion primitives. As an example the bayonet closure screwing operation, but also the approach phase before grasping the closure, were typical sensor-based elemental moves, the screwing operation being a nice example for shared control as well, as the hand rotation was position- (or better orientation-) controlled using gross command projections, while the motion along the rotation axis was locally force controlled so that forces arising due to the screwing operation werde immediately compensated. For more details see e.g. [11] or [12]. All sensory systems of ROTEX worked perfectly and the deviations between presimulated and real sensory data were minimal (fig. 5). This was one of the many positive surprises of ROTEX. By help of these realistic graphics presimulation we were able to operate the robot via the graphics just if the delays were not present. During the ROTEX mission we did not overlay real and simulated views e.g. out of the hand cameras, instead the real endeffector's position was indicated by the hand frame and the real gripper's position by two patches in the graphics scene. In addition the graphics system permanently displayed real and simulated sensory data in form of overlayed bars (fig. 4) or dots (in case of the tactile arrays, see lower left part of fig. 4), while an additional SGI system displayed the time history of simulated and real sensory signals shifted by the actual delays, thus correlated in time (fig. 5). 4. D Y N A M I C E N V I R O N M E N T :

CATCHING THE FLOATING OBJECT

We feel that the local closed loop concepts as described in the preceding chapter is the only reasonable way to master especially contact operations, where errors of e.g. 1 mm might cause excessive forces. However in a non-contact environment e.g. when trying to grasp a free-floating object these errors might be admissible. Indeed no space qualifiable real-time image processing hardware was available for ROTEX; so in case of grasping the free-floating object from ground the only chance was trying to close the loop merely via the

406

Figure 4. Sensorsimulation: Range finder simulation in the "virtual" workcell environment. In addition to the 5 simulated rays out of the gripper (fig.2) the bars in the right lower corner indicate the same simulated (bright) and the corresponding real (dark) range values as registrated by the real robot.

Figure 5. Correlation between presimulated (for comparison delayed) and real sensory data (in closed loop each) was nearly perfect in ROTEX. These recordings of the four finger-range-finders pointing "downwards" were made during sensorbased teleoperation when removing from the ORU bayonet closure.

407

Figure 6. Block structure of predictive estimation scheme

groundstation w i t h o u t local feedback (fig. 3b), thus leaving the basic ROTEX concept here. Fortunately a free-flyer provides a very precise dynamic model, so with the (monoor stereo video-)images available on ground optimal observer technology (Kalman filter estimation) is realizable and has succesfully be used in ROTEX. Nevertheless concepts of shared control (e.g. reserving several or all degrees of freedom for sensorbased automatic control) as presented in the preceding chapter remain valid here, too. Fig. 6 shows a block structure of such an estimation scheme. In ROTEX the (delayed) hand-camera information on the free-flyer's pose (relative to the gripper) was provided on ground using alternative schemes; one was based on the "dynamic vision approach" as given in [8], using only one of the two tiny hand-cameras, the other one was a full stereo approach realized in a multitransputer system. In both cases the thus "measured" object poses were compared with estimates as issued by an observer and predictor scheme that simulates the up- and down-link delays as well as robot and free-flyer models (fig. 6); this scheme predicts (and graphically displays) the situation that will occur in the spacecraft after the up-link delay has elapsed and thus allows to close the "grasp loop" either purely operator controlled, or via shared control, or purely autonomously (i.e. solving an automatic rendezvous and docking problem). In [4] the observer and predictor equations have been derived in detail for a fully linearized dynamic world model x(k + 1) =

A..._g.x(k)+ Bu(k)

y_(k) = c (k)

(1) (2)

408 where k denotes the k-th sampling interval, A is the systems dynamics matrix (containing robot, moving parts and disturbances), B the input matrix (vector) i.e. the control commands, __Cthe measurement matrix (vector), which indicates e.g. whether a moving object is observed out of the robot's hand, _x is the state vector, _u the control input to the robot (e.g. cartesian motion increments) and y contains the measured variables. In ROTEX the measured variables as registrated or computed in the groundstation have been the relative pose of the free-flyer (using real-time vision as mentioned above) and the robot's joint angles. In a lengthy derivation it has been shown for the linear case in [4] that if n~ is the uplink delay (in number of sampling intervals) and nd the downlink delay, then fig. 7 is the optimal delay-compensating scheme, showing up a very simple structure. For the real ROTEX implementation two essential changes had to be made: 1. The dynamic world model (e.g. due to the free flyer's dynamics) was treated as a real nonlinear system, thus extended Kalman filter schemes were applied instead of linear observers. 2. If we are only interested in perfect prediction in order to issue optimal control commands, it is of minor interest what the present dynamic state of the on-board system really is; in other words we may assume that there is only an uplink delay or only a down-link delay (which then has to summarize both types of delays of course). Thus we finally arrived at the structure of fig. 8 to be explained in more detailed now. Basically we have to split up the equ. (1) and equ. (2) into the robot (or camera) part (index R) and the free-flyer part (index F). Originally we intended to describe the robot dynamics (and thus the camera position dynamics) by simplified second order systems in every degree of freedom, i.e.

xR(k + 1) = ARxR(k ) + Bu(k)

(3)

but it turned out that the robot dynamics were negligible compared to the very long delays, so the final representation of the robot's motion was described by the frame transitions (caused by the control input u) Ln,k+ 1 = ALn, k (u)/JR, k

(4)

where the frames Ln is composed of the orientation matrix R and the tool-center position p (or more precisely the camera position), i.e. L R , k = [ Rk

P--R,kl

(5)

The free-flyer's motion in contrast has no control imput (if it does not hit the environment); assuming a rigid body motion it may be decomposed into a constant linear velocity of the center of mass and a constant rotational velocity around it. For avoiding singularities we used quaternions instead of e.g. Euler angles. The nonlinear state equation writes as

409

Figure 7. Detailed predictive estimation scheme for the graphics simulation of a teleoperated robot that has sensory feedback on board and a large time delay in the transmission links.

Figure 8. The ROTEX delay compensating scheme when grasping the free-flyer from ground.

410

X_F(k) = A F ( k - 1)_XF (k - 1)

(6)

where A f contains the nonlinear, state dependent parameters from the quaternion equations and ;TF consists of position, quaternions and velocities. State estimations for robot and freeflyer are different, too, nevertheless in both cases we used extended Kalman filters; in case of the robot the on-board measured joint angles (after transmission to ground) were compared with the simulated ones (transformed into cartesian values of the camera pose) taking into account that the on-board robot might not precisely execute the control commands

,R (k) = ~s

(k) + KR (k)

(k) - v__R(k)

~t

y (k) = c . z R (k)

(7) (8)

where _x~ (k) = f (L~:) = f (ALk-I,Lk_I). ^ denotes the updated estimates, while 9 denotes the predicted (apriori) estimates and K R is the corresponding Kalman gain matrix. The state estimation of the free flyer was much more difficult, partly due to the difficult realtime "relative pose" determination (i.e. the measurement) from the mono- or stereo images. Using the estimates of the hand camera position from equ. (7) and (8) the free-flyer's motion was estimated in the base coordinate system, too. The Kalman filter equations write here

(9) YF (k) = CFX_F (k) x__*F(k) = A F ( k -

1)s

(10) (k - 1)

(11)

Note that the measurement matrix CF(k ) as well as the system dynamics matrix A_1(k1) are not constant, but depending on _x~. We turned the Kalman filter gains in a way that after initializing the estimation process (with noiseless measurement data) the estimation errors were always less than 5 percent in less than 2 seconds. The rough outline of the estimation scheme we have given above is simplified in many aspects compared to the final implementation. Just to mention one example, the video data by which we measured the free-flyer's relative pose had typical down-link-delays of 2 sec, while the joint angle data transmitted via another channel had a minimum delay of around 3 seconds. These features were based on the shuttle control structure and thus not changeable. For grasping the free-flyer during the flight we allowed shared control in its two extremes only, namely either fully manually controlled, looking at the predictive stereo-graphics and operating a control ball or "space-mouse", or fully automatic based on a tracking algorithm not discussed here. By pressing just one key these modes were switchable

411 in the fraction of a second. This led to an exciting situation during the mission. We first teleoperated the robot to draw the free-flyer out from its fixture, released it and after letting it drift away we switched to the automatic tracking mode. In the graphics scenery the robot indeed grasped the object (a cube with all corners flattened), but the human operators were afraid the grasp might not be central enough and overrode the "close gripper" decision. Thus in reality (to be observed seconds later) the robot kind of snapped the object but by opening the gripper immediately gave it quite a kick so that it drifted through the workcell like a billard ball. After a few minutes it came back into the robot's area of reachability; this time we decided to let the automatic system alone and indeed the robot performed a perfect, fully automatic ground-controlled grasp (ground teleoperation by "machine intelligence") despite of a round-trip delay of about 6 seconds. Fig. 9 shows photos of the TV-scene out of one of the hand cameras immediately before successful, fully automatic grasping from ground, following the image processing approach in [8]. This automatic, ground-controlled capture of the free-flyer was one of the many spectacular actions of ROTEX. 5. R O T E X D A T A - T R A N S F E R

IN CASE OF O N - L I N E G R O U N D

OPERATION At the ROTEX-telerobotic ground station in the German Space Operation Control Center (GSOC) every 50 msec a set of motion commands, including information for gripper and program control, was generated. These commands were routed to the predictive graphic simulation. Simultaneously, the data were written into a data buffer with a depth of 10. At the same time the block number of the dataset was stored in a list, together with the actual system time. When the data buffer was filled with 10 data blocks, a DECNET transmit operation was initiated and the contents of the data buffer was transferred to GSOC. Fig. 10 shows the data flow from GSOC to Shuttle and vice versa. Numerous computers within this loop receive data, check the contents of the data block and transmit the information to the next computer or satellite link. The data processing together with the pure signal travel time on satellite links results in a total uplink time of about 2.5 seconds. The ROTEX board computer reads the data blocks from the HRFL (high-rate forward-link)-interface and releases the 10 minor frames to the board software modules; one frame per 50 msec. At the same time the released minor frames (including the unchanged block numbers) are fed into the ROTEX down link buffer. Again ten data blocks (minor frames) are combined to one major frame to minimize the input/output overhead at the computer system. The major frames are sent back to GSOC and from there to the ROTEX-telerobotic ground station. A DECNET receive program checks the major frames and if there was no transmission error, every 50 msec the contents of one minor frame is sent to the ground station software modules. In case of a data transfer error the disturbed block is neglected. The block number of each minor frame points to the block generation time, stored in the list. The total round trip time is calculated as the difference between the block generation time and the actual system time. GSOC is responsible for distributing the received data to the ground segments of all the active payload elements. Because the data processing load at the GSOC distribution computer

412

Figure 9. Two subsequent TV-images out of one of the hand cameras shortly before grasping the free flyer automatically from ground. The dark areas at the left and right lower part are the gripper jaws.

413

I c~176

.ou.,on I

I

!

Figure 10. ROTEX data paths from GSOC to shuttle COLUMBIA

is not constant, the data are not transmitted within equidistant time intervals- we have some jitter on the data blocks. Additional buffering was introduced to avoid the short-term jitter which originally was 0,3 sec. Thus with the up-link delay of around 3 sec we ended up with an average delay of about 6,5 seconds with long-term variations (typical time constants 10 minutes) from 5 to 7 seconds. Of course, the estimation schemes for the free-flyer's ground capture had to be permanently adapted to these delay changes. 6.

CONCLUSION

For the first time in the history of space flight a small, multisensory robot (i.e. provided with modest local intelligence) has performed a number of prototype tasks on board a spacecraft in the most different operational modes that are feasible today. Key technologies for the success of ROTEX have been 9 local (shared autonomy) sensory feedback control concepts, refining gross commands autonomously by intelligent sensory processing 9 powerful delay-compensating 3D-stereo-graphic simulation (predictive simulation), which included the robot's sensory behaviour. The experiment also clearly showed that the information and control structures in mission control centres for future space robot applications should be improved, allowing the robot operator on ground more direct access to the different types of uplinks and providing him with a continuous TV-transmission link. Close cooperation between man (astronaut or ground operator) and machine comprising different levels of robot autonomy was the basis of the success of ROTEX. It was clearly proven that a robot system configured in this flexible way of arbitrary and fast switching between the most different operational modes will be a powerful tool in assisting man in future space flight projects and it was impressively shown that even large delays can be compensated by appropriate estimation and pre-simulation concepts. ROTEX had to make optimal use of given data transmission structures which were not at all prepared to robot teleoperation. Operational space robots of the future should have much better conditions and thus really may become a powerful prolongation of the human arm into space.

414 REFERENCES

1. S.Lee, G. Bekey, A.K. Bejczy, "Computer control of space-borne teleoperators with sensory feedback", Proceedings IEEE Conference on Robotics and Automation, S. 205-214, St. Louis, Missouri, 25-28 March 1985. 2. M.T. Mason, "Compliance and force control for computer controlled manipulators", IEEE Trans. on Systems, Man and Cybernetics, Vol SMC-11, No. 6 (1981, 418-432). 3. T.B. Sheridan, "Human supervisory control of robot systems". Proceedings IEEE Conference, Int. Conference on Robotics and Automation, San Francisco, April 7-10, 1986. 4. G. Hirzinger, J. Heindl, K. Landzettel, "Predictive and knowledge-based telerobotic control concepts". IEEE Conference on Robotics and Automation, Scottsdale, Arizona, May 14-19, 1989. 5. S. Hayati, S.T. Venkataraman, "Design and Implementation of a Robot Control System with Traded and Shared Control Capability", Proceedings IEEE Conference Robotics and Automation, Scottsdale, 1989. 6. L. Conway, R. Volz, M. Walker, "Tele-Autonomous Systems: Methods and Architectures for Intermingling Autonomous and Telerobotic Technology", Proceedings IEEE Conference Robotics and Automation, Raleigh, 1987. 7. J. Funda, R.P. Paul, "Efficient control of a robotic system for time-delayed environments". Proceedings of the Fifth International Conference on Robotics and Automation, pages 133-137, 1989. 8. D. Dickmanns, "4D-dynamic scene analysis with integral spatio-temporal models", Fourth Int. Symposium on Robotics Research, Santa Cruz, Aug. 1987. 9. Christian Fagerer and Gerhard Hirzinger, "Predictive Telerobotic Concept for Grasping a Floating Object", Proc. IFAC Workshop on Spacecraft Automation and OnBoard Autonomous Mission Control, Darmstadt, Sept. 1992 10. B. Brunner, G. Hirzinger, K. Landzettel, J. Heindl, "Multisensory shared autonomy and tele-sensor-programming- key issues in the space robot technology experiment ROTEX", IROS'93 International Conference on Intelligent Robots and Systems, Yokohama, Japan, July 26-30, 1993. 11. G. Hirzinger, "ROTEX - The First Robot in Space", ICAR'93. The sixth International Conference on Advanced Robotics, Tokyo, 1993. 12. G. Hirzinger, B. Brunner, J. Dietrich, J. Heindl, "ROTEX - An advanced space robot technology experiment", ISRR'93, Sixth International Symposium on Robotics Research, Hidden Valley, PA, Oct. 2-5, 1993.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

415

Feature-Based Visual Servoing and its Application to Telerobotics Gregory D. Hayer, a Gerhard Grunwald, b and Kentaro Toyamaa ~Department of Computer Science, P.O. Box 208285 Yale Station, Yale University, New Haven, CT 06520-8285, U.S.A. bGerman Aerospace Research Establishment (DLR), Institute of Robotics and System Dynamics, Mfinchenerstr. 20, D-82234 Oberpfaffenhofen, Germany

Recent advances in visual servoing theory and practice now make it possible to accurately and robustly position a robot manipulator relative to a target. Both the vision and control algorithms are extremely simple, but they must be initialized on task-relevant features in order to be applied. Consequently, they are particularly well-suited to telerobotics systems where an operator can initialize the system but round-trip delay prohibits direct operator feedback during motion. This paper describes the basic theory behind feature-based visual servoing and discusses the issues involved in integrating visual servoing into the R O T E X space teleoperation system.

1. I n t r o d u c t i o n Automation and robotics are rapidly becoming extremely attractive areas within space technology. They hold the promise of assembly, servicing, and repair with a minimal number of expensive manned missions [7]. However, unlike a factory environment, space operations require the ability to work in an environment which is unstructured. For this reason, some amount of autonomous behavior is necessary to perform complex, diverse tasks. Such autonomy will rely heavily on vision, depth, force, torque, and tactile sensors, as well as advanced planning and decision capabilities. Unfortunately, our current lack of understanding of sensor data interpretation, robot motion control, and artificial intelligence makes the prospect of complete autonomy for complex tasks unlikely in the near future [2]. One way out of this dilemma is sensor-based telerobotics. In particular, space operation tasks for which telerobotics could play an increasingly large role include inspection, assembly, servicing, and repair [10]. When teleoperating in space, relay satellites and computer networks insert a large delay into round-trip communication. Hence sensor data such as video images cannot be used as direct, real-time feedback by a remote operator. One approach to overcoming this problem is to use a predictive computer graphics system as was successfully demonstrated in the German space robot experiment, ROTEX. In ROTEX, the operator handles a control device--a six degree-of-freedom control ball--based on a predictive graphics model of the robot and its environment. The control commands are sent to both the local simulation

416 and the remote robot. If the simulation is properly calibrated to the real environment, the behavior of the remote system will mimic that of the simulation exactly. Clearly, the crucial problem in this system is to provide an extremely accurate simulation. Teleoperation based on the predictive graphics will fail if the world model does not correspond with reality. This may happen if objects are deformed or damaged (for example, the solar panel of the Hubble Space Telescope), or if an object is freely floating in space. Another disadvantage of relying on a known world model is the need for a precise calibration of the entire operational space. This requirement is difficult to meet because the mechanics a r e p u t under extreme pressure by liftoff and the subsequent temperature differences between parts facing the sun and those which are not. One way of lessening the dependence on prior geometric knowledge is to make remote operations sensor-based. Experience has shown that proper use of closed-loop control algorithms operating in orbit can significantly enhance the capabilities of the teleoperation system [6]. To date, there has been little prog[ess in the use of visual feedback to'support teleoperation. One reason is that most classical work on visual servoing relies heavily on a calil~rated hand-eye system. As noted above, maintaining accurate calibration in orbit can be difficult. A second reason is that most vision algorithms are too complex t6 execute on hardware that is space-qualified. In [5], an approach to feature-based visual servoing using closed-loop control was developed. The main advantages of the approach are that it is extremely robust to calibration error and that it uses simple visual features that are computationally simple to track. It can be shown that the accuracy of these visual servoing algorithms is, in fact, independent of calibration error. These properties make the technique well-suited to the teleoperation problem since the operator can choose features and servoing operations using a single stati~c image, and the system can then, without further intervention, autonomously execute the operation with high accuracy and reliability. In this paper we discuss a family of visual servoing techniques, where simple positioning primitives can be easily combined to perform full six degree-of-freedom relative positioning. We then describe some of the issues involved in integrating visual servoing into a teleoperation environment, discuss a preliminary design for such a system, and illustrate its use on an example problem. The remainder of this paper is organized as follows. The next section describes the visual servoing problem and our solution to it. In Section 3, we describe the integration of visual tracking and the telerobotic system together with the operator interface. In Section 4, we describe our current progress at implementing this system.

2. Feature-Based Visual Tracking and Servoing Vision is an extremely rich and precise mode of sensing. In many ways, it is an ideal candidate for sensor-based motion and manipulation. However, in order to combine vision and robotics, twomajor problems must be addressed. First, vision problems that involve scene interpretation tend to be extremely difficult, and most of the vision algorithms used to solve them require specialized hardware in order to operate in real-time. Second, in order to relate visual information to a robot manipulator, the spatial relationship between

417

Figure 1. A visual servoing system performing positioning by reducing visual disparity to zero

the manipulator and the camera(s) must be known. Determining and maintaining this relationship, the hand-eye calibration, with high accuracy is very difficult. Here, we describe an approach to visual servoing that eliminates these difficulties. Instead of relying on accurate visual reconstruction, we base motion directly on visual feedback. In humans, this process is called hand-eye coordination. We are attempting to develop similar hand-eye skills for robotic manipulation systems. We hypothesize that a sufficiently large and rich vocabulary of hand-eye skills will dramatically increase the dexterity of robotic systems. For the purposes of this article, hand-eye skills are essentially vision-based feedback control algorithms. These methods use locally defined visual features, primarily image contours, that can be tracked in real-time by standard workstations or PCs. Corresponding features in two cameras comprise the input to a feedback control system. The feedback control system will provide accurate positioning despite calibration errors provided control gains are chosen to ensure stability. 2.1. Visual Feedback Control We begin by describing two systems for visual feedback control: one that controls only position and one that controls both position and orientation. Figure 1 illustrates the underlying principle. We consider a visual servoing system which includes two video cameras, a robot arm, and computers that perform image-processing, low-level control operations, and other interface-related functions. Any attempt to accurately calibrate this system is limited by system modeling errors, mechanical backlash, control inaccuracy, and so forth. Consequently, a system that reconstructs absolute point coordinates using vision and positions the robot based on that information will be extremely imprecise. However, as illustrated in Figures l(a) and l(b) the robot manipulator can be positioned extremely accurately relative to the observed target. In (a), the cameras observe the visual disparity 01 and 02 between a point on the manipulator and the corner of the box. We know the following property: zero disparity between the manipulator and the goal in both images

418 means that they are at the same point in space. This property is true independent of where the cameras are located relative to the robot (More precisely, this statement is true except in configurations where the goal or robot and the cameras are collinear). Thus, if we have a stable controller to achieve zero visual disparity, we can position accurately, even if the system is badly calibrated. The theory underlying a feedback control algorithm for this problem can be summarized as follows. Suppose that y = f ( x ) is a mapping from a robot configuration space to an output sensor space, both of dimension n. Given a desired setpoint y*, define e v = E ( y , y * ) , where ev approaches zero as y approaches y*. Taking time derivatives,

e'y= .]E[C,

(I)

where OE JE = Ox"

(2)

If E(-,-) is nonlinear, JE is a function of the system state, x. Thus, if the system state is not directly available, it must be estimated from sensor data. We take u = ~: to be the control input to the system and, presuming JE is full rank, compute u = - k JEl eu

(3)

where k is a constant set by the designer. If the Jacobian is nonsquare, the control vector is computed by u = -k J [ ( J E J T ) - 1 % .

(4)

It is well known that the proper choice of k in discrete time approximations of systems of this form lead to an error, e, that asymptotically approaches zero. Using these concepts, we can construct visual servoing systems to perform different types of positioning and motion by defining the appropriate image-based error term. The inputs available for defining errors are contours and fixed visual reference points observed in two cameras. The following is a brief review of camera imaging geometry for these features. Camera positions are represented by the frames C1 = (c~, E~) and (32 = (c2, E2). Points and fines in ~(3) will be written in capital letters. The projection of a point P or line L in camera i will be written pi and li respectively. The projection of a point P = (P~,Py, Pz) T to a homogeneous vector p~ = (u, v, 1) T is given by P' p,

= =

Ei(P - ci), (u, v, 1) T = P ' / P : .

(5) (6)

In vector form, this is written P i - - gi(P). An arbitrary infinite line, L, is parameterized by a tuple (Ld, L~,) where Ld is a fixed point on the line and Lv is a unit vector representing the direction of the line. The vector

419

li parameterizing the projection of L in camera i is L' = Ei (Lv x (Ld -- ci)), L' = v/L, +

(7)

(s)

In vector form, this is written li = hi(L). The Jacobian of g is a function of the position of the observed point, and the Jacobian of h is a function of the fixed point on the line:

jg(p)_

Og

I

,

~

Jh(L)= Oxx

-~Xp

"

(9)

La

As described in [3], it is possible to estimate the parameters of both lines and points using relatively straightforward techniques. We refer the reader to that publication for the technical details of estimation and control and proceed to describe the structure of three representative controllers. P r o b l e m Definition: Given a fixed reference point, P, on a manipulator and a line, L, not on the manipulator, servo the manipulator so that P C L using inputs pi and li, i = 1,2. A homogeneous vector pi in camera image i lies on the line projection li if and only 0. Aside from a set of singular configurations, it can be shown that for any if Pi" li arbitrary line L and a point P, ll'pl = 4"p2 = 0 if and only if P C L. This motivates the definition of a positioning error e C N(2) as: =

eli e2

Pl " ll

P2 4

" hi(L) ]

[gl(P)

g2(P) h2(L)

= H2(P, L).

(10)

The Jacobian is as follows:

JH~ --

[ ITJg(P) lTjg(p) ].

(11)

Applying the methods described above yields a controller for positioning a point on a line. This primitive controller is remarkably versatile and can be easily composed with itself to form controllers which control greater degrees of freedom, hinting at the possibility of less and less human operator intervention. For example, we can combine two point-to-line primitives to create a line-to-line or "alignment" controller which controls four degrees of freedom: P r o b l e m Definition: Given two fixed reference points P and R and a reference line L that is rigidly attached to a manipulator, develop a regulator that positions the manipulator so that P C L and R E L, using pi, ri, and li, for i = 1, 2.

420

M

L ,S ........ 9 ...... ~

P

Initial

R

Configuration

@ ~.----

P R L "'Goal C o n f i g u r a t i o n

Figure 2. The geometry of the six degree-of-freedom servoing problem

The error defined for the previous problem can be extended for e C ~(4):

Iell e2 e3

I pl ll I p2 12 rl ~1

gl(P)'hl(L) gl(R) "hi(L) g2(P)" h2(L)

H4(P,R,L).

(12)

With this formulation, it is possible to impose the remaining two degrees of freedom arbitrarily by choosing a rotational velocity and translation velocity about and along the estimated value of the line L, respectively. This allows for shared control between the control system and an external agent, e.g., an operator. A similar construction with an additional constraint yields a full six degree-of-freedom regulator: P r o b l e m Definition: Given three non-collinear, fixed reference points P, R, and S, and two non-parallel reference lines L and M that are rigidly attached to a manipulator, develop a regulator that positions the manipulator so that P 6 L, R C L, and S C M using pi, r~, si, li, and mi, i = 1, 2 (See Figure 2). The positioning error, e C N(6), is quickly derived:

el

Pl" ll

gl(P)" hi (L)

e2

p2 " 12

g2(P) h2(L)

e3 e4

rl . 11 r2" 12

gl(R) g2(R)

e5

81"ml

e6

s2 " m2

h~(L) h2(L) h~ (M) g2(S) h2(M)

H6(P,R,S,L,M).

(13)

gl(S) "

Based on the remarks above, it follows that, aside from a small set of singular configurations, e = 0 if and only if P E L, R E L, and S 6 M. The system Jacobian Jg~ is square

421 V1 M

-~,.

i ~~p .-'" L

Key Line Used To Find Vanishing Point Line Constructed from Vanishing Point Tracked Comer .... Reference Vector

Figure 3. The geometric constructions for inserting a floppy disk into a drive unit

and depends on five vector quantities that can be estimated from sensor data. Applying the methods described above produces a regulator that can control all six degrees of freedom of a robot manipulator. For details on the form of the Jacobian and the estimation of unknown values, we refer the reader to [3]. Note that the assignment of features in the problem above was arbitrary. That is, the roles of points and lines can be interchanged without changing the final result. Also, one way of defining L and M is by choosing two reference points and computing the line that runs through them. The problem formulation is independent of whether the cameras are stationary in the environment or mounted on the manipulator itself. Thus, this basic control formulation can operate on a wide variety of systems with different types of input. All of these controllers have been implemented and tested as described in [5].

2.2. Geometry and Visual Servoing The controllers described above do not have to be based on directly observed features. Various types of geometric constructions can be used to define "virtual" lines and points. However, in order to maintain the accuracy of servoing, these constructions must rely on image-level geometric constructions. For example, Figure 3 illustrates the use of geometric constructions to place a floppy disk into a disk drive. Geometrically, the strategy is to move the disk into the plane of the disk drive unit, align it with the slot, and then move toward the slot until the disk slides in. In the visual domain, we can perform this task if we know that the sides of the disk drive unit are parallel to the floppy disk drive. In a projected image, parallel lines meet at a point, so we first construct vanishing point V1 by tracking the indicated corners of the drive unit. The edge of the floppy slot and the vanishing point are used to construct the line L. The line M is constructed from the image of the slot itself. The three corners of the floppy are used to define the reference points P, R, and S. Servoing using this information will place the floppy disk at the mouth of the drive slot, oriented parallel to it. As this example illustrates, a variety of geometric constructions can be used to provide increased functionality for visual serving. As with the basic servoing routines, these

422 constructions rely on tracking specific features in an image and computing values based on those features.

2.3. Feature Tracking In [4], a feature-based tracking system was described. The tracking system is based on two central ideas: window-based image processing and state-based programming of networks of tracked features. A window is an image defined by its height, width, position, and orientation in device (framebuffer) coordinates. Tracking a feature means that image-processing operations are used to maintain a fixed relationship between window coordinates and the underlying feature. The low-level features currently available in the system include solid or broken contrast edges detected using convolutions and general grey-scale patterns tracked using SSD methods [1,15]. The entire tracking system is designed to run on a standard CPU and framegrabber with no additional hardware support. For example, tracking single contours on a Sun Sparc 2 with an Imaging Technologies 100 series framegrabber requires 1.5ms for a 20 pixel contour searching + 10 pixels using a mask 15 pixels wide. Tracking arbitrary patterns using SSD methods requires 70 ms to localize a 20 by 20 window. Basic features can be easily composed into more complex configurations using feature networks. Every feature or image property in our system can be characterized in terms of a state vector. For basic features--those that operate directly on images--the state of the feature tracker is usually the position and orientation of the feature relative to the framebuffer coordinate system. We define composite features to be features that compute their state from other basic and composite features. A feature network is defined as a set of basic and composite features connected by two types of directed arcs referred to as up-links and down-links. The up-links provide image information for higher levels of abstraction. For example, a cluster of image features that all lie in a plane may contribute information to a composite feature that has, as state, the slant and tilt of the plane. The down-links provide constraints from higher levels of abstraction. For example, if something is known about the motion of a rigid object, this information can be propagated down to the feature level to predict positions of image features in the subsequent image. This ability to "package" features into higher-level primitives is particularly useful for providing visual constructions such as those used in the last section. Each of these constructions can be defined as an object that requires certain initialization information and provides a particular type of image information. These objects can be composed to form more objects, or they can be used directly to provide input to visual servoing routines.

3. Teleoperation and Visual Servoing Visual servoing provides the means to achieve a particular geometric configuration reliably and accurately. However, some intelligence external to the servoing algorithm must translate a geometric task into the visually based operations and choose the visual features needed to perform those operations. In this section, we explore the issues that arise when integrating visual servoing into a teleoperation system similar to ROTEX. The information available to the human operator consists of a menu driven programming environment, two camera images of the remote site, and a graphics simulation of

423 the known aspects of the remote site. The completeness of the latter depends on the availability of prior object knowledge and/or sensor reconstruction methods. The system interface offers the operator a list of the visual servoing operations, e.g., positioning or alignment, that are available in the current execution context. The operator must select an operation and, based on the selection, he or she must choose image features that initialize the tracking system for the chosen operation. These may be image features, or features defined using visual constructions. In the latter case, the system requests further initialization information until all operations are fully instantiated with image-level features. Note that the initialization of feature tracking must take place in both camera images.

3.1. Image Level Feature Specification The effect of a visual control action is only meaningful if the chosen feature has physical relevance and if the images of the same physical features are chosen in both camera images. As a consequence, the set of image-level features we consider are contours, intersections of contours, and unique surface markings. An example of the latter would be a barcode or similar artificial labeling device. We disallow arbitrary patterns that would be difficult to place in accurate correspondence in both camera images. At the image level, the tracking system is initialized to track a feature by supplying the image coordinates (position and orientation) of the feature within a prescribed tolerance. This information may be interactively supplied by an operator "clicking" on a feature in an image, or may be indirectly supplied through model information. In both cases, the low-level image processing can check for the presence of the correct type of feature and also ensure that there are no distracting features in the immediate vicinity that would lead to tracking ambiguity. The system can reject any feature that does not satisfy these conditions. As noted above, it is important that the features chosen in both camera images be physically consistent with one another. One means of ensuring this is to exploit epipolar camera geometry. Given (homogeneous) projections pt and pr, of a point P, there is a 3 by 3 matrix E such that pTEpr -- O. If the camera system is accurately calibrated, the E matrix can be computed directly from the calibration. Alternatively, the E matrix can be computed from the projections of eight points in two images provided the points are arranged in a non-singular configuration [9,11,12]. Hence, given a ninth feature point in one image, it is possible to determine the epipolar line along which the corresponding feature lies in the second image. Note that if the point P lies on a contour in one image, the corresponding point in the second image is determined by the intersection of the corresponding contour with the epipolar line. Consequently, standard matching methods can be applied to order candidate matches along the epipolar line. The operator can then accept the system's best choice, or override it by choosing another contour. If the feature in the first image is the intersection of contour points (e.g., a corner) then the feature in the second image is a point where two contours and the epipolar line intersect at a single point. This has a very high probability of being unique. Similar remarks hold for special patterns. These ideas rely on having either an accurate calibration or at least 8 corresponding points in both images. That latter condition is preferable as it is independent of camera

424 calibration, but it may be extremely difficult to satisfy. However, the environment contains many known objects, including the robot itself. Consequently, a simple method for providing the required correspondences is to decorate the workspace with unique visual targets. These targets can be acquired automatically from static images on the ground and used for subsequent epipolar computations. Furthermore, if enough image processing bandwidth is available at the remote site, the targets can be tracked over time and do not have to be reacquired for every operation initialization.

3.2. Using Prior Environmental Knowledge Even if the automatic correspondence mechanisms described above functioned perfectly, it would still become onerous for the operator to specify all of the features needed for numerous six degree-of-freedom positioning operations. In many cases, image-level tracking initialization may not be necessary. Some objects in the environment, such as the robot manipulator, will be known. If the location and pose of these objects can be ' calculated using artificial markings or by tracking object features, then the operator can choose features from the model in the graphics simulation. By using the known correspondence between model and image, the tracking system can be initialized in both images automatically. If sufficient tracking bandwidth is available the operator can also register features for later use and reuse. For example, if the manipulator is holding a particular tool, e.g., a screwdriver, the operator may instruct the system to track the shaft and the endpoint of the screwdriver at all times. These features can then be used again and again by referencing a symbolic label or by choosing them from an iconic representation in the graphic simulation. Likewise, some often-used operations may implicitly select certain object features. For example, one extremely common operation is to command the manipulator to approach an object along the manipulator center axis. An operation "manipulator-aligned-approach" could be defined that automatically chooses certain manipulator features to define the center axis and endpoint of the manipulator. In this case, the operator need only choose one or two features that define the goal configuration. If this can be done in the graphic simulation, visual servoing is essentially no more difficult than pointing to where the robot should go. 3.3. Integration into ROTEX The ROTEX sensor-based telerobotic system [6] is based on the shared autonomy concept that distributes intelligence to man and machine [8]. Global tasks like visual servoing are specified interactively by a human operator and carried out by local sensory feedback loops executed by the remote robot system. Coarse task-level planning activities are performed by human intelligence while fine path planning on manipulator level takes place through sensor-based control [13]. The shared autonomy concept also allows shared control in which the robot is guided by local sensory feedback as well as by the human operator via the teleoperational device. In the context of visual servoing, this allows the specification of a visual controller constraining less than six degrees of freedom. The remaining degrees of freedom are controlled by the operator. The ROTEX system already supports a three-dimensional graphics environment in which object models and the robot are represented. This virtual world allows the operator to move around and "interact" with the robot and its environment. In addition, all remote

425

Figure 4. The ROTEX workcell

operations are mimicked in the graphical simulation. In order to support visual servoing, the functionality of the ROTEX ground station [6] will be expanded to include a menudriven programming environment, the image processing system, and the capability of overlaying remote camera images with wireframe models of known objects. These visual aids make the feature selection easier for the operator. The overlay not only shows the correspondence between a model and real data but also gives a visual hint as to the validity of the system calibration. Selected visual features will appear highlighted in the camera images and, when possible, in the graphic display. Visual servoing will also be incorporated into the robot simulation so that the servo operations can be tested otttine before applying them to the remote system.

4. Examples We now describe examples of visual servoing and teleoperation in space. First, we briefly sketch how two servoing problems would be specified in the extended ROTEX system. ROTEX contains two camera systems: a stereo system mounted on the manipulator itself and an external camera system. The first example uses the external stereo cameras, and the second uses a single manipulator-mounted camera. We assume that some calibration information on both camera systems is available, although the correctness of the calibration is not guaranteed. In our third example, we describe a task which we hope to perform in the near future and offer experimental results which confirm the feasibility of visual servoing as a means to simplify teleoperation tasks. We first describe how the floppy insertion task could be performed using the ROTEX interface. We note this is in fact a realistic task since one of the commonly performed tasks in orbit is the removal and insertion of circuit boards. We assume that the disk drive unit is a known object registered to the graphics model. The disk itself is not an object known to the system. We assume the manipulator is holding the disk initially.

426

Figure 5. The view from the ROTEX manipulator camera

The first action of the operator would be to choose a servoing operation. Since insertion is a common operation, it would appear in the list of available operations. Once insertion is chosen, the operator is notified that he or she must choose specific image information m in this case points and/or contours on the manipulated and target objects. The operator then chooses the three corner points of the floppy disk as the defining points for the manipulated object. When chosen, the system estimates the depth of the points (using camera calibration information) and enters them into the graphics display. The system continues to request two lines on the target object. The operator now chooses the construction parallel-line. The system offers options for constructing parallel lines, in particular, by choosing two parallel lines in the world and a third point. The operator chooses this option. Since the floppy drive is registered in the graphics model, the operator can directly choose the sides of the drive unit as the requested parallel lines and the edge of the floppy slot as the point defining the insertion axis. This defines the line L of Figure 3. Still needing a second line, the operator indicates the drive slot of disk unit in the graphics display. The feature specification for the task is now complete. The system analyzes the features for trackability and completeness. If the features are satisfactory, the operation proceeds in both simulation and on the remote system. When finished, the operator receives an image and checks that the operation was correctly performed. If so, the system proceeds to perform the insertion operation, presumably under position and force control. As a second example, we describe the task of grasping the ORU (Orbit Replaceable Unit). This task is representative of many experiments which involve the manipulator moving toward a rigidly attached object, grasping it, and moving it to some other place. Figure 5 shows a typical view from the manipulator cameras above the ORU. The visual line markings 11 and 12 were added to aid the human operator in positioning. The final goal position is reached if ll aligns with the horizon of the fingertips, 12 aligns with the gap between the fingers and the manipulator is perpendicular to the surface of the ORU as it touches down. In the final approach, four laser distance sensors integrated into the fingertips can provide distance and some orientation information. In this example, it is assumed that all objects are known and their models are registered in the graphics model. Unfortunately, the configuration of the manipulator cameras is such that occlusion and visual singularities severely limit their usefulness. In particular, the "horizon" line in the

427 figure is parallel to the baseline of the cameras. This means that it is not possible to acquire depth information on this line, or on lines parallel to it. Instead, we use visual information on the lines ll and 12 to constrain the orientation of the manipulator about the center axis and translation perpendicular to the manipulator axis while the operator controls the direction and attitude of approach. Proceeding as above, the operator begins by choosing an operation, in this case the appropriate approach operation. This operation assumes that the horizon line and the center of the manipulator gap (/92 in the figure) are the manipulator setpoints. It indicates that the operator must specify a line and point in one camera. Since the ORU is registered in the graphics simulation, the operator indicates ll and P1. The system verifies that these features are visible and trackable, and then begins to graphically simulate the visual servoing operation as it is carried out remotely. The operator uses the tracking ball to move the manipulator toward the ORU, simultaneously adjusting orientation, until the laser distance sensors detect the ORU surface. At this point, the distance sensors can control manipulator attitude with respect to the surface. Vision continues to control translation parallel to the surface and rotation about the center axis. The operator continues to control distance to the object until touchdown is registered on the force sensors. As our third example, we describe a set of completed experiments which demonstrate how the system can be employed without the use of an a priori model. The actual task is to visually guide a robot arm mounted on a repair satellite so that it can grasp a separate, damaged satellite. The robot arm holds a specialized rod-shaped grasping tool, which is equipped with two triplets of laser distance sensors. Grasping of the damaged satellite becomes possible when the grasping tool is inserted into the satellite's conic apogee motor. Although the distance sensors will be used to center the grasping tool once it is inside the motor, vision must be used to servo the tool into the cone. We expect to perform the task in shared-control mode, where the operator will determine only translation along and rotation about the cone axis. Therefore, the servoing algorithm must fix the remaining four degrees of freedom in the manipulator. In our experiments, we simulate the actual, expected environment with a pencil-like stick instead of the grasping tool, and a paperboard cone instead of a real apogee motor (See Figure 6). Two cameras will also be placed upon the repair satellite, either in a stereo configuration (side by side), or with one camera on the robot manipulator, and one at the base of the manipulator. Our present experiments approximate the first situation, where two black and white cameras are placed approximately 30cm apart, about 150cm away from the cone, in a constant geometric relationship with the manipulator base. We note that, unlike the ROTEX system, the absence of precise knowledge of the motor or of the geometric configuration of the satellites means that we cannot expect to have an accurate graphics simulation of the environment. Four corners are tracked in either image: two for the cone base (assumed to be viewed roughly from the side, so that a trapezoidal figure is projected to the cameras), and two for the grasping tool. Each corner is tracked as the intersection of two line segments, themselves simple edge detectors, hinged together at a vertex. The geometry of the task requires only that we know the slopes of the lines that track the sides of the rod and the

428

Figure 6. A pair of beginning and ending configurations of manipulator and cone for our third example: both corners of the tip of the rod are tracked along with the two top corners of the cones

sides of the cone, which are, in turn, used to construct virtual lines for the central axes of the rod and the cone. These virtual lines are fed as input into a 4-DOF controller (see Section 2.1) that determines the motions required for visually aligning the axes. We have been able, in our simulated environment, to consistently align the rod with the cone. Once vision has been initialized by a human operator, the servoing algorithm is capable of aligning the stick to within 2 degrees and 3mm of the cone axis, which is more than accurate enough for the needs of the application. Servoing remains reliable despite slow, continuous motion on the part of the cone and motion of the stick towards the cone as commanded by the operator. 5. D i s c u s s i o n We have described a method for feature-based visual servoing and its application to remote teleoperation. The visual servoing methods described above have been tested in stand-alone systems. We are now in the process of integrating them into a larger, more flexible servoing system. We have taken some initial steps toward including visual servoing in the ROTEX environment and other space operations. While visual servoing provides robust, calibration-insensitive primitive operations, even at this early stage, we find that its performance can be greatly enhanced in a variety of ways. First, any remotely operated system will benefit from the availability of artificial visual features, even if the features are not part of a calibrated geometric model. Ensuring that eight such features are always visible is extremely useful for correspondence calculation when the camera system is not well calibrated. Second, the ability to recognize and calibrate the robot end-effector and other objects in the environment to a simulation significantly eases the burden on the operator. Careful design of the visual servoing operations can relieve the operator from the potentially onerous task of specifying visual cues. Third, the availability of a graphical simulation for the visual servoing operation is an extremely useful tool. In addition to providing a means for choosing object fea-

429 tures, it supports operator interaction using hybrid control. The latter often simplifies the specification of a visual servoing task. One issue we have not addressed in any detail is the issue of error handling. The visual servoing methods described above will fail if the system moves through (visually) singular configurations. In many cases, these errors can be detected and corrected for automatically. However, some errors should be flagged and returned to the operator. For example, in ROTEX is it possible for the operator to choose a set of features on the manipulator that lead to a globally singular system. This should be detected and the feature specification rejected with an appropriate explanation of the problem. Another issue is the choice of a set of actions to provide to the operator, actions which should be the most "intuitive" and useful. The geometry used in visual servoing is often non-intuitive, and there are many limitations on what can and cannot be done. Thus, one immediate priority is to implement a preliminary version of the visual servoing interface and to perform simulated visual servoing experiments with the R O T E X system. We are also continuing to extend our understanding of closed-loop visual servoing and its application to robot control.

Acknowledgments The work in this paper could not have been done without the help of Sidd Purl on implementing an initial version of the tracking system, and W-C. Chang and A.S. Morse for help in designing the visual feedback controller. This research was supported by DARPA grants N00014-91-J-1577 and N00014-93-1-1235, by National Science Foundation grants IRI-9109116 and DDM-9112458, by NATO Collaborative Research Grant CRG-910994 and by funds provided by DLR.

REFERENCES 1. P. Anandan. A computational framework and an algorithm for the measurement of structure from motion. Int. J. Computer Vision, 2:283-310, 1989. 2. B. Brunner, G. Hirzinger, K. Landzettel, and J. Heindl. Multisensory shared autonomy and tele-sensor-programming: Key issues in the space robot technology experiment rotex. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Yokohama, Japan, 1993. 3. G . D . Hager. Six DOF visual control of relative position. DCS RR-1018, Yale University, New Haven, CT, May 1994. 4. G.D. Hager. Real-time feature tracking and projective invariance as a basis for hand-eye coordination. DCS RR-993, Yale University, New Haven, CT, November 1993. To be presented at the 1994 International Conference on Computer Vision and Pattern Recognition. 5. G. D. Hager, W.-C. Chang, and A. S. Morse. Robot feedback control based on stereo vision: Towards calibration-free hand-eye coordination. In Proceedings of the 1994 IEEE International Conference on Robotics and Automation, pp. 2850-2856. 6. G. Hirzinger. Rotex - the first robot in space. In International Conference on Advanced Robotics. 1993. 7. G. Hirziager, G. Grunwald, B. Brunner, and H. Heindl. A sensor-based telerobotic system for the space robot experiment rotex. 2. International Symposium on Experimental Robotics, 1991. Toulouse, France. 8. G. Hirzinger, H. Heindl, K. Landzettel, and B. Brunner. Multisensory shared autonomy: A

430

9. 10. 11. 12. 13. 14. 15.

key issue in the space robot technology experiment rotex. IEEE Conference on Intelligent Robots and Systems (IROS), Raleigh, USA, 1992. T. S. ttuang and C. It. Lee. Motion and structure from orthographic projection. IEEE Trans. Pattern Anal. Mach. Intelligence, 11(5):536-540, May 1989. D. Lavery. Perspectives on future space telerobotics technology. In International Conference on Advanced Robotics. 1993. H. Longuet-ttiggins. A computer algorithm for reconstructing a scene from two projections. Nature, 293:133-135, 1981. J. Mundy and A. Zisserman. Geometric Invariance in Computer Vision. MIT Press, Cambridge, Mass., 1992. T. B. Sheridan. Merging mind and machine. Technology Review, pages 33 -40, 1989. C. Taylor and D. Kriegman. Structure and motion from line segments in multiple images. In IEEE Int. Conf. on Robotics and Automation, pages 1615-1620. 1992. C. Tomasi and T. Kanade. Shape and motion from image streams: a factorization method, full report on the orthographic case. CMU-CS 92-104, CMU, 1992.

Intelligent Robots and Systems V. Graefe (Editor) 1995 Elsevier Science B.V.

431

Practical Coordination Control Between Satellite Attitude and Manipulator Reaction Dynamics Based on Computed Momentum Concept Kazuya YOSHIDA ~ Dept. of Aeronautics and Space Engineering Tohoku University Aoba, Sendai 980-77, JAPAN This paper presents a practical control method of robot satellite attitude to cope with manipulator reaction on free-floating space robots, developing the Computed-Momentum based Reaction Compensation (CMRC) concept. The author proposes versions of the CMRC control schemes based on angular momentum conservation in floating multi-link systems, and these schemes require far less computation than the computed-torque based methods. The proposed schemes are demonstrated and examined by computer simulations using a realistic 3D model which involves free-floating dynamics and structural vibration of solar paddles. 1. I n t r o d u c t i o n For successful promotion of space projects, robotics and automation should be a key technology. Autonomous and dexterous space robots could reduce the workload of astronauts and increase operational efficiency. A drawback, however, in controlling space robotic systems is the lack of an inertially fixed footing base in the orbitital environment, unlike ground-based manipulators. Any motion of the manipulator arm will induce reaction forces and moments in its base, then disturb the base position and attitude, and also excite solar paddle vibration. Position change due to the manipulator reaction may not be a major issue, however attitude change is more serious because, for satellites generally, solar and global pointings are essential to keep their function. If a satellite loses its proper orientation, it will lose electric power from solar paddles and tele-commands from the ground stations. Usually a feedback attitude control is working to maintain a proper satellite attitude from disturbances, however in case of a "robot" satellite, the magnitude of disturbance torque due to the manipulator reaction is tremendously larger than the conventional solar pressure or thin-air-drag torques. To cope with the problem, a feedback control which starts working only after the sensors detect the attitude error is not effective, but such class of a new control method as a coordination of the manipulator and attitude dynamics or a feedforward control anticipating the manipulator reaction, is necessary. In this paper, we discuss a velocity level feedforward method, based on momentum conservation dynamics, and develop a Computed-Momentum based Reaction Compensation

432 (CMRC) concept, then demonstrate its practical effectiveness with computer simulations. This paper is organized in the following way: Section 2 outlines a basic modeling of space robot dynamics and Section 3 proposes the CMRC control concept to meet practical mission requirements, then computer simulations are presented in Section 4. 2. M o d e l i n g a n d C o n t r o l of F l o a t i n g R o b o t S y s t e m s In this section, the basic dynamics of an articulated space multibody system is presented. The equations shall be used for derivation of the proposing control law in Section 3, and for solution of the direct dynamics at the simulation in Section 4. To discuss floating robot dynamics, we consider a general model that a robot satellite has plural arms including solar paddles, reaction wheels or other appendages. This is regarded as a chain of links in a tree configuration consist of n + 1 rigid bodies, connected with n articulated joints. Assume that 1 pieces of arms are mounted on a satellite main L body, and the arm k has nk pieces of links, then n - ~k=l nk. Flexible arms like solar paddles can be treated as segmented virtual rigid links connected with elastic hinges. The flexibility yields joint torques on the virtual hinges according to their virtual deformation. However, to avoid complexity, this paper do not explicitly discuss the nature and modeling of flexibility, although it is accounted in the simulation. We assume that the system freely floats in the inertial space, i.e. no external forces or moments apply on the system, or no orbital motion is considered. Let linear and angular momenta with respect to the centroid of the satellite base body (link 0) be P0, L0, respectively, we obtain

Lo

w~og H~ constant.

=

w0

H~r

~b

(1)

where (~ is a joint velocity vector. We then obtain the equation of motion of the floating robot in the following form [1]" 0

1

(2)

where -1

H: I l

eo,

[ H,,,r JT,,,,]

(3)

nk

- E E(x

+ m,k - k T

-k

+ I0

(4)

k=l i=1 l

nk k-k

k

H~r = E E ( I ~ J ~ i + mi rsiJ~i)

(5)

k=l i=1 l

nk

-

(J~, k=l i=1

I~ J~i + m, J~i J~i)

(6)

433

Figure 1

l

A Japanese robot satellite, ETS-VII [2]

nk

J ~ - Z Z m~J~/~

(7)

k=l i=1

jr,

-

[kf • ( ~ - plY), k~ • ( ~ ) - , ~ ) , . . . , . . . , k~ • ( ~ - , 5 ) , 0 , . . . , 0]

(s) (9)

j~, - [k~k~, , . . . , k ,~, o , . . . , o ] ~09 -

~

-

~0

(10)

r0ki - r k - r0

(11)

k m i 9 mass of link i of arm k w 9 total mass of the system (w -- ~-~"k=l l ~-'~i=1 nk m i ) r/k : position vector of centroid of link i of arm k p/k . position vector of joint i of arm k k/k : unit vector indicating joint axis direction of link i of arm k r0 : position vector of centroid of satellite base b o d y rg : position vector of a total centroid of the system B : joint flexibility dependent t e r m r : internal torque vector applied at joints and a tilde operator stands for a cross product such t h a t ?a - r x a. All position and velocity vectors are defined with respect to the inertial reference frame.

3. C o m p u t e d - M o m e n t u m

based Reaction

Compensation

(CMRC)

control

This section reviews requirements for practical robot-satellite and, to meet the requirements, proposes a control scheme coordinating m a n i p u l a t o r operation and base satellite a t t i t u d e based on a C o m p u t e d - M o m e n t u m based Reaction C o m p e n s a t i o n (CMRC) concept.

434

3.1. Basic requirements for space robots Focusing on robot-satellite development of near future, like ETS-VII by NASDA, Japan (see Figure 1) in late 1990's, or possible next-generation models in early 21st century, we must face to the following space borne problems: 9 there is a communication time delay between ground-stations and a space robot, approximately 1 to 10 seconds in case of low orbit operation, 9 the capacity of the communication line is small, 9 the on-board computer resource and the reaction compensator's capacity are limited, 9 the control system is subject to the severe coupling of the manipulator reaction and satellite attitude dynamics, however, the control system should stably maintain satellite attitude orientation using limited resources. Among these problems, this paper concerns the latter two arguments. Looking closely into the requirements, we find that the computational power of an actual flight version of on-board processors whould be equivalent to an i80286 at 10MHz or less, or an earlier version of note-book commercial personal computer. This is because that the design policy of flight models is to make the most use of the well-established hardware resorces to guarantee the proper operation in the space hostile environment, but not to take risks by challenging processor speed or using novel, but unmatured technology. For the same reason, the capacity of the semi-conductor memories is also extremely small. On-board control algorithms, therefore, must be as simple as possible avoiding heavy computation. On the contrary, the dynamic coupling of the manipulation and the satellite attitude imposes complicated and difficult computation upon the control systems. Such dynamic coupling is expressed by the equation of motion (2) or the momentum equation (1): we must compute the inverse dynamic solution from them. The other problem is the limitation of torque/ momentum compensation capacity of reaction wheels. A feedback attitude control is working on normal satellites to correct the attitude error, however the magnitude of disturbance torque due to the manipulator reaction is tremendously larger than the conventional reaction wheels' capability. For instance, the capacity of a flight model available in ETS-VII will be in the order of 0.1 Nm for maximum compensation torque, and 10 Nms for maximum momentum: those values are good enough for the conventional compensation, but too short to cope with the manipulator reaction. To solve these problems, one possible way is to operate the manipulator arms so slowly that the reaction meets the capacity; but it would degrade operational time-cost efficiency and require intolerable patience to operators. The other, and better, solution is to employ a class of feedforward control which anticipates the manipulator reaction.

3.2. C o m p u t e d - m o m e n t u m concept In order to obtain a feedforward signal, a computed-torque method which is based on the inverse dynamics solution is commonly used in the field of manipulator control. However in the space robot systems, the robot comprises much more complicated floating-link

435 Sample & Hold

Manipulator Joint PD Control Tcmtuc

CMRC}.~~I j

~'m

Torql~

_

+ ~

xw JReaction

Satellite Attitude PD Control

F i g u r e 2 A block-diagram of the proposed feedforward and feedback scheme with Computed-Momentum based Reaction Compensation (CMRC) concept.

dynamics like equation (2) and the on-board computational power is limited as discussed in the previous subsection, computed-torque method hence may not be acceptable. There is an attempt to develop an efficient computation method [3][4], but it still requires further power beyond the available on-borad processors. To cope with it, we propose the use of the momentum conservation equation (1) in stead of equation (2), because the momentum equation has a linear form at velocity-level, much simpler than the equation of motion at acceleration-level. Even though a velocity-level description, the momentum equation fully expresses system dynamics. In contrast with the Computed-Torque method, we name our methodology Computed-Momentum concept. Equation (1) can be solved for the system angular momentum L0. l

Lo - Iswo + E Imk~ k

(12)

k=l

where l Is -

nk

E E ( I Y - m ik-k~-k s i 5i)

(13)

k=l i=0 n

link -- ~ ( I ~ J ~ -

k k m~k -rg~J~)

(14)

i=1

Eventually, since the rightest term of equation (12) comprises manipulator arms and 9 reaction wheels, we use the symbols Ira, (Pro for the arm and Iw, (P~ for the wheels. Then we have

Lo - Isw + Ira(Pro + Iw(Pw,

(15)

436 where w0 = r for simplicity. Equation (15) is a linear and simple expression at velocity level, not using acceleration, but it involves the reaction dynamics in terms of momentum conservation. It thereby plays an important role in the following Computed-Momentum schemes. By solving (15) for the satellite angular velocity w, we obtain

w = I~-l(Lo- Imam - I~qb~)

(16)

where L0 is a known and constant inintial momentum. Equation (16) is a solution for the attitude disturbance velocity when the manipulator and/or the wheels make an arbitrary motion. Or (15) can be solved for the wheels' angular velocity ( ~ , then ~b~ = I~X(L0- I~w - Imam)

(17)

We can use the latter for computing the required velocity of reaction wheels for a given attitude maneuver w and manipulator motion 4)m. 3.3. P r a c t i c a l r e a c t i o n c o m p e n s a t i o n c o n t r o l s c h e m e Reliability management policy for the space development requires us to make most use of conventional, mature, then reliable technologies and subsystems. Compatibility with the existing systems and modularity are also recommended. From this point of view, for sattellite attitude control and manipulator joint control, the conventional PD or PID feedback modules should be most supposed to use. To meet it, we propose a control system for the coordination of satellite attitude and manipulator operation, based on individual satellite and manipulator PD (can be PID) feedback control modules. F i g u r e 2 shows a block diagram of the proposed control system. It is based on a parallel instration of PD feedback modules for each joints and wheels, but these modules are connected with a feedforward loop which signal is calculated on the computed-momentum scheme. A box with "CMRC" is the part of the feedfoward computation, in which the anticipation value for reaction wheels 4)z is determined when the manipulator motion Cma and

~md are given. There are two possibility for the CMRC calculation from eqs.(16) or (17). We then propose two versions of Computed-Momentum based Reaction Compensation control methods, say CMRC-A and CMRC-B. Combining the feedback law and equation (16), the torque for the reaction wheels (reaction compensators) should be

"rw -- K l ( a d - a ) + K2(wd- w ) + K3~)z

(18)

~++ - I~-~(Lo - I m q b m a - I : $ = )

(19)

for CMRC-A. Where ft is the satellite attitude angle and w, ( ~ are satellite attitude and reaction wheels angular velocities, which all are measured by on-board sensors, and K1,2,3 are control gains. The suffix a indicates a "d"esirerd control command given by an operator or an arm motion planner.

437 For a zero-momentum, inertial-fixed sysytem, becomes much simpler. Tw : - K l a

Lo =

~ d -~ • d

-" O,

then the control law

(20)

-- K2og - K 3 l s l (lm~)md -}- Iw~w)

Note that the proposed system (Figure 2) accepts any motion commands for the manipulator, not requiring a special motion planning strategy. The manipulator command ~bmd and ~md therefore can be determined arbitrary by a certain motion planner or a human operator. For CMRC-B, the feedforward signal is obtained as the error correction of wheels' velocity between an expected one (17) and a measured value ~bw, then ~II becomes (21)

~bff -- I w l ( L 0 - Im~md -- IsC~d) -- ~w For the zero-momentum system, = -Ux

-

-

(22)

+

which is very similar to CMRC-A (20), but the inertia matrices work differently. In the block diagram, there are sample/hold switches on the manipulator command and the feedforward lines to synchronize the timimg of these signals. The control flow with the switches is described as follows: 1. At an initial state, turn off the switches and set the manipulator motion and feedforward commands, zero. 2. Give the manipulator command ~bmd and ~md from a trajectory planner or a joystick, then compute ~bff using eq.(19) or (21). 3. Evaluate the feedforward torque component torque capacity, i.e. IK3

zl >

K3~ff:

if it exceeds the reaction wheels'

(23)

then return to the arm motion planner and modify the operation command so as to meet the reaction compensation capacity. (If the mission will not allow to change the trajectory physically, the only re-planning solution is to move the arm slower.) 4. After the modifications, if the condition

]K3dpff[ < T~m.~

(24)

is satisfied, turn on the switches and put the manipulator motion command ~)md, ~md, and the feedforward command ~bz into each PD control loops, simultaneously.

438 3.4. E s t i m a t i o n of c o m p u t a t i o n a l b u r d e n In order to emphasize the advantage of the proposed computed-momentummethod, we roughly estimate the computational burden and compare it with the previously reported computed-torque method. At each computational cycle, we need the coordinate transformation of each position vectors and inertia tensors, and preparation of Jacobian matrix. It requires the following times of multiplication/division (M) and addition/substruction (A) operations. For transformation et al. (21n + 9)M (21n + 6)A For Jacobians (43n + 16)M (25n + 4)A Then, calculate the inertia tensors Is and Ira. For Is of eq.(13) (12n + 21)M (21n + 12)A For Im of eq.(14) (7.5n 2 + 7.5n)M (6n 2 + 6n)A The final step is the computation of the feedforward signal. For CMRC-A at zero-momentum

+ 39)/ (2n + 21)A For CMRC-B at zero-momentum (3n + 9)M (2n + 9)A In total, we need the following number. For CMRC-A (7.5n 2 + 86.5n + 85)M (6n 2 + 75n + 43)A For CMRC-B (7.5n 2 + 74.5n + 34)M (6n 2 + 54n + 19)A In case of n - 6, For CMRC-A For CMRC-B

874M + 709A 751M + 559A

Equation (20) resembles (22), however we should note that the latter does not need the computation of I; -1 which requires (12n + 42)M and (21n + 28)A including a 3 • 3 inversion. Instead, I~ 1 is constant once the wheels are designed, then no computation needed: this is the advantage of CMRC-B. Mukherjee and Nakamura [3] reported the computational cost of their efficient scheme of computed-torque method. Their estimation is for manipulator control, not for the attitude control, but those cost must be similar and the comparison would make sense. In case of n - 6, they say 2,259M + 1,992A. Yokokoji et al. [4] recently made a minor

439 improvement although, the burden is still in similar number. However, in our estimation of the CMRC methods, even though no special technique or optimization algorithm is employed, the computational cost of the proposed c o m p u t e d - m o m e n t u m scheme is about one-half to one-third or less of the computed-torque scheme.

F i g u r e 3 Simulation model TWIST

BEND

SWING

I

= -(Kr + D~b) K = Ipw~ D = 2Ipw,( wd = ~/I - ( 2 w .

F i g u r e 4 Model for flexible solar paddles

4. S i m u l a t i o n S t u d y In order to demonstrate the practical performance of the proposed control method, we have carried out relatively detailed numerical simulation with a realistic 3D model. The model comprises a single 6-DOF, 75 kg, 2.5 m manipulator arm; 3-axes, 0.2 Nm20 Nms class reaction wheels; and 1.2 kilo-watt class solar paddles. The total mass of the robot satellite is 2000 kg, and the wing span of solar paddles is 28.5 m. A topology

440 of the model is depicted in F i g u r e 3 and a detailed specification is listed in T a b l e 1. Zero-momentum system is assumed here. For the simulation, direct dynamics solution is computed using equation (2) for a set of torques of manipulator joints ~-m and reaction wheels 7-w commanded by a control law. Numerical integration is carried out in every 0.01 simulational second. The solar paddles are treated as a series of rigid links connected with rotational compliant hinges, as mentioned at the beginning of Section 2. For simplicity, but not loosing a practical sense, we consider the first modes of flexible deformation about three principle axes: bend, swing, and twist. Each solar paddle therefore has three virtual, rotational joints at its root, axes of which joints are perpendicular to each others, like F i g u r e 4. The torques passively generated by the flexibility are modeled based on the following simple expression -

-

+

T)

"k

(25)

where the joint number i = 1, 2, 3 corresponding to the twist, swing, and bend axis, respectively, and the arm number k = 1, 2 indicating right and left paddles. K:i is virtual joint stiffness representing natural flexibility of the paddles, :Di is corresponding damping factor. These values are selected to express a realistic natural frequency of vibration, 0.1 Hz, 0.5 Hz, 1.0 Hz for each axis as shown in T a b l e 2. In the following simulation, we suppose a manipulator operation that the hand straightly moves over the satellite surface for 1.5 m distance in 100 s, carting with a 50 kg orbital replacement unit (ORU), see F i g u r e 3. For more reality, sensory delay and computational lag-time are considered. We suppose 1.0 s time-constant in the attitude sensor for the measurement of ~, and 0.5 s time-interval of the feedforward computation, i.e. the switches in the block diagram turn on and the corrsponding sample-hold values are refreshed in every 0.5 s. F i g u r e s 5 (a)-(d) show a series of the simulation results. Due to the page limitation, the attitude responses only around the pitch, a longitudinal axis of the solar paddles in which the largest reaction is observed, are presented here. (a) shows the case with no attitude control. After the 100 s manipulator operation, the satellite attitude receives about 6 deg change due to the accumulation of the arm reaction acceleration, and it will not return to the initial orientation. The right hand figure is a close-up of the attitude vibration. After the manipulation, 0.1 Hz vibration due to the solar paddle bending flexibility is excited, but its amplitude is less than 0.01 deg then this vibration will not cause a severe effect to the system. (b) shows the case with PD feedback control, using K1 = 100,/(2 = 1000 for attitude control gains. The attitude is going to recover zero finally, although, relatively large overshoot is observed. Maximum amplitude of the orientation change is about 3 deg. This amount of pointing failure may not be severe for a low gain antenna, but very critical for a high gain antenna. To get this response, the reaction wheel generates more than +0.5 Nm torque. If we have a limitation of the torque capacity, the attitude response becomes much worse. (c) shows the case with CMRC, the proposed feedforward control. The attitude response has been drastically improved, resulting almost zero attitude disturbance. Both of proposing CMRC-A and CMRC-B show similar results with a gain selection of K1 = 10,

441 Table 1 Base II mass (Kg) mo link 0 II 1800.0

II

[ a r m II (Kg)

I link link link link link link

1 2 3 4 5 6

II

Simulation model parameters

length (m) length l w i d t h l h e i g h t

201

length (m)

15.0 15.0 15.0 8.0 8.0 14.0

0.30 0.70 0.70 0.18 0.18 0.43

R.W.

,, (Kg)

0.15 0.35 0.35 0.09 0.09 0.20

II mi wheel 1 wheel 2 wheel 3

201

5.0 5.0 5.0

mmt. of inertia ( Kgm 2 )

Iox I Iov [ 25 5ooo15ooo1

Ioz 5ooo

mmt. of inertia ( Kgm 2 )

0.15 0.35 0.35 0.09 0.09 0.20

(m) radius 0.1 0.1 0.1

0.1219 1.8375 1.8375 0.0100 0.0266 0.2245

0.1219 1.8375 1.8375 0.0266 0.0266 0.0175

0.0188 0.0188 0.0188 0.0266 0.0100 0.2245

(Kgm 2 )

Iix I 0.025 0.025 0.025

Iiy I Iiz 0.025 0.025 0.025

0.50 0.50 0.50

T a b l e 2 Specifications for solar paddle vibration Solar II (Kg) Paddle .~ twist 5.0 bend 0.0 swing 50.0

II

length (m)

~

I ~1 b~

3.0 1.5 0.0 0.0 10.0 5.0

1.5 0.0 5.0

mmt. of inertia ( K g m 2 )

Iiz 0.0063 0.000 37.500

Iiy I 3.753 0.0C9 416.70

II ( K g m 2IP ) I (Hzf~ I (Kgm2/s2)~) twist 37.5 0.5 370.11 bend 1666.7 0.1 657.98 swing 1704.2 1.0 6.73 x 104 note: k = (2~fa)2I.

Iiz 3.753 0.000 454.20 ~ I

0.042 0.0048 0.023

442

%

8o[/

101.22-b

101.21 -

~

101.2o-

40

0

= lO1.19 -40 t

~

-80 0

I

!

I

100

200

300

101.18 101.17

400

0

100

TIME [sec]

200

300

400

TIME [sec] (a) No a t t i t u d e control

%

8O

0.4

40

0.2

E

0 /

"~-

~

-40 -

t-,

-80 0

o.o

r

-0.2 -0.4

!

I

I

100

200

300

400

0

100

TIME [sec]

300

400

TIME [sec]

(b) P D feedback control ( K 1 -

%

200

100,/{2 = 1000)

0.4

80-

,.., E

40-

z

0-

0.2

o.o

r

t--,

-40

-0.4

-80 0

-0.2

I

I

I

100

200

300

400

0

100

TIME [sec]

!

I

200

300

400

TIME [sec]

(c) Proposing C M R C control (K1 = 10, K2 = 100, K3 -- 5000 for CMRC-A, or K1 -- 10, K2 = 100, K3 = 0.1 for CMRC-B) 80-

0.4

0

40

,..-, E

,"r" (,J

0.2

o.o

0 -40

-

-80

-

0

-0.2 -0.4 I

I

I

100

200

300

400

0

!

I

I

100

200

300

400

TIME [sec] TIME [sec] (d) C M R C control with 0.2 N reaction wheel torque s a t u r a t i o n F i g u r e 5 Simulation result (attitude response in the pitch axis supposing 100 s O R U - t r a n s p o r t manipulation)

443 /(2 = 100,/(3 = 5000 for CMRC-A and K 1 - - 10, /(2 = 100,/(3 = 0.1 for CMRC-B. (d) shows the case with CMRC supposing 0.2 Nm reaction wheel's torque saturation. Even for such a limited hardware resource, the attitude disturbace is maintained -+-2 deg and it quickly returns to zero after the manipulator operation. 5. C o n c l u s i o n In this paper, we have discussed practical requirements for robot satellites and proposed a new control scheme: Computed-Momentum based Reaction Compensation (CMRC) method, in order to relieve the computational burden but to offer relatively good performance for satellite attitude maintenance anticipating the manipulator reaction. The availability of the proposed method, especially its practical implementation aspect, has been confirmed by a series of numerical simulation assuming a realistic flight-scale model. In addition, the computational cost of the proposed schemes is estimated, then a far advantage to the computed-torque control is shown. The cost estimation and simulation proves that the proposed method works remarkably well to anticipate and compensate the manipulator reaction, although employing a relatively simple control algorithm and conventional-scale reaction wheels. REFERENCES

1. Space Robotics: Dynamics and Control, edited by Xu and Kanade, Kluwer Academic Publishers, 1993. 2. M.Oda et al: "The ETS-VII, the World First Telerobotic Satellite," Proc. Artificial Intelligence, Robotics and Automation, in Space '92, pp.307-318, 1992. 3. Ft.Mukherjee and Y.Nakamura: "Formulation and Efficient Computation of Inverse Dynamics of Space Robots," IEEE Trans. on Robotics and Automation, vol.8 No.3, pp.400-406, 1992. 4. Y.Yokokoji, T.Toyoshima, and T.Yoshikawa: "Efficient Computational Algorithms for Trajectory Control of Free-Flying Space Robots with Multiple Arms," IEEE Trans. on Robotics and Automation, vol.9 No.5, pp.571-580, 1993.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

445

A 5-Axis Mini Direct Drive Robot for Teleoperation and Biotechnology Manuel Moreyra, Pierre-Henry Marbot, Steven Venema, Blake Hannaford Department of Electrical Engineering, University of Washington, Seattle, WA 98195-2500 b 1ake @ isdl.ee, w ashin gton. edu

ABSTRACT A previously developed 3-axis mini direct drive robot has been enhanced with two additional direct drive axes for general positioning and orientation of an axially symmetric tool. The arm has a work volume of about 120 cm 3 and 5-10 ~tm or better resolution and repeatability. The arm forms an initial prototype for the NASA/University of Washington MicroTrex flight telerobotics experiment. The contemplated terrestrial applications include handling submicroliter liquid samples for electrophoresis, and micro-manipulation with scaled force reflection.

INTRODUCTION In recent years teleoperation has greatly expanded its scope of applications from its beginnings in the nuclear industry and its early expansion into undersea operations. Space and terrestrial biomedical applications are now attracting increasing attention. In both of these areas, in addition to the desirability of removing humans from sources of physical risk (the vacuum of space or highly infectious agents for example), there is increasing interest in miniaturization. In space, economic and other pressures are shifting the emphasis from relatively heavy, high cost and complex systems, towards low mass, low cost systems with fewer functions and higher launch frequency. In surgery, similar pressures are encouraging replacement of traditional operations with endoscopic ones which are much less invasive. In both space and medical applications, communication links and time delays will play a key role in the overall system design. This paper will report on a new manipulator designed to explore the possibilities and challenges of small scale manipulation in these applications (Figure 1). The arm is a five axis direct drive design having a work volume of about 120 cm 3 . The arm is designed as a prototype for use in the NASA Micro Telerobotics Flight Experiment, "MicroTrex".

446 Time Delay. Some time delay is unavoidable in the communication links between master and slave manipulators in telemanipulation. If it is of sufficient magnitude, this delay effects both the control system dynamics and the human operator performance. The effects of time delays on telemanipulation performance have, for the most part, been well studied. For a recent review, see Hannaford (1994). One exception is the case of time-varying delays. Variable delays characterize all of the existing communication channels between earth and low earth orbit. These channels were designed solely for reliable accumulation of telemetry and imaging data and were not designed to latency requirements.

Flight Experiments. The ROTEX experiment was conducted by Dr. Gerd Hirzinger in May of 1993 (Hirzinger, et al., 1993). In this experiment, a human scaled, gear driven arm, flown in the Space-Lab D2 was successfully controlled from the Space-Lab flight deck, as well as from a ground control station. The arm described in the present paper is a preliminary prototype being developed for the NASA Micro Telerobotics Experiment, MicroTrex. This joint experiment between the University of Washington and the Jet Propulsion Laboratory will launch a small robot manipulator into low earth orbit and control it from the ground to perform an end-to-end test of groundbased remote control of space robots. The robot will perform tasks designed to simulate key activities of future robotic missions. The long term aim of this experiment is to provide design information for future micro and full size telerobots which will be required for human exploitation of space. In the shorter term, results are expected to benefit users and maintainers of the Space Station as well as terrestrial domestic biotechnology researchers and industries. A realtime video link to local schools is planned to make MicroTrex an educational and motivational experience for future engineers. Currently, MicroTrex has been approved for Phase-A funding under the NASA In Space Technology Experiments Program (INSTEP). Direct Drive Robots. DD robots have several advantages for manipulation including high precision and high speed (Asada & Kanade, 1983, Asada & Youcef-Toumi, 1987). Most of the previous research in direct drive robots has concerned manipulators aimed at industrial applications and having work volumes on the order of human arms. Recently we have been studying the properties of direct drive robots with much smaller dimensions. In earlier work (Marbot & Hannaford, 1991) we described a 3 axis direct drive manipulator with a work volume of about 30 cc, aimed at applications in manipulation of cells and very young embryos in the biomedical laboratory. This arm was also adapted for pick and place telemanipulation of small objects on the order of individual sand grains (Bhatti, et.al., 1992). For the current arm, we have redesigned and re-manufactured the first three links and have added 2 additional degrees of freedom. We were able to obtain high performance actuators for the robot from computer hard disk drives. Voice coil and "flat coil" actuators, used for positioning the heads of disk drives, must meet many of the same design criteria as for direct drive robots, and are mass produced at low cost. Recently we have made detailed measurements of their performance (Buttolo et. al., 1994). Laboratory Applications. Several potential biomedical applications drive this design. One

447 is loading sub-microliter fluid samples onto miniaturized electrophoresis gels for isoelectric focusing analysis. This job will require precision placement of a micro-pipette into liquid sample vials as small as 2 mm. in diameter. We intend to experiment with collecting samples from high density (864 well) micro-titer plates (molded sample vial arrays). These new plates occupy the same space as a standard 96 well plate (82 by 125 mm) with 9 times the well density. Accessing the wells will require 2 additional degrees of freedom beyond the original three since the pipette must be oriented along the major axis of the well. Arbitrary angles may be required to avoid collision with microscope objectives or other equipment. A sixth axis ("tool roll") is not needed in this application due to the axial symmetry of the pipette. The following sections will describe the Mini DD robot at its current state of development. First, the robot structure and mechanical design will be described. The design is analyzed in terms of the kinematic model, the dynamic model, and the Jacobian matrix. Then the control system will be described, and the current project status will be summarized.

MINI-ROBOT MECHANISM The first axis actuator is a linear voice coil from a 5.25 inch hard disk drive. The second joint is driven by a rotary magnetic actuator. The third axis actuator is a rotary voice coil. The body of the mini-robot contains 20 individual parts machined of aluminum and anodized.

Figure 1 Photograph of the Mini Direct Drive Robot

448 The original 3 axis design was modified in order to add two more degrees of freedom. Low inertia of the parts and high resolution and repeatability of the end effector displacements were design goals. The resulting mechanism allows generalized motion of the end-effector in five degrees of freedom: three displacements, and two rotations (pitch and yaw). Biomedical applications, such as electrophoresis procedures and laboratory sample manipulation, were used as a guideline in the current development. An additional goal was to produce a device suitable to act as a slave manipulator for experiments in scaled force reflection (Colgate, 1991, Hannaford, 1991, Kobayashi & Nakamura, 1992). The project attempts to extend the technology of mini-robotics with several particular emphases: 9 Employ direct drive actuation to maintain accuracy and repeatability, enhance force control and avoid backlash. 9 Provide good dynamic performance by minimizing inertia. 9 Use miniature disk-drive components, to get high precision, low inertia and low cost. 9 Make the two orientation axes intersect at the wrist. The electrophoresis sample handling procedure to be performed by the mini-robot consists of: 9 Picking up 500 l.tl and smaller protein samples from a well as small as two millimeters in diameter with a pipette attached to the wrist, 9 Placing the sample on the electrophoresis gel, 9 Washing and rinsing the pipette.

WORKSPACE DESIGN The mini-robot workspace is reduced to an unusual extent by the limited range of the direct drive rotary actuators. Optimization of the performance of the mini-robot involved obtaining as large a work space as possible without compromising speed, force and precision of end effector movements. The work space chosen was approximately 80 mm height, 68 mm width and 25 mm length. This will allow the robot to reach about 18 of the wells on a standard 96 well microtiter plate. This will be sufficient for the intended experimental applications. The workvolume is approximately 120 cm 3. The overall design of the first three axes is the same as reported earlier (Marbot & Hannaford, 1991). However, the parts were re-designed and re-machined to improve precision and to support the addition of two more motion axes. The same linear voice coil actuator was used for the first, linear motion axis. Two high accuracy, low friction linear ball bearings, also taken from disk drives, guide and support the first motion axis. The second joint is driven by a rotary magnetic actuator. Its angular stroke is _ 15 o and the arm is 152mm long, giving a Y-axis travel of +40 mm. The third axis actuator has angular stroke of + 13 o giving a Z-axis travel of +34 mm.

449

The fourth and fifth joints have +20 ~ of stroke. We do not consider them to affect work volume since they drive intersecting axes at the wrist. However, there is a m i n i m u m offset from wrist center to tool mount of 22 mm. Due to the low torque of actuator 3 and the relatively high weight and size of the actuator and encoder used on the fifth joint, direct drive in the strict sense of the term is not used. Instead, a belt and two pulleys are used to drive this joint with a 1:1 mechanical advantage.

ACTUATORS Table 1 shows the basic mechanical and electrical characteristics of the mini-robot actuators. The first axis actuator is a linear voice coil from a 5.25 inch hard disk drive. The second joint is driven by a rotary magnetic actuator (BEI Motion Systems Company). It has four coils in the rotor and four ceramic magnets. The peak torque is 0.08 N m when the coils are aligned between the magnets. The torque vs. angle curve is equivalent to cos 2 (202). Its total weight is 100 g. This actuator was not taken from a hard drive. In retrospect, this unit gave inferior performance at greater cost. A disk drive actuator would have had an almost constant torqueangle relation and lower mass. The third axis actuator is a rotary actuator is also made by BEI (model No. RA23-06-001), but for a disk drive application. The moving member which includes the coil weighs 6.3 g and the total weight is 46 g. The fourth and fifth joints use head actuators from 1.8" disk drives. A minor alteration was made to the actuators so that they could drive a 6.35 m m diameter shaft. Their weight is 20 g each.

Table 1 Actuator characteristics for each joint

Actuator 1

Stroke +12.5mm

Force or Torque Constant

Maximum Force or Torque

Coil Resistance Ohms

Current @ Max. Load

Type of Actuator

2.50 N / A

5.0 N

3.4

2.0 A

linear voice coil

2

+15 ~

39.0 x10-3 Nm/A

78 x10-3 Nm

5.2

2.0 A

rotary solenoid

3

+13 ~

40.0 x10-3 Nm/A

42.4 x10-3 Nm

6.8

1.05 A

rotary voice coil

4

+20 ~

13.6 x10-3 Nm/A

8.9 x10-3 Nm

5.9

0.65 A

fiat coil

5

+20 ~

13.6 x10-3 Nm/A

8.9 x10-3 Nm

5.9

0.65 A

fiat coil

450 MECHANICAL DESIGN The output of the first four actuators are directly coupled to their links, eliminating the mechanical gearing or other transmission elements. Other advantages accomplished by this direct drive configuration are no backlash, low friction, high stiffness and low weight. In the first four joints, each actuator case and actuator stator is mounted on the preceding link. Joints three, four, and five are located in the shoulder (Figure 2). The fifth joint is driven by an elastic belt and two pulleys and its actuator is mounted on the third link, non-adjacent link. This configuration reduces the mass on the wrist which would have generated much higher inertia and lower resonance frequency. The belt has enough elasticity to allow the rotation of the fourth joint. The mini-robot has the following mechanical features: 9The arm is used as a shaft to drive the fourth link (wrist). With this configuration, low inertia, high torsional stiffness and relatively high structural resonance is accomplished. This joint is directly driven although its actuator is distant from the wrist. 9The center of gravity of the mini robot moving parts is close to the shoulder. This creates a nearly balanced mechanism in which the dynamic forces in each link are almost decoupled from the dynamic forces of the remaining links. 9The rotation of the wrist joints (including a 0.5 g pipette) affects the accumulated inertias

Figure 2. Shoulder assembly.

451 on the shoulder revolute joints by +0.5 %. Therefore, in the calculation of the shoulder dynamic forces, the wrist could be considered in a fixed position for practical purposes. 9High stiffness without sacrificing low weight and low inertia was accomplished. The lowest structural resonance occurred in the arm at about 83 Hz. This structural resonance could be increased by 70% to 144 Hz by replacing the aluminum tubular arm with a steel tubular arm. A stranded Aramid filament is used as transmission belt for the fifth link,. The resulting natural frequency is about 237 Hz.

MINI-ROBOT KINEMATICS

Figure 3 depicts the mini-robot kinematic model with its workspace. We use the terms "shoulder," "arm" and "wrist" to describe the parts of the mini-robot. The shoulder is the area around axes 1, 2, and 3. The axes of the first two joints intersect, but the third one is displaced 12.5 mm along the positive direction of d I . The wrist is at the intersection of the axes of joints four and five. The arm is the link between the wrist and the shoulder. Using the DenacitHartenberg parameters from Table 2 and the standard procedure, we obtain:

Fig. 3

Mini-robot schematic diagrams. Kinematic linkage diagram (a), top view (b), and side view (c) of mechanism and workspace.

452 Table 2 Mini Robot kinematic parameters

i

~

ai

di

Oi

(deg

(mm)

(mm)

(deg)

Home Position

Motion Range

1

0

0

d1

0

d 1 = Omm

-12.5 < d I < 12.5mm

2

-90

0

0

02

02 = _90 ~

_105 ~ < 02 0 and s - 6 + A.e, the position error converge asymptotically to zero [6], [16]. The global asymptotic stability for the proposed adaptive controller is then ensured. Another important property concerns the convergence of the estimations to the true parameters which is not necessary with our controller [17]. The objective of the passive feedback controller is hence not the linearization but the preservation of the passivity of the system.

4. DECENTRALIZED ADAPTIVE CONTROL DESIGN Non-linear controllers perform good results. However they mainly depend on the rightness of the mathematical model structure used to describe the dynamic behavior. Moreover, these control laws require many operations which, for some applications, can't reasonably be computed in real time. That's the reasons why a simple adaptive decentralized control seems at this time more appropriate to our problem. Dealing with the compromise of keeping a simple control scheme and taking advantage of the passivity property to ensure the asymptotic stability, we have opted for the decentralized adaptive control proposed by Seraji [9]. First, we display the conventionnal feedback applied to our system and proceed to adjustment of the control gains. Then this control law, with some added signals, is made adaptive inspired by Seraji's approach. 4.1.

Conventional

Feedback

State feedback approaches are usually applied on pneumatic systems [4][5]. The system state can be defined by the state vector (q,t:l ,z). The output "t is obtained from the measure (APp-APn) and the known parameters S and g,. In what follows, we use the same error signals as defined in equation (14). From the way we present the model, we propose a state feedback control in two stages. First, desired position and velocity are used to compute the required supply torque and then, we proceed to a pressure control in order to define the servovalve input current i 9 rd-Kpe

+ Kv6

i-Ge r Kp, Kv and G are 2*2 positive diagonal gain inatrices.

(24)

521

Figure 5. State feedback controller

This control is implemented in discrete time using 8=(~/-1)/Te,the delta operator for approximation of the derivation. (/is the forward operator and T e the sampling period. This can be drawn as in (Figure 5). We use an experimental procedure to determine the control gains. First, we consider as desired trajectory a slow sinusoid for each joint (period = 10s) and put the vector G at a very low value (10 -4) so that no oscillation appear. Then, we adjust the gains Kp, K v, to minimize the position and speed errors, and to limit the current i in the servovalves saturated at i=+20mA._ Finally, we rise the vector gain G which has a stabilizing effect on the system. We repeat these last two steps till we obtain the vector gain G maximum without oscillations and small tracking errors. We repeat the procedure for faster desired trajectories to optimize the control gains and bandwith. Simulation results are presented in section 5. 4.2. Decentralized adaptive control design The method proposed by Seraji [9], is based on the decomposition of the model into n subsystems each of them representing the dynamic behaviour of one joint. These subsystems are interconnected by the Coriolis, centrifugal and gravity terms. The controller, said decentralized, is then composed of n control laws acting on each subsystem. The aim of this section is to apply the method to our system which includes the actuator dynamic. First, the model is rewritten as follows: Mi(q)qi +di(q,C'l,q)= r i ii-Ji

zi+Bi ri-E

(2s)

/li

with

J-1/Ji, B-Bi/Ji, ' i E'i-Ei/Ji d i represents the gravity, Coriolis, centrifugal torques for the ith joint and also the inertial effects from the other joints. As J', B' and E' are diagonal matrices (there are no coupling effects between the actuators), J'i, B'i and E' i are the diagonal terms. The independant joint controllers are built in two stages: one concerning the leg dynamic and the other the actuator dynamic. They are designed in order to ensure the asymptotic

522 convergence of the tracking position and speed errors to zero. This leads to the following control laws drawn in figure 6. r d _ f + K p e + K v 6 + k I ~[d (26) i-Ger+F6+k

2 (1d +k 3 z-d +k 4 ~.d

The index i is dropped and the gain variations coming from the "adaptive laws" block in figure 6 not represented for convenience.

Figure 6.Decentralized control diagram e, ~, e~ are the tracking errors defined by (14). All the gains f, Kp, Kv, F, G, k 1, k2, k3 and k 4 are updated on line according to adaptive laws. These laws are determined using Lyapunov method. The definition of two adaptive errors emerge from the demonstration proposed in appendix B. rl=Ler

(27)

and r 3 = P I e + p 2 6

The adaptive laws appear to be, after simplification, integral type adaptation algorithms as determined for the non-linear controller: f ' - 6 ( p I e+P2 6),

I(p - f l I(Pl e+P2 6).e,

I~ v - / 3 2 ( P l e+P2 6) 6, l~3-g4(Ler)

rd,

G=g2(Ler)

er

l ~ l - f l 3 ( P l e+P2 6) ~d, 1~2 _ g l ( L e r )

1~4 - g S ( L e r )

i-d,

l:'-g3(Ler) 6

qd

(28)

523 Some important remarks have to be mentionned concerning the stability proof developped in appendix B. First, it is assumed that the terms M and d vary slowly in comparison with the controller gains. Bad behaviour is then to be expected when dealing with high velocities. Moreover, the stability proof is based on an error model considered not disturbed and with null initial conditions. This implies that the desired trajectories must be taken with initial errors on speed and acceleration to zero. The asymptotic stability demonstrated is therefore a local stabiliO~. This implies that the desired trajectory must be slow enough to stay in the stable zone Finally, the control laws, even if they are computationally easy to implement, involve many gains to adjust. The number of gains increases with the number of joint. However, in our case the number of gains remains reasonable as we only have two joints. Initial gains are taken to zero except for Kp, K v and G which are initialized at the values found with the conventionnal controller presented in section 4.1. It means that, at the beginning, the adaptive controller will behave as the conventionnal controller. The tracking errors will then decrease owing to the adaptive laws. In spite of all these critical remarks, this controller remains attractive mainly because it do not include the complex dynamic model while ensuring the asymptotic stability. As for the nonlinear adaptive controller presented in section 3, the adaptive laws are based on the observation of the error variations. Convergence to the true parameters is hence not required.

5. C O M P A R I S O N VIA SIMULATION RESULTS Tile performances of these control laws have been studied in simulation. In both cases, the control is computed in discrete time and the system is simulated as continuous system using Simnon software. For each kind of control presented in this section, we use the same desired trajectory for comparison purposes, in tile form q d _ ( j r / 6 ) . ( l - c o s ((o.t)) with c0 = 3 rad.s -1. This trajectory guarantees null velocity t - 0 as advised in section 4.2. The results are presented only for the second joint. We get similar results for the first joint. First, we present the obtained results for position, velocity, tracking errors and the input current i of the servovalve, with corresponding gains and initial conditions. Then we propose a comparative study of the three controllers. 5.1. Simulation results

5.1.1 Conventional feedback As first simulation tests, we apply the conventionnal feedback presented in section 4.1. The sampling period for control is Te = 5 ms. The gains adjusted according to the procedure exposed in this section are: K p = 8 0 , Kv = 0 . 7 a n d G =

2.5 10-2.

524

q2 qd2 (tad)

8.1~2

t.

e2 ~a~

8 0.5

--4~.r-~ -o .q~l

D

|

Time (s)

Time (s)

~/12 qd2 (~d,'s)

"'21 e 2 ( t a d / s )

t, O -15. i

-5

Time (s)

Time (s) i 2 (A)

-1. lg-31

-2. ] : - 3 t e

:~ ~ Time (s)

Figure 7. State feedback controller This controller has the advantage to be the less complex one. However, the stabilit), is lost when dealing with high velocities or when the initial conditions do not match the desired ones. Moreover, the procedure taken to adjust gains is quite difficult to implement because of the oscillations which may appear when high gains are used. We are restricted to narrow closed loop bandwith. 5.1.2 Non linear passive controller

We then apply the non linear controller presented in section 3. The sampling period is Te = 3ms and the gains are adjusted to: Fv = 1, A = 20, G = 20, Fj2=I0000, FB2=0.02, FE2=0.9, Frn2-~. 1 and FI2 = 10 -4, with the following initial conditions: J2 = 3000, B 2 = E2= m2 = 12 = 0.

525

q2 qd 2 (rad) |

O

.0~1

9 2 (tad)

--2tt Time (s) 9

Time (s)

.

o

2. q2 qd 2 (tad/s)

e . 2 ] e2 (tad/s) 8.1

--'!! 15

_

:

--Q

,

e

4

s

8

-0

te

o

:~

Time (s)

~ d Time (s)

d

lh

,I i 2 ( A )

1---:I! -2.

0

'~ 4 w

i

6

O

110

Time (s) Figure 8. Non linear controller The sampling period must be quite low when the non linear controller is applied. The

acc~tracy, demanded and got, is actually high; this involves many computations. The complexity can be reduced with a study of the magnitude order of each parameter, and the removing of the less significant ones. Anyway, a hardware architecture fast enough to enable on line computations of this passive control is required. We can notice that when the initial conditions do not match with the desired one, the system retrieves quickly the desired trajectory. The same remark is suitable when adding a disturbance. This is due to the global asymptotic stability property peculiar to the passive non linear controller.

5.1.3 Decentralized adaptive control Finally, we present the results got with Seraji's approach applied to our system. The sampling period is Te=5ms. The starting velocity and position errors are zero and initial conditions for gains are the values obtained with the first controller:

526 Kp = 80, K v = 0.7, G = 2.5.10-2 and the other ones set to zero. The adaptive errors are set to. r 1 = e x and r 3 = 3.e + 0.5.6 and weighting gains to: 5 = I0, 131 = 132 = 40, 133 = 2 , gl = 0.1, g2 = 2, g3 = 0.8, g4 = 0.01. q2 qd2 (tad)

e 2 (tad)

t

S .El

Q

-B . 8 t

:~

m

,9

d

d

,b

e

e

4

Time (s)

s

a

ts

Time (st

,sq2 qd 2 (tad/s)

e.z

1

e2 (rad/s)

O.t

8 -i

-o. 1 --0. 6

~,

~t

d

6

~tb

Time (s)

,e

Time (s)

--5. E - 3

d

~

,~ d d tb Time (s) Figure 9. Decentralized controller As for the conventional controller, this controller looses stability at high velocities (o3 = 5 rad/s). This is due to the assumption of slowly time-variant parameter M(q) and d (q, t~, ti), which is no more verified in fast motion ases, and to the needed high gains. Another reason is that when the tracking errors are too high, it makes the system go out of the stable zone (because o f the local stability property). Variations of gains are then not sufficient to track the trajectories. We must underline that the weighting gains are difficult to adjust. Stability may also be lost when a too high gain is applied or when the initial conditions are too different from

527 the desired ones. This controller is hence not advisable for fast motions and not suJ.licieutly rohusl. These seem to be the most significant limits for the decentralized adaptive controller.

5.2. Comparative study Some remarks suitable for all the three controllers presented in this paper, concern the adjustment of gains. We observe that gains related to actuators are lower than the leg dynamic ones This can be justified by the different magnitude order involved in the two stages Weightings related either to gains for the decentralized adaptive controller or to parameters for the non-linear one, are adjusted in order to make their variation move fast enough without too much oscillations. These gains are smaller when the non-linear controller is applied, this emphasizes the fact that passive controller is more robust. Another general remark concerns the desired trajectories. No sufficently exciting property is advised for estimation as it is the case for identification algorithms. The controllers perform well whatever the desired trajectories are, chosen in function of the task to achieve, but the control parameters may be more difficult to adjust, bandwith limitation may appear. General comments emerge from the curve comparison. As prescribed, the decentralized adaptive controller behaves first as the conventional one and then improves its performances by adaptation. When the first period has gone by, the position tracking error of the two adaptive controllers are similar and about four times less than for the conventional one. Regarding the speed errors, we can note that we get better results with the non-linear controller. However, higher magnitude oscillations are observed during the first period. Applied input current is the same order of magnitude for the three controllers after many trials. Let us compare for our system the number of operation to be computed on line. For the non-linear controller, it represents about 134 additions or mutiplications with 4 sine or cosine functions to compute. For the decentralized controller, only 102 basic operations are required, that is to say about 23 % less. This ratio increases tremendously with the number of degree of fieedom and of parameters to estimate. Arguments for choice can be classified in two main points: - 7he t)clssive uoll-liuear adaptive coturo/ler has the properO~ to pelJorm well whatever the .weed rcmxe is. However, as it iuchtdes the complex uon-linear mode/in the coutrol kin's, it requires high computation power to be implemented ill real time. - The decentralized adaptive coulroller is attrative because of its simple control scheme and heuce implementation fitcilities. However, many gains are difficult to adjust and the stahiliO., is threateued when high veh)cities are required. A compromise must therefore be done. Either robustness towards disturbances and velocity range is necessary and then the nonlinear controller is advised, requiring tast computation property. Or the desired velocity is not too high and the hardware architecture less developped, and the decentralized adaptive controller is more appropriate. In our application, we intend to make our robot move fast keeping in mind that the stability of the control law is fundamental. Non-linear control is then advised. On the other hand, we expect to add one joint located at the "hip", which will increase the non-linear complexity.

528 Moreover, our hardware architecture is currently simple due to cost concern. These arguments make us opt for the decentralized controller which keeps the number of operations reasonable. Anyway, lower velocities will have to be taken into account. In conclusion, the two controllers, although having different properties, are interesting depending on the application, and no general choice can be taken.

6. CONCLUSION In this paper, a dynamic model of a pneumatic actuated leg is proposed to design adaptive control laws. Two kind of adaptive control approaches are applied and compared. The stability analysis of these two controllers are grounded on the passivity property of the system. Simulation results reveal the limits of the decentralized adaptive controller which looses stability at high velocities. On the other hand, the non-linear controller, although some non significant parameter may be removed, remains computationally complex. A compromise is then necessary to make a choice depending on the hardware architecture, the number of joints and the velocity range. The procedures developped in this paper are under implementation. For future works, the authors intend to investigate Force/Position control problems applied to pneumatic legged robots.

REFERENCES

1.

H.EI. Gamah, P.F.S. Amaral, J.G. Fontaine and J. Rabit, "Motion Control of an electropneumatic driven legged robot," Workshop on Motion Control, Peruggia, Italy, 1992. 2. Villard, P.Gorce, and J.G. Fontaine, "Study of the dynamic behavior of Ralphy," IEEE/RSJ b~ternational Cot~erence on h ltelligent Robot and 5"ystems, IROS'93, Yokohama Japan, pp 1765-1770, July 1993. 3. C.Villard, P.Gorce, and J.G. Fontaine, "Legged robot: How to solve quasi-dynamic situation with a coordinator lebel," hlternational Co1?ference on Advanced Mechatronics, ICAM'93 Tokyo Japan, pp 610-614, August 1993. 4. S. Scavarda, "Les asservissements 61ectropneumatiques de position," Ed. Hermes, 1989. 5. J. Pu, P.R. Moore, and R.H. Weston, "Digital servo motion control of air motors," htt. Journal Prod. Res, Vol. 29, No. 3, pp 599-618, 1991. 6. I . D . Landau, and R. Horowitz, "Applications of the Passive Systems Approach to the Stability Analysis of Adaptive Controllers for Robot Manipulators," Int. J. qf Adaptive Control and Signal Processing, Vol. 3, 1989. 7. J . J . E . Slotine, and W. Li, " On the Adaptive Control of Robot Manipulators," The International Journal of Robotics Reseamh, Vol. 6, No. 3, pp 49-59, 1987. 8. S. Dubowsky, and D. T. DesForges, " The Application of Model-Referenced Adaptive Control to Robotic Manipulators," Journal of Dynamic Systems, Measurement, and Control, Vol. 101, pp 193-200, September 1979.

529 9.

10. 11. 12.

13. 14.

15. 16. 17.

H. Seraji, " Decentralized Adptive Control of Manipulators: Theory, Simulation, and Experimentation," IEEE Transactions on Robotics and automation, Vol.5, No.2, April 1989. I.D. Landau, "Adaptive control- The model reference approach," Ed. Marcel Dekker, New-York, 1979. J.L. Shearer, "Study of Pneumatic Processes in the Continuous Control of Motion with Compressed Air, I, II," Trans. A,S3/IE, pp 233-249, February. 1956. A. Kellal, "Contribution a l'etude des asservissemants electropneumatiques. Application a la realisation d'un asservissement nuln6rique d'un axe de robot," These, INSA Lyon, Juillet 1987. E. Dombre, and W. Khalil, "Modelisation et commande des robots," Ed. Hermes, 1988. S. Liu, and J.E. Bobrow, "An Analysis of a Pneumatic Servo System and Its Application to a Computer-Controlled Robot," Journal of Dynamic ~S)~stems, Measurements, and Control, Vol. 110, pp 228-235, September 1988. S.J. Lin, and A. Akers, "Dynamic Analysis of a Flapper-Nozzle Valve," .lourmU of Dynamic Systems, Measurements, and Control, Vol. 113, pp 163-166, March 1991. N. K. M'Sirdi, and A. Benallegue "Adaptive and Robust Controller for Robot Manipulators with Flexible Joints", Proc. of ECC, Grenoble, July 2-5 1991. N.K. M'Sirdi, M. Guihard, and J.G. Fontaine, "Identification and control of pneumatic driven robot, " IEEE Systems, man and Cybernetics Confi,mnce, pp 722727, October 1993.

Appendix A In this section, we demonstrate the stability of the non-linear adaptive controller by the passive system theory. A system which verifies the Popov's inequality between its input u(t) and its output y(t) is passive. Popov's inequality is expressed as follows: j-~, yT(t).u(t).dt > - y 02

with 7' 02 < oc

The main inrerest of such systems lays in the tact that the parallel or the feedback connection of two passive systems remains passive. The Equivalent Feedback System represented in figure 4 will be passive if the tour interconnected blocks are passive, as they are in feedback connections. Remark: in the case of known parameters, the closed loop scheme is modified by removing the two feedback PAA blocks. The dynamic robot links block (M.g+(C'+Fv).S) is passive since it verifies Popov's inequality:

s 0, R = ~

> o

~ - t?~ ( x - x~) - / ~ f0~(b -

b~)dt

(12)

Where K =/)-1/~T/5 and t5 is the solution of the matrix Riccati equation. The posture and velocity control system is realized by a linear state feedback and f~edforward controller as equation (12) (Figure 4). 4. S t e e r i n g c o n t r o l When the robot spins itself to change direction on a plane, ie. ~ r 0, the value ~ affects the dynamics of balancing control, and equations (1) abd (2) are no longer exact. However, we can assume the robot's dynamics keeps equations (1) and (2) approximately, when the value ~ is small. Also, we regard the robot system as consisting of two independent inverse pendulums which are mounted on both wheels, which can be controlled independently. We can apply the Power Wheeled Steering method for heading control of the wheeled inverse pendulum type mobile robot based on these assumptions. The algorithm to control such a non-holonomic system should be divided into two steps: (a) Control to make a vehicle track the reference translational velocity v and directional angular velocity (b) Determination of locomotion velocity v and directional angular velocity ~ of the vehicle based on the deviation between the current position and direction, and the reference trajectory.

644

O +

"/dtl I

-lg(

+

i ref )

+D, + !0 U~,.

US

xs

0 ref " f( 0 ref) ~

Wheeled

O ~ I P " I ~1

+l

Inverse Pendulum

Figure 4. Block diagram of the posture and velocity control system

Rr

RI

i ! i

L

i

6"

i i

Figure 5. Parameters for a vehicle controlled by the PWS method

X

645

Step (a) is discussed in this section, and step (b) is described in next section. In the case of the power wheeled steering type vehicle (Figure 5), the moving velocity v and directional angular velocity ~ relate to the left and right wheel's rotational angular velocity 0t, 0r as in equation (13). (13) T

-Z-

Where L is the distance between the left and right wheels, and Rz and Rr are radius of the left and right wheels. The steering and moving velocity control of the robot are realized by controlling both the left and right wheel's rotational angular velocity. The reference rotational angular velocity for the posture and velocity control system (Figure 4) is given as equation (14).

Orref

!

~

r

L

)

(~4)

The block diagram of the control system to perform step (a) is shown in Figure 6. The inclination angle 6 and angular velocity 6 are obtained from a gyro sensor, which are commonly used for the controller of both wheels.

.........................................

Vref

~....

=1,..~.]

~

~

O"

L~'~ right wheel [ irreii I controller

r

Flr

-~ --~0 + +

posture and velocity controller

ref

[ 0,

0 ,i;;~........... : ..... :;i,;?, i .,,w,...

coo,ro,,er I i

"

...........................

" i

-. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

;

Figure 6. The heading and velocity control system

"

-

V

646

5. Straight line tracking control To make the robot track along a given trajectory in a plane, the reference value of moving velocity vrcf and direction angular velocity ~rc~,- must be calculated. In this section, we describe that algorithm.

Figure 7. Configuration of straight line tracking control

5.1. Heading control for straight line tracking Let us consider the case where the robot is to track along a straight line on a plane passing through a point (Xo, yo) with angle r from the x axis. The robot's current by dead reckoning or some position and direction are estimated as x(t), y(t) and r other positioning system. At first, we represent the robot position and direction on the - ~1 coordinate, on which the ~ axis denotes the given straight line to track. The robot position and direction is denoted by ~(t), v(t) and a(t). Then, the problem is formulated as "decide the angular velocity ~ ( t ) s o that ~(t) and c~(t) go to zero asymptotically, while v(t) remains constant and ~(t) increases with given velocity". So the problem can be treated as a regulator problem after linearization based oil the assumption that ~7(t) and ~(t) are small 9 The reference directional angular velocity at the next sampling instant should be calculated from negative feedback of ~(t) and a(t), ie:

Cre:(t +

At) -- -kv'rl(t) - k ~ ( t )

(15)

647

Velocity

Desired velocity v~

!

tO

ta

td

ts

Time

i~.--.~1 Acceleration section

Deceleration section

Figure 8. Reference velocity for the robot

5.2. Translational velocity control for starting and stopping A sudden change of the reference velocity for starting and stopping may cause the robot to be unstable. Hence, the reference velocity is given in three stages: the acceleration section (to- ta), the ordinary operating section (ta- td), the deceleration section (td- t,), as shown in Figure 8. The reference velocity to start and stop are calculated as follows: 1. Calculation of the reference velocity in acceleration section The reference velocity is given as equation (16).

(16) where, At is sampling interval, a is given acceleration ratio and vd is the desired velocity from the locomotion control command. 2. Calculation of reference velocity in deceleration section for stopping The calculation of reference velocity to make the robot stop at the goal position exactly is not a simple problem. The starting time of deceleration can not be predicted exactly because of the effect of response delay time or steady-state error in the control system. Letting the velocity and position at any time t be v(t) and Xstop where robot should stop are given by

,(t) = a - ( t , -

t)

x(t),

v(t) and the position

(17)

648

X~top -- x(t) +

ft ts a.

(t~ - t)dt

(18)

Where t~ is the time when tile robot will stop. The velocity v(t) caI1 be rewritten from equations (17)and (18).

v(t) - v/2a(Xstop- x(t))

(19)

Hence, if the control system has no response delay time and steady-state error, the reference velocity can be calculated as follows.

v,,f(t + At)-

{ v,,:(t) " v(t) < 1 2 a ( X , t o p -

v~z(t ) -a.

x(t)) A t " v(t) >_ 12a(Xstop - x(t))

(20)

To avoid the run-over at the stop position X~top caused by such effects, we propose a formula to calculate the reference velocity for stopping as equation (21).

Vref(t + At) -- { vr~f(t) " v(t) < v/2a[(Xstop - c ) - x(t)] vr~i(t ) - a . A t " v(t) >_ v/2a[(xstop - e) - x(t)]

(21)

where e is an appropriate small value. 6. L o c o m o t i o n c o n t r o l c o m m a n d s y s t e m A command system named "HYS", which is used to define the reference trajectory and other information for this type of robot has been designed. It is similar to the Spur [5] command system for autonomous mobile robots proposed by our group. The usual HYS commands denote the coordinate parameters on the two dimensional plane to define the line or position. Each command is represented as a function in tile C language. The important HYS commands are as follows: 1. Straight line command: HYSJine_LC(xun~, yun~, Czin~) To define the straight line along which the robot should track. 2. Stop command: HYS_stop_LC(xstop, y~top, ~b~top) and HYS_stop() To define the position and direction in which the robot should stop and emergency stop. 3. Spin command: HYS_spin_LC(~bsp,i,~) To make the robot spin to an angle of

Cspin from the

x axis.

4. Position adjustment command: HYS_adj_pos_LC(x,ej, y~ej, Cadj) To replace the internally estimated position by tile given parameters which are usually determined by detecting the environment with the external sensors.

649 5. The other commands HYS_set_vel(v)" To set the reference moving velocity of the robot,. HYS_set_angv(~)" To set tile reference directional angular velocity for the spin motion. HYS_get_pos_LC(&x, &y, &~p): To monitor the current position and direction. HYS_get_vel_angv_pos(&v, &~/~,&r To monitor the current moving velocity, directional angular velocity and posture of the robot.

Using these commands, the navigation program can easily control the robot trajectory in real time based on the map information and external sensory data.

7. Experiments using Yamabico 7.1. Experimental system and determination of parameters The robot used for experiments was a wheeled inverse pendulum type self-contained toobile robot Yamabico Kurara from which tile front and rear casters had been removed from the standard type Yamabico robot. Yamabico are a series of self-contained autonomous mobile robot platforms for experimental research which were developed by the authors' group. Figure 9 shows a self-contained autonomous inverse pendulum type experimental robot "Yamabico Kurara" which is controlling itself to keep its own balance. Tile robot has rotary encoders (resolution:2000) and a vibration type gyro sensor (TOKIMEC Co. TFG-160) to detect wheel rotation and the body's inclination angular velocity. Both wheels are driven by DC motors of 10 Watts. Ultra sonic sensors to recognize the environment are attached on the front, left and right sides of body. By using the parameters given in Table 1, the coefficient matrices /l and /) of tile augmented system (5) are found to be:.

~_

/ 0 100 / 38.23 -51.76 0

-0.085 0.347 0

0.121 -0.56 1

0 0 0

/)_ '

//o

-03.7 56.02

In the posture and velocity control experiment, the weighting matrices of performance index J were fixed as Q - d i a g ( 1 , 0.01,0.01,4) and/~ - 300. As a, result, feedback gains of the control system represented in equation (12) were given as/~1 - ( - 9 . 0 , - 1 . 5 8 , - 0 . 1 2 3 ) , /c2 - -0.115, and in this case poles of closed-loop system were calculated as A1, A2 (-6.177 + 0.119j) and A3, A4 - (-1.529 + 0.1412j). The paprameters k~, ks in equation (15) were determined to be 0.00457 and 1 respectively. An MC 68000 micro-processor was used as a controller, and all computation was performed in a fixed-point representation. The sampling interval of the control program is 5 miliseconds. The motor input current is controlled using tile feedforward current control method [6].

7.2. Experiments 1. Experiments for the posture and velocity control At first, the step response experiment to test the performance of the posture and

650

Figure 9. The inverse pendulum type self-contained mobile robot Yamabico Kurara

velocity control system was performed. One of the results is shown in Figure 10, which shows that the robot can track the reference velocity while keeping its balance. Figure 11 shows the experimental scene oil sloped rough ground. Tile result is shown in Figure 12. In spite of the roughness and slope of tile road, tile robot tracked the reference velocity while keeping balance of posture. This result shows that the proposed algorithm is robust enough for the indoor environment. 2. Experiments to verify our assumption on appling PWS method We assume that the steering control can be done independently without serious effects upon the balance of the body if the rotational angular velocity is limited to small values. We checked the validity of this assumption by examining the posture of the body at corners A and B when the trajectory as shown in Figure 13 is given for the robot. We found that the robot could track the given trajectory and maintain the stability of posture while performing a turning motion. The result is shown in Figure 14. 3. Experiments for trajectory tracking and position control Figure 15 shows the result of trajectory tracking control experiment when the desired straight line is given for the robot moving with velocity 25[cm/s] at point A and B on the x - y plane. Figure 16 shows one of tile experimental result using the reference velocity is calculated by algorithm as equation (21) for position control. 4. Experiments of indoor navigation

651

Figure 10. Experimental results for posture and velocity control

Figure 11. Experimental scene oil sloped rough ground

652

Figure 12. Experimental results for posture and velocity control on sloped rough ground

Corne~~.~ 200 cm -~IC~ rner A

20

\

.]

...................... "~'

,...........

.....

I ,o o,

~~__] MovingDirection J

Figure 13. The trajectory to test posture control in turning motion

653

Figure 14. Experimental results of posture stability in turning motion

We also performed an experiment in indoor navigation to test the performance of the total control sytem and the validity of the command system. The Intelligent Robot Laboratory at the University of Tsukuba was used for as the experimental environment. Figure 17 shows the robot moving in a real environment. 8. C o n c l u s i o n

In this paper, we proposed a control algorithm to make the inverse pendulum type mobile robot move in a plane, with the desired constant velocity, and while keeping the balance of posture by itself. The control algorithm was implemented and tested on the experimental robot. Through experiments using the robot, we concluded: 1. In spite of modelling errors, parameter variation, and the accumulated errors in the inclination angle, the robot can move robustly at the desired velocity and while keeping its balance. 2. It is possible to separate the steering control from the posture and velocity control without locomotion velocity and direction angular velocity being constrained too much.

654

Figure 15. Experimental results for trajectory tracking control

Figure 16. Experimental results for position control

655

Figure 17. Experiment in indoor navigation

3. Sensor based navigation, while keeping its balance, has been realized using the implemented command system. REFERENCES

1. K. Yamafllji and T. Kawamura, "Postural control of a monoaxial bicycle", Journal of tile Robotics Society of Japan, Vol.7 No.4 pp.74-79, 1988 (in Japanese). 2. O. Matsumoto, S. Kajita and K. Tani,"Attitude estimation of the the wheeled inverted pendulum using adaptive observer", Proc. of 9th Academic Conf. of tile Robotics Society of Japan, pp.909-910, 1991 (in Japanese). 3. E. Koyanagi, S. Iida, K. Kimoto and S. Yuta,"A wheeled inverse pendulum type self-contained mobile robot and its two-dimensional trajectory control", Proc.of ISMCR'92, pp.891-898, 1992. 4. J. Medanic and Z. Uskokovic, "The design of optimal output regulators for linear multivariable system with constant disturbances", Int. J. Control, Vol.37, No.4, pp.809830, 1983. 5. S. Iida and S. Yuta,"Vehicle command system and trajectory control for autonomous mobile robot", Proc. of the 1991 IEEE Int. Workshop on Intelligent Robots and Systems, pp.212-217, 1991. 6. S. Iida and S. Yuta, "Feedforward current control method using two-dimensional table for DC servo motor software servo system", Proc. of the 1988 IEEE Int. Conf. of Industrial Electronics, pp.466-471, 1988.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

An Outdoor All-purpuse Takao

OKUI

Robots System Platform and Yoshiaki

657

for

Autonomomous

SHINODA

FOURTH RESEARCH CENTER TECHNICAL RESEARCH and DEVELOPMENT JAPAN DEFENSE AGENCY 2-9-54

Mobile

Fuchinobe,Sagamihara,Kanagawa

INSTITUTE

229 J a p a n

1. A B S T R A C T A u t o n o m o u s g r o u n d robots for exploring n a t u r a l e n v i r o n m e n t s are developed in our r e s e a r c h center, along with t h e r e c o g n i t i o n of t h e c o n f i g u r a t i o n s and t h e obstacles. The t h r e e key technologies for such a robot are t h e s t u d y of t h e locomotion m e c h a n i s m , the a u t o n o m o u s s y s t e m a n d t h e m u l t i p l e robot control. This p a p e r will discuss the definition of r o u g h t e r r a i n a n d t h e e s t a b l i s h m e n t of t h e f u n d a m e n t a l precondition based on t h e r e s u l t s of t h e e x p e r i m e n t s we conducted. It will also propose an a u t o n o m o u s system for robots t a k i n g not only its c o n f i g u r a t i o n s into c o n s i d e r ation, b u t w e a t h e r , l i g h t i n g and t e m p e r a t u r e conditions as well.

2. 1 N ' I ~ O D U C T I O N

The goal of our work is the d e v e l o p m e n t of an a u t o n o m o u s s y s t e m t h a t does not depend on t h e controlled condition of i n d u s t r i a l s e t t i n g s , b u t functions in an u n s t r u c t u r e d outdoor e n v i r o n m e n t . The difficulties to r e a l i z e t h e a u t o n o m o u s system are not only that robots h a v e t h e l i m i t a t i o n of t h e c a p a c i t y of t h e h a r d w a r e i t self ( e . g . , u n c e r t a i n t y and error), b u t also that the architecture and t h e d a t a s t r u c t u r e for robots r e m a i n s unsolved. Recently, m a n y a r c h i t e c t u r e s for a u t o n o m o u s i n t e l l i g e n t robots are being proposed and d e m o n s t r a t e d . The C a r n e g i e Mellon N a v l a b is n a v i g a t e d on roads. [1] The C a r n e g i e Mellon A m b l e r was developed in r e s p o n s e to a u t o n o m o u s e x p l o r a t i o n of p l a n e t a r y a n d l u n a r surfaces. [2-4] The M I T M o b i l e Robot group d e m o n s t r a t e d complex control

658 s y s t e m s , o t h e r w i s e k n o w n as s u b s u m p t i o n a r c h i t e c t u r e , for a u g m e n t e d f i n i t e s t a t e m a c h i n e s . [5][6] H o w e v e r , a u t o n o m o u s r o b o t s h a v e y e t to be completed. Our approach

to t h e a u t o n o m o u s s y s t e m was

to f i r s t c o n s i d e r t h e

d e f i n i t i o n of r o u g h t e r r a i n . R o u g h t e r r a i n ( F i g . l ) c o n s i s t s of a v a r i e t y of a r t i f i c i a l a n d n a t u r a l objects, such as t r e e s and ditches, a n d also c h a n g e s in c o n f i g u r a t i o n , w e a t h e r , l i g h t a n d so on. We h a v e a n a l y z e d a n d defined t h e t e r r a i n o u t d o o r s as shown in Fig.2. It is obvious t h a t m a n y k i n d s of robots locomote on such a r o u g h terrain autonomously. Therefore, we h a v e e s t a b l i s h e d t h e f o l l o w i n g preconditions"

Figure 1: Rough Terrain Soil-Rock 9Kind of grain 9Water content 9Wet density

- - Configuration

Shape 9 9 9

9Ground bearing power Rough Terrain

Obstacle 9Natural obstacle (weed,forest,river etc.) -Artificial obstacle (building,wall,etc.) Figure 2: The Definition of Rough Terrain

659 1) T h a t t h e robot is as small and light as possible and has high mobility against rough terrain. Therefore, we i n t r o d u c e d t h e concept of giving the robot the ability to change p o s t u r e . 2) T h a t the robot has a map which describes the rough t e r r a i n between the s t a r t point and the goal point. However, the map is rough and does not h a v e complete information on obstacles, w a s h boards, changes in configurations, and so on. 3) T h a t the robot has a d a t a b a s e of its postures which p a r t i a l l y describes how to t r a v e r s e rough t e r r a i n . 4) T h a t a man indicates to the robot the s t a r t and goal points in which the robot will a u t o m a t i c a l l y create a global p a t h and will locomote along t h a t path. By u s i n g a model (Fig.3), we s t a r t e d the autonomous system, which does not depend on a locomotive mechanism, u n d e r the above p r e c o n d i tions. We h a v e n a m e d this robot the Autonomous Mobile A l l - p u r p s e Platform (AMAP). In this paper, we are proposing an a r c h i t e c t u r e for an a u t o n o m o u s systemfor AMAP predicated on the b e h a v i o r - b a s e d concept. This c o n cept is becoming widely adopted. We h a v e extended t h i s and are c o n s t r u c t i n g the b e h a v i o r - b a s e d intelligence h i e r a r c h y system. The r e m a i n d e r of this paper is organized as follows: In Section 3 t h e s y s t e m concept is described. The b e h a v i o r of robots t h a t l o c o mote on r o u g h t e r r a i n are discussed. In Section 4 s y s t e m s t r u c t u r e is p r e s e n t e d . This s t r u c t u r e was constructed by decomposing the a r c h i t e c t u r e into t a s k - a c h i e v i n g modules along the flow of the f u n c tion. Conclusions follow in Section 5.

Figure3:

AMAP

660 3. S Y S T E M

CONCEPT

A robot m u s t execute a diverse a r r a y of b e h a v i o r s in order to move on an o u t d o o r r o u g h t e r r a i n . (Fig.4) T h r o u g h our e x p e r i e n c e s of o p e r a t i n g AMAP on a r o u g h t e r r a i n model and a n a l y z i n g the r e s u l t i n g behaviors, we developed a concept for t he a r c h i t e c t u r e . As a result, we h a v e classified t he b e h a v i o r s into six t a s k s to create t h e a u t o n o m o u s s y s t e m as follows: 1) Global path planning P l a n s t h e global p a t h by u s i n g a map a n d m a k e s i m p r o v e m e n t s upon t h i s map. 2) Task management Selects from t h e following four t a s k s and sensor control for o b t a i n i n g t h e i n f o r m a t i o n t he robot r e q u e s t s . 3) Locomotion on unknown rugged configurations Recognizes a r o u g h t e r r a i n in f r ont of t h e robot a n d m a k e s m o t i o n p l a n n i n g for t r a v e r s i n g r u g g e d c o n f i g u r a t i o n s w h i c h are not described on t he robot's d a t a b a s e . R u g g e d c o n f i g u r a t i o n s are defined as t he t o p o g r a p h y t h a t t h e r o b o t in t h e basic p o s t u r e can not t r a v e r s e b u t by c h a n g i n g p o s t u r e can.

Figure 4: The Outdoor Autonomous Locomotion

661 Global path planning / / Task management \ ]Locomotion on unknown\ /rugged configurations \ ] Locomotion on known \ / rugged configurations ~

o = /

+~ +~= .~

Obstacle avoidance

/ Locomotion . on plain f!e!d s

I

Figure 5: B e h a v i o r - b a s e d Intelligence Hierarchy System 4) Locomotion on k n o w n rugged configurations Recognizes configurations in front of the robot and makes motion inferences by selecting from the database, which describes these rugged configurations and the way to traverse them. 5) Obstacle avoidance Detects and avoids rough t e r r a i n t h a t the robot can not locomote and traverse. 6) Locomotion on p l a i n fields Moves along the global path, conducts real time control, and detects changes from plain fields to rugged configurations or obstacles. Through analyzing these tasks (Fig.5), we have created a b e h a v i o r based intelligence h i e r a r c h y system t h a t deals with a diverse group of requested behaviors. [7]

4. S Y S T E M S T R U C T U R E

Before building the system flow on the basis of the previous system concept, we introduced five function stages. The Stages are sensing, processing, recognition, judgement and control. The sensing stage consists of sensors' hardware for obtaining the data about the environment around the robot. The processing stage is the extraction of signals from the sensors. The recognition stage is the generation of information for judgement. The judgement stage consists of motion planning to indicate where to go and what to do. The control stage constitutes motion control depending on the m e c h anism of the robot. In each t a s k and stage we have developed modules by considering the flow of data and distributivity. Therefore creating a system flow. (Fig.6) Each module is as follows:

662

Figure 6: The Flow of Autonomous System

1) Location ..... M a k e s an e s t i m a t i o n of the robot's c u r r e n t location a n d m a k e s a n y corrections to localize the robot's position. 2) Contact force Detects t h e a m o u n t of contact force for soft t e r r a i n . 3) Range image Acquires r a n g e images. This module is a k i n d of p i x e l - l e v e l fusion t h a t can be u s e d to i n c r e a s e the information content associated with each pixel in an image formed t h r o u g h a c o m b i n a tion of m u l t i p l e images. We are o b t a i n i n g t h e s e i m a g e s by u s i n g a t i m e - o f - f l i g h t - t y p e laser r a n g e finder (Fig.7) and C C D c a m e r a s . 4) Contact Touches c o n f i g u r a t i o n s and obstacles t h e e x t r a c t i o n of contours. We m a d e a r u b b e r a c t u a t o r t a c t i l e sensor w i t h e l a s t i c i t y to r e act to sudden changes in configuration. (Fig.8) 5) Edge extraction E x t r a c t s t h e b o u n d a r y b e t w e e n plain fields and otherwise, and sends t h e i n f o r m a t i o n to Caution. [7] 6) Region segmentation I S e g m e n t s i m a g e d a t a f r o m C C D c a m e r a s by a n a l y z i n g t h e t e x t u r e of t h e photo to detect t h e a r e a s w h e r e t h e robot can not locomote. 7) Pattern recognition Detects and recognizes moving objects.

663

Figure 7: Laser Range F i n d e r

F i g u r e 9: Region S e g m e n t a t i o n

Figure 8: Tactile Sensor

Figure 10: Polygon Approximation

8) Region segmentation II S e g m e n t s r a n g e d a t a into plain fields, convexs, concaves and blind spots. We h a v e defined s e g m e n t a t i o n by h e i g h t . (Fig.9)

[8][9] 9) Polygon approximation A p p r o x i m a t e s the boundaries of regions t h a t are obtained by Region s e g m e n t a t i o n I and lI to a convex hull (fig.10) and l a bels each region as "possible to move" or "impossible to move".

664

10) Polyhedron approximation A p p r o x i m a t e s the shape of configurations and obstacles by u s i n g inclinations in r e l a t i o n s h i p to a polyhedron. [9] 11) Data fusion Recognizes t h r e e dimensional shapes, moving objects and s o f t ness. [10] This module is a k i n d of f e a t u r e - l e v e l fusion t h a t can be used to both increase the likelihood t h a t a f e a t u r e , e x t r a c t e d from the i n f o r m a t i o n provided by t h e sensors, is r e c o g nized, and to c r e a t e a d d i t i o n a l composite f e a t u r e s for use by t h e system. It also is a s y m b o l - l e v e l fusion t h a t allows t h e i n f o r m a t i o n from t h e m u l t i p l e sensors to be effectively used t o g e t h e r at the h i g h e s t level of abstraction. 12) Adjancent detection Detects changes a r o u n d the robot by using tactile sensors and supersonic sensors, and produce obstacle fuzzy vectors t h a t e x press the distance and direction of obstacles. 13) Caution Detects a c t u a l changes in plain fields to m a k e a robot stop and produce caution fuzzy vector ( F i g . l l ) from obstacle fuzzy v e c tors. This module is a k i n d of s i g n a l - l e v e l fusion which r e f e r s to a c o m b i n a t i o n of signals from a group of sensors. Their objective is to provde a signal t h a t is u s u a l l y of t h e same form as the original signals, but it is of g r e a t e r quality. 14) Local path planning P l a n s a subgoal, which describes t h e position and t h e direction t h e robot should locomote, in the s e n s i n g a r e a of the robot and guides the robot to avoid obstacles with Reaction module. 15) Selection of action patterns Selects from t h e d a t a b a s e of actions (Fig.12) to t r a v e r s e r u g g e d configurations.

F i g u r e 11: Caution Fuzzy Vectors

F i g u r e 12: D a t a b a s e of Actions

665 16) Generation of action patterns Generates sequences of posture by using the posture database. 17) Task selection Makes a j u d g e m e n t of the task based on the ability of locomotion and the r e s u l t of recognition. 18) Map improvement Matches the global and local maps and makes i m p r o v e m e n t s based on the obtained e n v i r o n m e n t a l information. 19) Global path p l a n n i n g Generates a global path on the map. 20) Reaction Reacts to avoid rugged configurations and obstacles in a m a n n e r s i m i l a r to a h u m a n ' s unconditioned response by s y n t h e s i z i n g the planned path and the reaction vector. (Figl3) 21) Locomoting control Stabilizes the body of the robot in order to deal w i t h small changes in configuration. Locomotes by obeying speed and direction commands. 22) Posture control Changes control of posture by obeying the posture command. 23) Sensor selection Selects sensors to obtain information the robot needs. 24) Sensor control Controls the direction, the v i s u a l field and the m a g n i f i c a t i o n of the sensor to obtain the necessary information. We have constructed the e x p e r i m e n t a l system in our l a b o r a t o r y in order to study the system flow. (Fig.14) All of the devices, which belong to Sun's engineer work station, are connected by E t h e r n e t to condct p a r a l l e l d i s t r i b u t e d processing and to increase the e f f i c i e n cy of programming. The signals fromthe sensors are t a k e n in t h r o u g h f

I I_~nnedPath ~ , I fns';CF.uction

Y P

Robot [ / / -

I _ o ~

X

React ion Vector J

Figure 13" Reaction

666 Body External sensor

CCD image processing] ....I

9 CCD

9Laser range finder 9Tactile 9Ultrasonic .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Internal sesor 9 G sensor

9 F.0.G. 9 Inclinometer 9 Enc0der Motor ammeter

Range imageprocessing Tactile data processing ]

i

Environmental recognition

~

~ ~

Ultrasonic data pr0cessingh

ii

" ] ~j -I

Self-localization

Robot model _-7 Locomotion control

Figure 14: The Experimental System t h e V M E - b u s a n d a r e p r o c e s s e d by devices i n s t a l l e d w i t h r e a l t i m e UNIX. On t h e c o n t r a r y , t h e s i g n a l s to t h e r o b o t are p r o c e s s e d by devices i n s t a l l e d w i t h r e a l t i m e U N I X a n d are t a k e n out.

5. C O N C L U S I O N In t h i s p a p e r , we d e f i n e d r o u g h t e r r a i n , d i s c u s s e d t h e a u t o n o m o u s s y t e m f o r r o b o t s w h i c h l o c o m o t e on r o u g h t e r r a i n , and proposed the behavior-based intelligence hierarchy system which shows t h a t d i v e r s e sets are feasible. We c r e a t e d t h e A u t o n o m o u s Mobile A l l - p u r p o s e P l a t f o r m ( A M A P ) w i t h t h e a b i l i t y to c h a n g e its p o s t u r e a n d e q u i p p e d A M A P w i t h e x t e r n a l s e n s o r s such as t h e CCD a n d t h e l a s e r r a n g e f i n d e r a n d i n t e r n a l s e n s o r . We e x p e r i m e n t e d on s e v e r a l m o d u l e s w h i c h i n c l u d e L o c a t i o n , R a n g e i m a g e , C o n t a c t , Edge e x t r a c t i o n , R e g i o n s e g m e n t a t i o n lI a n d P o l y h e d r o n a p p r o x i m a t i o n . We o b t a i n e d some i n t e r e s t i n g r e s u l t s . We h a v e t h r e e objectives for t h e f u t u r e . F i r s t , we p l a n to c r e a t e a p r o c e s s flow for each m o d u l e by doing more e x p e r i m e n t s w i t h A M A P . Second, we will s t u d y t h e p r o p e r t i e s a n d l i m i t a t i o n s of t h e a r c h i tecture. Finally, we w i l l c l a r i f y t h e t r u e a r c h i t e c t u r e and data s t r u c t u r e for a u t o n o m o u s robots.

667 REFERENCES

1. Thorpe.C. (editor), "Vision and Navigation", t h e C a r n e g i e Mellon Navlab, Kluwer,1990. 2. R.Simmons and T.M.Mitchell, " A Task Control A r c h i t e c t u r e for Mobile Robots", Proc AAAI Spring Symposium, Stanford, California, March 1989. 3. J . E . B a r e s and W . L . W h i t t a k e r , " W a l k i n g Robot with a circulating gait ", Proc. of IEEE Int. Workshop on I n t e l l i g e n t Robots and Systems, p p . 8 0 9 - 8 1 6 , 1990. 4. Bares J.et al., " A m b l e r - a n autonomous rover for p l a n e t a r y e x p l o ration", IEEE Computer, Vol.22, No.6, 1 8 - 2 6 , 1989. 5. Rodney A. Brooks, "A Robust Layered Control System for a Mobile Robot", I E E E J o u r n a l of Robotics and Automation, V o l . R A - 2 , No.l, 14-23,1986. 6. Rodney A. Brooks, "A robot t h a t walk; e m e r g e n t behavior from a carefully evolved network", Proc. of 1989 Int. Conf. on Robotics and Automation, 6 9 2 - 6 9 4 , 1989. 7. Okui et al., "Astudy on an Autonomous Ground Vehicle:The system and the o f f - r o a d Detection using CCD", Proc. of 1992 Conf., J a p a n Society of Mechanical Engineers, Vol.C, N o . 9 2 0 - 1 7 , 5 1 6 - 5 1 8 , 1 9 9 2 . 8. F u r u t a , Okui et al., " A S t u d y on Recognition of T e r r a i n using L a s e r Range F i n d e r : Discussion on Simple Expression of T e r r a i n and its perception Method", Proc, of 1992 Conf., Robotics Society of J a p a n , No.3, 9 6 7 - 9 6 8 , 1992. 9. Kouzuki, Okui et al., " A Study on Recognition of T e r r a i n using Laser Range Finder: Detection of Planes using 3 - D hough T r a n s f o r m a t i o n and a study on Expression of t e r r a i n ", Proc. of 1993 Conf., Robotics Society of J a p a n , No.2, 7 8 3 - 7 8 4 , 1 9 9 3 . 10. A.Abidi and C.gonzales (editor), "DATA F U S I O N i n Robotics and Machine Intelligence", Academic Press Inc., 1992.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

669

Mechanism and Control of Propeller Type Wall-Climbing Robot

A. N I S H I

and H. MIYAGI

Faculty of Engineering, Miyazaki University Gakuen-Kibanadai, Miyazaki 889-21, JAPAN Abstract Use of a wall-climbing robot for rescue or fire-fighting has been anticipated for a long time. Four quite different models have been developed in our laboratory. The present model can move on a wall by using the thrust force of a propeller and can fly whenever it is required. Its mechanism and control system are discussed. 1. I N T R O D U C T I O N A robot capable of moving on a vertical wall has been looked forward to for a long time. It could be used for rescue and/or fire-fighting in high-rise buildings. Four quite different types of wall-climbing robots have been developed in our laboratory over the last 20 years. The first model, shown in Fig. 1, has a large sucker and crawlers as a moving mechanism [ 1]. Recently, many varieties of this type model have been developed for wall inspections in Japan[2]-[4]. The second type, shown in Fig.2, is a biped walking model, which has a small sucker on each foot[5]. As this can be applied to almost all irregular wall surfaces, it has a wider applicability than the former. In general, the walking motion is not very quick so that the walking model takes much time to climb to higher places on the wall. Unfortunately, a robot would be required to climb to higher places on buildings in a short time for emergency purposes, such as carrying a rescue tool or fire-fighting in buildings. The third model, shown in Fig.3, aims at these purposes[6]-[ 10]. It has propellers, and their thrust forces are inclined a little toward the wall to make use of the frictional force between the wheels and the wall surface, as well as to support the robot itself. Sometimes unexpected strong winds occur on the walls of high-rise buildings. In such cases, the control system to compensate for the wind load is important to avoid having the robot falling from the surface. This situation has been previously discussed in detail in Refs.[6] and [7]. Often there are many obstacles at the lower part of buildings, such as trees, eaves, entrances, etc.. In such cases, it would be useful if the robot could fly over these obstacles and reach the upper wall surface. Moreover, if the robot fell from a higher place on the wall surface by accident, it would need to make a soft landing to avoid endangering itself or its surroundings. These maneuvers can be done by employing a mechanism and control system enabling it to fly. As the robot has enough thrust force to sustain itself in the air, it can fly or land. This is the fourth model and its mechanism and control system are discussed in this paper[ 10]-[ 13].

670

Fig. 1 Large sucker model. 2. M E C H A N I S M

Fig.2 Biped walking model. Fig.3 Propeller type model.

AND CONTROL

OF THE ROBOT

2.1 Thruster of the wall-climbing robot An example of a conceptual model is shown in Fig.4. A conventional thruster is a combination of an engine and a propeller. Although a small jet engine and rocket motor can be made available for emergency use, they are expensive and noisy compared with the conventional engine and propeller. There are two types of propellers: those with larger diameters, similar to helicopter rotors; and smaller ones, similar to light-plane propellers. The former can produce a larger thrust force for the same engine power. However, the wind load is larger, and it is difficult to design a wall-climbing robot using this type. Therefore, the latter has been used for the model concerned here. 2.2 Mechanism of the flight model To use a robot for carrying rescue tools or for fire-fighting, first, it is necessary for the robot to reach and attach itself to the wall surface at the lower part of buildings. However, often there are obstacles such as trees, eaves and entrances at the lower part of the buildings. In such cases, it may be difficult for the robot to attach itself to the wall directly. As the robot, concerned here, has enough thrust force to lii~ itself up in the air, it can fly over these obstacles employing a suitable mechanism and control system, while the wind is mild or there is a calm wind condition. Almost the same situation will occur at higher parts of the buildings, where there are eaves or steps on the wall surface which are difficult to get over using the conventional moving mechanism. Moreover, there is danger of the robot falling from the wall surface by an unexpected strong wind or accident. In such cases, the robot must make a soft landing, or the fall would be very dangerous for the robot itself and its surroundings. These situations can be avoided by designing the robot so that it is able to fly or land. These maneuverings are shown in Fig.5 schematically.

671

Fig.4 Conceptual model.

Fig.5 Maneuvering of the flight model.

However, its major purpose is not to fly like a helicopter, but to move on a wall surface using its wheels, since moving on a wall is safer. Therefore, it has a long body with many wheels, and its thrust force has a component toward the wall to produce frictional force between the wheels and the wall surface. The conceptual robot is designed for this requirement. Two propellers are installed at the top and bottom of the trapezoidal body. Many light-weight wheels are arranged on the outside of the body, so that two outer planes can touch the wall surface, and it can get over the irregularity on the wall surface, such as the window frames, eaves and so on. When the robot touches the wall, the thrust force axis is inclined a little toward the wall, and the frictional force of the wheels is produced automatically.

2.3 Control mechanism of the flight model Three mechanisms have been constructed and examined to produce the control force of the flight model in the air. The first one is inclining the thrust force directly. This method can produce a large force, however, it is not suitable to the flight robot, because of the heavy weight of the mechanism and hence the insensitive response. The second mechanism is directing the propeller slip stream with vanes. This control mechanism is easy to construct and lighter than the former. Therefore, it is supposed that this is better for the flight model. The slip stream of the main thruster propeller rotates and thus produces reaction torque on the body. Counter rotating propellers are used to decrease this torque, however, its magnitude is variable for changing the thrust force depending on the up and down motion of the robot. The regulating stream vanes are installed in the slip stream of the propeller, and a couple of the vanes are controlled to eliminate the reaction torque. The flight control is accomplished by two control blades. These are set in the down stream of the regulating vanes. These are set crosswise. Each of them are controlled independently to produce a side force in any direction. A considerable amount of drag force was produced by these three blade rows and this decreases the thrust force. Moreover, the interaction of forces produced by each row was observed, and a complicated control software was required. A laboratory model and a flight model of this type were constructed and tested. The laboratory model could be controlled easily, however, the flight model was difficult to control since a more complicated mechanism was required to decrease the non-linear effect of the control, caused by

672 the heavy weight of the control mechanism. So, this mechanism was abandoned for the third one. As the body is symmetrical along the thrust axis, a righting force is not produced by the thrust force, when the robot is inclined in the air. Therefore, the control thrusters are required to produce the force to control the robot in the air. This is the third method and is applied to the present model. Eight DC-motor driven propellers are employed as a control thrusters, which are set at the top and bottom of the robot body, shown in Fig.4. As the flight model has 5 degrees of freedom in the air except for the up and down motion, this mechanism is redundant. Each control thruster can be controlled independently, therefore the control is easy compared with the above two methods. For the mechanism, however, a powerful and light weight battery has to be installed on the robot. The tail rotor of the helicopter model was employed as a thruster, and it can produce both forward and backward forces depended on the blade pitch angle. The thrust forces to produce the inclination of x- and y-directions are given by the forces Fx and Fy respectively, and the rotational motion in z-direction is given by Fz. Then, the following relations are derived: Fx=F2-F4+ (Fs-F7) Fy=F~-F3 + (-F6 +F8) Fz = (FI +F2 +F3 +F4)-(Fs +F6 +F7 +Fs)

(1)

(2) (3)

where F ~ F s are the forces of thrusters, and these are shown in Fig.6. The following relations are assumed: F2-F4=F5-F7

(4)

FI-F3=-F6+Fs FI +F2+F3+F4=F5 +F6+F7+Fs

(5) (6) 7'

t-

{k. "( k,,..

i

?

_| (~

.-

y~

~x

I

.o

Fig.6 Schematic of control thruster.

Fig.7 Schematic of laboratory model.

673 Then, each thrust force can be given by Fx, F), and Fz as follows: F~ = (F~+FY2)/4 F2 = (Fy +FJ2)/4 F3: (-F~ +FJ2)/4 F4: (-Fy +F._/2)/4 F,:(-Fy-FJ2)/4 F6:(-F,~-Fz/2)/4 FT=(Fy-Fz/2)/4 Fs:(Fx-FY2)/4

3, M E C H A N I S M

AND TESTING OF THE LABORATORY

(7)

MODEL

3.1 Mechanism of the laboratory model. Before the construction of a flight model, an investigation was carried out using a laboratory model, shown in Fig.7 schematically. It had a control mechanism similar to the conceptual model, but did not have the main thrusters. It was designed to obtain experimental data of the control mechanism easily. The model was supported by a rigid bar at the center of gravity with a universal joint. The inclination angles of x- and y-directions are sensed by both potentiometers and inclination sensors, and the rotational motion in z-direction is measured by an another potentiometer. Eight thrusters were controlled independently, and the inclination and rotation of the model were controlled both automatically and manually. As each control was independent for the small increment of each direction, the equations of motion were simple. The state feedback method was used, angles and angular velocities were taken as the feedback variables in each direction.

3.2 Wireless communication system and its preliminary testing Wireless communication is required to control the flight robot in the air or at higher places on a wall. The radio controller of a model plane was employed to simulate the control system for the laboratory model. The sensor signals from the model were given by wire to the computer. The feedback signals to the model were given by radio signals. Eight channels of the radio controller were used for the servo-motors of the thrusters. The preliminary experiment demonstrated that this method was an effective and easy way to control the model. This can be developed for the two-way communication between the flight model in the air and the computer on the ground.

3.3 Experimental results The moment of inertia of the model was adjusted by adding the corresponding mass on the model in order to compensate the lack of main thrusters. Then, almost the same amount of control forces was required compared with the flight model. The equations of motion are given as follows: ~

Ix ~ =F~L+ To~

(8)

,.

b r =FyL+Toy

(9)

Iz "~=Y~l+Toz

(10)

674 ,

i

i

J

I

|

I

t~ .

tD

= 0.1

i

O

F

c~

I

0 c~ .,_, r

N 0

-0.I

21,

0

I'0

2'0

-03 t

30 40 Time (see)

I

10

,

1

20

I

I

I

40 Time (see) 30

(a) Inclination sensor. (b) Potentiometer Fig.8 Responses of the laboratory model. where .Ix, Iy and Iz are the moments of inertia in x-, y- and z-directions respectively, L the distance between upper and lower control thrusters, and 1 the breadth of the model shown in Fig.7. Tox, Toy and Toz are the resultant torques around x-, y- and z-directions, respectively, produced by the engine and propeller operations. The time lag of the inclination sensors are measured and given as r =0.38 s. The optimum feedback coefficients are determined by the pole assignment method and its simulations. Examples of the results are shown in Fig.8, for the different feedback signals from the potentiometers and inclination sensors. The model was inclined about -0.13 rad initially, and after a given start signal it moved to the neutral position automatically. After that, the angles were controlled manually in _ 0.13 rad. The response angle showed that a certain amount of overshoot and small oscillations appeared in the inclination sensor signals. Almost the same results were obtained for the y-direction. A satisfactory control could be expected by these results.

4. FLIGHT

SIMULATION

4.1 Computer simulation Before the construction of the flight model, the flight simulation was carried out to ascertain the moving of the model in the air. The maneuvering of the flight model is done by the control of inclination angle of the model. When the model is inclined at a small angle, the side force component is produced by inclining the main thrust force. Then, the side movement occurs with acceleration. The engine revolutional speed is adjusted a little to maintain the same height. To move the model back to the initial position, the inclination angle has to be changed to the opposite direction. These control and motions are simulated by the computer and a result is shown in Fig.9. This shows that the robot moved about 10m from the origin in 24s, after that it

675 moved to the opposite position. The required changes of the inclination angle and velocity are shown in the same figure for the movement. This was obtained by using the pole assignment method of analysis.

4.2 Manual control of movement using the laboratory model As the laboratory model is supported by a bar, it cannot move to the side direction. This motion was created by the computer and combined with the inclination of the laboratory model. A combination of step inputs and a few adjustments were given to the model manually, and the inclination angle of the model was obtained by the potentiometer signals with a certain amount of time delay. The distance and velocity of the side movement as a function of the inclination angle were calculated using a computer. These results are shown in Fig. 10. It is shown clearly that the flight model can be controlled easily by the manual control. 0 L

Distance L 0.1

Incl. Angle O n'~,

;

/

",J

\

,,",,

I \I

IoEE . J ~

----- V e l o c i t y V

',

L--:r.----~---..~. -..~---'---=-'t..,--- I

.t.-'~--~.~..~.~._.~.,

!,,/\ -10

-0.1 .

i

I

0

~

I

I0

!,

.

I

20

.....

L.

I

30

L

I

40

I

60

50

t (sec)

Fig.9 Computer simulation of the flight robot. O '

CD 0.I '/'~"

F ,/',/ /

I

/

i

'

~

I

F't-~"

~

i- - I \ "

9

,

10

'

----- Manual Input Distance L Incl. A n g l e 8

----veto~iw

EE ...I

v

\

o

L L.W L'.'XJ -0.I

f

0

,

i

10

t

i

n

2o

t (sec)

. . . . . . 10

Fig. 10 Manual control of the laboratory model.

30

>

676 5. F L I G H T

MODEL AND ITS TESTING

5.1 Dynamic behavior of the main thruster In order to obtain the dynamic behavior of the main thruster, a test apparatus was constructed. The purposes of this experiment are as follows: (1) to see whether enough thrust force could be produced to sustain the robot itself in the air, (2) to obtain the dynamic thrust force, the side forces in x- and y-directions, and the resultant torque in z-direction for a sudden change of engine operation, (3) to ascertain the effect of electronic noise of the ignition plug and other high voltage parts to the radio communication system. These tests are now progressing.

5.2 Flight model and its testing The flight simulations and above experiments show that the flight model must be controlled well in the air. The last stage of the development is the construction and testing of the flight model. This is now in preparation.

6. C O N C L U S I O N Three different types of wall-climbing robots have been developed in our laboratory. The first model had a large sucker and could be utilized on flat and/or wide wall surfaces. The second was a biped walking model, which had a small sucker on each foot. This could be applied to almost all irregular wall surfaces. The third model could move on a vertical wall at high speeds. It had propellers, and the thrust force supplied both the forces to lift the robot itself up and produce the frictional force between the wheels and wall surface to be safe on a wall. When the third model is employed for emergency use, at first it is necessary for it to reach and attach itself to the wall surface. Often there are obstacles at the lower parts of the buildings, such as eaves, trees, entrances, etc.. Therefore, it would be very useful if the robot could fly over these obstacles and reach the upper wall surface. Moreover, if the robot falls from the wall surface by accident, a soft-landing control has to be employed to avoid danger to the robot and the surroundings. For these purposes a flight model has been considered. The mechanism and control systems for such maneuvering have been discussed. A laboratory model was constructed and tested. As a result, it was demonstrated that the laboratory model could be controlled easily, and similar control methods can be applied to the flight model. The flight simulation was carried out to ascertain the control mechanism and system. As the results, it is shown that the flight robot can be controlled easily and the control mechanism is suitable for the flight robot. The next step is the construction of the flight model and its testing in the air. This is now in preparation.

677

REFERENCES [1] Nishi A., Wakasugi Y. and Watanabe K., Design of a Robot Capable of Moving on a Vertical Wall, Advanced Robotics 1-1, 33-45(1986). [2] Nagatsuka K., Vacuum adhering crawler system "VACS"(in Japanese),Robot 53,127135(1989). [3] Ikeda K., Nozaki T. and Shimada S., Development of wall-climbing robot (in Japanese), Proc. 1st Symp. on Construction Robotics in Japan, 107-116(1990). [4] Sato K., Tominaga I., and Fukagawa Y., Development of wall-climbing robot and its application to nuclear power plant (in Japanese), Piping Technology 29, 92-96(1987). [5] Nishi A., Biped Walking Robot Capable of Moving on a Vertical Wall, Mechatronics 2-6, 543-554(1992). [6] Nishi A. and Miyagi H., A wall-climbing robot using thrust force of propeller, (Mechanism and control in a mild wind), JSME Int. J, Series C, 36-3,361-367(1993). [7] Nishi A. and Miyagi H., A wall-climbing robot using thrust force of propeller, 2nd Report, Trans. Jpn. Soc. Mech. Eng. (in Japanese), 58-547C, 163-169(1992). [8] Nishi A., A wall-climbing robot for inspection use, Proc. 8th Int. Sym. on Automation and Robotics in Construction, 267-274(1991). [9] Nishi A., A wall-climbing robot using propulsive force of propeller, Proc. 5th Int. Conf. Advanced Robotics, 320-325( 1991). [10]Nishi A., Control of a wall-climbing robot using propulsive force of propeller, Proc. IEEE/RSJ Int. Workshop on Intelligent Robots and Systems IROS '91, 1561-1567(1991). [ 11 ] Nishi A., and Miyagi H., Propeller type wall-climbing robot for inspection use, Proc. 10th Int. Symp. on Automation and Robotics in Construction (ISARC),189-196 (1993). [12]Nishi A., and Miyagi H., Development of wall-climbing robot, Intelligent Automation and Soft Computing, Vol.2, TSI Press, 327-332 (1994). [13]Nishi A., and Miyagi H., Mechanism and control of propeller type wall-climbing robot, Proc. IEEE/RSJ/GI Int. Conf. on Intelligent Robots and Systems, IROS '94, 1724-1729 (1994).

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

679

Automated Guided Vehicles in Japan --Recent Trends in Advanced Research, Development, and Industrial Applications-TOSItlHIRO TSUMURA University of Osaka Prefecture Sakai, Osaka, 593 Japan ABSTRACT: Recent trends in advanced AGV arc reviewed and discussed, with examples. Special concern is given to automated position finding, guidance, multi-vehicle strategies, and communication systems, with the focus on engineering, electromcchanical and optical problcrns. In addition to conventional manufacturing and advanced industrial applications, service vehicles such as floor cleaning, blind aids and off-road uses are described. New electric power supply systems are also presented.

1. INTRODUCTION Recently, in developing filll-automated factory manufacturing systems, AGVs (automated guided vehicles), permitting automated guidance and operation, are used as a key means of transportation on road, construction site or other locations. Various means of transportation for factory operation have been put to use (see materials [ 1]-[41), including hoist, crane, earth-moving machine, terrain vehicle, railway carrier system, fork lift truck, and automated or manually controlled carrier with tires. In the past 10 years, automated guided wiretype vehicles have mainly been used for transportation in industry, service and clerical work. Tile world's first AGV was developed in 1963 by Daifuku Ltd. in Osaka. In developing the vehicle, WEB, a U.S. company, transferred AGV technology to Daifuku. Following Daifilku, more than 20 Japanese companies commenced AGV developnlcnt.

Fig. 1 Numbers of vehicles and systems

680 Presently, AGV is being developed by more than 35 companies, of which more than 30 are manufacturing AGV otl a co,nmcrcial basis. Figs. 1-3 sl~ow data on AGVs (source: Japan Industrial Vehicle Assn.) in Japatl. Almost all AGVs in J~q)an are used in factories. Recently, however, new AGV applications have been developed in various fields; tllese applications include office automation, public facilities, railway temaihal, sllopl~ing center, container yard, restaurant, medical care, agriculture, engineering work, and building

Fig. 2 Breakdown of systerns according to vehicle type

Fig. 3 Btcakdown of systems according to business type

681

construction. This paper discusses the recent trends in the research, development and commercial use of AGV. Specifically, it treats automated position finding, guidance, multi-vehicle strategy, and vehicle communication. Another focus is on clectromcchanical and optical techniques.

2. AUTOMATED GUIDANCE SYSTEM

2.1 Reflecting tape guidance (optical or magnetic tape guidance system) In this system, reflecting tape (white tape, metal tape, or tape painted with glass powder) or magnetic tape is placed on the vehicle route, to detect the vehicle's deviation from the predetermined route. Light emitted from the vehicle lower part is reflected by the tape; the reflected light is sensed as signal by the photosensor on the vehicle lower part. In the case of guidance using magnetic tape, a magnetic sensor must also be installed on the vehicle lower part. It is better to place tape over the entire route, than to place it on some parts of the route. This system is superior to the electromagnetic cable guidance system in that the former system permits flexible and easy installation. The tape guidance system is advantageous also because it can bc easily recognized by passers-by. In Japan, more than 10 companies recommend the use of the system for manufacturing parts or office machines. A system using a single tape permits satisfactory operation. To improve operation, however, the following types of tape have been developed: a) Two tapes installed in parallel, to enhance system reliability b) Tape on which bar codes or some kind of marks are printed, to store special data or road information c) Curved tapes, with various curvatures Fig. 4 shows a tape guidance system. 2.2 Laser guidance (laser beam guidance system) The following two laser guidance systems have been devised: a) Straight line guidance system: In this system, laser beam is fixed. b) Guidance system for freely chosen courses: In this system, using a laser scanner, a two-dimensional mirror controller reflects laser beam.

Fig. 4 Tape guidance system

Fig. 5 Laser beam controller for curved course

682

Fig. 5 shows a reflecting mirror controller for guidance on a fi-eely chosen curved course. (The vehicle detects laser-transmitted instructions for guidance.) Presently, straight line guidance systems are commercially available. These systems are superior to electromagnetic or reflecting tape guidance systems, especially when used for carriers in super-clean rooms. Also, the applications of guidance systems for freely chosen courses will soon expand.

2.3 Optical recognition guidance system Various image recognition systems for vehicular or moving targets have been studied and developed. These systems range from simple mark recognition systems to large-scale computerized three-dimensional image recognition systems. In my opinion, the commercialization of image recognition systems require the fillfilhnent of three tasks: reducing cost, enhancing reliability, and increasing processing speed. 2.3.1 Comer cube laser-scaiming system In this system, comer cubes (a kind of retroreflectors) are set above the route (e.g. on the ceiling). A laser scanner scans laser beam to determine the course. A photo-sensing array is installed on the vehicle upper part. Data detected by the array are sent to the AGV steerage circuit. Groups of corner cubes, each group containing two or more cubes, are placed on the ceiling along the route. A fan beam-type laser scanner with rotary encoders is installed on the AGV upper part. Laser beains reflected by a corner cube group are received by AGV's photo-sensor. Based on beam angles, a computer mounted on AGV determines deviation from the course. This guidance system also dctects direction deviation from the reference course. Fig. 6 shows the principle of the guidance system. (See material 111.) This principle can be applied to blind aids. 2.3.2 Bar code mark recognition system In this system, a series of bar code marks are placed on the route, to provide route information to the vehicle. Video camera systems are installed on the vehicle front and/or lower part. Optical data at bar

Corner cube CCa

i

,'///_L,~l,'/_,'///// ,~//_,~/,~/(L,

o

V

o

l

9

I

am scaimer with sensor

AGV ~i~,)l~iJl]lJiJ/~llllll, Y

Corner cube CCb

0

Corner cubes are set al predetermined posilion on ceiling

* 0 = 0 9AGV is on the right course. * CCa = CCb 9AGV is on a course parallel to tile right one

Fig. 6 Principle of a comer cube -laserscanning system

683

code pattern position are processed by a computer mounted on the vehicle, to provide positional and guidance data. This type of systems are commercially available in Japan. 2.3.3 Spot mark/video camera system In this system, small glass balls are set on the course as spot marks, and protected by hard plastic cases. A video camera system is installed on the vehicle front, to measure deviation from the course via CCD receiver. Obtained data are processed by a computer mounted on the vehicle, to provide steerage infornmtion to the vehicle. 2.3.4 Roadside cube guidance System (1) Fixed fan laser-type system Fig.7 shows this system, in which many corner cubes are set along the guard rail, arbitrarily or at predeter, mined positions. Two or three laser fan beams are installed on the vehicle side. Using a mathematical method, the mounted microprocessor determines vehicle position and direction in relation to guard rail, based on odometric data and information on laser beam reflected by the cube. Accuracy in determining vehicle position and direction can be improved by setting corner cubes on both sides of the road. Fig. 8 shows an experimental vehicle. The cube guidance system pemaits the quick and accurate determination of vehicle position and direction. Data, provided continuously by the system, are used for adjusting the guiding controller in the automatic position detenuination system.

Fig. 7 Road side corner cube guidance system

Fig.8 Experimental car for roadside corner cube guidance system

(2) Fan beam scan-type system In this system, corner cubes are set on both sides of the road, especially when the road is undulate or uneven. The vehicle is equipped with dead reckoning and laser scanning systems, to determine the present vehicle position and the predetermined vehicle direction. The laser scanning system scans retro-reflectors at predeternaincd positions to determine the angles of reflected laser beams. Fan beam laser scanning permits this operation even when the vehicle is on a slope. Fig. 9 shows the position deternlination hardware design of a vehicle using tile fan beam scan-type system. Tiffs system will prove usefill in aiding blind persons.

684

L. T. : Laser transceiver ( scanning via rotation)

Fig. 9 Position determination hardware design for fan beam scan type vehicle

2.3.5 Edge detection system This system is designed to detect road edge or guard rail using route infomlation. The system uses one or two CCD cameras mounted on the vehicle. It is usefi~l to mark the course by a white or dashed line. Fig. 10 shows a vehicle with a road edge detection system. 2.4 Laser beacon system Ill 2.4.1 One-beacon system This system uses a laser scanning system installed at a predetermined position. The scanning system has a precision digital encoder, and a transmitter sending angular data to the surface vehicle. The vehicle is cquippcd with three or more photo-sensors, each receiving rotating scanning laser beam. Upon reception, received angular data are sent to a processor. In Fig. 10 Guidance via edge detection system

685

Fig. 11 Experimental laser beacon system

geometric design, the one-beacon system is similar to the sonic beacon system. However, the former system, permitting the emission of sharper laser beams, ensures higher accuracy. The one-beacon system is suitable for vehicles on seaside construction sites, ship positioning and guiding systems, and indoor AGVs. Fig. 11 is the illustration of an experimental robot vehicle of simple design 18i. 2.4.2 Two-bcacon system for 3-dimensional positioning 191 This system has two laser scanning systems installed at predetermined positions. Each of these systems is equipped with a precision digital encoder. Tile photo-sensor, mounted on tile vehicle, receives laser beams cmitted by the cncoders, and determines vehicle position via geometrical analysis. Information on vcrtical position clement can be obtained by setting thc laser for horizontal scanning, and installing the photo-sensor in the vertical direction. The two-beacon system is uscfi~l in guiding robots for construction work (e.g. those for measuring undulation during golf course construction), v,hicll travel over wide areas. Fig. 12 shows the system's principle. Fig. 12 Two-beacon system for 3-D positioning

686

2.5 Corner cube system In this system, three or more comer cubes (or reflecting tapes) are set on the course or field. A laser scanner with a precision rotary cncoder is mounted on the vehicle. The vehicle determines its position and direction via computer. A variety of the system uses two comer cubes and four laser transmitters/receivers, but no scanners. These systems are called laser navigators. (see Fig. 13) A popular application of the automatic guidance system is the outdoor concrete cement finishing system, using automatic map guidance; in this system, the automatically measured vehicle position is checked accurately via laser navigator

II01. In any of the measurement methods explained so far, the guided vehicle uses the differential odometric method, or the dead reckoning system (e.g. an odometer with a gyroscope; it is to be noted that, recently, reasonably priced but highly efficient gyroscopes have become available). In my opinion, the development of transportation infrastructure using laser guidance and corner cube setting methods will prove advantageous, since such infrastructure can be used in heavy rain, snow or fog; or for waste land, grated floor, sloped surface, stairway, escalator, or moving road. These methods will prove effective in guiding not only AGVs, but caterpillar or walking type vehicles, humans and animals. Recent discussions on advanced AGV and its system have been reported in many newspapers specializing

Fig. 13 Laser navigator

687 in industry. According to the papers, recent engineering efforts related to AGV have been focused on the following points: 1) Flexible course setting 2) Dust-free system 3) Low-noise vehicle 4) Energy (battery) problem 5) Autonomous guidance 6) t lybrid mode guidance (combination of autonomous guidance with forced guidance) 7) Velficle-to-vehicle communication 8) System or traffic control for guiding a large group of independent vehicles 9) Man-machine interface 10) Lower cost, higher performance 11) Development of the present system to expand applications, in such fields as agriculture, engineering work, building construction, and office automation 12) Operation in clean environment (e.g. in semiconductor or LSI plants) 13) Safety problems 14) Automatic guidance of vehicles in a formation 15) Automatic guidance system using a single trajectory (called "dotetsu" in Japan) a) Dotctsu using a mechanically coupled trajectory b) Dotetsu using a trajectory coupled non-mechanically (via radio, optical means, map etc.) 3. MULTI-VEillCLE STRATEGY The guidance and control of more than one vehicles involve additional problems. Recently, Japanese researchers in AGV or moving robot have been interested in such subjects as multi-vehicle cooperation, formation control, and collision avoidance. The following points are important in studying these subjects from the engineering or general teclmical standpoint: 1) Adoption of positive (cooperative) or negative (collision avoidance) strategy 2) Vehicle-to-vehicle or vehicle-to-station communication 3) Relative but accurate position and direction determination Table 1 shows the outline of research fields related to collision avoidance. 4. VEHICULAR COMMUNICATION For mobile communication, radio wave serves as efficient, economical and conveniently available means. However, radio wave communication in factory, office or outdoor sites causes unexpected, obstructive noise, which adversely influences AGV operation. The recently popular spread spectrum method may also prove detrimental to other AGV systems. Optical communication involves the establishment of a wireless system linking communication partners directly via optical beana. The important characteristics of optical communication include the following: 1) Absence of electromagnetic interference or noise 2) Communication capacity larger than that of radio 3) Absence of multi-path interference 4) Little leakage of information

688 Table 1. Research on collision avoidance in Japan

Recently, two optical methods have been proposed in Japan for frce communication between moving vehicles. 4.1 Control system for mutual tracking and follow-up This system was dcvclopcd in a MITI special projcct from 1984 to 1991. 4.2 Method using laser and corner cube (retroreflector system) This system has basically proved successful. The system requires one target tracking system, and one comer cube pemfitting modulation via signal. The tracking and follow-up control in the rctroreflcctor system need not be accurate; inaccuracy in the control is compensated by the accurate reflection of tracking optical beam on the transmitter via comer cube. Fig. 14 shows the principle of optical twoway communication using one light beam

Operation (Dynamic)

Operation (Static) . . . . .

.Heuristic

9Gradient methods (Ishitani, 1983)

(Noborio, 1989) Global (Kondo, 1988) (Okutani, 1983) 9Dynamic Programmin~ 9Potential methods (Ikegami, 1980) 9Fuzzy method (Takeuchi, 1988) 9Multipurpose program Local (Miyake, 1987)

1111. 5. MOTOR CONTROL

9Laplace potential (Akishita, 1991) 9Vertual impedance (Arai, 1989) 9Fuzzy method (Koyama, 1991) (Ishikawa, 1993) 9Multipuq~ose program (Saito, 1990) -Others (Tsubouchi, 1994) ( Nagata, 1993)

Recently, a new type of AC servo motor has come to be used widely for AGV. The cur-

Laser transmitter

Laser modulator

/

[Interface

Comer cube with laser modulator

Beam splitteQ..._ J

Interface ]

I"+ Computer Laser transceiver

Computer Respondent device

Fig. 14 Principle of optical two-way communication using comer cube

689 rent AC scrvo control system requires the following: 1) Low-inertia motor 2) Maintenance-free operation 3) Precise positioning and quick response by virtue of speed pattern matching and higher stall torque Cost reduction will lead to the filrther expansion of demand for AC servo motors. 6. AGV ENERGY PROBLEM

Almost all AGVs need powcrfitl battcrics as energy sources. Some AGVs for indoor use obtain powcr via wire cable. Since cable is not suitable for guidance on a complex route, some connectors for releasing and catching power have bccn designed. Recently, an induction-type power transmission system has been developed for commercial use [12]. When high-frequency current flows in the power transmission line, electromotive force is generated in the pickup coil installed on the AGV bottom through electromagnetic induction, transferring electric power to AGV. This system is durable, quiet and spark-free. Accordingly, it cart bc operated at dusty, ',vet or humid locations. 7. CONCLUSION Many technical problems concerning AGV remain to be solved. The AGV system is highly interesting for cngincers specializing in mechanics, electrical systems, instrumentation or computer. However, researchors in AGV invariably face cost problems and resulting constraints. This paper has discussed solutions to many AGV-rclatcd problems, including those concerning automatic guidance and positioning; these solutions involve the use of optical methods and systems. REFERENCES 1. T. Tsnmura, 1986 IEEE Robotics and Automation (1986) 1329. 2. T. Tstlmura, Jour. Soc. of Instr. and Cont. Eng. (SICE), 30, 1 (1991) 1 ( in Japanese). 3. A. Mcystcl, Autononaous Mobile Robots, World Scientific Publishing (1991). 4 T. Tsumura, Jour. Systems, Control and Information, 37, 6 (1993) 327 ( in Japanese). 5 T. Takcda, Proc. '86 IEEE Robotics & Automation (1986) 1349. 6 T. Tsumura, Proc. "94 JAPAN-U.S.A. Symposiuna on Flexible Automation (1994) 293. 7 H. Mochizuki, Proc. 7th ADVANTY Syrup. (1994) 85 ( in Japanese). 8 Y. Nakahara, Proc. 3rd Struct. Robot Syrnp. (1993) 309. 9 T. Ocl~i, Proc. 7th ADVANTY Syrup. (1994) 57 ( in Japanese). 10. K. Nishide, Proc. '90 JAPAN-U.S.A. Syrup. on Flexible Automation (1990) 531. 11. T. Tsumura, Proc. Intelligent Vehicle '93 (1993) 366. 12. Catalog of DAIFUKU Co. Ltd. (1994).

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

691

R o b o t i c s in M e d i c i n e Paolo Dario, Eugenio Guglielmelli and Benedetto Allotta ARTS Lab - Advanced Robotics Technology & Systems Laboratory, Scuola Superiore Sant'Anna via Carducci 40, 56127 Pisa, Italy

This paper reports the current state-of-the-art in medical robotics. Rather than trying to enumerate all the possible applications of robots or robotic technologies to medicine, three general areas of advanced robotics are identified which, based on the current technological background and expertise, could potentially provide significant improvements to the state-of-the-art in medicine. These are: macro robotics, micro robotics and bio-robotics. Macro robotics includes the development of robots, wheelchairs, manipulators for rehabilitation as well as new more powerful tools and techniques for surgery; micro robotics could greatly contribute to the field of minimally invasive surgery as well as to the development of a new generation of miniaturised mechatronic tools for conventional surgery; bio-robotics deals with the problems of modelling and simulating biological systems in order to provide a better understanding of human physiology. According to this classification, a review on the most important past and ongoing research projects in the field is reported. Some commercial products already appeared on the market are also mentioned and a brief analysis of the economical potentialities of robotics in medicine which can be prefigured in the near future is presented. 91.

INTRODUCTION

The application of robots - or more generally of technologies and know-how derived from robotics research- to medicine has moved rapidly in the last few years from the speculation of a small group of "visionary" scientists to reality. Today robotics can be considered as a real opportunity, available to a range of operators in the medical field, as well as to industries which want to explore a market that can quickly become very attractive [1]. The growth of interest on medical application of robotics has been so rapid recently, that is already difficult to provide a "still" picture of this field. However, we propose a classification that may be helpful as a guideline to discuss the main applications or perspectives of robotics in medicine. This classification is presented in Figure 1.

692 Rather than trying to enumerate all the applications of robots or robotic technologies to medicine, we have identified three main areas of advanced robotics which, based on the current technological background and expertise, could potentially provide significant improvements to the state-of-the-art in medicine. Robots can find practical application in two main medical fields: surgery and assistance to disabled and elderly persons. Moreover, robotics can also be conceptually associated to biological systems in the area that we can broadly define as bio-robotics. In this area, robotics can be seen as a "metaphor" of biological systems, and robotics research as an important bridge between h u m a n and biological sciences, on one side, and artificial intelligence, on the other side.

Figure 1.

Main applications and perspectives of medical robotics

However, instead of being just speculative, research on bio-robotics may also lead to a number of practical applications for the substitution or augmentation of organs a n d / o r functions of humans. A distinction that can be helpful to clarify some basic concepts and to discuss applications of medical robotics is the one we propose between macro robotics and micro robotics. In fact, this distinction implies not only different size, b u t - and most important - different intrinsic design features and mode of operation. This is particularly clear in the area of surgery, which most scientists and industrialists

693 perceive as the most promising field of application for robotics, and where two substantially different approaches can be envisaged: the use of robots as rigid accurate and autonomous machine tools, on one hand, or the use of robot-like endoscopes, flexible and teleoperated by the surgeons, on the other. Somewhere in the "middle" is an area also very interesting, which some investigators refer to as "mechatronic tools for surgery". The aim of this branch of medical robotics is to broaden the concept of robotic device for surgery by taking advantage of methodologies and technologies directly derived from the state-of-theart in robotics. Considering the valuable progresses of the last decades in the field of micro-mechanics and mechatronics, this approach could potentially lead to a quick and wide diffusion into the market of innovative surgical tools and thus to clinical practice. Significant potentialities have been also identified in the field of rehabilitation robotics. For instance, the feasibility of desktop robotic assistants has been already demonstrated, even though under particular conditions [2, 3]. In spite of this, the potential of mobile robotic systems has not been clearly defined yet. In the last decade, the prototypes of robots dedicated to household tasks which have been commercialised did not encountered the favour of the market, mainly due to their high cost associated to poor performance with respect to the average expectation of the user. In addition, the market for rehabilitation robotics is far from being a single one, actually being still strongly dependent on the trends of the mass consumer market. However, the success of such systems will mainly rely on their modularity and easiness of use, which must be considered as key factors from the very beginning of system design. In fact, modular components and friendly user interfaces could represent the real link between the mass consumer market and the rehabilitation technology market. Assistance systems currently available on the market require heavy adaptation of the house by means of special building design, installation of centralised environmental control systems (for door, windows, appliances, etc.) and of fixed desktop workstations. These systems are rather expensive, especially if also the cost of building special apartments and residences is taken into account. On the contrary, mobile personal robots represent a highly attractive solution, even in economical terms, as they could significantly contribute to minimise the required degree of adaptation of the house. This fact will not only decrease the global cost for the installation of the assistance system but it will also have some functional and cultural implications: the house will be available for use to both able-bodied and disabled people, no specialised environment will be necessary apart from the rest of the house, and consequently no problems will arise if the disabled person needs to move to a new apartment. In this paper, an overview of the past and ongoing research projects as well as of the first commercial products already appeared on the market is reported for the areas of surgical and rehabilitation robotics. Finally, basic concepts and potential applications of bio-robotics will also be discussed with reference to current research activities in the field.

694 2.

ROBOTICS FOR SURGERY

Many different projects in this field have been carried out during the last ten years and few of them already generated industrial systems that are currently under clinical evaluation in hospitals [4, 5, 6]. The numerous applications to surgery can be classified in two main areas: those based on "image-guidance" and those aimed to obtaining minimal invasiveness. 2.1 Image-guided surgery

The basic concept behind image-guided surgery is the use of a robot workstation integrated into the operating theatre where some of the parts of the patient's body are fixed by means of suitable fittings. This scenario is easy to implement for orthopaedic surgery, where fixators are commonly used to fix bones and also for neurosurgery, where the stereotactic helmet, mounted on the patient's head, is quite popular to provide absolute matching between pre-operative and intra-operative reference frames. Vision-based surgery may be viewed as a robotic CAD-CAM system where diagnostic images (from CT, NMR, US, etc.) are used for off-line planning of the intervention. The robot is used in a CNC machine tool-like fashion for precise cutting, milling, drilling and other similar tasks. A better quality of the intervention results from better performance of robots with respect to the manual operation of a surgeon in terms of accuracy and repeatability. A clear demonstration of the superiority of robot cutting versus normal cutting is shown in Figure 2 for the case of bone milling for hip implant [5].

Figure 2.

A comparison between (a) robot's and (b) surgeon's performance in bone milling for hip replacement. Reprinted from [5].

Real-time images may also be used during the intervention in combination with diagnostic images and tool position/orientation data in order to provide the surgeon with feedback about the current state of the intervention. It is important to point out that the surgeon supervises the robot system during operation.

695 Among the obvious differences between an industrial robot application and a surgical one, the need for suitable matching procedures between diagnostic images and off-line intervention planning on one hand and real execution on the other hand is still a key issue. As mentioned before, the issue of matching has been addressed and solved in some cases (specifically in the case of bone cutting in orthopaedics), but many problems still remain open due to the fact that most interventions on parts of the human body involve soft tissues and large deformations may occur. This results in possible discrepancies between pre-operative and intra-operative images. Image-guided surgery includes orthopaedic surgery, spine surgery, neurosurgery, reconstructive/plastic surgery and ORL surgery. A very representative example of implementation of image-guided robotic surgery is the one proposed by R. H. Taylor et at. [6] which has been implemented in an industrial system (Robodoc, Orthodoc, ISS Inc., Sacramento, CA, USA) currently used in human trials for the automated implant of hip prostheses.

Figure 3.

A view of the hip replacement surgery system in the operational theatre. Reprinted from [5].

The main reason which motivated many researchers to explore the use of robotic devices to augment a surgeon's ability to perform geometrically precise tasks is the consideration that the precision of surgical planning often greatly exceeds that of surgical execution. The ultimate goal of this effort is a partnership between men (surgeons) and machines (computers and robots) that seek to exploit the capabilities of both to perform a task better than either can perform alone [5]. The architecture of the hip replacement surgery system depicted in Figure 3, consists of a CT-based presurgical planning component, shown in Figure 4, and of a surgical component. The surgical procedure includes manual guiding to approximate positions of pins pre-

696 operatively inserted into bones (which are fixated to the operating bed) and automatic tactile search for each pin. Then, the robot controller computes the appropriate transformation between CT and robot coordinates and uses this information to machine out the implant cavity. Finally, the pins are removed and the surgery proceeds in the conventional manual procedure.

Figure 4.

An example of the pre-operative planning procedure for the hip replacement system. Reprinted from [6].

Safety issues have been taken into high consideration. In the "Robodoc" system they include extensive checking [7] and monitoring of cutter force, and the possibility for either the surgeon or the controller to freeze all robot motion or to turn off manipulation and cutter power in response to recognized exceptions. Techniques which are essentially similar to the one described before, but which have been adopted to different operation tasks and scenarios, have been developed for the cases of total knee arthoplasty [8] (see Figure 5), percutaneous discectomy [9], spine surgery [10] (see Figure 6), neurosurgery [11, 12], prostate surgery [13], and eye surgery (by the group of Ian Hunter at the McGill University, Canada).

Figure 5.

Example of knee prosthesis implant : (a)before intervention; (b)after intervention; (c) detail of femur resection obtained by robot cutting. Reprinted from [8].

697 Similar approaches have been presented by many authors, for example by Finlay and Giorgi [14] for neurosurgery (see Figure 7), by Stfittem et al. for ORL surgery

[~5].

Figure 6.

An example (a) of computer assisted spine surgery for treating a case of scoliosis (b). Reprinted from [10].

Figure 7.

A stereotactic helmet equipped with passive arm for neurosurgery. Reprinted from [14].

Teleoperation, virtual reality environments and advanced man/machine interfaces will probably play a key role in the future of image-guided surgery. Teleoperation can be useful in some cases, such as when a patient must be operated urgently in a place where no specialised surgeon is available (for example, on a battle field or an ambulance), or when for safety reasons (patients with infective diseases, or long operations under X-ray) it is not appropriate for the surgeon to be within the operation field. Another critical problem which can be solved by means of

698 teleoperation is the unavoidable increasing requirements in terms of room for equipment in the neighbourhood of the patient's bed, so that it might become necessary for the surgeon to move away and operate remotely. Advanced man-machine interfaces and force replication devices might also play an important role in the framework of intervention simulation and surgeon training carried out in virtual environments featuring realistic 3D representations of body organs. Some examples of interfaces possessing the sophisticated features which are required for truly realistic simulations of surgical interventions are already existing, like the one developed by Bergamasco et al. [16] at the Scuola Superiore Sant'Anna (Pisa, Italy) and many more will probably appear in the near future, like the one under development at the McGill University (Canada) by Hunter et al.. 2.2.

M i n i m a l invasive surgery

Minimal invasive surgery (MIS), also called "endoscopic surgery", is gaining increased acceptance as a powerful technique beneficial to patient's integrity time of recovery and cost for assistance. At its current stage of development, MIS depends on three prerequisites: the availability of high quality video endoscopy, the ability of high precision surgical instruments and the manual skill of well-trained surgeons [17]. MIS requires accessing the organ to be operated through a small hole, and the surgeon, although directly responsible for the manipulation of the surgical tool, misses large part of the information necessary to control finely the end effector. At present MIS is a sort craftmanship, where operating surgeons has to compensate with their skills the fact that they can not touch and sense with their fingers for diagnostic purposes, they may not have 3D view of the workspace, the access to the workspace is restricted, they can not feel the forces/torques and pressure they are exerting at the end effector tip. A possible scenario of the next generation of MIS consists of a combination of telemanipulation and telerobotics technology, as depicted in Figure 8.

Figure 8.

A possible scenario for teleoperated surgery. Reprinted from [17].

699 Generally speaking, the contribution of robotics to significant decrease of the level of surgical invasiveness could involve three different fields: the first is laparoscopic surgery, which is based on stiff tools that the surgeon manipulates directly and by which can keep some (even if small) degree of "sensation" on the features of the operation workspace; the second is commonly referred as endoscopic surgery which makes use of flexible endoscopes and implies the virtual loss of any type of "sensation" for the surgeon; the third is not linked to any specific type of surgery and consists of an attempt of improving the performances of traditional macro surgical tools by applying mechatronic technologies aimed at decreasing the invasiveness of tool operation.

Figure 9.

Example of catheter tip with increased dexterity. Reprinted from [ 18].

Major developments efforts are needed in such areas of robotics technology as sensor integration, force reflection, miniaturisation of mechanisms and actuators, control. A very challenging approach to MIS is pursued in Japan, where the development of teleoperated micro-catheters capable of diagnostic and surgical interventions within brain blood vessels is currently underway. The micro-catheter will possess high dexterity at the tip like the macro one shown in Figure 9, and all along its length and will incorporate micro-fabricated tactile flow and pressure sensors at the tip, along with micronozzles and micropumps for local injection of drugs and solutions for dissolving thrombus [18]. A scheme of the system for brain blood vessels diagnosis and surgery is depicted in Figure 10. As outlined by Sato et al. [19] and by Morishita et al. [20] (see also Figure 11), the development of suitable instrumentation for MIS requires considerable research efforts both in miniaturisation of components and the adaptation of teleoperation techniques [19].

700

Figure 10. Scheme of the system for brain blood vessels diagnosis and surgery. Reprinted from [18].

Figure 11. Relations between micro world and human world. Reprinted from [20].

Figure 11. Relations between micro world and human world. Reprinted from [20].

701 An example of next generation micro endoscopes for MIS is given in Figure 12.

Figure 12. A next generation micro endoscope for MIS. Reprinted from [18].

3.

R O B O T I C S FOR R E H A B I L I T A T I O N

The possible role of robotics in the field of rehabilitation has been widely investigated in the last decades. Possible specific applicative areas which have been already identified range from the assistance to the disabled and the elderly, by means of robotic manipulators, intelligent wheelchairs and dedicated interfaces for household and vocational devices, through the restoration of impaired functions, by means of advanced prostheses, ortheses and electrical functional stimulation (FES), to the development of virtual environments for training and genuine rehabilitative therapies. Whatever is the selected approach, one of the key factors for the success of robotic a i d s i s certainly the potentiality to make these peculiar users still able to exert a complete control on their environment by using a robotic interface. In rehabilitation robotics, the term "environmental control" refers to a disabled user's capacity to actively interact with his or her external environment [21]. Although all of the sensory and motor functions are necessary for a complete environmental control, disabilities based on partial or total loss of upper limb

702 function are particularly serious, due to the consequent reduction in, or loss of, the manipulative function. This kind of disability is the most significant impediment in carrying out common everyday activities (e.g. personal hygiene, job, hobbies): the user receives the external stimuli but is then unable to respond to, or act on them (by modifying the external environment, for example). When lower limb function is also reduced (or lost), the physical (and psychological) loss of control is profound, and makes a disabled user dependent on others in virtually every respect. Numerous research teams, potential users and manufactures are already involved in developing techniques for environmental control, and for controlling robots in the context of rehabilitation and assistance for disabled users. At present, one of the commonest use of rehabilitation technology puts a general purpose computer at the hub of a multi-purpose 'cockpit', and the users operate all of their (specialised or adapted) products from that cockpit [22]. There are some advantages to this, especially for severely disabled or bed-ridden users, but less so for users with moderate, or age-related disabilities: the user is distanced from the task itself, the interaction style is based on the computer, rather than on the product being used and the task being performed. In such a context of use, and for these users, the computer remains an obtrusively technical device which tends to appear as the unique link between disabled users and their environment. 3.1

Manipulators in rehabilitation One of the primary objectives of rehabilitation robotics has always been to fully or partly restore the disabled user's manipulative function by placing a robot arm between the user and the environment. Some important factors must be considered in the design of such a peculiar environmental control system: the user's degree of disability (a system must be flexible enough to be adapted to each user's capabilities); modularity (system inputs and outputs must be easy to add or remove accorfling to each user's needs); reliability (a system must not let the user down); and cost (the system must be affordable). According to the state-of-the-art of rehabilitation robotics, three different configurations of robot systems, differently reflecting the above mentioned factors, have been considered as feasible for the assistance of severely and moderately disabled users. Historically, the first configuration which has been investigated is the bench- or table-mounted manipulator included in a completely structured desktop workstation [23, 24, 25, 26, 27, 28]. Even though various systems based on this approach were positively evaluated with users [2, 3, 28, 29, 30] and some commercial products already appeared on the market, such as the DEVAR system (Tolfa Corporation, Palo Alto, CA, USA) and the RAID system (Oxford Intelligent Machines Ltd., Oxford, Great Britain), yet they seem to be particularly suitable for assisting disabled employees for executing vocational tasks at their workplace. In fact, desktop workstations better reflect the type of organisation of space and time which is typical of vocational activities. Furthermore, this approach brings all the previously discussed drawbacks of using a 'cockpit' environmental control system. One of the first prototypes of desktop workstation, the MASTER (Manipulator Autonomous at

703 Service of Tetraplegics for Environment and Rehabilitation) system was developed in France by CEA (Paris) and is shown in Figure 13. The wheel-chair-mounted arm is particularly suited to users with upper limb disabilities, but its usefulness relies on the user being able to move (or control the movements of) the wheelchair, so that the robot can be taken to the area(s) of activity: the home environment must also be adapted to suit the robot's working height, making it an intrusive solution. This solution, depicted in Figure 14, is becoming much popular as it allows to the disabled or elderly to use the robot arm anywhere, not being necessarily related anymore to some fixed structured locations [31, 32] and is being widely experimented [33, 34].

Figure 13. The MASTER desktop workstation (CEA- France)

Figure 14. The wheelchair-mounted MANUS manipulator. Reprinted from [32].

However, some technical problems, mainly concerning the accuracy which can be obtained on grasping operations (the arm is fixed to the wheel-chair which is not properly a rigid structure) and the possibility to equip the wheel-chair with a friendly (and then complex) arm controller (the available space and the battery energy are limited) have still to be solved. Moreover, this solution does not actually address all disabled users, but only those with upper limb disabilities. Consequently, it seems to have limited concrete perspectives to become economically attractive by inducing mass market demand. In both previous cases, a severely disabled or bed-ridden user is not well catered for: the workstation dominates the user's home environment, while the wheelchairmounted robot is simply not an option. A third different solution is that of using an autonomous or semi-autonomous mobile vehicle equipped with a manipulator and additional sensor systems for autonomous or semi-autonomous operation. Its mobility and versatility make it particularly suitable for severely disabled or bed-ridden users, as long as the interface between the user and the robot is easy to use: the user should be able to

704 instruct the robot with a high-level language, via a bi-directional user interface offering appropriate methods of input and suitable output. This configuration has been first used in industrial applications (e.g. textile industries) and is surely the most sophisticated one, but it is also the most generally applicable, since the idea to have a robot in personal service can result attractive for both able-bodied and disabled users. A first prototype of a mobile robot for rehabilitative applications was developed by S. Tachi et al. of the MITI Japanese laboratories. This system, named MELDOG, was devised only to act as a robotic "dog" for blind patients, thus not having any possibility of manipulating or carrying objects. Generally speaking, the ideal system should include sub-systems dedicated to vision, fine manipulation, motion, sensory data acquisition, and system control. A sketch of a possible configuration for such a robot [1] is shown in Figure 15. Unfortunately, the cost of this solution is often prohibitive for all but the wealthiest of users, and the usability problems inherent in such sophisticated systems have yet to be satisfactorily solved for non-professional users. Various prototypes of autonomous or teleoperated mobile robots for the assistance to the disabled in different activities have been implemented and others are currently under development [35, 36, 37].

Figure 15. Concept of a mobile robotic aid. Reprinted from [1].

As a sample, Figures 16 and 17 illustrate two ongoing research projects in this field: namely, the american MOVAR system, mainly devised for vocational use, and the Italian URMAD system, mainly devised for household applications. Both prototypes have been mostly implemented. In particular, the URMAD system (the acronym stands for "Mobile Robotic Unit for the Assistance to the Disabled") will be

705 widely experimented with the final end users, i.e severely disabled patients, before the end of 1994.

Figure 16. The Mobile Vocational Assistant (MOVAR). Reprinted from [1 ].

One of the main unsolved technical problems is that the more the performance of the robot is enhanced the more its dimensions (imposed by the standard building norms), its autonomy of operation, and consequently its cost become excessive for prefiguring an effective commercial exploitation.

Figure 17. An overview of the URMAD system.

An attempt [38] to overcome these drawbacks and limitations is being carried out in the framework of the TIDE-MOVAID project by an European team coordinated by

706 the ARTS Lab of the Scuola Superiore S. Anna. As illustrated in Figure 18, the final objective of this project, in fact, is to develop a complete system, including a mobile robot, having globally functional performances similar to URMAD but still respecting the limitations imposed by a normal household environment so to avoid heavy adaptation of the house.

Figure 18. The MOVAID approachi a mobile base which fits into different activity workstations.

This goal is being pursued by partially distributing resources in the environment instead of concentrating them on the vehicle. As long as distribution of resources is well balanced and technologies are properly integrated in the domestic environment, such a realistic solution could represent a good compromise between the current state-of-the-art in advanced robotics and the ideal concept of the autonomous robotic assistant in a modern domestic scenario, thus hopefully favouring a rapid commercial exploitation. To this aim, the MOVAID project, rather than developing new basic hardware components, will take advantage of the results of previous research projects. As a sample, in Figure 19 the URMAD manipulator, which will be also used for the MOVAID mobile unit, is shown. Another important aspect of the MOVAID project is that the emphasis is on the friendliness of the system's interface and on a non-intrusive integration of technologies in the domestic environment. In this optics, MOVAID represents a potential opportunity to have a direct transfer of advanced technologies towards the home environment. This approach, which is strictly related to the increasing industrial interest in "domotics" (i.e. the development of a "smart" house accessible to all users, including the disabled and the elderly) could also have interesting implications in the medical field (telemedicine, home assistance vs. clinical assistance for the disabled and the elderly, etc).

707

3.2 Robotic systems for hospitals Mobile robots could become one answer to the current shortage of help in hospitals, as well as one solution to the problem of the diseases (lumbago, low back pain) which affect the personnel involved in heavy physical tasks such as lifting a patient and carrying h i m / h e r to the toilet or changing sheets in the bed.

Figure 19. The URMAD manipulator: an 8 d.o.f, redundant dextrous arm purposely developed for service applications (SM - Scienzia Machinale srl - Pisa, Italy).

An example of implementation of a hospital transport mobile robot has been presented recently by Transition Research Corporation (TRC Inc., Danbury, CT, USA). The robot, depicted in Figure 20, has been designed and built for addressing the need for assistance with such tasks as point to point delivery [39].

Figure 20. The HelpMate hospital transport mobile robot. Reprinted from [39].

708 The objective of the hospital transport mobile robot ("Help Mate") is to curry out such tasks as the delivery of off schedule meal trays, lab and pharmacy supplies and patient records. The navigation system of HelpMate, unlike many existing delivery systems in the industry which operate within a rigid network of wires buried or attached to the floor ("AGVs"), relies on sensor-based motion planning algorithms that specifically address the issue of navigation in a partially structured environment. The system is also able to handle sensor noise and sensor inaccuracy, errors in position estimation, and moving obstacles (e.g. people). HelpMates have been installed in several hospitals. In some hospitals HelpMate is in operation 24 hours per day and the hospitals are reporting an increase in productivity and efficiency. The HelpMate represents a useful and probably industrially valuable solution to some basic needs requiring the transportation of lightweight objects. The complexity of the system is deliberately kept low by eliminating automated manipulation (which is carried out by human operators), by assuming flat floor in the working environment (the robot uses elevators - which are controlled by means of elevator control computers activated by an infrared transceiver-, and doors), and by providing the robot with accurate geometric and topographical information about the hospital hallways, elevator lobbies and elevators. Further help to nurses could be provided by heavier robots, designed to execute tasks requiring hard muscular work. Japanese laboratories and industries have identified this field as very promising, and have invested substantial efforts in the development of fetch and carry robots for hospitals, usually hydraulically actuated and featured by high payload. An interesting example of such type of robot is the patient care robot named "MELKONG", that was developed a few years ago by the Mechanical Engineering Laboratory (MEL) in Japan [40]. The "MELKONG was intended to lift, hold and carry an adult patient (weighing up to about 100 kg) or a disabled child. The robot docked to the bed in the hospital room, lifted the patient in its arms from the bed, moved back still holding him/her in its arms and transferred him/her to the toilet, or bathroom, or dining room. Usually the robot was controlled by a nurse, but it was expected that at night the patient could also call the robot and control it by means of simple commands given by means of a joystick. Serious problems related to automatic docking, mobility, manipulation, actuation (by hydraulic actuators), energy supply, man/machine interfaces were addressed and solved. The MELKONG concept has evolved from the early prototype described above to more sophisticated versions incorporating functional and aesthetic improvements, such as the one depicted in Figure 21. A transfer-carrier vehicle based on an evolution of the MELKONG concept is commercially available in Japan from Sanyo Electric Co. Ltd. Furthermore a simple functional robot aimed at supporting the elderly and the disabled for independent living, in particular for evacuation (a function that the elderly wished to do by him/herself if an adequate support equipment is available), is also being developed in Japan jointly by the National Institute of Bioscience and

709 H u m a n Technology of AIST, Aprica Kassai Inc. and Hitachi Ltd, in the framework of the National Programme for Welfare State and Apparatus (Towards a New Society).

Figure 21. Recent version of a robot for lifting bedridden patients. Reprinted from [41].

3.3 Intelligent wheelchairs For a physically disabled person the main advantage of using a transport mechanism, like a wheelchair, is to allow h i m / h e r to achieve some degree of personal mobility. In the case of a wheelchair carrying a robot arm, the severely disabled can use the robot arm anywhere, and not only remotely or in fixed permanent locations. In order to increase the performance of ordinary wheelchairs (and of the robot manipulators possibly mounted on them), a number of approaches have been proposed based on robotic and mechatronic technologies. These approaches comprise attempts to develop autonomous vehicles which can be used to transport a person from one location to another with little or even without outside assistance, as well as attempts to increase the capability of the vehicle to locomote on unprepared surfaces and to overcome obstacles. An example of the first approach is represented by self-navigating wheelchairs, as the one proposed by Madarasz et al. a few years ago [42]. The vehicle, designed to function inside an office building, is able to plan its own path from its current location to a particular room in the building, and then to travel to that location. The system must also function with minimum impact on the building in which it will be used, that is the building cannot be equipped with a guidance mechanism, such as embedded wires in the floor or painted stripes that can be followed. Therefore, the wheelchair becomes substantially a sort of mobile robot with high degree of autonomy. In fact, the vehicle is self-contained: all of the sensing and decision making are performed by the on-board equipment. This approach relieves the disabled person from tasks h e / h e r may be unable to carry on, but a system for

710 supervised control is provided for high level commands and for other types of operations requiring direct guidance. In Europe, a recent example of a project aimed at developing a wheelchair featured by partly autonomous behaviour is represented by the European TIDEOMNI project. An italian manufacturer (TGR s.r.l., Ozzono Emilia, Italy) produces a wheelchair (named "Explorer") incorporating both wheels and tracks. This wheelchair, which has been also modified to host the Manus system in the framework of the European SPRINT-IMMEDIATE project, can not only run on regular terrain, but also go up and down stairs with the user on board. This approach represents an evolution towards the possible development of a new generation of vehicles designed to deeply enhance the mobility of the user. A very interesting example of this evolution is the adaptive mobility system proposed recently by Wellman et al. [43] at the University of Pennsylvania, Philadelphia, USA The design of the system, that is basically a hybrid vehicle incorporating wheels as well as two "arms" that can work both as manipulators and as legs, is based on the assumption that a legged vehicle allows locomotion in environments cluttered with obstacles where wheeled or tracked vehicles can not be used. A legged vehicle is inherently omnidirectional, provides superior mobility in difficult terrain or soil conditions (sand, clay, gravel, rocks, etc.) and provides an active suspension. The legs also give the chair versatility and allowed it to be re-configured. When stationary, one of the legs can be used as a manipulator in order to perform simple tasks such as reaching for objects or pushing open doors. 4.

F R O M A D V A N C E D P R O S T H E S E S A N D O R T H E S E S TO F.E.S.

There have been frequent intersections between robotics and limb prosthetic technologies in the past. Many devices, like artificial legs, artificial hands and arms, have evolved in the '60s and '70s both as prostheses for amputees and as possible components of advanced robots. Examples of these devices are the Belgrade hand [44] and the UTAH arm [45]. More recently the UTAH-MIT dextrous hand [46] was designed as a robotic hand by taking inspiration from the human hand, whereas new prostheses for amputees have been developed by exploiting last advances in robotic technology (like the one developed in the framework of the European TIDEMARCUS project). However, it is quite obvious and very clear to all of those working in the field of aids for disabled that, although disabled persons may accept "artificial" devices as assistants, their dream and ultimate goal is to be able to manipulate and walk again. Although this is out of reach for current medical capabilities, a few promising approaches are being pursued by some investigators which might ultimately lead to render that dream closer to reality. A first example of robotic device that has been developed to help patients with impaired walking capabilities to restore their functions is the one developed in Japan and illustrated in Figure 22.

711 An evolution of the above mentioned assistive device for "natural" walking is the active orthosis, whose development has been pioneered by Prof. Pierre Rabishong and his team of INSERM, in Montpellier, France [47]. This active orthosis, which is shown in Figure 23, was intended and used for clinical research and re-education. A further application of the system was the combination of the re-education apparatus with functional electric stimulation (F.E.S.) studies. This evolution is the goal of a EUREKA project, called CALIES (Computer-Aided Locomotion by Implanted Electro-Stimulation), which represents probably the most important coordinated effort presently carried out in the world for restoring autonomous locomotion in paralysed persons. The project investigates the possibility of implanting stimulating electrodes into lower limb muscles, or even nerves, and to obtain close to natural walking by providing appropriate stimulation patterns by means of an external portable computer.

Figure 22 Robotic device for walking rehabilitation. Reprinted from [48]. A key component for improving the performance of FES-based apparatus is the implanted electrode. Development in this field might not only allow to obtain better systems for computer-assisted manipulation and locomotion, but eventually even lead to achieve the dream of restoring natural manipulation and locomotion. In fact, some projects aim to develop implantable neural prostheses based on the capability of the peripheral nervous system to regenerate, which could establish a bidirectional electrical continuity between the nervous system and external devices [49, 50]. One of these projects, the European ESPRIT-INTER project [51], investigates an approach based on a combination of silicon microfabrication technology, polymer channel guidance and nerve growth factors release, in order to promote nerve regeneration through a pattern of microholes with electrodes, thus rendering it possible to pick up sensory signals and to selectively stimulate nervous fibers. Other projects are exploring the possibility of by-passing interruptions in the nervous system (even at the central level) by artificial nerve grafts, fabricated by conductive

712 polymer fibers. Some other projects try to obtain nerve regeneration at the spinal cord level by implanting fetal nervous cells. It is the hope of many researchers that this class of "hybrid" devices, which exploit the properties of artificial materials combined with those of biological factors (possibly modified by means of biotechnology), could eventually allow to restore the continuity of nervous pathways, and make the dream of paralysed persons to move their own limbs true.

Figure 23. Active orthosis: master-slave version. Reprinted from [47]. In fact, some projects aim to develop implantable neural prostheses based on the capability of the peripheral nervous system to regenerate, which could establish a bidirectional electrical continuity between the nervous system and external devices [49, 50]. One of these projects, the European ESPRIT-INTER project [51], investigates an approach based on a combination of silicon microfabrication technology, polymer channel guidance and nerve growth factors release, in order to promote nerve regeneration through a pattern of microholes with electrodes, thus rendering it possible to pick up sensory signals and to selectively stimulate nervous fibers. Other projects are exploring the possibility of by-passing interruptions in the nervous system (even at the central level) by artificial nerve grafts, fabricated by conductive polymer fibers. Some other projects try to obtain nerve regeneration at the spinal cord level by implanting fetal nervous cells. It is the hope of many researchers that this class of "hybrid" devices, which exploit the properties of artificial materials combined with those of biological factors (possibly modified by means of biotechnology), could eventually allow to restore the continuity of nervous pathways, and make the dream of paralysed persons to move their own limbs true.

713 5.

BIO-ROBOTICS

Biological systems are not only the recipient of the services of robots, but also the source of inspiration for components and behaviours of future robot systems. This not well defined, but intriguing and stimulating area of interest for robotics, can be called "bio-robotics" and is currently receiving an increasing amount of attention by many investigators. In general, biological systems are a living proof that some complex functions (both sensorimotor and "intellectual") that robotics researchers should like to realize in artificial systems can actually be implemented. Locomotion, manipulation, vision, touch are all functions which living beings execute seemingly without effort, but which turned out to be extremely difficult to replicate in artificial systems. In the recent past, many different groups have been active in this "borderline" area where the distinction between "robotics"and "bioengineering" becomes very subtle. A book discussing state-of-the-art results and perspectives in this field has been published recently [52]. Examples of components which are explicitly inspired to their biological counterparts (and which are intended to be the "core" of sensorimotor systems capable of replicating the function of their biological counterparts) are retina-like CCD sensors [53] and tactile sensors [54]. A photograph of the CCD vision sensor, whose geometry is inspired to the one of the human retina (including the high-acuity fovea-like central part), is reported in Figure 24. A scheme of a fingertip incorporating three different types of sensors which provide (in combination with appropriate sensorimotor acts) the robot controller with information on object geometry and material features comparable to those of the human fingertip, is given in Figure 25. A close-up of one of the sensors (a 256-element array sensor), which imitates the space-variant distribution of tactile receptors in the fingertip skin, thus emphasising the role of "attentive behaviour" in active touch, is also reported in the same figure.

Figure 24. The foveal structure of the CCD retina-like sensor. Reprinted from [53].

714

Figure 25. The ARTS Lab fingertip probe (a) and tactile sensor (b). Reprinted from [54].

Further applicationsin the fieldof bio-roboticsinvolve the use of artificialsystems as accurate models for investigating the physiology of biological systems. The laboratory which has pioneered this approach is the one headed by the late Prof. Ichiro Kato at Waseda University, which has developed robotic devices capable of playing different musical instruments such as organs, piano, violin and flute. A system developed for investigatingthe function of mastication in humans is depicted in Figure 26. Prof. Kato himself (ICNR '91) and other investigators (Coiffet [56], Rabischong [5Y]) have proposed even more intriguing speculations on the relations between human mind and robot mind. Based on these hypotheses the Waseda laboratory is currently investigating an approach to the assistance to the disabled and the elderly which involves not merely the concept of "service robots", but even the one of robot as "companions" of humans.

715

Figure 26. The mastication robot. Reprinted from [55].

6.

CONCLUSIONS

The reasons why robots did not gain immediate acceptance in the medical community are in part obvious, and in part more subtle. The obvious reasons are both psychological (robots may be perceived as "competitors" by physicians, and as potentially dangerous exotic machines by patients) and technical (industrial robots are reliable, but no real expertise exists in the world about robots working full time in the vicinity or even in contact with humans). The subtle reasons are related to a possible misconception of the very same notion of robots, which should probably be corrected in the interest of the robotic research community. In fact, most users perceive robots either as the industrial robot arm, or as an exotic and anthropomorphic creature. In the field of advanced robotics, and of medical robotics in particular, the robot leaves the factory floor and gets into physical contact with the h u m a n operator (the surgeon, the patient). In some cases, the robot will maintain the overall usual structure of an industrial robot (although new robot arms dedicated to medical application are being presented), but in most other cases robotic technologies will be embedded into tools which will not possess the traditional robotic "look". This shift should not be seen, in our opinion, as a problem, but rather as a very interesting and attractive opportunity for the robotics research community to extend their reach to a broader area (sometimes referred to as "mechatronics in medicine" or even "bio-mechatronics"). Nevertheless, the concrete experimental and clinical results achieved both in the fields of "macro" and "micro" medical robotics, and only partially reported in this paper, together with the economical and social motivations for using these new technological tools, permeated by robotic technologies, could certainly represent the best viaticum for a massive development of this new area of advanced robotics in the near future.

716 REFERENCES 1. J.F. ENGELBERGER: Robotics in service, Biddles Ltd., Guildford (GB), 1989. 2. J. HAMMEL, K. HALL, D. LEES, L. J. LEIFER, H. F. M. VAN DER LOOS, I. PERKASH, R. CRIGLER, Clinical evaluation of a desktop robotic assistant, Journal of Rehabilitation Research and Development, 2, 3, 1-16, 1989. 3. J. HAMMEL, H. F. M. VAN DER LOOS H. F. M., I. PERKASH, Evaluation of a vocational robot with a quadriplegic employee, Archive of Phys. Med. and Rehabil., 73, 683-693, 1992. 4. P. CINQUIN, J. DEMONGEOT, J. TROCCAZ, S. LAVALLI~E, G. CHAMPLEBOUX, L. BRUNIE, F. LEITNER, P. SAUTOT, B. MAZIER, A. Pt~REZ, M. DJAID, T. FORTIN, M. CHENIN, AND C. A. IGOR: Image guided operating robot: methodology medical applications, results. ITBM (Innovation and Technology in Biology and Medicine) - Special Issue on Robotic Surgery, 13(4):373-393, 1992. 5. H.A. PAUL,, B. MITTLESTADT, B. MUSITS, R. H. TAYLOR, P. KAZANZIDES, J. ZUHARS, B. WILLIAMSON AND W. HANSON: Development of a surgical robot for cementless total hip arthoplasty, Clinical Orthopaedics, December 1992, Vol. 285, pp. 57-66, USA. 6. P. ELURY, P. LOPEZ, D. GLAUSER, N. VILLOTTE, C. W. BURCKHARDT: Minerva, a robot dedicated to neurosurgery operations, Proc. of the 23th ISIR, Barcelona, Spain, 1992, pp. 729-733. 7. R. H. TAYLOR, H. A. PAUL, P. KAZANZIDES, B. MITTLESTADT, W. HANSON, J. ZUHARS, B. WILLIAMSON, B. MUSITS, E. GLASSMAN, W. L. BARGAR: Taming the bull: safety in a precise surgical robot, Proc. of '91 ICAR, International Coneference on Advanced Robotics, vol. 1, pp. 865-870, Pisa, June 1991 8. M. MARCACCI, M. FADDA, P. DARIO, S. MARTELLI, A. VISANI, T. WANG: The total knee arthoplasty project: a demonstration feasibility of the computer assisted approach, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), in, June 20-22, 1994, pp. 59-66. 9. M. FADDA, S. MARTELLI, P. DARIO, M. MARCACCI, S. ZAFFAGNINI, A. VISANI: First steps towards robot-assisted discectomy and arthroscopy, ITBM (Innovation and Technology in Biology and Medicine) - Special Issue on Robotic Surgery, 13(4):394-408, 1992 10. P. SAUTOT, S. LAVALLI~E, J. TROCCAZ, P. CINQUIN: Computer assisted spine surgery, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22,1994, pp. 51-58. 11. Y. S. KWOH, J. HOU, E. A. JONCKHEERE, S. HAYATI: A robot with improved absolute positionning accuracy for CT guided stereotactic brain surgery, IEEE Transactions on Biomedical Engineering, Vol. 35, N. 2, February 1988, pp. 153-160. 12. N. VILLOTTE, D. GLAUSER, et at.: Conception of stereotactic instruments for the neurosurgical robot Minerva, Proc. of the 14th Annual Int. Conf. of the IEEE EMBS '92, 1089-1090, Paris, France.

717 13. B. DAVIES: Soft tissue surgery, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22, 1994, pp. 67-72 14. C. GIORGI, M. LUZZARA, D. S. CASOLINO, AND E. ONGANIA:. A computer controlled stereotactic arm: virtual reality in neurosurgical procedures, in Advances in stereotactic and functional neurosurgery edited by B. Meyerson, Springer Verlag, Vienna, NY, USA, 1993, pp. 75-76. 15. J. STUTTEM, B.KREMER, B. KORVES, L. KLIMEK, G. KROCKELS, R. MOSGES. Robotics in ear, nose and throat surgery, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22, 1994, pp. 31-35. 16. M. BERGAMASCO, B. ALLOTTA, L. BOSIO, L.FERRETTI, G. PARRINI, G. M. PRISCO, F. SALSEDO, G. SARTINI: An arm exoscheleton system for teleoperation and virtual environments applications, Proc. of the 1994 IEEE Int. Conference on Robotics and Automation, San Diego, CA, USA, May 8-13, 1994, pp. 1449-1455. 17. E. HOLLER, W. WEBER: System and control concepts for a telemanipulator system to be applied in minimally invasive surgery, Proc. of the 1st IARP Workshop on Micro Robotics and Systems, Karlsruhe, Germany, June 15-16, 1993, pp. 111-120. 18. H. NARUMIYA: Micromachine technology (Intraluminal diagnostic & therapeutic system), Proc. of the 1993 Workshop on Micromachine Technologies and Systems, Tokyo, Japan, October 26-28, 1993, pp. 60-65. 19. T. SATO et al., Micro handling robot for micromachine assembly, Proc. of the 1st IARP Workshop on Micro Robotics and Systems, Karlsruhe, Germany, June 15-16, 1993, pp. 138-146. 20. H. MORISHITA, Y. HATAMURA: Development of ultra precise manipulator system for future nanotechnology, Proc. of the 1st IARP Workshop on Micro Robotics and Systems, Karlsruhe, Germany, June 15-16, 1993, pp. 34-42. 21. L. LEIFER: Rehabilitative Robots, Robotics Age, May/June, 1981. 22. G. BIRCH. Technology and Homes: the Remote Gateway Project, The Neil Squire Foundation Journal, Technical section, 5-8, 1992. 23. H. ROESLER, V. PAESLAK: Medical Robot arms, Proc. of 1st CISM-IFToMM Symp. on Theory and Practice of Robots and Robot arms, Udine, Italy, 1974, pp. 158169. 24. J. GUITTED et al.: The Spartacus, Telethesis: robot arm control studies, Bulletin of Prosthetics Research, 32, 1979, pp. 69-105. 25. H. FUNABUKO et al: Application of robot arm for environmental control system of bed ridden patients in private house, Proc. of 11th Int. Symp. on Industrial Robots, Tokyo, 1981, pp. 71-78. 26. M. R. HILLMAN. A feasibility study of a robot manipulator for the disabled, Journal of Medical Engineering & Technology, 11, 4, 1987, pp. 160-165.

718 27. D. LEES et al.: A third-generation desktop robotic assistant for the severely physically disabled, Proc. 11th Annual Conference on Rehabilitation Technology (ICAART '88), Montreal, Canada, 1988. 28. W. SEAMONE, G. SCHMEISSER: Early clinical evaluation of a robot arm/worktable system for spinal-cord-injured persons, Journal Rehabilitation Research and Development, 22, 1, 1985, pp. 37-56. 29. L. HOLMBERG, Rehabilitation Robotics at HADAR, Rehabilitation Robotics Newsletter, Vol. 4, No. 4, Wilmington, DE, Fall, 1992 30. R. CAMMOUN, J. M. DETRICHE, F. LAUTURE, B. LESIGNE: Evaluation of the MASTER 1 robotic system and design of the new version, Proc. IEEE Int. Conf. on Advanced Robotics (ICAR '93), Tokyo, Japan, 1993, pp. 319-322. 31. S. D. PRIOR: A wheelchair mounted telemanipulator arm for use by the disabled, Research Project Report 1988, Middlesex Polytechnic, London. 32. H. KWEE, C. STANGER: The Manus robot arm, Rehabilitation Robotics Newsletter, 5, 2, 1993, pp. 1-2. 33. J. C. ROSIER, J. A. VAN WOERDEN, L. W. VAN DER KOLK, B. J. F. DRIESSEN, H. H. KWEE, J. J. DUIMEL, J. J. SMITS, A. A. TUINHOF DE MOED, G. HONDERD, P. M. BRUYN: Rehabilitation robotics: the MANUS concept, Proc. of '91 ICAR, vol. 1, pp. 893-898, Pisa, June 1991. 34. J. HENNEQUIN, R. G. S. PLATTS: IEE Colloqium Digest, No.1992/108. 35. Progress Report, Veterans Administration R & D Center, 1988, Palo Alto CA, USA. 36. M. A. REGALBUTO, T. A. KROUSKOP, J. B. CHEATHAM: Toward a practical mobile robotic aid system for people with severe physical disabilities, Journal of Rehabilitation Research and Development, 29,1, 1992, pp. 19-26. 37. P. DARIO, E. GUGLIELMELLI, C. MULE', M. DI NATALE: "URMAD: An Autonomous Mobile Robot System for the Assistance to the Disabled, 6th International Conference on Advanced Robotics ('93 ICAR), Tokyo, Japan, November 1-2, 1993, pp. 341-346. 38. P. DARIO, E. GUGLIELMELLh L'introduzione di nuove tecnologie nell'ambiente domestico. Atti del Convegno "Tecnologie, servizi ed edilizia innovativa per la terza et?z", SAIEDUE, Bologna, Italy, 1994. 39. J. EVANS et al.: Handling real-world motion planning: a hospital transport robot, IEEE Control Systems Magazine, 1992, pp. 15-19. 40. E. NAKANO et al.: First approach to the development of the patient care robot, Proc. of llth Int. Symp. on Industrial Robots, Tokyo, 1981, pp. 95-102. 41. S. HASHINO: Aiding Robots. Advanced Robotics, 7,1, 1991, pp. 97-103 42. R. L. MADARASZ et al.: The design of an autonomous vehicle for the disabled, IEEE J. of Robotics and Automation, Vol. RA-2, No. 3, 1986, pp. 117-126. 43. P. WELLMAN. An adaptative mobility system for the system. Proc. of the 1994 Int. Conf. on Robotics and Automation, San Diego, CA, 1994, 3, pp. 2006-2011

719 44. G. BEKEY et al.: Control architecture for the Belgrade-USC hand, Proc. of 1988 IEEE Int. Conf. on Robotics and Automation, Philadelfia, PA, USA, 1988. 45. S. C. JACOBSEN et al.: Development of the UTAH artificial arm, IEEE Transactions on Biomedical Engineering, BME-29(4), pp. 249-269. 46. S. C. JACOBSEN et al.: Design of the UTAH/MIT dextrous hand, Proc. of the 1986 IEEE Int. Conference on Robotics and Automation, San Francisco, CA, USA, pp. 1520-1532. 47. J. P. MICALEFF AND P. RABISHONG: Motorized assistance for the handicapped: control systems, vehicles, walking machine, Proceedings of the First European Workshop on Evaluation of Assistive Devices for Paralyzed Persons, 1984 pp. 125-132, Edited by A. Pedotti and R. Andrich, COMAC BME. 48. T. DOHI: Robot technology in medicine, Advanced Robotics, Vol. 7, No. 2, pp. 179187, 1993. 49. D. J. EDELL: A peripheral nerve information transducer for amputees: long-term multichannel recordings from rabbit peripheral nerves, IEEE Transactions on Biomedical Engineering, Vol. BME-33, no. 2, February 1986, pp.203-213. 50. G. T. A. KOVACS, J. M. ROSEN: Regeneration-type peripheral nerve interfaces for direct man/machine communication, in Robots and biological systems: towards a new bionics?, NATO ASI Series, Springer Verlag, Berlin, 1993, Vol. 102, pp. 637-666. 51. M. COCCO et al.: An Implantable Neural Connector Incorporating Microfabricated Components, Proceedings of MicroMechanics Europe, Neuchatel, Switzerland,, 1993, pp. 185-188 52. P.DARIO, G. SANDINI, P. AEBISCHER: Robots and biological systems: towards a new bionics?, NATO ASI Series, Springer Verlag, Berlin, 1993. 53. P. DARIO, M. DE MICHELI, G. SANDINI, M.TISTARELLI: Retina-like ccd sensor for active vision, NATO A R W on Robots and Biological Systems, I1 Ciocco, Tuscany, Italy, 1989. Springer-Verlag. 54. M. RUCCI, P. DARIO: Autonomous learning of tactile-motor coordination in robotics, Proc. of the 1994 IEEE International Conference on Robotics and Automation, San Diego, CA, May 8-13, pp. 3230-3236. 55. H. TAKANOBU, A. TAKANISHI, I. KATO: Design of a mastication robot mechanism using a human skull model, Proc. of '93 IROS, Yokohama, Japan, July 26-30, 1993, pp. 203-208. 56. P. COIFFET: Robot habilis and robot sapiens or some ideas about robot man links in robot design, Proc. of 1st European. Conference on Medical Robotics (ROBOMED "94), June 20-22, 1994, pp. 169-174. 57. P. RABISCHONG: Is man still the best robot, Proc. of 1st European. Conference on Medical Robotics (ROBOMED '94), June 20-22, 1994, pp. 157-159.

This Page Intentionally Left Blank

Intelligent Robots and Systems V. Graefe (Editor) 9 1995 Elsevier Science B.V. All rights reserved.

721

Planning, Calibration and Collision-Avoidance for Image-Guided Radiosurgery * Achim Schweikard *t, Mohan Bodduluri *, Rhea Tombropoulos * w and John R. Adler t * Robotics Laboratory, Department of Computer Science, Stanford, CA 94305 t Department of Neurosurgery, Stanford, CA 94305 * Accuray Inc., Santa Clara, CA 95054 w Section on Medical Informatics, School of Medicine, Stanford, CA 94305 In radiosurgery a moving beam of radiation is used as an ablative surgical instrument to destroy brain tumors. Classical radiosurgical systems rely on rigid skeletal fixation of the anatomic region to be treated. This fixation procedure is very painful for the patient and limits radiosurgical procedures to brain lesions. Furthermore, due to the necessity of rigid fixation, radiosurgical treatment with classical systems cannot be fractionated. A new camera-guided system capable of tracking patient motion during treatment has been built to overcome these problems. The radiation source is moved by a 6 degree-of-freedom robotic arm. In addition to offering a more cost effective, less invasive, and less painful treatment, the robotic gantry allows for arbitrary spatial motion of the radiation source. Based on this feature we can treat non-spherical lesions with accuracies unachievable with classical radiosurgical systems. The system introduces a new class of radiosurgical procedures, called non-stereotactic, or image-guided radiosurgery. At the heart of these procedures are algorithms for planning both a treatment and the corresponding beam motion, given the geometric description of the tumor shape and relative locations in the particular case. 1. I N T R O D U C T I O N In radiosurgery, brain tumors (and other brain lesions) are destroyed by moving an intense beam of radiation [2,6]. A necrotic dose is delivered to a tumor by cross-firing from multiple directions, to reduce the amount of energy deposited in healthy brain. The radiosurgical treatment consists of several phases. First, a precise 3D map of the anatomical structures in the area of interest (for instance, the brain) is constructed using computed tomography (CT) and magnetic resonance (MR). Next a motion path for the radiation beam is computed to deliver a dose distribution that the surgeon finds acceptable (taking into account a variety of medical constraints). Finally, a jointed mechanism moves the radiation source according to this path. *This research is funded in part by the Sheik Enany Fund, Lorraine Ulshafer Fund and by the National Library of Medicine (LM-05305 and LM-07033). The authors thank Joe Depp, Martin Murphy and Bo Preising of Accuray Corp. for comments and suggestions.

722

Figure 1. Robotic radiosurgical system, patient couch, on-line x-ray vision system

A new radiosurgical system has been built to overcome limitations of earlier systems. The new system is based on an on-line x-ray vision system and a 6 degree-of-freedom manipulator moving a compact and light weight linear accelerator. 2. S Y S T E M

OVERVIEW

The system consists of the following components (Figure 1): 9 R a d i a t i o n source: A novel compact and light-weight linear accelerator running at high frequency generates a 6 MV photon beam. 9 M e c h a n i c a l gantry: A GM-Fanuc 420 manipulator with 6 degrees-of-freedom moves the linear accelerator during treatment. 9 T r e a t m e n t planning: Geometric planning algorithms are at the heart of the new system. A beam path is computed given the constraints placed by the desired dose distribution. The beam path must observe additional constraints, i.e. the path must be collision-free and must not obstruct the surveillance and on-line vision subsystems. 9 Surveillance: An ultrasonic system surveys the position of the robot and the linear accelerator during treatment. 9 S t a t i c and D y n a m i c Imaging: CT and MR scanners acquire pre-operative images of the brain. An online vision system with two orthogonal x-ray cameras takes images of the patient's skull twice every second and computes the patient's position by correlating the images to precomputed radiographs. Small movements of the head are compensated for by a corresponding motion of the manipulator arm. Larger movements cause the radiation process to be interrupted. During treatment, the linear accelerator is moved in point-to-point mode through a series of configurations. At each configuration, the beam is activated for a small time interval St, while the arm is held still.

723 Currently, the beam has circular cross-section the radius of which remains constant throughout a given treatment. However, this radius is initially set between 5mm and 40mm by mounting a collimator (lead inset for beam focusation) to the radiation source. The new system offers the following advantages over earlier radiosurgical and radiotherapeutical systems (see e.g. [7,9,10]): 9 The tissue does not have to be fixed in space. In earlier radiosurgical systems this was done with stereotaxic frames. Besides being very painful for the patient, this fixation was only possible for the lesions in the head, so the use of radiosurgical methods was limited. 9 Treatment can be fractionated, i.e. the total dose can be delivered in a series of 2-30 treatments, where only a small dose is delivered during each treatment. Fractionation has turned out to be highly effective in radiation therapy, but could not be used in radiosurgery due to the necessity of rigidly fixing the tissue in space. 9 Based on geometric planning algorithms the radiation dose can be focused within the lesion with high accuracy so that healthy tissue can be protected, and side effects of radiation can be reduced dramatically. In particular, our algorithms allow for generating dose distributions with prescribed non-spherical isodose surfaces and high homogeneity. A prototype of the new system has been installed at Stanford Medical Center. The use of the system for clinical investigation was approved in February 1994. 3. G E O M E T R I C

TREATMENT

PLANNING

The objective of treatment planning is to find a motion suitable for treating the particular condition presented by a patient. This planning problem is illustrated in Figure 2. Figure 2-a schematically depicts an axial cross-section of the brain with a circular tumor T. If T is irradiated from only one direction r l , the healthy tissue along the beam absorbs approximately the same dose as the tumor. If, instead, we use two beam directions, r l and r2, the dose deposited in the tumor is approximately twice the dose in healthy tissue. Using more beam directions can lead to further improvements of the dose distribution, and a very sharp drop-off of the dose in the tissue surrounding the (spherical) tumor region can be achieved. In this type of treatment, the axes of all beams cross a single point in space, called the isocenter. However, for tumors of non-spherical shape this single-isocenter treatment procedure is problematic (Figure 2-b). Treatment planning has to take into account the following constraints. 9 The region receiving high dose should match the tumor region. To protect healthy tissue, sharp dose drop-off should be achieved in the healthy tissue surrounding the tumor. 9 Certain anatomic structures (e.g. optic nerves, arteries, and brain stem) are very critical and/or radiation sensitive. The dose in such critical regions should be very small.

724

Figure 2. Cross-firing at a tumor

9 Dose homogeneity: for certain lesions, it is essential that the dose inside the lesion be uniform; the dose in healthy non-critical tissue should also be uniform. Planning a radiosurgical or radiotherapeutical treatment is difficult and time-consuming. In many cases, the constraints characterizing an adequate dose distribution cannot be met simultaneously. Our method for computing a suitable treatment plan proceeds in two steps" 9 Beam selection: beams are selected according to the given tumor shape. The selection uses a reachability model of the manipulator workspace. 9 Beam weighting: an intensity or weight is assigned to each beam. The weight is given by the duration of beam activation. 4. S E L E C T I N G

BEAM

CONFIGURATIONS

4.1. Spherical T u m o r s First assume the tumor is spherical, with center p and radius r. We set the radius of all beams to r. The beam configurations are then selected in the following way. The axes of all beams cross the tumor center p (p is the isocenter). We consider beam directions determined by vertices of a regular (Platonic) polyhedron centered at p. The choice orientation of this polyhedron is problematic if there are critical healthy structures in the t u m o r vicinity. Furthermore there is no such regular polyhedron with more than 20 vertices. A more even distribution is obtained using more directions. The methods in [14] provide point grids on spheres which are adequate for our purposes. Our planning method thus uses a large set of beams (typically n >_ 300) for a single sphere, and addresses the presence of critical regions not during the beam selection phase, but during the beam weighting phase.

725

Figure 3. Isodose surface, beam directions. Region inside surface absorbs 50% or more of maximum dose

4.2. Ellipsoidal Shapes While the scheme in the previous paragraph gives both very sharp dose drop-off around a sphere-shaped lesion and sufficient homogeneity, generalizations of this scheme for treating non-spherical tumors are not obvious. A first attempt is to use two spheres instead of a single sphere, where each sphere is treated independently. However, it is generally difficult to cover an arbitrary tumor shape with spheres, such that the volume of overlap with the tumor environment is small, and the entire shape is covered. This is aggravated by the fact that the two spheres should not intersect, since this would yield very inhomogeneous dose distributions, i.e. high dose (so-called hot spots) in the intersection region of the spheres and/or insufficient coverage of the tumor region. Studies on complication rates show that hot spots, even inside the tumor, should be avoided if at all possible. However, a simple generalization of the above single sphere method is obtained in the following way. Instead of using a single isocenter, we place a series of isocenters evenly spaced along a line segment. Each such isocenter is treated in the same way as a single isocenter. Central to this approach is the fact that each sphere receives very low dose, and many spheres are overlaid. In this way we can avoid hot spots in the intersection of two spheres. The dose distribution resulting from such a motion as well as isocenter points and beam directions for generating this distribution are shown in Figure 3. Figure 3 shows a 50% isodose surface, i.e. the region inside the surface receives 50o-/; or more of the maximum dose. There are several problems with this method: 1. If we use the same set of directions for all isocenters thus chosen, many of the beams will be parallel. Accumulated dose fall-off from such beams will cause dose inhomogeneity in the surrounding tissue.

726

Figure 4. Sweep-motion (a) beam has circular cross-section (b) Beam cross-section is rectangular

2. Dose inhomogeneity in the tumor may occur. The dose in the tumor center will be much higher than the dose along the tumor boundary, as illustrated in Example 1 below. This inhomogeneity is in large part due to the cylindrical beam shape. To address the first problem, we choose rotated beam sets for each isocenter. In practice rotating the entire beam set aimed at one isocenter by a fixed amount is equivalent to choosing each individual beam direction from a finer spherical point grid. Furthermore, instead of using several beams for each isocenter, we can use a larger set of isocenters, evenly spaced on a line segment and a single beam (from the finer sphere grid) for each such isocenter. Thus we place a single beam through each isocenter point along the line segment. Each beam direction is then chosen (in random order) from a fine grid of points on the unit sphere. The second problem can be addressed in several ways. The following example shows that the mentioned problem is related to the collimator shape.

4.3. Example Figure 4-a shows the cross-section of a cylindrical beam swept along a line segment 1. The beam axis is orthogonal to the plane of cross-section. The beam cross-section is a circle c. The point p will absorb higher dose than point p' during this sweep-motion. The dose at these points will be proportional to the length of the segments s and s', thus causing substantial inhomogeneity. A beam with rectangular cross-section avoids this problem (Figure 4-b). 4.4. General Shapes The above approach is restricted to the pseudo-cylindrical shapes shown in Figure 3 and shapes obtained from bending such shapes. A direct generalization of this scheme is the following. With an algorithm in [14], we compute a grid of n points on the t u m o r

727 surface. Each such point is now treated as an isocenter. As above, a single beam is placed through each isocenter. This scheme addresses both the first and the second of the mentioned problems: Accumulation along certain principal directions is avoided by using a fine grid. Accumulation of high dose in the tumor center is avoided by placing isocenters on the tumor surface. To achieve homogeneity, the point grid on the input surface can be expanded or retracted interactively. A path thus generated is refined by computing beam weights. 5. C O M P U T I N G

BEAM

WEIGHTS

During the second planning step, weights are assigned to each of the previously selected beams. Let C 1 , . . . , Cn denote the cylindrical beams at the n selected configurations. Our method will assign a dose value wi >_ 0 to every Ci. wi is a factor specifying activation duration at the particular (static) beam configuration. The values W l , . . . , wn determine a distribution D, defined as follows. If p is a point and C i l , . . . , Cik are the cylinders containing p, then D(p) - Wil + " " + wik. (This definition provides a coarse model for photon beam characteristics. Refinements described in [12] allow for a more accurate representation of these characteristics.) We consider two disjoint regions T (for tumor) and H (for healthy tissue), with the following constraints: the dose delivered at each point in T must be larger than some value a, while the dose at each point of H must be below/3 (/3 < a). The n cylinders, T, and H define an arrangement of cells in space. Each cell is defined as a maximal connected set not containing any piece of the boundaries of regions T and H or the cylinders. For each cell we compute a label. A cell a in cylinders C i l , . . . , Cik has label 1 - { i l , . . . , ik}. The calculation of W l , . . . , w~ reduces to finding a point in the intersection of two ndimensional polyhedral sets: If ~ is in T and labeled by { i l , . . . , i k } then g determines the inequality: OL ~ Wil + ' ' " + Wik.

A r162 ,~' l~br

by { j ~ , . . . , j~,} il~ H gives-

/3 E wj~ + . . . + wjk,. The inequalities for all cells in T determine a convex polyhedral set P~. Similarly, the inequalities derived from the cells in H determine a polyhedron PZ. If P~ and PZ intersect, any point (Wl,..., w~) in the intersection gives a dose distribution that satisfies the given constraints. Otherwise, the problem admits no solution. More generally, we can specify several healthy critical or non-critical regions H 1 , . . . , Hq marked by distinct maximal doses/31,...,/3q. We then obtain polyhedra P~, PZ,,..., PZ~. Any point (Wl,..., w~) in the intersection of these polyhedra determines a dose distribution that satisfies the input constraints. The intersection of the polyhedral sets P~, P91,.-., PZq is another n-dimensional convex polyhedral set. Extreme points of this set can be computed, allowing us to deal directly with optmization criteria (e.g., in addition to satisfying the input constraints defined by c~,/31,...,/3q, minimize the dose delivered to some region Hi). We consider two inequalities deriving from two cells ~ and ~' in T. Let L - { i l , . . . , ik} and L ' - { j l , . . . ,jk,} be the labels of ~ and ~'. If L C L', then the inequality given by

728

Figure 5. Regions T, H, minimal and maximal cell nmi,~ resp.

I~ma x .

n implies the inequality given by n I, since all wi are positive or null. A T-cell is called minimal, if its label is not a superset of any other T-label (Figure 5). Similarly, a cell in Hi is called maximal, if its label is not a subset of any other label in Hi. Thus, in the above polyhedral intersection test, we need only to consider minimal T-cells and maximal H-cells. 6. K I N E M A T I C S

AND

COLLISION AVIODANCE

The beam configurations selected by the planning system must be organized into a sequence defining a collision-free path of the linear accelerator. For safety reasons, we currently do not use optimization methods for computing this sequence. Instead, we use a hard-coded sequence of beam configurations, called nodes. A fixed motion connecting the nodes provides a path template. An essential feature of the beam selection method is the following: beam configurations can be selected in such a way that the deviation between the hard-coded path template and the treatment beams calculated for an individual patient is small and upper-bounded. For all node configurations the beam axis crosses the work space origin. A node is thus simply given by a tool center point placement. We use tool center points on a sphere of appropriate radius centered at the origin. For each node we compute a local joint space. Thus, angle limits within which the arm and the accelerator can be moved without colliding or getting into the line of sight of a camera are associated with each node. To compute treatment beams given the particular tumor shape and location as described in section 3, we connect isocenter points to the tool center points of nodes. Thus each treatment beam is given by an isocenter point and the tool center point for a node. Given reasonable bounds for the tumor size we can assume that isocenter points are close to the workspace origin. Since the source-target distance for the beam is large, we can ensure that the angular deviation between a hard-coded node and associated treatment beam configurations remains below a fixed threshold, and within the local joint space of the node (Figure 6). The placement of a cylinder beam in space is kinematically redundant. The beam

729

Figure 6. Origin O, target region T, tool-center points

Vl,...

, Vr~

for hard-coded nodes

can be translated along and rotated about its axis without changing spatial placement. Thus a cylinder has 4 degrees-of-freedom. The range of beam directions should be as large as possible to allow for homogeneous distributions. A simulating program is used to determine tool center points for nodes covering a large portion of a half-sphere (Figure 7). The beam widens with source-target distance. By placing tool center points for all nodes and treatment points on a sphere we can ensure that all beams have nearly constant and equal radius in tissue. Effector orientation is represented by 3 Euler angles such that the last of these angles gives the rotation about the beam axis. For each node an appropriate value for this angle is computed in such a way that arm posture changes along the path can be avoided. For verification the path in Figure 7-b was executed and a photographic film in a water phantom (plexiglass reconstruction of a skull, placed at the end of the patient couch in Figure 1) was exposed to the generated radiation. Figure 8 shows the photographic film. During execution of the path in Figure 7-b an equal dose value was delivered at each node, with the beam axis aimed at the work space origin. The restrictions in the workspace result in a penumbra along the horizontal direction in Figure 8-c and -d. A total of 6000 cGy was delivered at 300 nodes, resulting in a total execution time of 1 hour. 7. CALIBRATION

The positioning accuracy of the mechanical gantry is much lower than the repeating accuracy. The radiation source must remain at sufficient distance from the tissue to be irradiated. Small inaccuracies in positioning the radiation source are thus amplified and may cause inaccuracies in target localization. Thus the hard-coded motion path was calibrated in the following way. An optical crystal was placed at the work space origin. A laser beam pointing along the beam central axis was placed in the radiation source. The crystal is activated when touched by the laser beam. Activation intensity can be read

730

Figure 7. Path template connecting node configurations

Figure 8. Photographic film phantom for spherical distribution (axial cross-sections)

from the crystal, thus giving a measurement for placement accuracy. The calibration process considers each node separately. The end effector is placed at grid points in a small local joint space surrounding the node. At each grid point the laser is activated and a measurement is taken from the crystal. For the given node we can thus find joint angle offsets placing the beam in such a way that the beam axis crosses the origin. The angular deviation between treatment points and hard-coded nodes is small, and we use the offsets thus computed as offsets for the treatment beams as well. A bound of 0.3 mm for inaccuracies was determined experimentally for isocenter points close to the origin. 8. P L A N N I N G

SYSTEM

A treatment planning system with 3D graphical user interface based on the above beam selection and weighting methods is incorporated into the new system. It is designed to enable the surgeon to make use of the system flexibility. The tumor and critical tissues are described as polyhedra. The regions of interest are delineated by polygons in the MR images showing axial, sagital, or coronal cross-

731

Figure 9. Sample case (Stanford Medical Center). (a) Axial CT with tumor delineated in black (b) 3D representation of tumor delineation (c) Treatment beams (d) Isodose surface overlaid on tumor reconstruction

sections of the brain. These polygons are thickened between two cross-sections, and a 3D reconstruction of the tumor shape is computed. The radius r of the (cylindrical) beam and the number (n) of beam configurations are given as input. The beam selection method in section 3 in combination with the above collision avoidance method is invoked to compute a set of beam directions corresponding to the input tumor shape and location. To compute beam weights, our implementation uses a modified simplex algorithm to find a point within the intersection region of k n - d i m e n s i o n a l polyhedra. A test series for the new system was carried out to evaluate the possibilities of generating prespecified dose distributions. In the test series, cases treated during a comparison period with an earlier system for frame-based stereotactic radiosurgery [7] were considered. In each case data for the actual treatment performed were retrieved. Plans for the same cases were then computed with our new planning system. The comparison is based on photographic film phantom studies in water, dose-volume histograms, as well as 2D and 3D dose visualization paradigms. The results of the test series and variants of the above methods are discussed in [12]. An example is shown in Figures 9 and 10. Figure 9-a shows an axial cross-section of the brain for a 69 year old patient presenting a large non-spherical tumor of the falx cerebri, delineated in the cross-section. Figure 9-b shows the 3D reconstruction of the tumor outline. A regular grid of 600 points was expanded on a surface reconstruction,

732

Figure 10. Film phantom for sample case in Fig. 9, showing the shape of the region receivinghigh dose. Film slices in the phantom are arranged in the same spatial structure as the tomographic images.

giving isocenter points. Figure 9-c shows the treatment beams used. Figure 9-d overlays the input tumor shape with the generated 50% isodose surface to illustrate the matching between input and output shapes. For verification, the motion was executed exposing a photographic film phantom (Figure 10). The phantom shows a close matching between the input tumor shape and the actual shape of the region receiving high radiation dose.

9. C O N C L U S I O N A prototype of the new system has been installed at Stanford Medical Center and the use of the system for clinical investigation has been approved. Systems will soon be installed at several major medical research institutions. The methods described divide treatment planning into two steps: (1) Selection of beam configurations (2) Computation of dose weights for these configurations. For the latter step, we proposed a general theoretical approach. The evaluation of the planning methods gives very encouraging results. Improved methods could include other collimator geometries and corresponding generalizations of the planning paradigms. The planning method relies on weighted arrangements in 3D space. This is not limited to cylinders, and seems appropriate for treatment planning in conventional radiation therapy as well. It is likely that the new systems will allow for radiosurgical treatment of tumors outside the brain, including tumors for which no treatment is currently available. In particular, extensions are desirable for very radiation-resistant tumors and cases in which conventional radiation therapy would cause too much damage to healthy tissue and is not applicable.

733 REFERENCES

1. Barth, N. H. An Inverse Problem in Radiation Therapy. Intern. d. Radiation Oncology Biol. Phys., 18:425-431, 1990. 2. Betty, O. O., Munari, C., and Rosler, R. Stereotactic Radiosurgery with the Linear Accelerator: Treatment of Arteriovenous Malformations. Neurosurgery, 24(3):311321, 1989. 3. Brahme, A. Optimization of stationary and moving beam radiation therapy techniques. Radiother. Oncol., 12:127-140, 1988. 4. Cormack, A. A problem in rotation therapy with x-rays. Int. J. Radiat. Oncol. Biol. Phys., 13:623-630, 1987. 5. Halperin, D. and Yap, C.K. Combinatorial Complexity of Translating a Box in Polyhedral 3-Space 9th Annual A CM Syrup. on Computational Geometry, 29-37, 1993. 6. Larsson, B. et al. The High Energy Proton Beam as a Neurosurgical Tool. Nature, 182:1222-1223, 1958. 7. Lutz, W., Winston, K. R., and Maleki, N. A System for stereotactic radiosurgery with a linear accelerator. Int. d. Radiation Oncology Biol. Phys., 14:373-381, 1988. 8. Murtagh, B. A., Saunders, M. A., Minos 5.4 user's guide. Technical Report SOL 83-20R, Dept. of Operations Research, Stanford University, 1983. 9. Podgorsak, E. B., et al. Dynamic Stereotactic Radiosurgery Intern. d. Radiation Oncology Biol. Phys., 14:115-126, 1988. 10. Rice, R. K., et al. Measurements of Dose Distributions in Small Beams of 6 MeV X-Rays. Phys. Med. Biol., 32:1087-1099, 1987. 11. Schweikard, A., Adler, J. R., and Latombe, J. C., Motion Planning in Stereotaxic Radiosurgery. Proc. IEEE Int. Conf. Robotics and Automation, Atlanta, GA, May 1993, 1909-1916. (Extended version to appear in IEEE Tr. Robotics and Automation.) 12. Schweikard, A., Tombropoulos, R. Z., Kavraki, L., Adler, J.R., and Latombe, J.C., Treatment Planning for a Radiosurgical System with General Kinematics. Proc. IEEE Int. Conf. Robotics and Automation, San Diego, CA, May 1994, 1720-1727. 13. Troccaz, J. et al. Conformal External Radiotherapy of Prostatic Carcinoma: Requirements and Preliminary Results, Rep. No. 912I, TIMC-IMAG, Facult6 de M6decine, Grenoble, 1993. 14. Turk, G. Re-Tiling Polygonal Surfaces. Computer Graphics, 26(3):55-64, 1992.

This Page Intentionally Left Blank