Human-Friendly Robotics 2021: HFR: 14th International Workshop on Human-Friendly Robotics (Springer Proceedings in Advanced Robotics, 23) 3030963586, 9783030963583

This book is a collection of research results in a wide range of topics related to human–robot interaction, both physica

129 79 4MB

English Pages 150 [148] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Series Editor Foreword
Organization
Preface
Contents
Combining Hybrid Genetic Algorithms and Feedforward Neural Networks for Pallet Loading in Real-World Applications
1 Introduction
2 Hybrid Genetic Algorithm
2.1 Chromosome Structure
2.2 Evaluation Process
3 Feedforward Neural Network
4 A New Two-Stage Algorithm
4.1 First Stage
4.2 Second Stage
5 Results
5.1 HGA Parameters
5.2 FNN Parameters
5.3 Input Instances and Graphical Results
6 Conclusions
References
Complete and Consistent Payload Identification During Human-Robot Collaboration: A Safety-Oriented Procedure
1 Introduction
2 Motion Planning with Dynamic Collision Avoidance
3 Physically Consistent Parameter Identification
4 Simulations
4.1 Software Setup
4.2 Results
5 Conclusions
References
Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration of Large Objects
1 Introduction
2 Related Works
3 Initial Pose Estimation
4 Point Cloud Registration
5 Results
6 Conclusions
References
Design of a Collaborative Modular End Effector Considering Human Values and Safety Requirements for Industrial Use Cases
1 Introduction and Related Works
1.1 Human–Robot Collaboration in the Industry
1.2 Safety with Collaborative Robots
1.3 End-Effectors and Collaborative Robots
2 Materials and Methods
2.1 Design Methodology
2.2 Value Sensitive Design Evaluation
2.3 COVR Safety Assessment
3 Results
3.1 Value Sensitive Evaluation and Final Design
3.2 Field Deployment
3.3 Safety Assessment
3.4 Picking Performances
4 Discussion
5 Conclusion
References
Detecting Emotions During Cognitive Stimulation Training with the Pepper Robot
1 Introduction
2 Background and Motivations
2.1 Social Assistive Robots and Cognitive Stimulation Therapy
2.2 Emotion Recognition from Facial Expressions in Elderly People
3 An Overview of the Study
3.1 The Tasks
3.2 Measurements
3.3 Automatic Analysis of Emotions
4 Results
5 Conclusions and Future Work
References
Mapping Finger Motions on Anthropomorphic Robotic Hands: Two Realizations of a Hybrid Joint-Cartesian Approach Based on Spatial In-Hand Information
1 Introduction
2 Distance-Based Hybrid Mapping
2.1 Algorithm and Switching Condition
2.2 Transition and Online Algorithm
2.3 Simulation and Results
3 Workspace-Based Hybrid Mapping
3.1 Joint-Cartesian Mapping Transition
3.2 Cartesian Mapping
3.3 Experimental Evaluation
4 Conclusions
References
Robotized Laundry Manipulation With Appliance User Interface Interpretation
1 Introduction
2 Grasping Pose Detection
2.1 Wrinkle Detection
2.2 Extension to Washing Machines
2.3 Blob Detection
3 User Interface Interpretation
3.1 Point of Interest Coordinate Estimation
3.2 Display Recognition
4 Experimental Results
4.1 Laundry Grasping Task
4.2 Graspability Tests
4.3 Appliance User interface Interpretation
5 Conclusions and Future Work
References
Specification and Control of Human-Robot Handovers Using Constraint-Based Programming
1 Introduction
2 Related Work
2.1 Insights from Human Handover Studies
2.2 Motion Planning and Control of Human-Robot Handovers
3 Constraint-Based Programming of Handovers
3.1 Principle of Constraint-Based Programming in eTaSL
3.2 Reaching Phase
3.3 Object Passing Phase
3.4 Retraction Phase
4 Experiments
4.1 Experimental Setup and Parameter Choices
4.2 Results
5 Conclusions
References
Speed Maps: An Application to Guide Robots in Human Environments
1 Introduction
2 Related Works
2.1 Mapping and Localization
2.2 Person Detection and Tracking
2.3 Navigation in Human Environments
3 Elements for Navigation
3.1 State Machine
3.2 Maps
3.3 User Interface
3.4 Global Localization
3.5 Goal Location Determination
3.6 Speed Maps
4 Implementation
4.1 Person Detection and Tracking
4.2 Person Following
4.3 Person Guidance
5 Demonstration
5.1 Scenario: Point-to-Point Navigation
5.2 Scenario: Guiding a Person
6 Conclusion
References
Recommend Papers

Human-Friendly Robotics 2021: HFR: 14th International Workshop on Human-Friendly Robotics (Springer Proceedings in Advanced Robotics, 23)
 3030963586, 9783030963583

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Springer Proceedings in Advanced Robotics 23 Series Editors: Bruno Siciliano · Oussama Khatib

Gianluca Palli Claudio Melchiorri Roberto Meattini   Editors

Human-Friendly Robotics 2021 HFR: 14th International Workshop on Human-Friendly Robotics

Springer Proceedings in Advanced Robotics

23

Series Editors Bruno Siciliano Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione Università degli Studi di Napoli Federico II Napoli, Napoli, Italy

Oussama Khatib Robotics Laboratory Department of Computer Science Stanford University Stanford, CA, USA

Advisory Editors Gianluca Antonelli, Department of Electrical and Information Engineering, University of Cassino and Southern Lazio, Cassino, Italy Dieter Fox, Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA Kensuke Harada, Engineering Science, Osaka University Engineering Science, Toyonaka, Japan M. Ani Hsieh, GRASP Laboratory, University of Pennsylvania, Philadelphia, PA, USA Torsten Kröger, Karlsruhe Institute of Technology, Karlsruhe, Germany Dana Kulic, University of Waterloo, Waterloo, ON, Canada Jaeheung Park, Department of Transdisciplinary Studies, Seoul National University, Suwon, Korea (Republic of)

The Springer Proceedings in Advanced Robotics (SPAR) publishes new developments and advances in the fields of robotics research, rapidly and informally but with a high quality. The intent is to cover all the technical contents, applications, and multidisciplinary aspects of robotics, embedded in the fields of Mechanical Engineering, Computer Science, Electrical Engineering, Mechatronics, Control, and Life Sciences, as well as the methodologies behind them. The publications within the “Springer Proceedings in Advanced Robotics” are primarily proceedings and post-proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. Also considered for publication are edited monographs, contributed volumes and lecture notes of exceptionally high quality and interest. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by SCOPUS, SCIMAGO, WTI Frankfurt eG, zbMATH. All books published in the series are submitted for consideration in Web of Science.

More information about this series at https://link.springer.com/bookseries/15556

Gianluca Palli · Claudio Melchiorri · Roberto Meattini Editors

Human-Friendly Robotics 2021 HFR: 14th International Workshop on Human-Friendly Robotics

Editors Gianluca Palli University of Bologna Bologna, Italy

Claudio Melchiorri University of Bologna Bologna, Italy

Roberto Meattini University of Bologna Bologna, Italy

ISSN 2511-1256 ISSN 2511-1264 (electronic) Springer Proceedings in Advanced Robotics ISBN 978-3-030-96358-3 ISBN 978-3-030-96359-0 (eBook) https://doi.org/10.1007/978-3-030-96359-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Series Editor Foreword

At the dawn of the century’s third decade, robotics is reaching an elevated level of maturity and continues to benefit from the advances and innovations in its enabling technologies. These all are contributing to an unprecedented effort to bringing robots to human environment in hospitals and homes, factories and schools; in the field for robots fighting fires, making goods and products, picking fruits and watering the farmland, saving time and lives. Robots today hold the promise for making a considerable impact in a wide range of real-world applications from industrial manufacturing to healthcare, transportation, and exploration of the deep space and sea. Tomorrow, robots will become pervasive and touch upon many aspects of modern life. The Springer Tracts in Advanced Robotics (STAR) was launched in 2002 with the goal of bringing to the research community the latest advances in the robotics field based on their significance and quality. During the latest fifteen years, the STAR series has featured publication of both monographs and edited collections. Among the latter, the proceedings of thematic symposia devoted to excellence in robotics research, such as ISRR, ISER, FSR, and WAFR, have been regularly included in STAR. The expansion of our field as well as the emergence of new research areas has motivated us to enlarge the pool of proceedings in the STAR series in the past few years. This has ultimately led to launching a sister series in parallel to STAR. The Springer Proceedings in Advanced Robotics (SPAR) is dedicated to the timely dissemination of the latest research results presented in selected symposia and workshops. This volume of the SPAR series brings a selection of the papers presented at the fourteenth edition of the International Workshop on Human-Friendly Robotics (HFR), which took place in Bologna, Italy from October 28 to 29, 2021. The volume edited by Gianluca Palli and Roberto Meattini is a collection of 9 contributions on human-robot coexistence including theories, methodologies, technologies, and experimental studies.

v

vi

Series Editor Foreword

From its classical program with presentations by young scholars, the fourteenth edition of HFR culminates with this valuable reference on the current developments and new directions of human-friendly robotics—a genuine tribute to its contributors and organizers! Naples, Italy Stanford, USA December 2021

Bruno Siciliano Oussama Khatib SPAR Editors

Organization

The 14th International Workshop on Human-Friendly Robotics (HFR 2021) was organized by the University of Bologna, Italy.

Executive Commitee Conference Chair: Prof. Gianluca Palli, University of Bologna Conference Co-Chair: Dr. Roberto Meattini, University of Bologna Program and Publicity Chair: Dr. Davide Chiaravalli, University of Bologna Program and Publicity Chair: Dr. Riccardo Zanella, University of Bologna Local Arrangement Chair: Alessio Caporali, University of Bologna Local Arrangement Chair: Wendwosen Bellete Bedada, University of Bologna Local Arrangement Chair: Kevin Galassi, University of Bologna

Program Commitee Carmine Recchiuto, University of Genoa Wilm Decre, KU Leuven Davide Chiaravalli, University of Bologna Mario Selvaggio, University of Naples Federico II Raffaella Carloni, University of Groningen Chenguang Yang, UWE Bristol Matko Orsag, University of Zagreb James Young, University of Manitoba Alessio Caporali, University of Bologna Matteo Saveriano, University of Innsbruck Pauline Maurice, LORIA-CNRS Roberto Meattini, University of Bologna vii

viii

Salvatore Pirozzi, University of Campania Kevin Galassi, University of Bologna Thomas Eiband, DLR Marcello Bonfe, University of Ferrara Alessandro Rizzo, Politecnico di Torino Lucia Migliorelli, Universitá Politecnica delle Marche Riccardo Zanella, University of Bologna Akansel Cosgun, Monash University Bare Luka Zagar, Technical University of Munich Andrea Maria Zanchettin, Politecnico di Milano Gianni Borghesan, KU Leuven Fabio Ruggiero, University of Naples Federico II Giorgio Grioli, Italian Institute of Technology Martina Lapresa, Campus Bio-Medico University of Rome Wendwosen Bellete Bedada, University of Bologna Valeria Villani, University of Modena and Reggio Emilia Daniel Tozadore, University of Sao Paulo Padmaja Kulkarni, TU Delft Laura Corti, Campus Bio-Medico University of Rome Berardina Nadja De Carolis, University of Bari

Sponsoring Company Franka Emika GmbH, Munich, Germany

Organization

Preface

The technological shift from classical industrial robots, which are safely kept away from humans in cages, to robots that are used in close collaboration with humans, is facing major challenges that need to be overcome. In this direction, the growing need to automate daily tasks, combined with new robot technologies, is driving the development of human-friendly robots, i.e., safe and dependable machines, operating in the close vicinity to humans or directly interacting with them in a wide range of domains. This book describes the newest achievements in a wide range of topics related to human-robot interaction, both physical and cognitive, including theories, methodologies, technologies, empirical, and experimental studies. The International Workshop on Human Friendly Robotics (HFR) is an annual meeting organized by young researchers, and is dedicated to research problems related to human–robot coexistence, like robot interaction control, robot learning, and human–robot co-working. Previous venues were Bologna (Italy), Innsbruck (Austria), Reggio Emilia (Italy) Shenzhen, (China) Naples (Italy), Genova (Italy), Pisa (Italy), Tübingen (Germany), Twente (the Netherlands), Brussels (Belgium), Rome (Italy), Pontedera (Italy), Munich (Germany), Genova (Italy). The 14th International Workshop on Human Friendly Robotics (HFR 2021) was held in Bologna, Italy, from October 28 to 29, 2021, and organized by the University of Bologna. The workshop was chaired by Prof. Gianluca Palli and co-chaired by Dr. Roberto Meattini. The meeting consisted of five keynote talks, a forum with open discussion, and seventeen contributed presentations in a single track. This is the fourth edition of the workshop with associated proceedings collected in a book of the SPAR series dedicated to HFR. The papers contained in the book have been selected on the basis of a peer-reviewed process and describe the most original achievements

ix

x

Preface

in the field of human-robot interaction coming from the work and ideas of young researchers. Bologna, Italy November 2021

Prof. Gianluca Palli Dr. Roberto Meattini General Chairs HFR 2021

Contents

Combining Hybrid Genetic Algorithms and Feedforward Neural Networks for Pallet Loading in Real-World Applications . . . . . . . . . . . . . . Gabriele Ancora, Gianluca Palli, and Claudio Melchiorri

1

Complete and Consistent Payload Identification During Human-Robot Collaboration: A Safety-Oriented Procedure . . . . . . . . . . . Saverio Farsoni and Marcello Bonfè

15

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration of Large Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ismayil Ahmadli, Wendwosen B. Bedada, and Gianluca Palli

29

Design of a Collaborative Modular End Effector Considering Human Values and Safety Requirements for Industrial Use Cases . . . . . . Matteo Pantano, Adrian Blumberg, Daniel Regulin, Tobias Hauser, José Saenz, and Dongheui Lee Detecting Emotions During Cognitive Stimulation Training with the Pepper Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giovanna Castellano, Berardina De Carolis, Nicola Macchiarulo, and Olimpia Pino

45

61

Mapping Finger Motions on Anthropomorphic Robotic Hands: Two Realizations of a Hybrid Joint-Cartesian Approach Based on Spatial In-Hand Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto Meattini, Davide Chiaravalli, Gianluca Palli, and Claudio Melchiorri

77

Robotized Laundry Manipulation With Appliance User Interface Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wendwosen B. Bedada, Ismayil Ahmadli, and Gianluca Palli

91

xi

xii

Contents

Specification and Control of Human-Robot Handovers Using Constraint-Based Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Maxim Vochten, Lander Vanroye, Jeroen Lambeau, Ken Meylemans, Wilm Decré, and Joris De Schutter Speed Maps: An Application to Guide Robots in Human Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Akansel Cosgun

Combining Hybrid Genetic Algorithms and Feedforward Neural Networks for Pallet Loading in Real-World Applications Gabriele Ancora, Gianluca Palli, and Claudio Melchiorri

Abstract The “Distributor’s Pallet Packing Problem” in a real industrial scenario is addressed in this paper. The main goal is to develop a two-stage algorithm capable to provide the spatial coordinates of the placed boxes vertices and also the optimal boxes input sequence, while guaranteeing geometric, stability, fragility constraints and a reduced computational time. Due to NP-hard complexity of the problem, a hybrid genetic algorithm coupled with a feedforward neural network is used. In the first stage, the hybrid genetic algorithm is run several times on each order within a large set of packing instances, using a different fitness weight vector at each iteration, and storing the best chromosomes to form a rich solution set. After its generation, the best solution is chosen for each order, optimizing a new global weighted function. The global optimal weight vector is tuned by hand, relying on a graphical user interface that allows to show, in real-time, the best solution as a function of the global weights. The dataset is then created, keeping track of both local and global weight vectors related to the optimal solution. Hence, the dataset is used to train, validate and test the neural network. In the second stage, the trained neural network is used to provide the optimal pair of fitness weight vectors, allowing to run the hybrid genetic algorithm only one time and to select directly the optimal solution in the set. The proposed algorithm has been tested and validated on several packing instances provided by an industrial company. Keywords Packing · Genetic algorithms · Metaheuristics · Machine learning · Neural networks

G. Ancora (B) · G. Palli · C. Melchiorri University of Bologna, Bologna, BO 40136, Italy e-mail: [email protected] G. Palli e-mail: [email protected] C. Melchiorri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_1

1

2

G. Ancora et al.

1 Introduction The real-world application of packing and optimally arranging on a pallet a sequence of boxes of heterogeneous size, shape and weight is treated in this paper. Moreover, some constraints on geometry, stability and fragility must be satisfied. Using the typology defined by [1], the considered scenario is a 3/V/O/M type of cutting and packing problems, well treated in the literature using different mathematical approaches. They are classified in two main clusters: exact and heuristic/metaheuristic. Due to NP-hardness of the problem [2], exact optimization techniques [3, 4] can not be used in real-world applications because, in general, they require a huge amount of boxes, and consequently high computational burden. Also heuristic techniques [5], out of a metaheuristic framework, are not deemed as suitable choices, because they are often too greedy, and usually get trapped in a local optimum very far from the global one. For these purposes, in [6], we dealt with this problem by using a metaheuristic framework because of its capacity to try to escape from local minima, exploring more thoroughly the solution space while reducing computational time. In particular, a Hybrid Genetic Algorithm (HGA) was proposed, using a weighted-sum mono-objective fitness function. By analyzing a proper case study, the proposed HGA proved to be very efficient and flexible for all input instances. However, it is possible to improve it even more. In fact, the weight vector related to the fitness function was tuned through trial-and-error techniques, without guaranteeing its optimality. Furthermore, it was not function of the input instances, but it was constant for each order. The aim of this work is to improve the aforementioned algorithm, learning the optimal weight vectors related to each weighted-sum mono-objective fitness function encountered during the optimization process, using a feedforward neural network and relying on a new algorithm structure. This paper is organized as follows: in Sect. 2, a revisitation and a synthesis of the main arguments addressed in [6] is presented. In Sect. 3, a brief summary of feedforward neural networks is shown, discussing how they can be adapted to our problem. In Sect. 4, the complete structure of the new algorithm is presented. Finally, in Sect. 5, simulation results are presented and discussed, while in Sect. 6 some final considerations and plans for future activities are described.

2 Hybrid Genetic Algorithm In order to get the paper reading clearer, this section proposes a revisitation and a synthesis of the main arguments addressed in [6].

Combining Hybrid Genetic Algorithms and Feedforward Neural …

3

2.1 Chromosome Structure Let B = {b1 , b2 , ..., bs } be an ordered set containing all possible topology of boxes, where each element is characterized by a height, widht and depth value. Let U = {u 1 , u 2 , ..., u m } be a high-cardinality set of orders. Considering an order order, it is u i and defining u ic as the number of boxes of type bc present s in the i u representing possible to write u i = [u i1 , u i2 , ..., u is ]. Moreover, let n i = c=1 c the total number of boxes in the order. Given an order u i , a HGA chromosome cdi ∈ {c1i , ..., cli } represents an input boxes sequence and it is composed of n i genes, i i i , ..., xd,n , where xd,k ∈ {1, ..., n i } denotes the k-th box placed. To hopely obtain xd,1 i a wider solution range, the initial population is composed by chromosomes sorted in a decreasing, increasing and random-volume order.

2.2 Evaluation Process The chromosome evaluation process is performed by solving a set of constrained minimization problem, one per each box of the input sequence, with respect to both anchor points and box orientations. The first ones are generated according to the heuristic procedure reported in the following. Regarding orientations, only rotations along the main orthogonal axes of the box frame are allowed. Moreover, it should be noticed that, for some types of boxes, there may be orientations that make it fragile.

2.2.1

Heuristic Procedure

The initial set of candidate anchor points represents a parameter of the algorithm. Given a box to be placed, a feasibility check is required for each pair (anchor point, orientation). It consists in satisfying a set of geometric, fragility and stability constraints. It is fully reported in our previous paper. Only among all feasible pairs, the minimizer of a suitable fitness function is chosen. Once the box is placed, its vertices are added to the set of candidate anchor points for the next box of the sequence.

2.2.2

Fitness Function

The aforementioned suitable fitness function can be written as: f k = w1 f 1,k + w2 f 2,k + w3 f 3,k + w4 f 4,k

(1)

where wh are normalized positive weights (h ∈ {1, ..., 4}) and f h,k represents normalized fitness function. More in details, once the k-th box is placed:

4

G. Ancora et al.

– f 1,k represents the maximum height of the boxes; – f 2,k denotes the height of the center of mass of the boxes; – f 3,k indicates the Euclidean norm between the chosen anchor point and the center of the pallet; – f 4,k represents the number of fragile boxes on the pallet. In order to evaluate the whole chromosome, a fitness function f i,d is needed and obtained once all boxes are placed. It will be: f i,d = f ni .

(2)

3 Feedforward Neural Network An artificial neural network (ANN) contains a number of artificial neurons, and uses them to identify and store information [7]. It consists of input and output layers, as well as (in most cases) one or more hidden layer(s) consisting of units that transform the input into something that the output layer can use. In Fig. 1, the complete structure of an artificial neuron is shown. Except for the neurons of the input layer, each neuron uses a nonlinear activation function. The most used nonlinear activation functions are the Rectified Linear Unit (ReLU), Sigmoid and Tanh activation functions [8]. To address our problem, a particular class of ANN, represented by feedforward neural network (FNN) is used. It is a fully connected network moving only in forward direction (no loopback). An example of FFN with three hidden layers is shown in Fig. 2. For the training process, a supervised learning technique is utilized [9, 10]. The dataset creation is discussed in Sect. 4.1.1.

Fig. 1 Structure of an artificial neuron

Combining Hybrid Genetic Algorithms and Feedforward Neural …

5

Fig. 2 FNN with three hidden layers

4 A New Two-Stage Algorithm The two-stage algorithm proposed in this work consists of a coupling between the previous HGA and a FNN. The latter is exploited to learn, as a function of the input instances, the optimal weight vector related to the local HGA fitness function and the optimal weight vector related to a new global sum-weighted objective function. The aim of the first stage is to create a rich dataset in order to train, validate and test the neural network. The HGA is run several times on each order within a large set of packing instances, using a different fitness weight vector at each iteration, with the aim of generating a rich set of solutions. Once the set is created, the best solution is chosen for each order, optimizing a new global sum-weighted objective function. Relying on a graphical user interface, it is possible to choose the optimal weight vector related to the optimized global objective function, showing in real-time the dependence of the optimal solution on the weight vector. Moreover, it is possible to keep track of the weights vector associated to the local HGA fitness that generated the chosen optimal solution.

4.1 First Stage 4.1.1

Dataset Creation

Let W = {w1 , w2 , ..., w p } be a set of p local weight vectors related to the local weighted-sum fitness function optimized in the chromosome evaluation process of the HGA. In particular, the j-th local weight vector is w j = [w j,1 , w j,2 , w j,3 , w j,4 ], where j ∈ {1, 2, ..., p} and w j,h ∈ [0, 1], ∀h ∈ {1, 2, 3, 4}. These vectors are found relying on trial-and-error techniques and they are able to cover a wide range of

6

G. Ancora et al.

configurations during the palletization process. Oppositely to what happened in our previous algorithm, in which only one solution (the optimal one) was generated as output, our aim now is to generate a rich set of solutions for each order in U , running the HGA p times, varying w j and considering the ri best solutions founded during HGA process at each iteration. Notice that, for each solution, we keep track of the local weight vector that generated it. Considering the order u i , the HGA fitness function in (1) for the local weight vector w j can be rewritten as: f j,k = w j,1 f 1,k + w j,2 f 2,k + w j,3 f 3,k + w j,4 f 4,k .

(3)

This extended version of (1) is optimized when the k-th box of a generic chromosome cdi has to be placed (k ∈ {1, 2, ..., n i }), fixing w j as local weight vector. In order to evaluate the whole chromosome, an extended version of (2) is needed and obtained once all boxes are placed. It will be: f ji,d = f j,ni .

(4)

Once all p simulations related to the i-th order are completed, a set Si = i i i , cbest,2 , ..., cbest,b } containing the best b = p · ri solutions is created. The opti{cbest,1 mal solution will be the one that minimizes a new global weighted fitness, calculated i belonging to Si . It can be written as: for each chromosome cbest,z g i = w1i g1 + w2i g2 + w3i g3 + w4i g4 + w5i g5

(5)

where gt are new normalized fitness functions and wti are normalized positive global weights (t ∈ {1, ..., 5}). In particular: – g1 represents the ratio between the total volume of the palletized boxes and the volume of the associated convex hull; – g2 represents the number of fragile boxes; – g3 represents the stackability index of the pallet. It is calculated as: g3 = β · (1 − Astack ) · D

(6)

where: – β is a boolean value. It is set to 1 if all boxes at maximum height are not fragile. Otherwise, it is 0. In the case of β = 1, the fragility constraint ensures that no fragile boxes will be damaged by the stacking process. – Astack indicates the normalized area of the convex hull Cstack of the set Sstack , containing the vertices of the top face of each boxes at maximum height; – D represents the normalized Euclidean distance between the geometric center of the pallet plane and the projection of the Cstack centroid on it. – g4 and g5 represent the global strapping indices along the width and length directions of the loaded pallet. Let x, y and z denote the three orthogonal axes related

Combining Hybrid Genetic Algorithms and Feedforward Neural …

7

to the width, length and height directions of the loaded pallet. Furthermore, let Sproj be a set containing the projections onto the xy-plane of the vertices of the placed boxes and C x y = (C x , C y ) the centroid of its convex hull. The two indices are calculated, respectively, as: 

g4 = 1 − min(sx,1 , sx,2 , ..., sx,ni ) g5 = 1 − min(s y,1 , s y,2 , ..., s y,ni )

(7)

where sx,k and s y,k represent the normalized strapping indices along x-axis and y-axis of the k-th box of the sequence. Considering the projection Bx y,k = (Bx,k , B y,k ) onto the x y-plane of the centroid of the k-th box, the index sx,k (s y,k ) can be calculated as follows: – Case Bx,k = C x (B y,k = C y ): some contact forces act along the x-axis (y-axis) during the strapping process. Let Fx (Fy ) be the box face parallel to the yz-plane (x z-plane) and such that its centroid is a minimum distance from C x y . Let A x (A y ) and Atouch,x (Atouch,y ) be, respectively, the area of Fx (Fy ) and the area of the portion of Fx (Fy ) touching other boxes. It is possible to write: sx,k = Atouch,x Ax   Atouch,y s y,k = A y . – Case Bx,k = C x (B y,k = C y ): no contact forces act along the x-axis (y-axis) during the strapping process. Then, sx,k = 1 (s y,k = 1). Relying on a graphical interface that permits to modify the global weights and to show in real-time the four best solutions in Si , it is possible to create a target vector yi related to the input vector u i to be included into a dataset used for the training, validation and testing phase of the neural network. In particular, let W¯ gi = [w¯ 1i , w¯ 2i , w¯ 3i , w¯ 4i , w¯ 5i ] be the selected global weight vector related to the global fitness g i and W¯ li = [w¯ ij,1 , w¯ i,j,2 , w¯ ij,3 , w¯ ij,4 ] the local weight vector that generated the chosen optimal solution. The target vector for the input vector u i will be y i = [W¯ li , W¯ gi ]. Once all these tasks are completed for all orders in U, the entire dataset can be created: D = [(u 1 , y1 ), (u 2 , y2 ), ..., (u m , ym )].

4.1.2

Training Process

The dataset D is randomly shuffled and divided into three subsets, using the following percentages: – Training set Dtrain : 60%; – Validation set Dvalid : 20%; – Test set Dtest : 20%. For the training process, the backpropagation algorithm is adopted, using the mean square error (MSE) as loss function to be minimized. In order to avoid overfitting, the early stopping criterion is adopted [11], calculating the MSE on the validation set

8

G. Ancora et al.

after each training epoch and stopping the training process when it tends to increase. When the training process is completed, the generalization performance of the FNN is calculated on the test set. The results are reported in Sect. 5.

4.2 Second Stage Let Utest = {u 1test , u 2test , ..., u m test } be a set containing all the packing instances associated to the test set Dtest . Once the test phase is successfully completed, the trained FNN is now able to determine the optimal local weight vector W¯ li and the optimal global weight vectors W¯ gi , for each order u itest ∈ Utest . By doing this, the HGA will run only once using the predicted local weight vector W¯ li and, once the solution set is generated, the optimal solution will be chosen using the predicted global weight vector W¯ gi .

5 Results The above algorithm is coded in Python 3.7, running on a 3.6 GHz Intel Core i9 octacore processor with 64 GB RAM. The set U of real packing instances is provided by an industrial company, as well as the box types and pallet types databases. In order to fully exploit the processor power, U is split into eight subset to be simulated in parallel on each core.

5.1 HGA Parameters The HGA parameters are set as follows: – – – –

Number of chromosomes = 30; Number of generations = 8; Number of chromosomes randomly selected in the selection process: n r s = 3; Crossover probability: Pc (g) = [0.3, 0.28, 0.25, 0.23, 0.2, 0.18, 0.15, 0.1]; – Mutation probability: Pm (g) = [0.7, 0.65, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]; – Local weight vectors: See Table 1.

Combining Hybrid Genetic Algorithms and Feedforward Neural … Table 1 Local weight vectors j w j,1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1.0 1.0 0.9 0.8 0.9 0.5 0.8 0.4 0.6 0.2 0.2 0.8 0.9 1.0 0.9 1.0 0.2 0.8 0.5

9

w j,2

w j,3

w j,4

0.7 1.0 0.9 0.6 0.2 0.1 0.7 0.4 0.8 0.2 0.4 0.7 0.7 0.5 0.6 1.0 0.2 0.7 0.5

0.5 0.1 0.3 0.9 0.8 0.4 0.6 0.6 0.2 1.0 0.2 0.7 0.4 0.2 0.1 0.0 0.0 0.0 0.0

0.9 0.1 0.1 0.7 0.5 0.5 0.5 1.0 0.8 1.0 0.3 1.0 0.4 0.3 0.5 0.1 1.0 0.5 0.5

5.2 FNN Parameters The FNN parameters are set as follows: – – – – – – –

Number of hidden layers: 1; Number of neurons in the input layer: 122 + bias; Number of neurons in the hidden layer: 500 + bias; Number of neurons in the output layer: 9; Activation function: ReLu; Loss function: Mean Squared Error; Stopping criterion: Early Stopping.

5.3 Input Instances and Graphical Results The set U, containing real packing instances, is provided by an industrial company. The number of boxes of each instance can vary from 1 to 83 and, on average, 15 pallets per hour should be arranged. Moreover, the frequency distribution of the

10

G. Ancora et al.

different dimension box typologies of all instances is shown in Fig. 3. The dataset D, generated from U, contains 675 samples. The related subsets, Dtrain , Dvalid and Dtest , contain 405, 135 and 135 samples, respectively.

5.3.1

Training and Validation Processes

The training process proves to be very efficient, resulting in a training time of 1.81 s. Moreover, it turns out to be robust with respect to the introduction of new typology of boxes in B, that leads to an increment of the overall network complexity. The MSE evaluation on both Dtrain and Dvalid is reported in Fig. 4, as function of the epochs number of the FNN training process. As expected, the error evaluated on Dtrain is a decreasing function, while the error evaluated on Dvalid , is decreasing until 17-th

Fig. 3 Frequency distribution of the different dimension box typologies of all packing instances

Fig. 4 MSE evaluation on training set and validation set

Combining Hybrid Genetic Algorithms and Feedforward Neural …

11

epoch and, then, it starts to increase, highlighting an overfitting phenomenon. The early stopping criterion will stop the training process once the 17-th epoch is reached.

5.3.2

Test Process

After the FNN training process is complete, it is possible to evaluate the MSE on the test set Dtest and it results equal to M S E test = 0.0258. The comparison between the target weight vector y i = [W¯ li , W¯ gi ] and the predicted weight vector yˆ i = [Wˆ li , Wˆ gi ] is reported in Tables 2 and 3, for some relevant test instances. Notice that the target

Table 2 Target versus predicted local weight vector Order Local weight vector 1 1 2 2 3 3 4 4 5 5 6 6

[0.9, 0.2, 0.8, 0.5] [0.86, 0.18, 0.61, 0.55] [1.0, 0.7 0.5, 0.9] [0.89, 0.63, 0.54, 0.86] [0.6, 0.8, 0.2, 0.8] [0.56, 0.71, 0.34, 0.71] [1.0, 0.5, 0.2, 0.3] [1.28, 0.57, 0.14, 0.26] [0.8, 0.7, 0.7, 0.1] [0.83, 0.75, 0.59, 0.86] [0.9, 0.9, 0.3, 0.1] [0.99, 0.82, 0.39, 0.17]

Table 3 Target versus predicted global weight vector Order Global weight vector 1 1 2 2 3 3 4 4 5 5 6 6

[0.7, 0.5, 0.4, 0.4, 0.4, 1.0] [0.68, 0.49, 0.42, 0.38, 0.38, 0.97] [0.7, 0.5, 0.4, 0.4, 0.4, 1.0] [0.69, 0.49, 0.39, 0.39, 0.39, 0.99] [0.7, 0.5, 0.4, 0.4, 0.4, 1.0] [0.71, 0.47, 0.41, 0.40, 0.38, 0.99] [0.7, 0.5, 0.4, 0.4, 0.4, 1.0] [0.67, 0.51, 0.37, 0.38, 0.38, 0.98] [0.7, 0.5, 0.4, 0.4, 0.4, 1.0] [0.73, 0.52, 0.39, 0.42, 0.43, 1.01] [0.7, 0.5, 0.4, 0.4, 0.4, 1.0] [0.81, 0.57, 0.46, 0.45, 0.45, 1.14]

Type Target Predicted Target Predicted Target Predicted Target Predicted Target Predicted Target Predicted

Type Target Predicted Target Predicted Target Predicted Target Predicted Target Predicted Target Predicted

12

G. Ancora et al.

Table 4 Simulation time (S), number of different dimension box typologies (M) and total number of boxes (N ) reported for some test instances Order M N S (min) 1 2 3 4 5 6

1 1 2 2 2 2

Table 5 Time simulation statistics on subset Dtest

44 21 28 27 50 27

1.529 0.187 13.183 15.190 14.540 14.797

Mean value

1.654 min

Standard deviation Minimum value Maximum value

4.176 min 0.004 min 20.632 min

global weight vector is the same for each order in U, meaning that it fits very well the type of packaging carried out by the company. The optimal solution of these instances is depicted in Figs. 5a–b. They are found running the second stage of the algorithm directly on the relative predicted fitness weight vector. The red and blue box borders indicate, the fragility or non-fragility of the box, respectively. The related simulation time (S), the number of different dimension box typologies (M) and the total number of boxes (N ) are reported in Table 4, while useful time simulation statistics, calculated on the whole subset Dtest , are shown in Table 5.

6 Conclusions To solve the “Distributor’s Pallet Packing Problem”, in real-world scenario, an extension of our previous algorithm has been presented. In particular, a HGA coupled with a FNN has been used, taking into account geometric, stability and fragility constraints. One of the most important features of the new algorithm is its capability and efficiency to predict, for each input instance, the optimal weight vectors of all fitness functions that need to be optimized for solving the problem. Moreover, it is able to provide as output, the optimal boxes input sequence as well as the spatial coordinates of the the placed boxes vertices. The proposed algorithm proves to be very efficient and flexible for all the input instances as shown by the simulation results. Possible future works may involve the investigation of other machine learning techniques, in order to compare the various performance. Moreover, new practical constraints and new fitness functions could be added to the model, increasing the quality of the solutions even more.

Combining Hybrid Genetic Algorithms and Feedforward Neural …

(a) Test Instance 1.

(b) Test Instance 2.

(c) Test Instance 3.

(d) Test Instance 4.

(e) Test Instance 5.

(f) Test Instance 6.

Fig. 5 Optimal solution related to some relevant test instances

13

14

G. Ancora et al.

References 1. Dyckhoff, H.: A typology of cutting and packing problems. Eur. J. Oper. Res. 44, 145–159, 01 (1990) 2. Yaman, H., Sen, ¸ A.: Manufacturer’s mixed pallet design problem. Eur. J. Oper. Res. 186(2), 826–840 (2008) 3. Bischoff, E., Janetz, F., Ratcliff, M.: Loading pallets with non-identical items. Eur. J. Oper. Res. 84(3), 681–692 (1995). Cutting and Packing 4. Junqueira, L., Morabito, R., Yamashita, D. S., Yanasse, H.H.: Optimization models for the three-dimensional container loading problem with practical constraints. In: Fasano, G., Pintér, J.D. (eds) Modeling and Optimization in Space Engineering, Springer Optimization and Its Applications, ch. 0, pp. 271–293. Springer (2012) 5. Bischoff, E.E., Marriott, M.D.: A comparative evaluation of heuristics for container loading. Eur. J. Oper. Res. 44(2), 267–276 (1990). Cutting and Packing 6. Ancora, G., Palli, G., Melchiorri, C.: A hybrid genetic algorithm for pallet loading in realworld applications,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 10006–10010 . 21th IFAC World Congress (2020) 7. Das, H., Roy, P.: A Deep Dive into Deep Learning Techniques for solving Spoken Language Identification Problems in Speech Signal processing, pp. 81–100. 12 (2018) 8. Nwankpa, C., Ijomah, W., Gachagan, A., Marshall, S.: Activation functions: comparison of trends in practice and research for deep learning 12 (2020) 9. von der Malsburg, C.: Frank rosenblatt: Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Brain Theory, pp. 245–248, 01 (1986) 10. Rumelhart, D.E., McClelland, J.L.: Learning Internal Representations by Error Propagation, pp. 318–362 (1987) 11. Morgan, N., Bourlard, H.: Generalization and parameter estimation in feedforward nets: some experiments. In: Touretzky, D. (ed) Advances in Neural Information Processing Systems, vol. 2. Morgan-Kaufmann (1990)

Complete and Consistent Payload Identification During Human-Robot Collaboration: A Safety-Oriented Procedure Saverio Farsoni and Marcello Bonfè

Abstract The paper proposes a procedure to provide a complete and physicallyconsistent estimation of mass, center of mass and inertia tensor of the payload attached to the end-effector of an industrial manipulator equipped with a force/torque sensor. The procedure involves the generation of an artificial potential field that allows the proper excitation of the payload inertial parameters while avoiding static and dynamic obstacles, thus ensuring a safe and collaborative scenario. The adopted identification algorithm consists in the solution of a constrained non-linear optimization problem that guarantees the physical consistency of the inertial parameters. The proposed approach has been validated by simulating a typical collaborative workcell where a Franka-Emika Panda robot performs the procedure while avoiding dynamic obstacles. Keywords Payload identification · Collaborative robotics · Dynamic collision avoidance · Non-linear optimization

1 Introduction The collaboration between humans and robots is a scenario that is becoming more and more widespread in the manufacturing industry, in particular in terms of sharing common spaces during working activities that could also involve the physical Human-Robot Interaction (pHRI) [3]. In this context, the safety of humans has to be strictly ensured, as asserted in the government regulations provided by the ISO 10218 standard [9] and more recently by the technical specification ISO/TS 15066 [10]. Furthermore, the above-mentioned regulations define four collaborative operating modes: Safety-rated Monitored Stop (SMS), Hand Guiding (HG), Power and Force Limiting (PFL) and Speed and Separation Monitoring (SSM). HG and PFL represent challenging task from a control engineering point of view because of the S. Farsoni (B) · M. Bonfè Department of Engineering, University of Ferrara, Ferrara, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_2

15

16

S. Farsoni and M. Bonfè

necessity to accurately estimate the external forces and torques due to contacts or collisions and distinguish them from the forces generated by the robot motion and by the attached payload. Indeed, the inertial properties (i.e. mass, center of mass and inertia tensor) of the robot itself and its payload introduce non-negligible effects which have to be taken into account for implementing model-based or impedance control schemes, as addressed in [1]. However, while the inertial parameters of an industrial manipulator could be provided by the manufacturer calibration, the parameters of the payload are rarely available, especially when it is an heterogeneous assembly of parts with complex geometry. Although in some cases the payload parameters can be computed by means of CAD software tools, the experimental identification of such parameters is considered the most reliable way to cope with the uncertainties in geometry, mass distribution and material density. Overviews on experimental identification methods for rigid bodies can be found in [15] or in [20] with a focus on robotic systems. Several identification procedures provide the estimation of the inertial parameters exploiting the measurements of a wrist-mounted Force/Torque (F/T) sensor that are fused with the measured/estimated velocities and accelerations of the payload typically by means of least-squares algorithms [6, 12]. It is worth observing that such methods involve the execution of robot motions that are specifically designed for the adopted identification algorithm. In particular, it has been demonstrated that sinusoidal trajectories with high frequency and amplitude provide the proper excitation of the inertia parameters [11]. As a consequence, such desired trajectories result in motions that are only limited by the mechanical capabilities of the robot and, if not carefully planned, they may cause unexpected dangerous collisions with obstacles and may even jeopardize the safety of human operators in collaborative workcells. As an example, the usage of industrial manipulators mounted on a mobile base is increasing its diffusion in automated warehouses for optimizing the transportation and the distribution of goods [2]. In these cases, the robot requires an open workspace and it is commonly programmed for generalpurpose applications, involving the manipulation of a variety of objects with different weight and geometry, difficult to know in advance. A previous work by the authors address these issues by proposing an identification procedure that involves the usage of a collision-free path planner and a sequence of specific test trajectories designed to excite the more significant inertial parameters one by one, thus resulting in more restrained and safe robot motions [4]. Furthermore, an interesting aspect that has been recently investigated is the physical consistency of the identified parameters. In [19] the authors propose a constrained optimization problem whose solution ensures that the set of identified inertial parameters can always belong to a rigid body. In the light of these considerations, the aim of the paper is to improve the results of [4] by adopting a novel motion planning algorithm for the identification procedure, that allows on one hand to avoid collisions with moving entities, thus ensuring the safety of human operators and, on the other hand, to generate specific test trajectories that take into account safety bounds on velocity and acceleration while providing the proper excitation of all of the inertial parameters. Moreover, the adopted identification approach integrates the algorithm proposed in [19], which has been adapted here to

Complete and Consistent Payload Identification During …

17

estimate the payload inertia tensor while providing the physical consistency of all the identified parameters. The remainder of the paper is organized as follows: Sect. 2 describes the adopted motion planner that generates the reference trajectories for the robot controller; Sect. 3 presents the algorithm exploited to provide a physicallyconsistent set of inertial parameters; Sect. 4 describes the setup used in the simulation experiments and shows the obtained results; Sect. 5 draws the final remarks and discusses possible further developments.

2 Motion Planning with Dynamic Collision Avoidance In the context of a collaborative application, where humans can move inside the robot workspace in an unpredictable way, the robot should be able to plan its motion online reacting to the obstacle movements for avoiding undesired collisions. Therefore, a noteworthy assumption for dynamic motion planners is that at least the position of the moving entities is continuously monitored at a proper rate. For this purpose, collaborative workcells may be equipped with specific sensors to detect the presence of humans and track their positions [7]. The proposed motion planning algorithm relies on the use of artificial potential fields [17]. It is assumed that the reference signal provided to the robot controller is in the form of the joint target velocities and it mainly consists of the sum of two kind of contributions, one deriving from a desired motion and the other from a repulsive potential field. The former contribution aims at generating the proper excitation trajectories for the inertial parameters, while the repulsive contribution takes care of creating a barrier potential for all the robot links nearby the convex regions that include the obstacles at the considered time step. In more detail, the excitation trajectories are designed as follows: x˙ ref (t) = [vx (t), v y (t), vz (t), ωx (t), ω y (t), ωz (t)]ref With:

(1)

⎧ ⎪ ⎪ ⎨

vx (t) = v y (t) = vz (t) = 0 n ax,i sin (iωt) + bx,i cos (iωt) ωx (t) = i=0 n a y,i sin (iωt) + b y,i cos (iωt) ω y (t) = i=0 ⎪ ⎪ ⎩ n az,i sin (iωt) + bz,i cos (iωt) ωz (t) = i=0

(2)

where x˙ ref is the desired velocity twist of the end-effector frame, consisting of null linear velocities and periodic angular velocities. ai , bi , ω, n are design parameters chosen on the basis of the robot features. The excitation trajectories are defined in the Cartesian space and they have to be mapped into the robot joint space to be used as the target component of the robot motion. For this purpose, the Saturation in the Null Space (SNS) algorithm [8] has been adopted, as it provides a solution for real-time generation of joint motion taking into account time-varying velocities/accelerations constraints. Such an algorithm is specifically designed for redundant robots and it

18

S. Farsoni and M. Bonfè

integrates a task-scaling procedure executed when the original task is found to exceed the desired constraints. Remark 1 The bounds on the joint velocities should be designed not only on the basis of the physical capabilities of the robot, but also considering a conservative scale factor that takes into account the additional repulsive contribution entering in the final computation of the velocity command (see Eq. (6)). Therefore, the target component due to the exciting potential field is computed as: e

q˙ ref (t) = SNS(˙xref (t), q(t))

(3)

where SNS indicates the Saturation in the Null Space method of [8] and q(t) is the current robot joint configuration. In the following the time dependence of the variables is omitted when the context is clear. The repulsive potential field generates an action trying to move away from obstacles the part of the robot that is currently closest to obstacles. It is assumed that the region of workcell corresponding to the obstacles has been partitioned into p convex components. For each of these convex regions, the repulsive component can be computed as explained e.g. in [17]:  r

q˙ i =

kr ( 1 ηi2 (q) ηi (q)

1 λ−1 ) ∇ηi (q) η0,i



if ηi (q) < η0,i

0 otherwise

i = 1, . . . , p

(4)

where ηi (q) is the minimum distance between the i-th convex region containing a robot link or the payload and the obstacles. η0,i is the range of influence of the i-th region, so that if the corresponding minimum distance to the obstacles is greater than such threshold, then the i-th repulsive component is null. kr > 0 and λ ≥ 1 are design parameters. ∇ηi (q) is the gradient vector of the minimum distances towards the i-th obstacle. Therefore, the total repulsive component is computed as the sum of the contributions generated by the p obstacles: r

q˙ ref =

p 

r

q˙ i

(5)

i=1

Finally, the robot target velocities at the time instant t become the sum of the exciting and the repulsive components, as follows: q˙ ref (t) = e q˙ ref (t) + r q˙ ref (t)

(6)

It is worth observing that the robot, even if commanded with (6), may still fail to avoid collisions in case that the obstacles are moving too fast or they are arranged too densely in the robot workspace. Therefore, in order to ensure the safety of the human operators and of the physical objects in the workcell, a watch-dog procedure

Complete and Consistent Payload Identification During …

19

has to be implemented to stop the robot motion when the minimum distance to the closest obstacle overcomes a safety threshold.

3 Physically Consistent Parameter Identification The dynamic model of the robot payload can be written linearly with respect to a compact representation of its inertial parameters (i.e. the⎤mass m, the center of mass ⎡ Ix x Ix y Ix z c = [cx , c y , cz ] and the inertia tensor I = ⎣ Ix y I yy I yz ⎦) as follows: Ix z I yz Izz ˙ g) F = V(a, ω, ω,

(7)

where F = [Fx , Fy , Fz , τx , τ y , τz ] is the vector containing the forces and torques which the payload is subjected to, V is a 6×10 matrix depending on the gravity vector g and on the payload kinematics variables (i.e. the linear accelerations a and the angu˙ The vector  = [m, mcx , mc y , mcz , Ix x , I yy , lar velocities ω and acceleration ω). Izz , Ix y , Ix z , I yz ] groups the payload inertial parameters. The formulation (7) assumes that all the involved variables are expressed w.r.t. the payload frame. Therefore, classical identification techniques consider the parameters of a rigid body to be an element of the Euclidean space 10 and pose the identification problem as a linear least-squares optimization problem [16]. Such an approach, although it is computationally convenient, does not ensure the physical consistency of the provided results, as it neglects the fact that not all vectors of 10 can represent the set of inertial parameters of a rigid body. In more detail, a rigid body can generate a set of inertial parameters if the following conditions are satisfied [18]: – the mass is positive: m > 0 – the inertia tensor at the center of mass is symmetric and positive definite: Ic = IcT > 0 Moreover, when Ic is computed w.r.t. the principal axes of inertia, it becomes a diagonal matrix with non-null positive elements Ix , I y , Iz that have to satisfy the following triangular inequality: Ix ≤ I y + Iz ,

I y ≤ Ix + Iz ,

Iz ≤ I y + Ix

(8)

The identification method proposed in [19] provides a set of physically consistent parameters, by changing the representation of the inertial parameters into a parametrization for which it is possible to define a constrained optimization problem whose solution satisfies the above-mentioned conditions. In particular, the proposed parametrization  involves 16 parameters defined by:

20

S. Farsoni and M. Bonfè

– the mass m; – the center of mass c; – the rotation matrix Q between the payload reference frame and the frame of the principal axes of inertia; – the vector I = [Ix , I y , Iz ] containing the element of the diagonal inertia tensor expressed w.r.t. the principal axes of inertia. Then, this different parametrization can be mapped into the corresponding vector  of the payload parameters by means of the function π() defined as follows: ⎡

⎤ m ⎡ ⎤ ⎢mc⎥ m ⎢ ⎥ ⎢ ⎥ ⎦ mc  = ⎢ Ix x ⎥ = π() = ⎣ ⎢ .. ⎥ T vech(Qdiag(I)Q − mS(c)S(c)) ⎣ . ⎦ I yz

(9)

where vech(·) is the operator that serializes into a vector the elements of the uppertriangular part of a symmetric matrix, diag(·) creates a diagonal matrix from the input vector and S(·) denotes the skew-symmetric matrix-value operator, so that S(u)v = u × v, with u, v ∈ 3 . The physical consistency is then ensured by defining equality and inequality constraints on , in particular: – the mass must be positive: m > 0, – Q must be an orthogonal matrix: QQT = I3×3 , det(Q) = 1 – the element of I must be positive: Ix , I y , Iz > 0 The usage of the parametrization , altough it allows to easily define conditions that ensure the physical consistency, causes the loss of linearity in the dynamic model of the payload. Indeed the model of (7) becomes: ˙ g)π () F = V(a, ω, ω,

(10)

where the function π () introduces non-linearities in the model. Therefore, the identification of the inertial parameters of the payload can be posed as a constrained non-linear optimization problem which considers the above-mentioned constraints ˆ as follows: on  and provides the optimal  ˆ : argmin  

N 

||Fi − V(ai , ωi , ω˙ i , gi )π()||2

(11)

i=1

With  subject to the defined constraints. N is the number of obstervations in which the Fi are measured by a wrist-mounted F/T sensor and ai , ωi , ω˙ i , gi may be measured by appropriate sensors or estimated e.g. by means of a Kalman filter on the end-effector positions, as described in [5].

Complete and Consistent Payload Identification During …

21

Measurement acquisition ̇

Robot

Distance computation m

F/T sensor

, ̇

Kalman Filter , , ̇

Saturation in the Null Space

Payload identification

̇ ̇

Non-linear constrained optimization

Collision avoidance

∑ Motion planner

Fig. 1 The block diagram of the overall safety-oriented identification procedure. The red group of blocks involves the acquisition of the required measurements from the workcell. The blocks inside the blue group represent the motion planning task with dynamic collision avoidance. The yellow group contains the blocks involved in the identification of the inertial parameters

Finally the block diagram of Fig. 1 shows the overall procedure that provides the inertial parameters while taking care of ensuring the human safety during the robot motions.

4 Simulations For validating the presented payload identification method, a typical collaborative workcell has been simulated. The robot involved in the simulations is the 7-DOF Franka-Emika Panda collaborative Robot, equipped with a wrist-mounted F/T sensor and carrying a payload of 3 Kg, which has to be identified by executing the proposed procedure while avoiding moving obstacles.

22

S. Farsoni and M. Bonfè

4.1 Software Setup The workcell has been simulated in CoppeliaSim [14], a robotic simulation framework whose main features are: – Robot models: 3D and kinematic models of several manipulators can be included in a simulation; – Scripting: Lua and C++ regular APIs can be used to create customizable scripts associated to the scene objects; – Minimum distance calculation: distances between any scene object can be efficiently computed; – Remote Interfacing: thanks to the Remote API framework the simulation can establish a client-server TCP/IP socket to interact and exchange data with an external application as a Matlab script. The main idea is to use CoppeliaSim for creating the 3D scene and then online computing the minimum distances between links and obstacles, acquiring the measurements of the F/T sensor and the joint positions (i.e. the red group of blocks in the diagram of Fig. 1). Such data are transmitted to a Matlab script that computes the q˙ ref signal to command the robot motion and performs the identification procedure (i.e. the yellow group of blocks in Fig. 1). The Matlab Optimization Toolbox provide several functionalities to define and solve an optimization problem, in particular the non-linear constrained problem (11) has been processed using the fmincon function which implements, among others, the interior-point algorithm [13]. The procedure should stops when all the identified values reach a steady-state. The parameters of the exciting sine waves are changed every 5 s to increase the variability of the payload motions. It is worth observing that the computation of ∇ηi (q) required in (4) is performed by a CoppeliaSim script simulating a small displacement of the considered joints and calculating the variation of the minimum distances to the i-th obstacle. Moreover, the current values of the payload velocities and accelarations, required to compute the matrix V in the optimization problem of Eq. (11), are estimated on the basis of the current joint positions by using a Kalman filter implemented as described in [5]. The Matlab script also implements the motion planning algorithm described in Sect. 2 (i.e. the blue group of blocks in Fig. 1). The workcell implemented in CoppeliaSim is depicted in Fig. 2, in which the main components of the simulation are highlighted.

4.2 Results The simulations results show that, by properly tuning the coefficients of the repulsive potential field r q˙ ref , the robot could avoid the collision with a cuboid-shaped obstacle moving at 1 m/s (i.e. simulating a walking human). In more detail, as shown in Fig. 3 the most critical minimum distances, i.e. those between the payload and

Complete and Consistent Payload Identification During …

23

Fig. 2 The simulated Panda robot while performing the safety-oriented identification procedure

Fig. 3 The evolution of the most critical robot-obstacle minimum distances monitored during the identification procedure

24

S. Farsoni and M. Bonfè

Fig. 4 The final velocity command (red line) consisting in the sum of a repulsive component (black line), that prevails when the obstacles are close to the robot, and an exciting component (blue line), that prevails when the robot moves away from obstacles. Note also that the discontinuities in the commanded velocities are due to the fact that every 5 s the coefficients of the exciting sinusoidal velocities are recomputed to increase the randomness of the signal

the obstacles and between the sixth robot link and the obstacles are initially under the safety threshold set by the designer. Then, the repulsive component of the commanded velocities makes the minimum distances increase until they overcome the threshold. Afterward, the moving obstacle enters into the robot workspace causing the distances to decrease again under the threshold so that the motion planner has to react by imposing the dominant repulsive component in the velocity command. The graphs of Fig. 4 depict the comparison between the exciting component, the repulsive component and the final joint velocity commands. It can be noticed that when the

Complete and Consistent Payload Identification During … Table 1 Identification results Parameter Reference m (kg) mcx (kg · m) mc y (kg · m) mcz (kg · m) I x x (kg · m2 ) I x y (kg · m2 ) I x z (kg · m2 ) I yy (kg · m2 ) I yz (kg · m2 ) Izz (kg · m2 )

3 0.3 0.3 0.15 0.0463 −0.03 −0.0150 0.0445 −0.0150 0.0850

25

Identified

Error (%)

2.9998 0.2997 0.3003 0.1500 0.0487 −0.0329 −0.0141 0.0454 −0.0156 0.0816

0.0045 0.07 0.10 0.0096 5.48 9.83 5.61 2.058 4.48 3.9175

obstacles are close to the robot, the repulsive effect prevails over the exciting one until the robot is driven in a safe configuration far away from the obstacles. Indeed, when all the minimum distances are greater than the thresholds, the final command overlaps with the exciting sinusoidal velocities. The results of the payload identification are summarized in Table 1. The identified values become stable after 35 s of simulation. The percentage errors of the identified mass and center of mass are almost null (< 0.1%), while they increase in the identification of the inertia tensor parameters (< 10%). The results are comparable with those obtained experimentally in [4] for a similar payload but with a different identification algorithm that does not provide the estimation of Ix y , Ix z , I yz and does not address the problem of the physical consistency of the identified parameters. Finally, the graphs of Fig. 5 show the transient towards the steady-state of the percentage error on the identified inertial parameters.

5 Conclusions The authors developed a novel procedure for the identification of the inertial parameters of the payload attached to an industrial manipulator, equipped with an additional F/T sensor. This method integrates a dynamic motion planning scheme that allows to avoid collisions between the robot and moving obstacles, thus ensuring the safe execution of the procedure also in the context of a collaborative scenario where robot and humans can share the workspace. The motion planning scheme makes use of artificial potential fields that generate on one hand an exciting component aimed at the tracking of specific test trajectories and, on the other hand, a repulsive component that moves away from the obstacles the closest robot link or the payload itself. The identification algorithm provides the payload parameters by solving a constrained non-linear optimization problem that ensures the physical consistency of the identi-

26

S. Farsoni and M. Bonfè 10

-3

m

4 3 2

0

10

30

err (%) 0

10

0

10

20

30

20

10

30

40

30

40

0

10

20

30

40

30

40

30

40

t (s) Iyy

20

0

40

20

t (s) Ixy

40

err (%) 10

30

20

0

40

20

0

0

40

t (s) Ixz

40

0

0

40

10

20

t (s) mc z

0.005

err (%)

err (%)

30

20

0

10

0.01

t (s) Ixx

30

err (%)

20

0

0.015

0.1

0.09

0.06

0.05

40

t (s) mc y

0.11

err (%)

20

mc x

0.07

err (%)

err (%)

5

0

10

t (s) Iyz

20

t (s) Izz

40

err (%)

err (%)

6 20

0

0

10

20

t (s)

30

40

4 2 0

0

10

20

t (s)

Fig. 5 The transient towards the steady-state of the percentage error on the identified inertial parameters

fied parameters. The results obtained by simulating a collaborative workcell validate the potential feasibility of the proposed method and motivates further experiments on a real experimental setup, in which the parameters of the motion planner (i.e. the bounds on joint velocities in SNS algorithm and the scaling factor of the repulsive potential components) have to be carefully tuned for improving the safety inside the shared workspace. In particular, the end-effector joints could perform faster motions as they are more involved in the generation of the payload excitation trajectories, while the motion of the base joints should be slowed down as their movement causes

Complete and Consistent Payload Identification During …

27

the robot to sweep a greater volume of the workspace, thus hazarding dangerous collisions. A further concern that has to be addressed for implementing the procedure in an industrial workcell involves the high computational time required to solve the non-linear optimization problem. Indeed its solution cannot be performed in real-time at the same rate as that required by the motion planner to command the robot, however it can be processed in parallel at a slower rate while the planner takes care of providing the velocity command to the robot controller at the maximum rate. As a future development the authors aim to improve the motion planning by incorporating both the collision avoidance and the parameter excitation tasks into a constrained optimization problem, so that a feasible collision-free motion can be provided without the usage of a conservative scale factor.

References 1. Aghili, F.: Robust impedance control of manipulators carrying a heavy payload. J. Dyn. Syst. Meas. Control 132(5) (2010) 2. Aleotti, J., Baldassarri, A., Bonfè, M., Carricato, M., Chiaravalli, D., Di Leva, R., Fantuzzi, C., Farsoni, S., Innero, G., Lodi Rizzini, D., et al.: Toward future automatic warehouses: an autonomous depalletizing system based on mobile manipulation and 3d perception. Appl. Sci. 11(13), 5959 (2021) 3. Cherubini, A., Passama, R., Crosnier, A., Lasnier, A., Fraisse, P.: Collaborative manufacturing with physical human-robot interaction. Robot. Comput. Int. Manuf. 40, 1–13 (2016) 4. Farsoni, S., Ferraguti, F., Bonfè, M.: Safety-oriented robot payload identification using collision-free path planning and decoupling motions. Robot. Comput. Int. Manuf. 59, 189– 200 (2019) 5. Farsoni, S., Landi, C.T., Ferraguti, F., Secchi, C., Bonfe, M.: Compensation of load dynamics for admittance controlled interactive industrial robots using a quaternion-based kalman filter. IEEE Robot. Autom. Lett. 2(2), 672–679 (2017) 6. Farsoni, S., Landi, C.T., Ferraguti, F., Secchi, C., Bonfè, M.: Real-time identification of robot payload using a multirate quaternion-based kalman filter and recursive total least-squares. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 2103–2109. IEEE (2018) 7. Ferraguti, F., Landi, C.T., Costi, S., Bonfè, M., Farsoni, S., Secchi, C., Fantuzzi, C.: Safety barrier functions and multi-camera tracking for human–robot shared environment. Robot. Auton. Syst. 124, 103,388 (2020) 8. Flacco, F., De Luca, A., Khatib, O.: Control of redundant robots under hard joint constraints: saturation in the null space. IEEE Trans. Robot. 31(3), 637–654 (2015). https://doi.org/10. 1109/TRO.2015.2418582 9. ISO 10218-1:2011 Robots for Industrial Environments: Safety Requirements (Part 1: Robot). Standard, International Organization for Standardization (2011) 10. ISO/TS 15066:201 Robots and robotic devices: Collaborative robots. Standard, International Organization for Standardization (2016) 11. Jin, J., Gans, N.: Parameter identification for industrial robots with a fast and robust trajectory design approach. Robot. Comput. Int. Manuf. 31, 21–29 (2015) https://doi.org/10.1016/j.rcim. 2014.06.004. https://www.sciencedirect.com/science/article/pii/S0736584514000441 12. Kubus, D., Kroger, T., Wahl, F.M.: On-line estimation of inertial parameters using a recursive total least-squares approach. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3845–3852. IEEE (2008)

28

S. Farsoni and M. Bonfè

13. Potra, F.A., Wright, S.J.: Interior-point methods. Journal of Computational and Applied Mathematics 124(1), 281–302 (2000). https://doi.org/10.1016/S0377-0427(00)00433-7. Numerical Analysis 2000. Vol. IV: Optimization and Nonlinear Equations 14. Rohmer, E., Singh, S.P.N., Freese, M.: V-rep: A versatile and scalable robot simulation framework. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1321–1326 (2013). https://doi.org/10.1109/IROS.2013.6696520 15. Schedlinski, C., Link, M.: A survey of current inertia parameter identification methods. Mech. Syst. Signal Process. 15(1), 189–211 (2001) 16. Siciliano, B., Khatib, O., Kröger, T.: Springer Handbook of Robotics, vol. 200. Springer (2008) 17. Siciliano, B., Sciavicco, L., Villani, L., Oriolo, G.: Robotics: Modelling, Planning and Control. Springer Science & Business Media (2010) 18. Sousa, C.D., Cortesao, R.: Physical feasibility of robot base inertial parameter identification: a linear matrix inequality approach. Int. J. Robot. Res. 33(6), 931–944 (2014) 19. Traversaro, S., Brossette, S., Escande, A., Nori, F.: Identification of fully physical consistent inertial parameters using optimization on manifolds. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5446–5451 (2016). https://doi.org/10. 1109/IROS.2016.7759801 20. Wu, J., Wang, J., You, Z.: An overview of dynamic parameter identification of robots. Robot. Comput. Int. Manuf. 26(5), 414–419 (2010)

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration of Large Objects Ismayil Ahmadli, Wendwosen B. Bedada, and Gianluca Palli

Abstract In this paper, the shape alignment method that takes into account large objects with very limited visibility and significant occlusions utilizing an octree data structure on the GPU is presented. The proposed algorithm relies on the offline computed 3D model of the object and an initial estimation of it’s pose using a deep learning technique to detect key features of the object, in order to improve the accuracy and the speed of the registration process. The final aligned pose is achieved by computing the iterative closest point algorithm on GPU utilizing octree, starting from the initial estimated pose. To highlight the application of the proposed method, autonomous robotic tasks requiring interaction with washing machine is discussed. Finally, the performance in terms of both speed and accuracy of the different implementations of the algorithm on the CPU and GPU, as well as with and without augmented octree neighbourhood search is provided. Keywords 3D Vision · Point cloud registration · Object detection · Robotic perception

1 Introduction The application of shape registration nowadays spans multiple areas of robotic assistant tasks. For robots interacting with unknown environments, a high priority is given to the collision avoidance task in order to prevent damages to surrounding objects and the robot itself. With this respect, the main practical problem usually emerges due to partial information acquired by range finders or 3D vision sensors because of occlusions. This problem becomes particularly relevant when the robot task objective is to interact in a specific way with certain objects in the environment, having

I. Ahmadli · W. B. Bedada (B) · G. Palli DEI - Department of Electrical, Electronic and Information Engineering, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_3

29

30

I. Ahmadli et al.

a previous knowledge about its functionality but only limited information about its location and that can be also considered as obstacles form the point of view of the robot motion. The scenario here considered is related to a mobile manipulator interacting with a washing machine in household environments. Because of its size, the washing machine can be seen as an obstacle, but precise knowledge about its location is needed to interact with it and perform manipulation tasks. Due to the complex geometry of the drum’s interior, the robotic arm requires up to 1 cm accuracy in the nearest obstacle measurement when performing inspection and grasp inside the appliance drum which can only be achieved through full model registration. Besides, to perform grasping of pieces of laundry inside or on the exterior of the appliance, laundry region segmentation is important. Our registration algorithm can contribute to the correct removal of the point clouds related to the appliance out of the scene. The same principle can be applied to similar scenarios in which robots need to interact with other appliances or machines in general. To solve this issue, shape registration algorithms provide information about the location of a known objects to the robot manipulator, assuming that the robot has some representation of the object embodiment, such as a 3D CAD model. The main issue related to the case of relatively big objects is that only limited information can be obtained by 3D vision sensors. This is due to the limited field of view of devices such as 3D scanners, RGB-D cameras and LiDAR, and due to the shape and size and occlusions of target object. This issue is also made harder by the fact that very likely those objects have very limited features that can be exploited by the vision system. In facts, many registration algorithms highly depend on the accurate initial transformation provided by feature-based methods that have been widely used over the years as a solution. However, feature-based methods require sufficient shape textures on the given objects in order to calculate the local features. In this paper, we provide an efficient algorithm to align the 3D model of a domestic appliance with point cloud provided by a RGB-D camera in a robust way by combining object detection based on deep learning, GPU-based implementation of the Iterative Closest Point (ICP) algorithm and exploiting OcTree structure. The proposed scenario is selected because, in common household environments, usually only the front of the object is visible, but the robot needs to have a complete knowledge about its spatial location and encumbrance during the interaction with the internal parts of the appliance. Therefore, our proposal to improve the accuracy and speed of the registration by using featureless technique in combination with usage of OcTree-based ICP computation [5, 6] is core of the novelty of our algorithm. A deep learning-based approach is implemented in order to have better initialization and avoid computational complexity of feature-based methods, as shown in Fig. 1. The first step in our approach is to determine relevant points on the image plane using deep learning based object detection techniques. This result is exploited to obtain a preliminary information about the appliance location in the camera frame, as reported in Sect. 3. The initial pose guess allows us to apply adaptive filtering on the scene data as well, as explained in Sect. 4, while localizing the robot with respect to the appliance allows the robot to operate in the environment satisfying safety con-

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration …

31

Fig. 1 The workflow of shape registration

ditions. The comparisons of the different registration approaches, their drawbacks and advantages is presented in Sect. 5. Finally, Sect. 6 reports the conclusions and future works.

2 Related Works Object Detection: Nowadays, deep learning based object detection is one of key topics in computer vision. In recent years, one-stage models [15], whereas localization and classification processes are conducted in a single step eliminating initial region proposals, provide usually faster and more efficient results than two-stage approaches [9, 20], although they sacrifice performance for detection of small objects to gain speed. Pose Estimation: Gauss-Newton [2, 17], Levenberg-Marquardt [16, 25] and the orthogonal iteration [18] methods are three commonly used non-linear optimization techniques to solve Perspective-n-Point (PnP) [7, 14] problems. However, the first algorithm is highly dependent on the initial projection and prone to failures if poorly initialized. To avoid incorrect convergence and to obtain maximal precision, non-iterative EPnP [14] algorithm can be used to initialize non-linear approach which produces both higher stability and faster convergence. On the other hand, the Levenberg-Marquardt method can be considered as an interpolation of steepest descent and the Gauss-Newton algorithm. In case of incorrect solution, it provides slow convergence to the desired solution, behaving like a steepest descent method. When the solution is close to a correct convergence, it behaves like Gauss-Newton method. This method always guarantees convergence of the solution. Point Cloud Registration: Several approaches are reported in literature that can be comparable in terms of accuracy and convergence time. The ICP algorithm [3, 4] is a dominant registration method for geometric alignment of three-dimensional models when an initial estimate of the relative pose is known. The ICP iteratively refines a closed form solution, minimizing an error metric by targeting maximal convergence

32

I. Ahmadli et al.

between two data. The original approach of the ICP uses the closest point matching which results in an exhaustive search. Over the years, several improvements have been proposed to decrease duration needed for convergence and to improve the accuracy. These variants can be classified according to sampling data, matching strategy, weighting the corresponding pairs appropriately, rejecting certain matches based on threshold value and its their error metrics which give various results depending on level of noise, shape features in three-dimensional data [22]. It is reported in literature that the ICP can likely diverge to a local solution depending on initial pose [26]. A probabilistic solution for both rigid and non-rigid registration problems, called Coherent Point Drift (CPD) algorithm, is provided in [19]. The ICP has been reinterpreted in [12] as an alignment problem of Gaussian mixtures in a robust way that statistical discrepancy measure between the two corresponding mixtures is minimized. Initial correspondence matching is necessary step for most registration algorithms including the ICP, whereas the ICP is highly dependent on a correct correspondence matching. In [13] semi-definite relaxation based randomized approach is proposed as a correspondence-absent registration. To speed up the ICP algorithm, several search methods [5, 6, 24] in finding the correspondences have been proposed.

3 Initial Pose Estimation There are four different region of interests on a washing machine which can be targeted in the detection step, namely the Knob, the Detergent Box, the User Interface and the Glass Door. The points corresponding to the region of interest on the washing machine is obtained in two-dimensional image plane using deep learning based object detection. The accuracy of the detection is a crucial to improve the performance of the point cloud registration process as explained in following section. The choice of the network is motivated by the need of accuracy and tolerable detection speed of the selected architecture, whereas a single SSD architecture with ResNet-50 backbone [10] has been considered as a suitable solution to generate 4 object labels. The robot should localize the domestic appliance in order to interact with its pieces and to implement numerous tasks. Besides, our shape registration algorithm initially requires the estimated pose of the appliance with respect to the robot as already shown in Fig. 1. For that aim, an iterative approach for the PnP problem is implemented in order to determine the position and orientation of a calibrated camera with respect to the world frame attached on the knob of the domestic appliance as shown in Fig. 2, given four 3D-2D point correspondences and the intrinsic parameters. The main reason behind choosing the iterative approach is to have excellent precision in spite of being slower solution compared to non-iterative ones. The accuracy of the initial projection can save computation time needed for overall convergence. In addition, it has huge impact on the overall accuracy of the ICP algorithm [22]. Therefore, we are feeding the estimated pose of the appliance into registration algorithm, whereas the general procedure is explained in detail in Sect. 4.

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration …

33

Fig. 2 PnP estimation

To start estimating the pose, we measure the object points of interests on the appliance with respect to the world frame. Three out of four points are the center of the bounding boxes of Knob, User Interface and Glass Door respectively which are shown with red colour in Fig. 2. The fourth point alters depending on the robot pose. Variability of the fourth point adds stability to pose estimation task, preventing incorrect projection caused by detection so that detection on the image plane may be mismatched with the corresponding points in the world frame depending on the variety of camera poses. It causes significant pose estimation error and instability. Two possible options for the last point are indicated with blue colour in Fig. 2. Given the homogeneous representation of 4 known object points pkw = [X k Yk Z k 1]T , k ∈ {1, 2, 3, 4} expressed with a reference frame attached to the observed object, the homogeneous image points pk = [u k vk 1]T with respect to the camera frame obtained by projecting the points pkw into the image plane by means of a generalized camera model can be expressed as pk = where

1 A [Rinit |tinit ] pkw s

(1)



⎤ f u 0 cu A = ⎣ 0 f v cv ⎦ 0 0 1

(2)

is the camera matrix containing the camera intrinsic parameters, such as the focal length coefficients f u , f v and the principal points cu and cv , s is a scalar parameter and Rinit and tinit are respectively the rotation matrix and the translation vector representing the relative position of the object frame with respect to the camera. The rotation

34

I. Ahmadli et al.

T matrix Rinit ∈ SO(3) is orthogonal, which satisfies the constraints Rinit Rinit = I3 and det(Rinit ) = 1. The matrix Rinit can be specified with three consecutive rotations around the frame axis using Euler angles {θx , θ y , θz }

Rinit = Rinit (θx )Rinit (θ y )Rinit (θz )

(3)

From Eq. (1) we can define the so-called homography matrix H as ⎡ ⎤ h1 h2 h3 h4 1 H = A [Rinit |tinit ] = ⎣h 5 h 6 h 7 h 8 ⎦ s h 9 h 10 h 11 h 12

(4)

By defining the vector h¯ = [h 1 , · · · , h 12 ] containing the elements of H , the projection of the 4 considered object points into the image plane can be represented ¯ by the function f (h) ¯ = Blockdiag([H, H, H, H ])[ p1w p2w p3w p4w ]T f (h)

(5)

where Blockdiag([H, H, H, H ]) is the block diagonal matrix having four H matrices along the diagonal. Considering that the intrinsic parameters in A are known, the elements of h¯ can be defined as a function of the pose vector θ¯ = [θx θ y θz tx t y tz ]T containing the 3 Euler angles {θx , θ y , θz } and the three components {tx , t y , tz } of the ¯ translation vector t, i.e. h¯ = g(θ), Assuming the vector b contains 4 detected image points, the re-projection error can be computed as   ¯ = b − f (g(θ)) ¯ 2 E init (θ)

(6)

An iterative solution based on the non-linear Levenberg-Marquardt optimization allows us to compute the rotation matrix Rinit and the translation vector tinit in order to minimize the re-projection error, which is the sum of squared distances between the actual image points and the projected object points. Then, we compute Jacobian matrix J of the re-projection error E(θ¯ ) by combining two Jacobian matrices J f and Jg J=

¯ ∂ f (g(θ)) ∂ f ∂g = = J f Jg ¯ ∂g ∂ θ¯ ∂θ

(7)

To minimize re-projection error, the pose vector θ¯ is updated recursively at each step k as θ¯k+1 = θ¯k + (JkT Jk + λ diag(JkT Jk ))−1 JkT (b¯ − f (g(θ¯k ))) (8) where θ¯k and Jk are the estimated parameter vector θ¯ and the related Jacobian J at the generic step k respectively, λ > 0 is a damping factor and diag(JkT Jk ) means

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration …

35

a diagonalized matrix of (JkT Jk ). We iterate the computation until the re-projection error becomes smaller than the certain threshold, i.e. E init (θ¯k ) < α. Once the position and the orientation of the washing machine with respect to the robot is estimated, the robot can exploit this information to feed into the point cloud registration algorithm.

4 Point Cloud Registration Regarding robot interaction with a domestic appliance, collision avoidance must be considered to prevent possible damages both to the robot and the appliance. However, 3D data taken from the scene using RGB-D camera mounted on the robot provide incomplete representation of the washing machine due to environmental and self occlusions, shown in Fig. 1. This may cause dramatic failures on collision avoidance task due to the lack of information. To avoid this problem, point cloud registration techniques can be implemented in order to make the scene data complete. The registration is a recurring task for a number of applications ranging from object modeling and tracking to simultaneous localization and mapping problems. Reliability of initial transformation has a significant influence on computation time and accuracy of the ICP algorithm. To this end, feature extraction algorithms are largely used in the initial guess, but they tends to be error-prone in case of large occlusions and in the presence of significant noise in the data. In the considered scenario, removing non-appliance points from the scene by filtering has huge impact on accuracy of the result and computational complexity such that running search algorithms through each point in model data in the correspondence matching between the scene and the model points is computationally expensive, in particular in the presence of noisy data. Therefore, the scene point cloud captured by a RGB-D camera of the robot is filtered using Passthrough filter in order to decrease computation time and to increase accuracy of the registration algorithm. We estimate the initial projection as described in the previous section instead of extracting features and computing correspondences to speed up the point cloud registration process and make it more reliable. Assuming B is a finite point set obtained by uniform sampling of the appliance CAD model, which points in the object frame are named b j ∈ B, j = {1 . . . , Nm }, the initial transformation in the camera frame can be described as: o j = Rinit b j + tinit ,

j = {1 . . . , Nm }

(9)

where o j , j ∈ {1, . . . , Nm } represents the model point cloud O0 in the camera frame after initial transformation. After the initial transformation provided by Eq. (9), we recursively refine the registration using the ICP algorithm. Let S be the scene point cloud obtained by filtering the camera data, which points are referred by si , i ∈ {1, . . . , Ns }. Since the scene point is fixed, the center of mass of the scene cloud is computed and a new point could with origin on the center of mass is created:

36

I. Ahmadli et al.

CS =

Ns 1  si , ∀si ∈ S Ns i=1

S  := {si  : si  = si − CS , ∀si ∈ S},

(10) (11)

Thereafter, the algorithm is initialised by the initial projection Eq. (9), i.e. Ok = O0 at the initial step. Hereafter, the model point cloud is updated at each iteration k in order to achieve a suitable alignment of the model with the point cloud provided by the 3D camera, that is considered fixed. In order to find the correspondences between closest points between model and scene point clouds, the conventional ICP algorithm computes the Euclidean distance between each point in model set o j ∈ Ok and each scene point si ∈ S. This generates a new point cloud 

o2j − si2 , ∀si ∈ S, ∀o j ∈ Ok Mk := m i : min oj

(12)

composed by Ns points o j from the model point cloud Ok representing the point having the smaller distance to their counterpart in the filtered scene point cloud S. It results that the conventional ICP algorithm [3] looks through every model point in order to compare distances and to find nearest neighbor related to given source point. This approach causes great computational complexity in the order of O(N S N M ), i.e. the computation time increases proportionally with respect to the product between the number of points in each set. To improve the effeciency of the algorithm, the model point cloud is partitioned into an optimized data structure by exploiting OcTree, a special type of space partitioning which provides faster solution in nearest neighbor searches in many applications. OcTree speeds up this process by subdividing recursively each node in a tree into eight children, as it implements searching only through points inside its octant. The average computation time for the nearest neighbor search using OcTree structure emerges to be in the order of O(N S logN M ). To converge to the global minimum in the registration, one useful approach is filtering correspondences [11] to reduce the number of the false matches, once corresponding pairs of the points are determined. There are several filtering policies of correspondence rejection based on distance, duplicity, surface properties and statistics. In this work, the outliers are eliminated according to their distance in GPU-based algorithms. It filters out the matches with a distance larger than given threshold where it is also formulated in the original ICP algorithm, see Fig. 3. The center of mass of the model cloud is found and subtracted from each model point m i at each iteration: CM =

Ns 1  m i , ∀m i ∈ Mk Ns i=1

Mk := {m i  : m i  = m i − CM , ∀m i ∈ Mk },

(13) (14)

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration …

37

Fig. 3 Outliers filtering: rejection based on the distance between the points

The goal in the registration process at each k-th iteration is to determine best fit transformation Tk between point clouds defined as

R k tk Tk = 000 1

 (15)

that allows to optimally match the model point cloud with the one provided by the 3D camera. To achieve this goal, the convergence to zero of the following point-to-point error metric must be ensured E icp (Rk , tk ) =

Ns 1  wi m i − Rk si − tk 2 Ns i=1

(16)

where the weighting factor wi can be used to normalize the matches in the least squares formulation. The proposed algorithm then updates the rotation matrix Rk and the translation vector tk in the transformation matrix Tk by means of Singular Value Decomposition (SVD). To do so, the point clouds are organized in the column vectors Mk , Mk and S  containing the points of Mk , Mk and S  respectively. Then, to define the optimal transformation, the cross-covariance matrix is computed in order to apply the decomposition: (17) P = Mk S T By applying SVD to P P = U V T

(18)

the rotation matrix Rk and the translation vector tk can be computed as Rk = U V T tk = C M − R k C S

(19) (20)

Assuming Mkh is homogeneous representation of the vector Mk , the model point could can be aligned with the filtered scene point cloud using the inverse of Tk h Mk+1 = Tk−1 Mkh

(21)

38

I. Ahmadli et al.

Considering the orthogonality condition of the rotation matrix Rk−1 = RkT :

h Mk+1

RkT −RkT tk = 000 1

 Mkh

(22)

This procedure is iterated until the termination criteria is satisfied. Termination criteria [11] can be determined according to maximum number of iterations, relative minimum transformation threshold between iterations, maximum number of similar iterations and the absolute or relative value of error metric, i.e. E icp (Rk , tk ) < α, where α is a positive threshold. Since this work aims to compare different variants of the ICP algorithms, the refinement of the scene data is processed recursively until the given iteration count is obtained.

5 Results Object Detection: Our data set contains 4200 images of a single washing machine. The samples has been captured from varying viewpoints under the regular lighting condition using different sources. Training images have been annotated in both automatic and manual ways. The former is based on the fiducial markers [8, 21] and annotation type with 2500 samples of training data in order to increase the number of samples and fasten labelling process. The rest of training images have been labelled manually on the markerless scenes to increase accuracy of detection and to avoid learning the marker itself. SSD architecture with ResNet-50 backbone is retrained on NVIDIA GeForce RTX 2080Ti graphics adapter over 65K steps using TensorFlow library [1]. The learning rate is initially assigned to 0.04. However, it adaptively decreases over training process relative to the number of an iteration in order to provide a smooth convergence of the cost function to zero value. We set the momentum optimizer coefficient γ to 0.9. All the input images are normalized and augmented by rotation, flipping and contrast variance. Table 1 indicates that training with our dataset shows satisfying results for large objects. Pose estimation: Once the localization of the corner points of the bounding boxes surrounding targeted appliance parts on the image plane through the detection module is achieved, four points on the image plane which are not preserving collinearity condition are selected. Those points of interest are directly fed into pose estimation

Table 1 mAP values after 65K steps of training process. mAPL —mAP with large objects, mAPM —mAP with medium objects, mAPS —mAP with small objects, [email protected]—mAP with 50% of IoU threshold, [email protected]—mAP with 75% of IoU threshold mAP @.50IoU @.75IoU mAP L mAP M mAP S 0.3537

0.7562

0.2472

0.4891

0.3604

0.2343

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration …

39

Fig. 4 Washing machine pose

algorithm for each frame of the scene. This implementation avoids additional computation for feature extraction and correspondence matching between scene and model images. Given three-dimensional points in the world frame and the selected points on the image plane, SSD integrated algorithm provides only four correspondences which results in a reliable and stable pose estimation. The results of the pose estimation algorithm can be seen in Figs. 4 and 6 respectively. The pose accuracy becomes satisfying due to iterative non-linear optimization. The stability of pose allows us to implement the initial projection of point cloud which highly influences the accuracy of the ICP registration algorithm. Besides, localizing a domestic appliance in a three-dimensional space with respect to the robot allows us to implement manipulation tasks on the appliance. Point cloud registration: We compared a variety of registration algorithms and show their differences in computation time of the convergence with respect to the device on which they are implemented, i.e. CPU or GPU. Open source point cloud library PCL is used in case of the CPU implementation. Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz has been deployed for the CPU-based operations. GPU-based parallel programming is implemented in order to create a registration structure handling the ICP algorithms using conventional and OcTree search methods. The GPU algorithms are conducted using NVIDIA GeForce RTX 2080Ti graphical adapter. The 3D model and the scene point set are fetched using PCL library and stored initially in memory for the CPU operations. However, they are transferred in the graphical adapter’s memory addresses to implement CUDA-based algorithms. The threshold distance for correspondence rejection has been set to 0.6 in GPU-based implementations. Multiple scenarios are considered to validate the efficiency of our algorithm,

40

I. Ahmadli et al.

Table 2 Comparison between the algorithms. Scene point cloud: 12599 points, Model point cloud: 117542 points Algorithm Initial projection [s] ICP [s] ICP CPU Feature-based ICP CPU PnP ICP GPU conventional ICP GPU OcTree

10.857 0.268 0.266 0.258

5.522 4.106 1.597 0.184

Fig. 5 Projection of the model cloud after initial estimation. blue: filtered scene point cloud, red: the model point cloud after DL-based initial transformation, oj : the model point in the camera frame

i.e. with open and closed glass door cases and with the presence of occlusions in the scene. The differences among CPU and GPU implementations are indicated in Table 2. The model point cloud contains three-dimensional points of 117.542 after uniform sampling of the mesh data, which is shown in red colour in Fig. 5. Considering the original representation of the scene point cloud contains the high number of nonappliance points fetched directly by the robot camera, the effect of filtering can apparently be seen in Fig. 5 such that it completely eliminates background points from the scene data. One of the main impact of filtering in our experiment is a glass door-open configuration of the appliance. The registration algorithms tend to diverge to a local minimum in the presence of a glass door of the washing machine without priori filtering. The reason behind is absence of the glass door in our model, causing incorrect mismatches between point clouds. However, filtering provides the algorithm to converge to desired solution both in opened and closed configurations of the glass door. In our experiments, a vague estimation of transformation between the point clouds for the initial projection is conducted in the absence of feature extraction. The estimation of pose is conducted with four 3D-2D point correspondences, whereas image points obtained by deep learning based object detection method. The model point

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration …

41

Fig. 6 The results of the ICP algorithm. The closed glass door configuration

cloud after initial transformation is represented with green points in Fig. 5. The final alignment of the model onto the scene point set, in case of the closed glass door configuration of the appliance, is represented in Fig. 6. In order to see the effect of our solution on the accuracy of the ICP algorithms, we also implemented feature-based calculation of the initial projection based on FPFH [23] descriptors. The results in multiple trials show a superiority of deep learning based approach over the classical one. It allows us to skip computational complexity for feature matching in the correspondence estimation step and gives higher accuracy with respect to former method. Table 2 indicates a computational difference in terms of time complexity between feature-based approaches and our algorithm. While determining the correspondences and estimating initial transformation took 10.857 s on average after 6 trials, it lasted averagely only 0.258 and 0.266 s at the same number of trials in GPU-based implementations. In addition, feature-based approach is highly sensitive to camera noise and partial occlusions. In such cases, feature-based algorithms produce less success rate in multiple trials due to the lack of information. On the other hand, the 3D model could align to the scene point set in every trial using our approach despite of the presence of occlusions and the state of open glass door as shown in Fig. 7.

42

I. Ahmadli et al.

Fig. 7 Different states of the washing machine: occluded (a), open glass door configuration (b)

6 Conclusions We concluded that the registration algorithms implemented in the GPU work faster with respect to their CPU-based counterparts due to its parallelism concept. Main advantage of GPU-based implementation is shown together with the advantages provided by the OcTree structure. In this paper, we mainly targeted to exploit both the speed of OcTree search method in CUDA environment and featureless characteristic of initial estimation of the object pose. Integrating the solution of PnP problem into the registration allows us to eliminate computational burden for the initial transformation. Besides, we can obtain accurate registration in the presence of noisy point clouds. While the presence of occlusion and noise can have a significant influence on the results of the registration in the classical methods, our experiments revealed superiority of our approach. Using the OcTree structure in the GPU-based implementation decreased the duration of the registration process almost 22 times than time needed for the CPU-based implementation. Currently our approach is considered for a single washing machine. Future activities will be devoted on the generalization of the process to different scenarios using multiple large objects and on the automatic definition of features for the initial alignment.

References 1. Abadi, A. et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). Software available from tensorflow.org 2. Araújo, H., Carceroni, R., Brown, C.M.: A Fully Projective Formulation for Lowe’s Tracking Algorithm (1996) 3. Besl, P.J., McKay, N.D.: A method for registration of 3-d shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 239–256 (1992)

Deep Learning and OcTree-GPU-Based ICP for Efficient 6D Model Registration …

43

4. Chen, Y., Medioni, G.: Object modeling by registration of multiple range images. In: Proceedings. 1991 IEEE International Conference on Robotics and Automation, vol. 3, pp. 2724–2729 (1991) 5. Elseberg, J., Borrmann, D., Nuchter, A.: Efficient processing of large 3d point clouds, 10 (2011) 6. Elseberg, J., Magnenat, S., Siegwart, R., Nuchter, A.: Comparison on nearest-neigbour-search strategies and implementations for efficient shape registration. J. Softw. Eng. Robot. (JOSER) 3, 2–12 (2012) 7. Gao, X.-S., Hou, X.-R., Tang, J., Cheng, H.-F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930–943 (2003) 8. Garrido-Jurado, S., Muñz-Salinas, R., Madrid-Cuevas, F., Marín-Jiménez, M.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn. 47, 2280–2292 (2014) 9. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524 (2013) 10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR, abs/1512.03385 (2015) 11. Holz, D., Ichim, A.E., Tombari, F., Rusu, R.B., Behnke, S.: Registration with the point cloud library: a modular framework for aligning in 3-d. IEEE Robot. Autom., Mag. 22, 110–124 (2015) 12. Jian, B., Vemuri, B.C.: Robust point set registration using gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1633–1645 (2011) 13. Le, H., Do, T.-T., Hoang, T., Cheung, N.-M.: SDRSAC: semidefinite-based randomized approach for robust point cloud registration without correspondences. CoRR, abs/1904.03483 (2019) 14. Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: an accurate o(n) solution to the pnp problem. Int. J. Comput. Vis. 81, 02 (2009) 15. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. CoRR, abs/1512.02325 (2015) 16. Lowe, D.G.: Fitting parameterized three-dimensional models to images. IEEE Trans. Pattern Anal. Mach. Intell. 13(5), 441–450 (1991) 17. Lowe, D.G.: Three-dimensional object recognition from single two-dimensional images. Artif. Intell. 31(3), 355–395 (1987) 18. Lu, C.P., Hager, G., Mjolsness, E.: Fast and globally convergent pose estimation from video images. IEEE Trans. Pattern Anal. Mach. Intell. 22, 610–622 (2000) 19. Myronenko, A., Song, X.: Point set registration: Coherent point drift. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2262–2275 (2010) 20. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR, abs/1506.01497 (2015) 21. Romero-Ramirez, F., Muñoz-Salinas, R., Medina-Carnicer, R.: Speeded up detection of squared fiducial markers. Image Vis. Comput. 76, 06 (2018) 22. Rusinkiewicz, S. Levoy, M.: Efficient variants of the icp algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001) 23. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: 2009 IEEE International Conference on Robotics and Automation, pp. 3212–3217 (2009) 24. Wehr, D., Radkowski, R.: Parallel kd-tree construction on the gpu with an adaptive split and sort strategy. Int. J. Parallel Program. 46, 12 (2018) 25. Weng, J., Ahuja, N., Huang, T.S.: Optimal motion and structure estimation. In: Proceedings IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 144–152 (1989) 26. Yang, J., Li, H., Campbell, D., Jia, Y.: Go-icp: a globally optimal solution to 3d icp point-set registration. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2241–2254 (2016)

Design of a Collaborative Modular End Effector Considering Human Values and Safety Requirements for Industrial Use Cases Matteo Pantano , Adrian Blumberg, Daniel Regulin, Tobias Hauser, José Saenz , and Dongheui Lee Abstract In times of volatile market demands and international competition, European companies must highly rely on robots, especially collaborative robots, to enable flexible, agile, and resilient manufacturing. However, considering that 99% of the industry is composed of small and medium enterprises, the barriers of robot adoption need to be considered. Therefore, in this research work, we address two of the main barriers for the use of robots: safety and design, by proposing a modular end-effector for collaborative robots. In this work, an iterative design methodology using morphological structures and feedback from a user group to obtain a design solution was applied. Afterwards, the obtained solution comprised of several modules using 3D printing and off-the-shelf components was manufactured. Finally, the end-effector was tested on its picking performances and safety. The result shows that the reconfigurable end-effector can be easily adapted to grasp different parts, is well perceived by users, and meets the safety requirements for collaborative applications. Keywords Modular gripper · Human robot collaboration · Value sensitive design · SMEs

M. Pantano (B) · A. Blumberg · D. Regulin · T. Hauser Functional Materials and Manufacturing Processes, Technology Department, Siemens Aktiengesellschaft, 81739 Munich, Germany e-mail: [email protected] M. Pantano · D. Lee Human-Centered Assistive Robotics (HCR), Department of Electrical and Computer Engineering, Technical University of Munich, 80333 Munich, Germany J. Saenz Business Unit Robotic Systems, Fraunhofer Institute for Factory Operation and Automation IFF, 39106 Magdeburg, Germany D. Lee Institute of Robotics and Mechatronics, German Aerospace Center (DLR), 82234 Wessling, Germany © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_4

45

46

M. Pantano et al.

1 Introduction and Related Works 1.1 Human–Robot Collaboration in the Industry Collaborative robots are a new breed of machines that are intended to “team up” with human operators and perform shared tasks without the need of safety fences. This concept was firstly introduced in General Motors manufacturing plants during the 1990s where mechanical passive devices were used to reduce the strain of operators for certain assembly steps [1]. Recently, the field has been revitalized by the new industrial revolution, Industry 4.0 [2]. As a result, several innovations in controls and artificial intelligence (AI) have led to robotic systems which can execute complex motions and tasks. Due to these versatile applications, collaborative robotics solutions have already begun emerging in the market [3]. Despite the available solutions, the introduction of human collaborative robotics in manufacturing enterprises still causes difficulties [4]. Hence, only large enterprises in the automotive and electronics sectors are largely deploying collaborative robots at their facilities [5]. Therefore, Small and Medium Enterprises (SMEs) are still behind, although they represent 99% of European businesses [6]. This low adoption rate can be traced back to three main barriers: safety, interfaces and design methods [4]. To counteract this, researchers have been lately proposing methodologies to reduce the limitation factors. In the field of safety, research has been conducted since the first concepts of collaborative robots were introduced. Yamada et al. [7] were pioneers in this field and they conducted the first tests to evaluate the hazards which a robot could bring to a human in case of a physical collision. Even though certified robots for collaborative usage are on the market nowadays, their safety is still perceived as less refined than their sophisticated levels of design and interface [8]. Therefore, safety is still a frequently inquired topic in the industrial and research areas [9, 10].

1.2 Safety with Collaborative Robots Safety in human–robot collaboration (HRC) is radically different from systems employed with conventional industrial robots. In the latter, operator’s safety is achieved through the utilization of rigid fences which separate the robotic workspace from the operator workspace. In the former, the presence of such physical barriers is eliminated. Hence, other risk reduction methods must be employed. Due to the increasing establishment of collaborative applications, recent standards have highlighted options for ensuring safety of humans working collaboratively with humans. The operating modes which can be used to safely implement HRC systems are as follows: Power and Force Limiting (PFL), Speed and Separation Monitoring (SSM), Hand Guiding (HG) and Safety-Rated Monitored Stop (SRMS). The standards describing them are the Type C standards (i.e., standards that have been

Design of a Collaborative Modular End Effector Considering Human …

47

designed especially for the field) and are as follows: ISO 10218 Part 1 [11], Part 2 [12] and ISO/TS 15066 [13]. In these standards, the operating modes previously cited are described and limits on forces and pressures that can be applied in case of collisions between robot and operator are reported. Therefore, by designing robotic applications that fulfill such conditions, the robot can be considered safe and usable. This always requires applying a risk assessment methodology according to the ISO 12100 [14]. A risk assessment is a methodology which, taking the application definition, identifies and classifies the severity of risks. It reduces the risks through risk mitigation measures and determines if the residual risks (i.e., pressures and forces applied if a collision happens) are acceptable. If not, the methodology must be iterated. This risk assessment, although well documented, is often overlooked in the industry [10]. Moreover, the pressure/force measurements are use case specific (e.g., End-Effector (EE) jaw tip). Therefore, companies lack information on how to apply this methodology in their applications. One example to bridge this gap is through the COVR project [15]. This project targets the explanation of methodologies for safe introduction of collaborative robots, but according to the best of authors’ knowledge, examples of COVR case stories do not include EE design [16].

1.3 End-Effectors and Collaborative Robots The EE is responsible for interacting directly with the environment (e.g., grasping, moving, releasing) and it works through physical principles (e.g., impactive, contigutive) as described in ISO/TR 20218-1 [17]. Moreover, if collaborative robots are used, further constraints, like the one of the applied pressures mentioned above, must be considered. Therefore, an EE needs to be designed in a way that respects the limits and requirements for a collaborative application, which is indeed a central topic for the research community and is still in its infancy [18, 19]. Regarding these limits, the first appearance of the topic in research was during the development of the Deutsches Zentrum für Luft- und Raumfahrt e.V (DLR) lightweight robot (LWR) [20]. Haddadin et al. [21] were the first to analyze the correlation between EE design with levels of soft-tissue injuries. In their work, they discovered that EE geometries (i.e., mechanical primitives) lead to different injuries and such are connected to the mass of the impacting object and object travel speed. Moreover, the authors concluded reporting that standardization from the different robotic stakeholders (i.e., robot manufacturer, user, mechanical designer, and application designer) was essential for safely integrating a robot. Another approach for guaranteeing the safety of the EE was through an active airbag system [22]. The authors proposed a safety module composed of an airbag system to conceal dangerous parts of the EE in case of collisions with the human body. However, unknown risks were still present in the design, since pressure drops in the airbags could still occur, rendering the safety protection insufficient. Regarding the application requirements, the EE still plays an important role due to the specificity of the task [18, 23]. A comprehensive overview of technologies for

48

M. Pantano et al.

EE in order to perform grasping tasks can be found in [24]; therefore, an additional review would be superfluous. However, for this work it is important to cite some important key technologies. When dealing with grasping, the design must integrate the following: physical principle (i.e. mechanical force, thermal bond) establishing a temporary connection to the target part, EE mechatronic control for the actuation of one or more grasping principles, and integration of the mechatronic control with the robot perception [25]. Apart from these technical challenges, from the social acceptance view, an EE for collaborative robots needs to be compelling to operators due to its influence on operator trust [26]. And, from a functionality view, it must be reconfigurable for the different needs of flexible production lines [27]. To the best of authors’ knowledge, the only existing example for a EE design satisfying these requirements can be found in [28]. The authors proposed a reconfigurable EE through user input via jaws configuration for simplifying grasping tasks. However, although the authors argue the safe aspect of the EE, no evaluation according to the standards was performed. Therefore, this work proposes a reconfigurable EE which could be adapted through operator input via different gripping principles (e.g., vacuum, electromagnetic) while ensuring safety and user acceptance to guarantee a successful implementation in industrial scenarios through well documented testing. The paper is structured as follows. In Sect. 2 we describe the iterative method used for designing the EE and the evaluation strategy. Afterwards, in Sect. 3 we show the results achieved regarding EE design and we illustrate the testing regarding safety and picking performance. Finally in Sects. 4 and 5 we discuss our design and summarize our work.

2 Materials and Methods 2.1 Design Methodology The EE design in this work is meant to result in a gripper which meets the safety requirements of collaborative applications and allows a flexible usage due to a modular design approach. To structure the design process, a known iterative design methodology was deployed [29], and a chronological structure of different design phases was followed. Firstly, the tasks were analyzed, and requirements were defined. Relevant standards and publications, such as [13, 17, 30] helped to identify requirements. Secondly, conceptional solutions to guarantee a flexible and reconfigurable EE were generated through the usage of exploratory design tables and combination of design primitives. This included, for example, abstracting the functionalities and identifying suitable physical effects. Moreover, from the physical effects, morphological structures were established. These morphological structures helped to implement the basic physical effect in the desired application and specify first shapes and dimensions of a conceptional solution. Based on the morphological structures, gripper concepts were derived in the next step. Finally, the gripper concepts were

Design of a Collaborative Modular End Effector Considering Human …

49

Fig. 1 Design methodology. The figure illustrates the most relevant phases during the design process and provides schematic results obtained within the process. The phases are performed sequentially, however the backwards directed arrows indicate the possibility of iterations. The figure was adapted and translated by permission from Springer Nature Customer Service Centre GmbH: Springer, Kostengünstig Entwickeln und Konstruieren [29]

evaluated. A schematic representation of the process can be seen in Fig. 1. The shape of the figure is intended to highlight the number of solution alternatives during the design process. Throughout the concept design the variety increases, before a selective process during the final phases narrows down the variety.

2.2 Value Sensitive Design Evaluation To ensure good perception from the EE stakeholders, a value-based analysis was selected. The value-based analysis was used for investigating the implications of the EE with direct and indirect stakeholders. Due to the task, the value-oriented prototype method [31] from value sensitive design (VSD) was selected. This method assesses mockups on their displayed values (e.g., human safety) through a rating from stakeholders. However, to also ensure a sound technical solution, the VSD analysis was integrated with a technical evaluation. Therefore, the method proposed in VDI 2225 [32] was slightly adapted to consider technical and human aspects (i.e., evaluation features). According to the method in the standard, the rating should cover numbers from 0 to 4, where 4 is the best rating. Therefore, the evaluation was implemented in a Likert scale because it provided the possibility to assign a quantitative value to a qualitative answer [33]. The scale was integrated in a questionnaire with 12 participants. The participants included 12 males with different industry backgrounds and an average of 9.5 (SD = 10) years of experience in research or development. The questionnaire featured several statements regarding technical and human value aspects of the design concepts. Hence, the participants could express the level

50

M. Pantano et al.

Fig. 2 Example of the concept evaluation. The users were requested to evaluate every concept on a technical (Ts) and a human value (Vs) point of view. Afterwards, the final score was obtained by plotting the Ts and the Vs as suggested by VDI 2225 [32]

of acceptance of a statement by a rating according to the cited Likert-scale. On one side, the criteria for evaluating the technical value contained: manufacturing cost, weight, safety (i.e., mechanical, and electrical hazards), performance (e.g., versatile use and successful pick rate), set-up time, reliability (e.g., maintenance intervals of 50 operation hours), and repairability (e.g., within one hour and basic tools). On the other side, for the human values, a set from [31] was taken. This set included: human welfare (i.e., physical wellbeing), universal usability (i.e., easy to use) and trust in the system. Moreover, it was requested to the participant to express an additional human value if necessary. In conclusion, to summarize the findings it was decided to represent the results from both aspects through numerical indicators between 0 and 1, where 1 was equal to an “imaginary ideal design”. A sample of a questionnaire entry and result finalization can be found in Fig. 2.

2.3 COVR Safety Assessment To ensure the safety of the proposed design in the target application, a safety validation had to be performed. Thus, a risk assessment followed by practical safety tests according to the methodologies reported by the COVR Toolkit [34] was applied. At first the target application had to be defined. Thereby, the task workflow was outlined employing the modelling approach of [35] due to its applicability in collaborative application. The sequence comprised a collaborative assembly where the robot and the operator had to interact to snap two parts together. The resulting task sequence can be seen in Fig. 3. This is used to identify potential hazards and derive risk mitigation measures for each individual task along the entire sequence. Following ISO 12100 [14], the identified hazards were assigned with values for severity, frequency, probability, and avoidance. Afterwards, by combining these parameters the risk levels of hazards were obtained. Upon this final value further actions such as risk mitigation

Design of a Collaborative Modular End Effector Considering Human …

51

Fig. 3 Workflow of the assembly use case. The figure shows the different actions involved in the assembly and depicts the possibility that each of them could be taken either by an operator or by a robot as suggested by [35]. The workflow representation is necessary for the safety assessment to identify possible causes of hazards. For example, in the release phase the possibility of impact with falling part can happen

measures were selected. Table 1 provides an excerpt of mechanical hazards, risk levels and mitigations identified in the context of this research. However, other types of potential hazards could appear. In most cases, the objective is to reduce medium and high-level risks to ensure a safe application. Since collaborative applications require the operator to work alongside the robot, collisions could occur. Hence, the selected risk reduction technique was PFL. However, its correct implementation needed to be ensured. For this, the COVR protocols were considered. The COVR protocol GRI-LIE-1 recommends sufficient collision testing to identify suitable robot operation parameters (e.g., load, travel speed). To do so, a crash test dummy model was used1 . Such model included human attributes like tissue (damping layer with defined thickness and hardness) and bone structure (spring with defined spring rate). The specification of this crash test dummy model differs for different body regions and should address all possible collisions which the application may cause as specified in ISO/TS 15066 [13]. Through the crash test dummy model, collision forces could be monitored after gripper collision. To initiate such testing, a calculation according to the Eq. (1) regarding robot speed and transferred energy helped to identify suitable reference speed levels [13] pA v=  −1 k m1H + m1R

(1)

with p and A being respectively the max allowable pressure and the impact area for not allowing injuries, k being the spring coefficient defining the impacted tissue, and m H and m R being the involved masses during the impact. Afterwards, through 1

https://www.pilz.com/en-INT/products/robotics/prms/prms.

Impactive gripping

Loss gripping contact

Mechanical catching

2

3–6

1–5

Dragging, throw over

Impact with falling parts

Clamping in closing jaws

Collision quasistatic

3

2

3

4

4

Robot motion

Collision transient

Robot motion

4

5

5

5

5

5

2

2

3

3

4

P

1, 3

F

Risk estimation S

Cause

Step #

Consequences

Hazard’s identification

1

3

3

5

5

A

8

10

11

13

13

C

H

M

H

H

H

Risk

Design, PPE

Limiting object mass

Limiting grip force

PFL

PFL

Action

Mitigation

2

1

1

1

1

S

5

5

5

5

5

F

2

2

2

2

2

P

Risk estimation

1

3

3

3

5

A

8

10

10

10

10

C

M

L

L

L

L

Risk

Table 1 Excerpt of the mechanical hazards found for the application. The severity (S) combined with the risk class (C) determines the risk level of a hazard [36]. Within that approach, the sum of the ratings for frequency (F), probability (P), and avoidability (A) results in the risk class, where three risk levels are rated, (H) high, (M) medium and (L) low. The table shows both the hazard identified and the risk mitigation techniques applied in the use case considering the step number. The step number refers to the previous BPMN. For sake of readability, personal protective equipment has been abbreviated with PPE

52 M. Pantano et al.

Design of a Collaborative Modular End Effector Considering Human …

53

the tests, the allowable operation speed was the one that led to forces and pressures below the thresholds.

3 Results 3.1 Value Sensitive Evaluation and Final Design The evaluation helped to identify two concepts with satisfying attributes regarding technical performance and human value aspects among the ones presented to the user group as depicted in Fig. 4. The concepts receiving the highest scores were a vacuum gripper (Fig. 4c) and a modular two jaw gripper (Fig. 4a). In the human value dimension, the simple vacuum gripper scored highest. However, the two jaw gripper

Fig. 4 Proposed concepts with variations of morphological structures. Concept a proposes a design built upon the physical effects of friction force, contour pairing and vacuum suction. The friction force is implemented using the morphological structure of two jaws oriented vertically, a removable jaw tip and a translational movement. The contour pairing is enabled through specific jaw tips. Moreover, additional morphological structures can be added through the additional module, in the figure a vacuum module has been added. Concept b is based on friction force implemented through two jaws oriented vertically and roto-translation movement. Concept c is based on vacuum suction proposed via an individual suction cup fixed in translational position. Concept d is based on the effect vacuum suction and has adjustable and interchangeable vacuum suction cup to allow different object diameters

54

M. Pantano et al.

Fig. 5 Final gripper design. The design includes two 3D-printed jaws ➀, with textured silicone jaw tips ➁ for dexterous gripping. The adapter ➂ is enabling the assembly of the components and connects the angular adjuster ➃ to the gripper. The complete assembly is presented with the vacuum module ➄ as an example. This vacuum module features a silencer ➅, internal vacuum components ➆ such as a vacuum nozzle, and interchangeable suction cups ➇. Besides that, an electromagnetic module was designed. The figure shows the housing which contains the control components ➈ and the electromagnet ➉ at the tip of the module

was selected for further development due to its modular structure and its flexible use. Between the concept decision and the final design, an iterative development process supported by additive manufacturing (i.e., fused deposition modeling and stereolithography) was adopted. Moreover, due to the presence on the market of industrial grade actuators, the solution opted for the reuse of existing components. Therefore, considering the numbering in Fig. 5, the final design was comprised of an adapter plate ➂ with mounting slots for off-the-shelf two jaws unit and an angular adjuster for additional self-made modules ➃ for ensuring reconfigurability. Moreover, due to the physical principles of contour pairing the design comprised 3D printed interchangeable jaws ➀ with texture silicone tips ➁. Finally, two sample modules were designed. For ensuring interoperability, the modules were designed with constant distance from the robot flange to result in a constant Tool Center Point (TCP) and with adaptable tips upon need (i.e., vacuum suction cup diameter and material). Therefore, through this modular design, different configurations could be included upon the requirements by a manufacturing operator through the usage of 3D printing technology or adapting tips.

3.2 Field Deployment The proposed design was then implemented in the targeted use case. In this case, it was a flexible manufacturing cell with a Universal Robots (UR) 10. However, for the industrial integration, some further requirements for the scenario (i.e., control

Design of a Collaborative Modular End Effector Considering Human …

55

of the impactive force) had to be considered. These were: flexibility of tool interchange and compatibility with a PLC (Programmable Logic Controller) for safety management. Therefore, the selected hardware was: a Zimmer® Impactive Gripper GEH6040IL-31-B with IO-Link interface, a CoreTigo® Master/Bridge combination for wireless IO-Link communication and a GRIP® G-SHW063 tool changer. The gripper module had a force measurement device, which permitted the monitoring of the applied force and a well-known sensor/actuator communication protocol. The IO-Link communication permitted the establishment of a wireless communication between the PLC and the EE to ensure seamless tool interchange. The tool changer permitted a manual tool change to further ensure flexibility.

3.3 Safety Assessment The results of the safety assessment contained measures to mitigate the previously identified hazards. Some hazards could be eliminated completely by modifying the gripper design through the risk mitigation identified as inherently safe design. For example, the hazard of magnetic catching was removed by decreasing the magnetic force of the module. However, not all hazards could be removed completely by design changes. Hence, necessary implementation of technical protective measures or organizational safety measures was included. In case of collaborative applications, collisions cannot be completely removed. Thus, inclusion of technical protective measures in the form of power and force limitations were added. Of utmost importance in this matter was the operation speed due to its major influence on the transferred forces and pressures to the human operator when colliding. In this research, the allowable operation speeds were obtained by iterative collision tests at various speeds while monitoring the occurring pressure values. Such tests were carried out by setting a precise speed in the robot controller and then letting the robot drive towards a fixed constraint which had the crash test dummy model sample on it. To monitor the pressure values the Fujifilm® Prescale sheets LLW with range from 50 to 250 N/cm2 were used as already identified by [22]. Figure 6 gives insights in the results for two relevant regions of the presented gripper. The selected two regions were the ones with smallest surface area according to the design (i.e., gripper contour and module tip). In this case the obtained operating speeds giving pressure values under the threshold were 100 and 50 mm/s upon the colliding area of the gripper.

3.4 Picking Performances To evaluate the performances of the proposed design for object picking empirical tests were carried out. The tests comprised using the gripper to pick the target parts by using the different integrated physical effects. A total of ten industrial parts were

56

M. Pantano et al.

Fig. 6 Quasistatic pressure at various speed levels caused by two relevant regions of the gripper. The red line identifies the ones generated by the first risk region near the gripper contour. The blue identifies the ones generated by the second risk region on the tip of the module. The grey area in the chart indicates where the measuring range of the deployed pressure measuring foil ends. The points in the gray area exceeded the foil limit and the a-priori value of 300 N/cm2 is thus assigned. The threshold for the quasistatic pressure at the hand region is taken from ISO/TS 15066 [13] and is represented by the gray dotted line. At the identified suitable operating speed level, three tests are done to confirm the suitability. Thus, the chart shows three points at the desired speed levels (i.e., 50 and 100 mm/s)

used for the evaluation. To perform the picking, testing was carried out with an automated industrial bin-picking system from Keyence® , which permitted the calculation of the robot trajectories and speeds to actuate the grip. To prove the effectiveness of the gripper, several parts were placed under the sensor and after specification of the gripping strategy (i.e., gripping point, physical principle), results of the gripping outcome were monitored. If one physical principle was not functioning, the process was reiterated with a different physical principle. Results of the testing are summarized in Fig. 7. The vacuum principle showed the highest applicability, but it was not always the best principle to use. In these cases, the two jaw gripping based on friction force was useful for more complex geometries without plain surfaces. Finally, the magnetic module was deemed useful for small ferromagnetic parts.

4 Discussion The result of this work is a modular gripper comprised of a two-finger jaw unit with interchangeable jaws and an additional attachment reconfigurable on the position for further modules (i.e., vacuum module). The design, although it followed an independent iterative development methodology, has similar features as the one used by winning teams of the industrial Amazon® Robotics Challenge as identified by [25] while ensuring its safety for HRC. Therefore, high probabilities of success in pick

Design of a Collaborative Modular End Effector Considering Human …

57

Fig. 7 Performance evaluation with ten industrial parts. The matrix shows the results of the empirical picking evaluation through the usage of the Keyence® bin-picking system. On the left side the ten industrial parts are listed with corresponding figures on the top. On the bottom side the different physical principles are shown. On the right the qualitative performance legend is reported. Where poor means that the part was not grasped, fair the part was grasped after three attempts, good was grasped at the second attempt and excellent the part was grasped at the first attempt

operations with unknown objects can be foreseen as highlighted by the test’s performance. However, it should be noted that apart from the technical feasibility, during the design process, the conceptual model resembling only a vacuum suction cup without the two finger jaws (Fig. 4c) received a higher score from the participants in the user study. Therefore, while using the proposed design the operators might have negative feelings towards the visual appearance of the EE as identified by [26, 37], thus the adoption of the EE might be endangered. Nevertheless, the two finger jaw with the vacuum module scored second, therefore leaving a low chance of ending up in such case. Moreover, without considering the hardware related to the PLC to EE communication as long wireless communication was a non-essential design feature. The proposed design with vacuum and two finger jaws has a manufacturing cost around e3.5 k of which ca. 80% are allocated to the Zimmer ® Impactive gripper and the remaining to consumables for the different modules which mainly included 3D printer filament and vacuum components. Therefore, if this solution is compared with a well-known two-finger EE like the Robotiq® 2F-85 which cost around e4.3 k [38], advantages on the design principles are twofold. On one hand, multiple grasping principles are present and can be easily integrated by 3D printing. On second hand, costs can be lower than market solutions.

58

M. Pantano et al.

5 Conclusion In HRC a need for a reconfigurable EE which is certified for safe operation and visually appealing to operators is necessary for increasing the adoption of collaborative robots in the industrial sector. Therefore, the presented work showed the process of designing a modular gripper for human robot collaborative workcells considering requirements for safety and perceived values. First, the review of the state of the art identified that a reconfigurable and tested gripper for collaborative robots was still missing. Second, an iterative design methodology which analyzed the relevant requirements and standards was followed and conceptual designs were obtained. Third, an evaluation which included both human values and technical aspects was proposed to a user group to gather feedback about the best conceptual design. Fourth, the selected conceptual design was investigated and an implementation using a selfdesigned adapter plate with mounting slots for off-the-shelf units and additional modules with reconfigurable jaws, vacuum and magnetic effectors was performed. Finally, the implemented design was tested according to the most updated regulations for safety in collaborative applications and picking performances according to empirical tests. The design showed that the proposed reconfigurable design can adapt to several use cases while respecting safety limitations and be well perceived by operators, therefore answering the need of small enterprises for an EE usable in different scenarios. However, further investigations together with other unknown parts, larger group of operators and use cases should be performed for ensuring the applicability of the reconfigurability. Therefore, future work will investigate the application of the presented EE with algorithms for bin picking applications in unknown environments. Acknowledgements This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 873087 and the Helmholtz Association.

References 1. Wannasuphoprasit, W, Gillespie, R.B., Colgate, J.E. et al.: Cobot control. In: Proceedings of International Conference on Robotics and Automation, pp. 3571–3576. IEEE (1997) 2. Forschungsunion, A.: Recommendations for implementing the strategic initiative INDUSTRIE 4.0 securing the future of German manufacturing industry: Final report of the Industrie 4.0 working group (2013) 3. Bogue, R.: Europe continues to lead the way in the collaborative robot business. Indus. Robot: Int. J. 43, 6–11 (2016). https://doi.org/10.1108/IR-10-2015-0195 4. Villani, V., Pini, F., Leali, F., et al.: Survey on human–robot collaboration in industrial settings: safety, intuitive interfaces and applications. Mechatronics 55, 248–266 (2018). https://doi.org/ 10.1016/j.mechatronics.2018.02.009 5. Eurostat: 25% of large enterprises in the EU use robots (2019) 6. Union, E.: Unleashing the Full Potential of European SMEs. Publications Office of the European Union, Luxembourg (2020)

Design of a Collaborative Modular End Effector Considering Human …

59

7. Yamada, Y., Hirasawa, Y., Huang, S.Y., et al.: Fail-safe human/robot contact in the safety space. In: Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN’96, Tsukuba, pp. 59–64. IEEE (1996) 8. Gualtieri, L., Rauch, E., Vidoni, R.: Emerging research fields in safety and ergonomics in industrial collaborative robotics: a systematic literature review. Robot. Comput.-Integr. Manuf. 67, 101998 (2021). https://doi.org/10.1016/j.rcim.2020.101998 9. Pantano, M., Regulin, D., Lutz, B., et al.: A human-cyber-physical system approach to lean automation using an industrie 4.0 reference architecture. Procedia Manuf. 51, 1082–1090 (2020). https://doi.org/10.1016/j.promfg.2020.10.152 10. Aaltonen, I., Salmi, T.: Experiences and expectations of collaborative robots in industry and academia: barriers and development needs. Procedia Manuf. 38, 1151–1158 (2019). https:// doi.org/10.1016/j.promfg.2020.01.204 11. International Organization for Standardization: ISO 10218-1:2011: Robots and robotic devices—safety requirements for industrial robots—Part 1: Robots (2011) 12. International Organization for Standardization: ISO 10218-2:2011: Robots and robotic devices—safety requirements for industrial robots—Part 2: Robot systems and integration (2011) 13. International Organization for Standardization: ISO/TS 15066:2016: Robots and robotic devices—Collaborative robots (2016) 14. International Organization for Standardization: ISO 12100:2010: Safety of machinery— General principles for design—Risk assessment and risk reduction (2010) 15. Bessler, J., Schaake, L., Bidard, C., et al.: COVR—towards simplified evaluation and validation of collaborative robotics applications across a wide range of domains based on robot safety skills. In: Carrozza, M.C., Micera, S., Pons, J.L. (eds.) Wearable Robotics: Challenges and Trends, pp. 123–126. Springer International Publishing, Cham (2019) 16. Valori, M., Scibilia, A., Fassi, I., et al.: Validating safety in human-robot collaboration: standards and new perspectives. Robotics 10, 65 (2021). https://doi.org/10.3390/robotics10020065 17. International Organization for Standardization: ISO 20218-1:2018: Robotics—safety design for industrial robot systems—Part 1: End-effectors (2018) 18. Zhang, H., Yan, Q., Wen, Z.: Information modeling for cyber-physical production system based on digital twin and AutomationML. Int. J. Adv. Manuf. Technol. 107, 1927–1945 (2020). https:// doi.org/10.1007/s00170-020-05056-9 19. Birglen, L., Schlicht, T.: A statistical review of industrial robotic grippers. Robot. Comput.Integr. Manuf. 49, 88–97 (2018). https://doi.org/10.1016/j.rcim.2017.05.007 20. Bischoff, R., Kurth, J., Schreiber, G., et al.: The KUKA-DLR Lightweight Robot arm—a new reference platform for robotics research and manufacturing. In: ISR 2010 (41st International Symposium on Robotics) and ROBOTIK 2010 (6th German Conference on Robotics), pp. 1–8 (2010) 21. Haddadin, S., Haddadin, S., Khoury, A., et al.: On making robots understand safety: embedding injury knowledge into control. Int. J. Robot. Res. 31, 1578–1602 (2012). https://doi.org/10. 1177/0278364912462256 22. Weitschat, R., Vogel, J., Lantermann, S., et al.: End-effector airbags to accelerate human-robot collaboration. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 2279–2284 (2017) 23. Zsifkovits, H., Modrák, V., Matt, D.T.: Industry 4.0 for SMEs. Springer Nature (2020) 24. Monkman, G.J.: Robot Grippers. Wiley-VCH, Weinheim, Chichester (2010) 25. Fujita, M., Domae, Y., Noda, A., et al.: What are the important technologies for bin picking? Technology analysis of robots in competitions based on a set of performance metrics. Adv. Robot. 1–15 (2019). https://doi.org/10.1080/01691864.2019.1698463 26. Fink, J.: Anthropomorphism and human likeness in the design of robots and human-robot interaction. In: Ge, S.S., Khatib, O., Cabibihan, J.-J., et al. (eds.) Social Robotics, pp. 199–208. Springer, Berlin Heidelberg, Berlin, Heidelberg (2012) 27. de Souza, J.P.C., Rocha, L.F., Oliveira, P.M., et al.: Robotic grasping: from wrench space heuristics to deep learning policies. Robot. Comput.-Integr. Manuf. 71, 102176 (2021). https:// doi.org/10.1016/j.rcim.2021.102176

60

M. Pantano et al.

28. Salvietti, G., Iqbal, Z., Hussain, I., et al.: The Co-Gripper: a wireless cooperative gripper for safe human robot interaction. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4576–4581. IEEE (2018) 29. Methodik und Organisation des Kostenmanagements für die Produktentwicklung. In: Ehrlenspiel, K., Kiewert, A., Lindemann, U. (eds.) Kostengünstig Entwickeln und Konstruieren, pp. 35–121. Springer Berlin Heidelberg, Berlin, Heidelberg (2007) 30. Salunkhe, O., Fager, P., Fast-Berglund, Å.: Framework for identifying gripper requirements for collaborative robot applications in manufacturing. In: Lalic, B., Majstorovic, V., Marjanovic, U., et al. (eds.) Advances in Production Management Systems. The Path to Digital Transformation and Innovation of Production Management Systems, pp. 655–662. Springer International Publishing, Cham (2020) 31. Friedman, B.: Value Sensitive Design: Shaping Technology with Moral Imagination. MIT Press (2019) 32. Verein Deutscher Ingenieure: VDI 2225 Blatt 3:1998-11: Konstruktionsmethodik—Technischwirtschaftliches Konstruieren—Technisch-wirtschaftliche Bewertung (1998) 33. Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 22(140), 55 (1932) 34. COVR toolkit. https://www.safearoundrobots.com/home. Accessed 11 Oct 2021 35. Froschauer, R., Lindorfer, R.: Workflow-based programming of human-robot interaction for collaborative assembly stations. Verlag der Technischen Universität Graz (2019) 36. International Organization for Standardization: ISO/TR 14121-2:2012: Safety of machinery— Risk assessment—Part 2: Practical guidance and examples of methods (2012) 37. Biermann, H., Brauner, P., Ziefle, M.: How context and design shape human-robot trust and attributions. Paladyn, J. Behav. Robot. 12, 74–86 (2021). https://doi.org/10.1515/pjbr-20210008 38. Bélanger-Barrette, M.: What is the price of collaborative robots. https://blog.robotiq.com/whatis-the-price-of-collaborative-robots. Accessed 14 Oct 2021

Detecting Emotions During Cognitive Stimulation Training with the Pepper Robot Giovanna Castellano, Berardina De Carolis, Nicola Macchiarulo, and Olimpia Pino

Abstract Recently, social robots are being used in therapeutic interventions for elderly people affected by cognitive impairments. In this paper, we report the results of a study aiming at exploring the affective reactions of seniors during the cognitive stimulation therapy performed using a social robot. To this purpose an experimental study was performed with a group of 8 participants in a 3-weeks program in which the group was trained on specific memory tasks with the support of the Pepper robot. To assess and monitor the results, each session was video-recorded for human and automatic analyses. Given that aging causes many changes in facial shape and appearance, we detected emotions by means of a model specifically trained for recognizing facial expressions of elderly people. After testing the model accuracy and analyzing the differences with the human annotation, we used it to analyze automatically the collected videos. Results show that the model was able to detect a low number of neutral emotions and a high number of negative emotions. However, seniors showed also positive emotions during the various sessions and, while these were much higher than negative ones in the human annotation, this difference was smaller in the automatic detection. These results encourage the development of a module to adapt the interaction and the tasks to the user’s reactions in real time. In both cases, some correlations emerged showing that seniors with a lower level of cognitive impairment experienced fewer positive emotions than seniors with a more severe impairment measured with the Mini–Mental State Examination (MMSE). In our opinion, this could be due to the need for personalized cognitive stimulation therapy according

G. Castellano · B. De Carolis · N. Macchiarulo (B) Department of Computer Science, University of Bari, Bari, Italy e-mail: [email protected] G. Castellano e-mail: [email protected] B. De Carolis e-mail: [email protected] O. Pino Department of Medicine and Surgery, University of Parma, Parma, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_5

61

62

G. Castellano et al.

to the senior’s MMSE thus providing more stimulating tasks. However, a deeper investigation should be conducted to confirm this hypothesis. Keywords Cognitive stimulation therapy · Social robots · Mild cognitive impairment

1 Introduction Socially Assistive Robots (SARs) are expected to have an increasing importance in human society. They are being used in several domains, such as assisting and supporting learning of children with special needs, cognitive stimulation treatments for people with Mild Cognitive Impairment (MCI), physical exercise therapy and daily life support. As far as MCI is concerned, it is an intermediate stage between the cognitive decline associated with typical aging and more severe forms of dementia. The aim of a MCI treatment is to reduce existing clinical symptoms or to delay the progression of cognitive dysfunction and prevent dementia. To slow down the progression of the decline, it is necessary to provide an increasing assistance over time by offering people suffering from MCI with timely and engaging cognitive training. There is growing evidence that cognitive interventions may be associated with small cognitive benefits for patients with MCI and dementia. Based on recent trials, computer training programs have positive effects on cognition and mood [1]. In particular, cognitive stimulation and rehabilitation therapy focus on protocols in which different types of tasks are used for recovering and/or maintaining cognitive abilities such as memory, orientation and communication skills [2]. Also, motor activities are important to help individuals with dementia to rehabilitate damaged functions or maintain their current motor skills in order to preserve autonomy over time. These cognitive and physical training require a trained therapist that, besides supporting the patient during the execution, has to give feedback during the therapeutic session and keep track of the user’s performance for monitoring the progress over time [3]. Recently, humanoid robots seem to offer a valid support to this task [4] and are being used effectively in dementia care, cognitive stimulation and memory training [5–7]. Following these findings, in collaboration with a local association (“Alzheimer Bari” ONLUS), we set up an experimental study aiming at evaluating the effectiveness and acceptance of SAR technology in providing therapeutic interventions to people suffering from cognitive changes related to aging and dementia. In this pilot study Pepper, a semi-humanoid robot developed by SoftBank Robotics [8], was employed as a support to psychotherapists in cognitive stimulation sessions. In total we planned to run 4 sessions in which Pepper had to be used to convey the planned therapeutic intervention but due to the COVID-19 emergency, we had to suspend the experiment one session earlier. For the acceptance of social robots in this kind of interventions it is of major importance to stress the need for positive experience in the interaction with such a robot. Emotions are important signals

Detecting Emotions During Cognitive Stimulation Training …

63

denoting the quality of the user experience in interacting with technology and can be used to quantify the effectiveness of a social robot on the well-being of the patients during training. Our purpose was to explore their emotional response (e.g., positive or negative) in the different tasks in order to see whether participants were positively engaged by the robot during interaction. In this paper we report the results of a study that concerns the robot’s acceptance and its potential in eliciting positive emotions during a Cognitive Stimulation Therapy (CTS). While in a previous study we focused on the general results of the CST in terms of general acceptance and task accomplishment points of view [9], in this paper we focus on automatic emotion analysis of the participants during the therapeutic sessions with Pepper. We performed this task by analyzing spontaneous facial expressions with a software for automatic assessment of emotions occurring during the interaction and comparing it with the annotation of three human raters. Since many research studies in literature have shown that facial expressions of elderly are different from those of young or middle-aged for a number of reasons (i.e., the wrinkles on the face, the functional decline of muscles involved in the facial expression [10]), we developed a Facial Expression Recognition (FER) model specialized in recognizing emotions in elderly faces. We validated the accuracy of the model with the stratified cross-validation method, and then we evaluated the accuracy of the model in analyzing the collected videos compared to annotation of the human raters. From the analysis of results, we can say that, in general, Pepper was quite effective in engaging participants in the CST program. Comparing the annotations of the model with the human annotations we can say that, in general, the model detected less neutral emotions and more negative emotions than the human observers. Conversely, for positive emotions, the FER model had a good agreement with the annotation made by humans. In both cases, correlations in the data also emerged, showing that seniors with higher MCI experienced less happiness than seniors with a more severe impairment. This result may denote the need for a real-time analysis of the user’s behavior in order to adapt the interaction between the robot and the user. These results encourage us to continue the current work, also carrying out the comparison with a control group in which the same stimulation protocol will be executed without the use of the social robots. The paper is structured as follows. In Sect. 2 backgrounds and motivations of the research are reported. Section 3 describes the performed study and Sect. 4 reports its results. Finally, in Sect. 5 conclusions and possible future works are discussed.

64

G. Castellano et al.

2 Background and Motivations 2.1 Social Assistive Robots and Cognitive Stimulation Therapy With the growing incidence of pathologies associated with aging, there will be an increasing demand for maintaining care systems and services for elderly with the imperative of economic cost-effectiveness of care provision. One of these pathologies is MCI, which is typically a progressive disorder that represent a stage between the expected cognitive decline of normal aging and the more serious decline of dementia. It is characterized by problems with memory, language, thinking or judgment. MCI interventions aim not so much at changing the underlying neuropsychological impairment but at reducing the disability deriving from the impairment (i.e., increasing quality of life and sustaining independent living [11]). In this view, CST for optimizing cognitive functions of older adults with mild to moderate dementia are promising. The goal of CST is to stimulate participants through a series of themed activities de-signed to help them continue to learn and stay socially engaged. Socially Assistive Robotics (SAR) are a class of robots that use their social interactive capabilities and their ability to engage human users in both social and emotional dimensions [12] to establish a relationship with the user that leads toward intended therapeutic goals [13]. The integration of robotics into both formal and informal MCI care opens up new possibilities for improving the life of patients and alleviating the burden on caregivers and the healthcare services. Early studies have shown that SAR has the advantage of improving mood, social relationships among patients and emotional expression of individual dementia sufferers [14]. Several investigations on the effects of social robot therapy have been conducted. In [6], it was also observed a good acceptance by older adult in assisted living facilities. Recently, some studies [7] also investigated how patients with dementia relate to humanoid robots and perceive serious games accessed through it, as part of a training program aimed to improve their cognitive status. It has been observed that elders became more engaged with Pepper along sessions and there was a positive view to-wards the interaction with it. In [2] NAO has been used to reproduce physical exercises to a group of seniors. NAO was also employed in individual and group therapy sessions [6, 15] to assist the therapist with speech, music and movement. As far as the user engagement and experience is concerned, it has been argued that Pepper brings patients with dementia in a more positive emotional state and in music sessions stimulating patients to recall memories and talking about their past [16].

Detecting Emotions During Cognitive Stimulation Training …

65

2.2 Emotion Recognition from Facial Expressions in Elderly People The recognition of facial expressions (Facial Expression Recognition - FER) is one active research areas in the field of Computer Vision due to its significant potential, both from the point of from an academic and a commercial point of view. FER are mainly used to classify facial expressions into emotions (i.e., basic emotions: happy, sad, angry, surprise, disgust, fear) or according to dimensions such as valence and arousal [17]. While great progress has been made in this area, the recognition of facial expressions in elderly subjects remains a challenging task since aging has an effect on facial expressions. In particular, there are age-related structural changes in the face, such as wrinkles and facial muscles decay, that can be recognized as emotions (usually the neutral expression of elderly people is recognized as sadness) recognition [18]. In order to recognize the facial expressions in elderly faces in a more efficient way, automatic systems should be trained on datasets containing a conspicuous number of examples of emotions expressed by elderly faces, like in the case of FACES [19]. Guo et al. [20] conducted a study about the effect of human aging in facial expression recognition, on a computational perspective. A computational study of emotion recognition was done within each age group and across age groups. Recently, deep learning methods have been attracting more and more researchers due to their great success in various computer vision tasks, mainly because they avoid a process of feature definition and extraction which is often very difficult due to the wide variability of the facial expressions. Caroppo et al. [21] used deep learning for applying a fine-tuning adjustment of a pre-trained network using and evaluating the approach on two benchmark datasets (FACES and Lifespan), that are the only ones that contain facial expressions of the elderly. Their approach is found to increase the accuracy of FER in older adults about 8%. Lopes et al. [22] studied the effects of aging on automatic facial emotion recognition and created an application that can detect three facial expressions on a selected person (Happiness, Sadness and Neutral) and used the Lifespan image dataset as our source for the SVM database training and testing phase. The elderly group, older than 60 years, got 90.32% accuracy when detecting the neutral faces, 84.61% when detecting happy faces and 66.6% when detecting sad faces. In these studies, comparing the accuracies with the model trained on facial expressions of younger age groups, there is evidence that aging influences facial emotion recognition. For this reason, for our purpose we decided to train our model on the portion of the FACES dataset containing images of older adults.

66

G. Castellano et al.

3 An Overview of the Study To investigate how seniors with MCI relate to and perceive the CST program performed with the aid of the social robot Pepper, we performed a study including 8 participants that lasted 3 weeks, with weekly meetings of about 35 min (initially the intervention was planned for 4 weeks but we had to stop one week earlier due to the Covid-19 pandemic). Participants were selected for the experimental study among the members of the Alzheimer Bari Association according to the evaluation of their neurophysiological state, made with MMSE (Mini-Mental State Examination) [23], and their willingness to take part to the study. The MMSE score was used to select seniors in order to have a group as homogeneous as possible (see Table 1 for a description) considering a MMSE score between 13 and 26.2, i.e., patients with beginning of MCI to mild stage of dementia, since patients with these scores can make progress with CST. Before running the CST with Pepper, participants and their relatives received detailed information about the study and subsequently signed their consent to be video recorded during the experiments. These consents were also signed by their legal representatives. The social robot platform used in the current study is Pepper, a semi-humanoid robot developed by SoftBank Robotics. Besides the capability to express multimodal behavior using speech, gestures and LEDs located around the eyes and on its shoulders, Pepper also has tactile sensors on the head and in the hands and a tablet on which to display videos, images and buttons allowing the user to interact with it by touch.

3.1 The Tasks The tasks to be performed during the CST program with Pepper were selected by the staff of specialized therapists of the Association and were adapted to Pepper communicative capabilities. Four sets of cognitive stimulation tasks for the group were created, one for each planned session. Sessions were planned to run weekly Table 1 Participants’ MMSE

ID

Gender

Age

MMSE

1

F

89

23.4

2

F

77

26.2

3

M

82

24.1

4

M

89

21.1

5

M

82

13

6

F

79

13.2

7

F

69

20

8

F

72

17

Detecting Emotions During Cognitive Stimulation Training … Table 2 Description of the tasks for each session

67

Session 1

Session 2

Session 3

Motor imitation

Motor imitation

Motor imitation

Word completion

Memory of pose

Visual-verbal associative memory

Verbal associative memory

Verbal associative memory

Memory of prose Verbal associative memory

and last between 30 and 40 min. In the following Table 2, the exercises planned for the three sessions that were accomplished are reported. The task opening each session was the motor imitation since it was judged as less stressful and more engaging by the therapists. In Fig. 1a Pepper is showing some movements to be imitated by seniors. Figure 1b shows an example of a VisualVerbal Associative Memory Task in which Pepper shows on the tablet the image of a famous person and asks for his/her name. In general, if the participant’s answer is correct, Pepper provides positive feedback, showing thumbs up on the tablet and body movements manifesting how happy it was with that response (Fig. 1c). In the case of a wrong answer, Pepper encouraged the patient to try again without using negative words (e.g., bad, wrong). The interaction between the robot and the patients was vocal. During the training program the patients were seated in front of therapists and Pepper. Behind a wall there was the technician devoted to solve technical problems arising during the exercises with the robot. Pepper was positioned about one meter away from each patient, respecting its range in which it manages to be engaged and perceive the people around it. Beside the Pepper’s internal video camera located

Fig. 1 a An example of physical exercises. b Memory training. c Positive feedback

68

G. Castellano et al.

Fig. 2 The environment settings

inside its mouth (which allowed to better capture the faces of the patients), another video camera was positioned in the room in order to have a front view of patients’ faces and to be able to analyze the entire group behavior. Figure 2 shows the setting of the environment. More details about the study and the tasks are described in [9].

3.2 Measurements We collected the videos of the three sessions and we segmented them in order to have one video for each exercise. A first analysis of the material was performed by three expert human raters (two women and one man, of average age 37.67 years old) that annotated the videos in terms of number of correct answers, eye contact and emotions (angry, disgust, fear, happy, sad, surprise, and neutral) experienced by each participant during each exercise. Emotions were annotated by observing the facial expressions of the participants every second. According to the Fleiss’ kappa [24] they had an almost perfect agreement with an index of 0.83. The results of this analysis are reported in [9] and we can say that the patients participated actively in the experiment with an engagement of 65% on the duration of the CST. In particular, the session in which participants resulted most engaged into the robot interaction is the third one with 82% of attention toward Pepper on the total of the duration of the session. As far as emotions are concerned, the level of negative emotions experienced by the seniors during the entire duration of the experiment was quite low (0.59% for Session 1, 2.02% for Session 2 and 1.08% for Session 3). Besides the “neutral” state (on average 79.44% per day), seniors experienced more positive emotions (on average 19.33%) than negative ones. Analyzing the emotional experience for each task, the maximum “happy” rate was achieved (30.75%) during the visual-verbal associative memory task, followed by the one concerning motor imitations (25.32%).

Detecting Emotions During Cognitive Stimulation Training …

69

3.3 Automatic Analysis of Emotions After this manual analysis, in order to detect emotions in a CST automatically using the robot’s camera, we developed a Facial Emotion Recognition (FER) model, specialized on the recognition in real-time of facial expressions in elder people faces. Facial expressions are one of the most used channels to display emotions and facial features are also the most commonly used for automatic emotion recognition since their detection does not require expensive extra-hardware since webcams are present on many devices. Therefore, we focus on facial features to detect affective states. Facial expression recognition from images requires a pipeline that involves different modules. In this work we use the typical FER pipeline: after a pre-processing phase, the human faces in the input image are detected, and then the detected faces are cropped and registered. Then, features are extracted and used to classify emotions. To train the classification model, we looked for datasets containing images of elderly people faces. The FACES dataset [19], as far as we know, is one of the few ones containing age-expression photos. It involves 171 people showing six different expressions (angry, disgust, fear, happy, sad and neutral). The subjects are divided into three main age-groups (young: 19–31 years old, middle-aged: 39–55 years old, older: 69–80 years old). For each subject 2 examples of each expression are saved, so in total the dataset consists of 2052 frontal images. For the purpose of this study, only facial expressions of older adults were considered and pre-processed. Then, only 684 (57 older adults that perform twice the six expressions) were used for training and testing our model. Figure 3 shows an example for each of the six emotions in the FACES dataset. The implemented pipeline takes in input a single facial image and converts it in gray scale. Subsequently, the face in the image is detected with a Multi-task Cascaded Convolution Network MTCNN [25] and then cropped. A set of features extracted from the face region and characterizing the facial expression is the input to the classifier. We used as features the set of AUs (Action Units) describing the facial expressions according to the Facial Action Coding System (FACS) [26], a scheme for manual facial expression coding uses these different muscle activations, 46 action units (AUs), and combinations of them to determine indications for emotions. Then, for this software we decided to use as features the presence and intensity (expressed as a float from 0 to 5) of 17 facial AUs (AU01r, AU02r, AU04r, AU05r, AU06r,

Fig. 3 An example of the six emotions in FACES dataset

70

G. Castellano et al.

AU07r, AU09r, AU10r, AU12r, AU14r, AU15r, AU17r, AU20r, AU23r, AU25r, AU26r, AU45r and the presence of the facial action unit AU28c). To extract these features, we used OpenFace 2.0 [27] a freely available tool capable of accurate facial landmark detection, recognize a subset of Action Units (AUs), and gaze tracking and head pose estimation. The selected AU are those that are possible to estimate with the OpenFace software. Due to the low number of examples for each class in the dataset, we did not consider approaches based on deep learning. We tested the performance of three machine learning algorithms for classification, namely Multi-class SVM [30], Random Forest [28], and Decision Tree [29]. We used two validation approaches: the stratified k-fold cross-validation with k = 10 and the train-test split. The best performance was achieved by Random Forest with an accuracy of 97.6%, hence we considered this model as final model. In Table 3 the confusion matrix of this final model is shown. Figure 4 shows an example of emotion recognition during a CST session using this model. Table 3 Confusion matrix Angry

Disgust

Fear

Happy

Neutral

Sad

Classified as

218

4

0

0

4

2

Angry

6

222

0

0

0

0

Disgust

0

0

226

0

2

0

Fear

0

0

0

228

0

0

Happy

2

0

0

0

224

2

Neutral

0

6

0

0

4

218

Fig. 4 Emotion recognition of six members of the group during a CST session

Sad

Detecting Emotions During Cognitive Stimulation Training …

71

4 Results First of all, we analyzed all the collected videos and in total 23,322 faces were detected out of 23,994 in total. Videos were processed taking a frame per second. As far as the undetected faces are concerned, we noticed that most of them were somewhat totally obscured or occluded, while the model was robust enough to detect partially occluded faces or profiles. Then a face was detected in total of the corpus with a success rate of 97%. It has been observed that in each frame where a face was detected the model was able to recognize an emotion. During the whole memory training program, emotions have been recognized correctly 83% for all the session compared to the manual annotation. Figure 5 shows the percentage of emotions, distinguished by their valence (NEU = neutral, POS = positive, NEG = negative), annotated manually by the human observers (NEU_MANUAL, POS_MANUAL, NEG_MANUAL) compared with those automatic detected by the software (NEU_AUTO, POS_AUTO, NEG_AUTO) in the three days of the CST. In general, the FER model detected less neutral faces and more negative emotions than the human observers. This was probably due to the fact that humans considered the context (i.e., negative states sometime were detected when they were looking down). As far as an emotion with a negative valence is concerned, we found out that Angry was never detected by the FER model and in the 0.4% of the cases by human observers. Disgust has been identified by the model in the 11.4% of the cases while it has been detected by the human observers only in the 0.6% of the cases. The FER model has never detected fear that was detected only in 0.2% of the video corpus by humans. Sadness was recognized in 0.6% of the cases by human observers and in 19.8% of the cases by the FER model. Happiness was identified in 16% of the cases when annotated by humans and 28.6% when detected by the model. Surprise has been recognized in the 3.6% by humans and, since it was not included in the FACES dataset, it could not be detected automatically. In conclusion, we can say that, for positive emotions, the developed FER model is in agreement with the annotation made by humans while for the negative emotions, there is a larger difference. Finally, the neutral expression has been identified in the 40% of the videos by the software and in 79% by human observers. Analyzing the emotional experience for each task (Table 4), we observed that the visual-verbal associative memory stimulated the maximum occurrence of positive emotions followed by the task concerning motor imitation. The worst task, in terms Fig. 5 Emotions during the three days of the CST

79

16

5

0

0

0

0

Neutral

Happy

Surprise

Anger

Disgust

Fear

Sad

19

0

8

0

N/A

34

39

1

0

0

1

3

14

81

25

0

9

0

N/A

19

47

Auto (%)

Manual (%)

Manual (%)

Auto (%)

Word completion

Motor imitation

0

0

0

0

2

17

81

Manual (%)

22

0

10

0

N/A

29

39

Auto (%)

1

1

2

1

1

7

87

Manual (%)

7

0

21

0

N/A

23

49

Auto (%)

Visual associative memory Memory of prose

Table 4 Emotions detected in each task by raters (manual) and by the FER software (auto)

1

0

1

0

7

23

68

Manual (%)

26

0

9

0

N/A

38

27

Auto (%)

Visual-Verbal associative memory

72 G. Castellano et al.

Detecting Emotions During Cognitive Stimulation Training …

73

of detected emotion valence, was the memory of prose. This is due, probably, to the nature of the story selected by the therapists that in our opinion was conveying a content eliciting negative emotions. The Pearson coefficient was calculated in order to evaluate linear correlations be-tween the results of the behavioral observations and the neuropsychological evaluations’ scores. Both for human annotation and automatic recognition, seniors with higher MCI tended to experience mostly neutral emotions (r = 0.70 in the first case and r = 0.69 in the second one) and were less happy (r = −0.80 in both cases) than patient with a lower MMSE score. In our opinion, this could be due to the need for sessions personalized according to the MMSE thus providing more stimulating tasks for them. However, this topic required a deeper investigation.

5 Conclusions and Future Work In this paper we presented the result of an experimental study carried out in the context of rehabilitation interventions for reducing cognitive decline in the elderly people with MCI and mild dementia based on the use of a Social Assistive Robot, namely Pepper. The study aimed at investigating how this technology can be used to support therapists in training programs in order to improve patients’ cognitive state. In general, the evaluation and feedback from participants and therapists were positive and showed that the system was easy to use and suggested future research challenges that the community should address. The older adults approached the humanoid robot as a human and a stimulus to go to the center to do the cognitive stimulation program. They accepted Pepper as part of their group since, for example, participants greeted at Pepper, talked to the robot as an entity having its own personality and asked the robot to sing with them. For the automatic recognition of the emotional reactions of patients, we developed a FER model specialized in recognizing emotions from the facial expressions of elderly faces. Results obtained so far are encouraging, however in order to make the FER model more robust in real-time and in the wild we will explore the use of deep learning. To this aim, we will collect a larger dataset of facial expressions of old people. In the near future, we will try to overcome to some limitations emerged from this study. A first limitation is that we could not make a comparison with a control group without the use of Pepper, as planned, due to the COVID-19 emergency. A second limitation concerns the sample size, since the experiments were made with only one group of people. Future work should involve a larger sample considering also a greater number of trials extended over a greater number of sessions. Besides emotion recognition, a further task that we plan to develop in future work is the automatic analysis of engagement with the purpose of improving the user experience and create a more engaging cognitive stimulation program by adapting the robot behavior to the users. Then the final aim of the research work will be to analyze in real time of the holistic behavior of the user to consent the robot to understand and adapt to the user’s reaction during the interaction.

74

G. Castellano et al.

Acknowledgements The authors would like to thank the “Alzheimer Bari” ONLUS, the two therapists Claudia Lograno and Claudia Chiapparino for their support, and all the seniors who participated to the experiment.

References 1. Cooper, C., Mukadam, N., Katona, C., Lyketsos, C.G., Ames, D., Rabins, P., Engedal, K., de Mendonça Lima, C., Blazer, D., Teri, L., et al.: Systematic review of the effectiveness of nonpharmacological interventions to improve quality of life of people with dementia. In: Database of Abstracts of Reviews of Effects (DARE): Quality-assessed Reviews [Internet]. Centre for Reviews and Dissemination (UK) (2012) 2. Pino, O.: Memory impairments and rehabilitation: evidence-based effects of approaches and training programs. Open Rehabil. J. 8(1), 25–33 (2015). https://doi.org/10.2174/187494372 0150601E001 3. Rouaix, N., Retru-Chavastel, L., Rigaud, A.-S., Monnet, C., Lenoir, H., Pino, M.: Affective and engagement issues in the conception and assessment of a robot-assisted psycho-motor therapy for persons with dementia. Front. Psychol. 8, 950 (2017) 4. Law, M., Sutherland, C., Ahn, H.S.„ MacDonald, B.A., Peri, K., Johanson, D.L., Vajsakovic, D.-S., Kerse, N., Broadbent, E.: Developing assistive robots for people with mild cognitive impairment and mild dementia: a qualitative study with older adults and experts in aged care. BMJ Open 9(9), e031937 (2019) 5. Pino, O., Palestra, G., Trevino, R., De Carolis, B.: The humanoid robot nao as trainer in a memory program for elderly people with mild cognitive impairment. Int. J. Soc. Robot. 12(1), 21–33 (2020) 6. Valenti-Soler, et al.: Social robots in advanced dementia. Front. Aging Neurosci. 7, 5 (2015) 7. Manca, M., Paterno, F., Santoro, C., Zedda, E., Braschi, C., Franco, R., Sale, A.: The impact of serious games with humanoid robots on mild cognitive impairment older adults. Int. J. Hum. Comput. Stud. 102509 (2020) 8. https://www.softbankrobotics.com/emea/en/pepper 9. De Carolis, B., Carofiglio, V., Grimaldi, I., Macchiarulo, N., Palestra, G., Pino, O.: Using the pepper robot in cognitive stimulation therapy for people with mild cognitive impairment and mild dementia. In: ACHI 2020, The Thirteenth International Conference on Advances in Computer-Human Interactions, Valencia, Spain, 21–25 November 2020 10. Ebner, N.C., Johnson, M.K.: Age-group differences in interference from young and older emotional faces. Cogn. Emot. 24(7), 1095–1116 (2010) 11. Kasper, S., Bancher, C., Eckert, A., Förstl, H., Frölich, L., Hort, J., Korczyn, A.D., Kressig, R.W., Levin, O., Paloma, M.S.M.: Management of Mild Cognitive Impairment (MCI): the need for national and international guidelines. World J. Biol. Psychiatry 21(8), 579–594 (2020) 12. Kim, G.H., Jeon, S., Im, K., Kwon, H., Lee, B.H., Kim, G.Y., Jeong, H., Han, N.E., Seo, S.W., Cho, H., et al.: Structural brain changes after traditional and robot-assisted multi-domain cognitive training in community-dwelling healthy elderly. PLoS One 10(4), e0123251 (2015) 13. Mataric, M.J., Scassellati, B.: Socially assistive robotics. In: Springer Handbook of Robotics, pp. 1973–1994. Springer (2016) 14. Vogan, A.A., Alnajjar, F., Gochoo, M., Khalid, S.: Robots, AI, and cognitive training in an era of mass age-related cognitive decline: a systematic review. IEEE Access 8, 18284–18304 (2020) 15. Martín, F., Agüero, C.E., Cañas, J.M., Valenti, M., Martínez-Martín, P.: Robotherapy with Dementia patients. Int. J. Adv. Rob. Syst. (2013). https://doi.org/10.5772/54765 16. De Kok, R., Rothweiler, J., Scholten, L., van Zoest, M., Boumans, R., Neerincx, M.: Combining social robotics and music as a non-medical treatment for people with dementia. In: 2018 27th

Detecting Emotions During Cognitive Stimulation Training …

17. 18. 19.

20. 21. 22.

23. 24. 25. 26. 27.

28. 29. 30.

75

IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 465–467 (2018) Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980) Esme, B., Sankur, B.: Effects of aging over facial feature analysis and face recognition (2010) Ebner, N., Riediger, M., Lindenberger, U.: Faces: a database of facial expressions in young, middle-aged, and older women and men: development and validation. Behav. Res. Methods 42, 351–62 (2010) Guo, G., Guo, R.-X., Li, X.: Facial expression recognition influenced by human aging. IEEE Trans. Affect. Comput. 4, 291–298 (2013) Caroppo, A., Leone, A., Siciliano, P.: Facial expression recognition in older adults using deep machine learning. AI*AAL@AI*IA (2017) Lopes, N., et al.: Facial emotion recognition in the elderly using a SVM classifier. In: 2018 2nd International Conference on Technology and Innovation in Sports, Health and Wellbeing (TISHW), pp. 1–5 (2018). https://doi.org/10.1109/TISHW.2018.8559494 Folstein, M.F., Folstein, S.E., McHugh, P.R.: “mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J. Psychiatr. Res. 12(3), 189–198 (1975) Fleiss, J., Cohen, J.: The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ. Psychol. Measur. 33(1973): 613–619 Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Sig. Process. Lett. 23(10), 1499–1503 (2016) Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978) OpenFace 2.0: Facial behavior analysis toolkit. In: Baltrušaitis, T., Zadeh, A., Lim, Y.C., Morency, L.-P.: IEEE International Conference on Automatic Face and Gesture Recognition (2018) Breiman, L.: Random Forests. Mach. Learn. 45, 5–32 (2001) Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986) Weston, J., Watkins, C.: Multi-class support vector machine (1999)

Mapping Finger Motions on Anthropomorphic Robotic Hands: Two Realizations of a Hybrid Joint-Cartesian Approach Based on Spatial In-Hand Information Roberto Meattini, Davide Chiaravalli, Gianluca Palli, and Claudio Melchiorri Abstract In literature, two sub-problems are typically identified for the replication of human finger motions on artificial hands: the measurement of the motions on the human side, and the mapping method of human hand movements on the robotic hand. In this study, we focus on the second sub-problem. During human to robot hand mapping, ensuring natural motions and predictability for the operator is a difficult task, since it requires to preserve the Cartesian position of the fingertips and the finger shapes given by the joint values. Several approaches have been presented to deal with this problem, which is still unresolved in general. In this work, we propose an approach for combining joint and Cartesian mapping in a single method. More specifically, we exploit the spatial information available in-hand, in particular, related to the thumb-fingers relative position. In this way, it is possible to perform both volar grasps (where the preservation of finger shapes is more important) and precision grips (where the preservation of fingertip positions is more important) during primary-to-target hand mappings, even if kinematic dissimilarities are present. We therefore report for two specific realizations of this approach: a distance-based hybrid mapping, in which the transition between joint and Cartesian mapping is driven by the approaching of the fingers to the current thumb fingertip position, and a workspace-based hybrid mapping, in which the joint-Cartesian transition is defined on the areas of the workspace in which thumb and finger fingertips can get in contact. This work was supported by the European Commission’s Horizon 2020 Framework Programme with the project REMODEL—Robotic technologies for the manipulation of complex deformable linear objects—under Grant 870133. R. Meattini (B) · D. Chiaravalli · G. Palli · C. Melchiorri Department of Electrical, Electronic and Information Engineering (DEI), University of Bologna, Viale del Risorgimento 2, 40136 Bologna, Italy e-mail: [email protected] D. Chiaravalli e-mail: [email protected] G. Palli e-mail: [email protected] C. Melchiorri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_6

77

78

R. Meattini et al.

Keywords Multifingered hands · Human-centered robotics · Grasping · Dexterous manipulation · Telerobotics and teleoperation

1 Introduction The correct mapping of human hand motions on anthropomorphic robotic hands is a quintessential problem of telerobotics, which remains unsolved nowadays [1]. Indeed, even considering anthropomorphic robotic hands, the presence of unavoidable kinematic dissimilarities makes the replication of human primary hand (PH) motions on the robotic target hand (TH) anything but straightforward. This means that is not possible to precisely emulate PH’s phalanx motions and, consequently, to achieve a truly predictable and intuitive behaviour of the TH. Looking at related works, the problem of mapping human to robot hand motions can be divided in two sub-problems: (i) the technological challenge of tracking human hand motions with appropriate sensor equipments; (ii) the theoretical challenge of effectively and functionally mapping human finger motions on the robotic counterpart. In this study, we will focus on the second sub-problem, which is both conceptual and analytical, and still without a general solution [2]. Incidentally, the use of sensorized exoskeletons and gloves has already solved the first sub-problem with good results, e.g. see [3–6]. Many mapping methods have been presented in literature, among which the joint [7] and Cartesian [3] mappings are the most popular. In direct joint mapping, the PH joints are directly imposed on the robotic hand, making this method particularly appropriate for gestures and power grasps, where the preservation of the finger shapes is prevalently important [8]. On the other hand, Cartesian mapping imposes on the TH the fingertip poses, i.e. the fingertip positions and orientations [9], followed by the determination of the joint angles computing inverse kinematics. In this way, the preservation on the PH finger shapes is not guaranteed in general [10], and therefore direct Cartesian mapping is normally preferred for the mapping of precision grasps. A different, more advanced mapping method consists in the definition of virtual objects in the workspace of the PH, which are then reported on the TH side in order to obtain coordinated finger motions within the robotic hand workspace [11]. However, if precision of the fingertip positions is required, this method can produce unintuitive motions on the TH [12], even when extended to a generalized description of the shape of the virtual object [13]. Different types of mappings base their method on recognizing a finite number of human hand postures related to predetermined TH configurations [14, 15]. In this case, the main issue consists in the generation of unexpected motions outside of the set of considered PH gestures. In general, the approaches proposed in previous works mostly disregard the importance of preserving intuitiveness and predictability of the motions during primary-to-target hand mapping, which instead can be properly achieved—to a certain extent—by preserving finger shapes and fingertip positions within the adopted method.

Mapping Finger Motions on Anthropomorphic Robotic Hands …

79

In this work, we propose a hybrid joint-Cartesian mapping approach, which exploits in-hand spatial information. Specifically, two realizations of this concept are reported and evaluated in this article: (i) a distance-based hybrid mapping, in which a transition between the joint and Cartesian mappings is enforced on the basis of the distance between the fingertips of the PH’s thumb and opposite fingers, and (ii) a workspace-based hybrid mapping, where the areas of the primary and target hand workspaces in which thumb and finger fingertips can get in contact is exploited. In the following, these methods are presented and evaluated.

2 Distance-Based Hybrid Mapping Most precision grasps are characterized by the close proximity of a finger’s fingertip and the thumb endpoint. Based on this assumption an hybrid control strategy, switching between joint and Cartesian mapping according to the PH thumb-finger distance is considered.

2.1 Algorithm and Switching Condition Let us consider a virtual sphere of radius r1 centered in the PH thumb endpoint PH pT . According to the position PH p j with j = {I, M, R, P} of the PH fingertips for the index, middle, ring and pinkie fingers respectively, a joint mapping q=

TH

PH

q,

(1)

where THq and PHq are the TH and PH joints, is enforced when the fingertips are placed outside the sphere and the inequality  PH pT −

PH

p j  > r1

(2)

is satisfied. Conversely when the measured position of a finger h ⊂ {I, M, R, P} is below the distance threshold  PH pT −

PH

ph  ≤ r 1 ,

(3)

then a switch in the control for that finger is implemented and a cartesian mapping is exploited TH p Sh = cs ( PH ph − PH pT ) + TH p ST , (4) where TH p Sh represents the TH’s fingertip positions of the fingers satisfying Eq. (3), cs is a scaling factor given by the ratio of the TH and PH thumb lengths and TH p ST is the TH’s thumb fingertip position. A different cartesian mapping is defined for

80

R. Meattini et al.

the TH thumb in case Eq. (3) is satisfied for at least one finger. Let us consider the cartesian components of the TH fingertip positions PH pi = [ PH piTx PH piTy PH piTz ]T and T TH T TH T T TH p Si = [TH p Si p Si y p Si z ] , with i = {T, I, M, R, P}. A scaling factor d{l,t} x with {l, t} = {{I, M}, {M, R}, {R, P}} characterizing the relative distance between the PH and TH fingers base frame can be defined d{l,t} =

TH p O St x − TH p O Sl x   PH p Ot x − PH p Ol x 

Given the thumb position with respect to a finger frame m = {I, M, R} TH

p ST x =

(5)

the Cartesian mapping for the TH thumb is defined in the x component as ⎧ TH ⎪ p ST x 1 , ⎪ ⎪ ⎪ TH ⎪ ⎪ ⎨ p ST x 2 , = TH p ST x 3 , ⎪ ⎪ TH ⎪ p ST x 4 , ⎪ ⎪ ⎪ ⎩TH p ST x 5 , where

if if if if if

pT x ≥ PH p O I x p O M x < PH pT x ≤ PH p O I x PH p O R x < PH pT x ≤ PH p O M x PH p O P x < PH pT x ≤ PH p O R x PH pT x < PH p O P x , PH PH

TH

p ST x 1 = [TH TO S I cs O I pT ]x ,

TH

p ST x 2 = [TH TO S I O I pT ]x + d{I,M} [ p˜ I ]x ,

TH

p ST x 3 = [TH TO S Mx O M pT ]x + d{M,R} [ p˜ M ]x ,

TH

p ST x 4 = [TH TO S Rx O R pT ]x + d{R,P} [ p˜ R ]x ,

TH

p ST x 5 = [TH TO S P cs O P pT ]x ;

with TH TO Si , i = {T, I, M, R, P}, the homogeneous linear transform between TH i finger {O Si } and TH base frame and notation [h] j describing the component of the vector h along axis j. This specific mapping allows to preserve through the palm of the TH hand the scaled distance between the PH thumb endpoint and the base frame of the other fingers. Instead for the y and z components the mapping is implemented according to the distance between the PH thumb endpoint and the PH index finger base frame: TH p ST y = [TH TO S I O I pT y cs ] y (6) and TH

p ST z = [TH TO S I O I pT cs ]z .

(7)

Mapping Finger Motions on Anthropomorphic Robotic Hands …

81

2.2 Transition and Online Algorithm A transition function allows to preserve the smoothness of the motion during the switch between the two mapping strategies. The input position for the TH fingertips pinput is obtained as: (8) pinput = (I − K ) p Q + K pC , between the fingertip cartesian position p Q = F S (q), with F S (·) the Forward kinematic function of the TH, defined according to the joint mapping (Eq. (1)) and the fingertip cartesian position pC defined according to the cartesian mapping. The matrix I is a 15 × 15 identity matrix while K = diag(k ST , ..., k ST , ..., k Si , ..., k Si , ..., k S P , ..., k S P ),       3 times

3 times

3 times

where i = {T, I, M, R, P} represents thumb, index, middle, ring and pinkie finger respectively. k Si defines the non linear sigmoid-like profile implementing the smooth transition when PH finger moves between the two sphere centered in the thumb and characterized by radius r1 and r2 (r2 > r1 )

k Si (δi ) =

⎧ ⎪ ⎨1

1 (1 ⎪2

⎩ 0,



iπ cos( r2δ−r )) 1

if δi < r1 if r1 ≤ δi ≤ r2 if δi > r2 ,

(9)

with δi =  PH pT − and PH pi , i = {I, M, R, P}.

2.3 Simulation and Results We performed a simulation experiment in which a set of motions of a PH has been mapped into a simulated target robotic hand. As PH we used the paradigmatic hand model, whose simulation has been performed by using the open-source SynGrasp Matlab toolbox [16]. As target robotic hand, for this test we used the simulator of the commercially available Allegro Hand [17]. The Allegro Hand simulator is implemented in ROS on Rviz. In Figs. 2, 3, 4, the signals involved in the mapping of a tripodal motion on the Allegro Hand (see its visualization in Fig. 1) are reported. In relation to such figure, it can be appreciated the switching between the Cartesian reference p Q produced by the joint mapping (blue dotted line) and the Cartesian reference pC provided by the Cartesian mapping (black dotted line). Furthermore, it is observable the presence of the sigmoid-like transition. The plots of the bottom two rows of Figs. 2, 3, 4 report for the related values of K and of the shape and Cartesian errors, which are equal to zero in the related “joint-only” and “Cartesian-only” phases.

82

R. Meattini et al.

Fig. 1 Visual frames of the PH and the Allegro Hand simulator during the execution of a tripodal motion

Fig. 2 Plot of pC , p Q , pinput , k S , e Q (shape error) and eC (Cartesian error) for the thumb, index and middle fingers (columns from left to right) during the tripodal closure motion visually reported in Fig. 1

3 Workspace-Based Hybrid Mapping 3.1 Joint-Cartesian Mapping Transition In the following for the sake of simplicity only the control for the thumb and index finger will be presented. The same strategy can be easily applied to all other fingers respectively. Let’s consider the convex hull MH , characterized by the set of discretized points BMH ⊇ MH obtained by the intersection of the thumb and index finger configuration space. Such subspace describes all possible fingertip contact point for the two finger. Based on this assumption we want to impose a Cartesian

Mapping Finger Motions on Anthropomorphic Robotic Hands …

83

Fig. 3 Plot over time of pC , p Q , pinput , k S , e Q (shape error) and eC (Cartesian error) for the thumb, index and middle fingers (columns from left to right) during the tripodal closure motion visually reported in Fig. 1

Fig. 4 Plot over time of pC , p Q , pinput , k S , e Q (shape error) and eC (Cartesian error) for the thumb, index and middle fingers (columns from left to right) during the tripodal closure motion visually reported in Fig. 1

84

R. Meattini et al.

control inside the convex hull in order to produce precise motion when finger come in contact with each other and joint control outside to preserve shape during wide motions. Moreover we want to define a transition function to smoothly move between the two mapping strategy guaranteeing the consistency of motion for the fingers. To MH as the scaling by a factor s > 1 of the convex hull this purpose let’s consider

s M H with respect to its centroid and BM

H s as the region in PH space delimited by it. M H containing all PH point that do not belong to both  B The region B H = BM

Hs hulls can be exploited to impose a transition function f between the two mapping strategies: (10) f : B H ⊂ R3 → R MH . that linearly interpolates the nodes (h s,i , 0) and (h i , 1), with h i ∈ MH and h s,i ∈

s By proper normalization of the transition function the points on the external hull M



MH are mapped as f (h ) = 0, ∀ h M s s,i s,i ∈ H s and those on the internal one H M as f (h s,i ) = 1, ∀ h s,i ∈ H . In between a radial basis function (RBF) allows to efficiently solve the multidimensional interpolation problem [18]. According to the respective thumb and index position M pT and M p I with respect to the hull MH it is then possible to define two functions, a thumb transition f T = f ( M pT ) and an index transition f I = f ( M p I ) that combined can determine the thumb-index transition function f T I ( f T , f I ), or simply f T I

fT I ( fT , f I ) =

f T , if f T ≤ f I f I , if f T > f I

.

(11)

The transition function f T I allows a smooth motion between the joint and cartesian mapping strategies. When joint mapping is applied the joint references for the thumb and index fingers S qT,Q and S q I,Q are directly provided according to the PH joint values. Therefore the Cartesian position S pT,Q , S p I,Q and orientation S oT,Q , S o I,Q of the two fingers is easily determined through standard forward kinematics S x T,Q = S FT ( S qT,Q ) and S x I,Q = S F I ( S q I,Q ). The cartesian pose of the fingers it is therefore described as

S S pT,Q p I,Q S S and x I,Q = S , (12) x T,Q = S oT,Q o I,Q with the orientation vector characterized by any suitable representation as quaternions or Euler angles. In a similar way when Cartesian mapping is applied the position S pT,C , S p I,C and orientation S oT,C , S o I,C of the thumb and index fingers for the TH hand are directly generated by the mapping and immediately available in the pose vector S x T,C and S x I,C S

S

x T,C

p = S T,C oT,C

and

S

p I,C , = S o I,C S

x I,C

(13)

Mapping Finger Motions on Anthropomorphic Robotic Hands …

85

Eventually the new hybrid mapping defines a trade-off between the two mapping contributions in the form S

pT,input = (1 − k T I ) SpT,Q + k T I SpT,C ,

S

p I,input = (1 − k T I ) Sp I,Q + k T I Sp I,C ,

(14)

for control of the position input S pT,input , S p I,input , with k T I implementing the sigmoidal smooth transition ⎧ ⎪ if M pT , M p I ∈ / BM

⎨0 Hs (15) k T I = 1, if M pT , M p I ∈ BMH . ⎪ ⎩1 M M (1 − cos(π f T I )) if pT , p I ∈ B H 2 and oT,input = o I,input =

oT,Q SM oT

if k T I = 0 , otherwise

(16)

S

if k T I = 0 , otherwise

(17)

S

o I,Q SM oI

for the control of the orientation input S oT,input , S o I,input . In the previous equation the orientation inputs S M oT and S M o I are defined as the orientation of PH fingers with respect to their base frame, consistently described with respect to the frame of the respective finger in the TH. In particular let’s consider the rotation matrices M Ro I and M RoT describing index and thumb fingers orientation with respect to their base frames o I and oT and the rotation matrices R S M I and R S MT describing the change in orientation between PH and TH finger frames. Then the TH reference orientation for both the index and thumb finger are obtained as S M Ro I = R S M I M Ro I and S M RoT = R S MT M RoT . Eventually the input joint reference for TH index and thumb joints S q I and S qT are obtained through inverse kinematics S

qT = S IT ( S x T,input ),

S

S

with x T,input

(18)

q I = S I I ( S x I,input ),

p = S T,input oT,input



p I,input . = S o I,input S

and x I,input

(19)

A spatial sigmoid-like profile without discontinuities is enforced through Eqs. (14)–(15) and a smooth transition is achieved when switching between the two mapping strategies.

86

R. Meattini et al.

3.2 Cartesian Mapping Let’s consider the PH convex hull M H and its centroid M g described with respect the PH world reference frame { M O}. A different representation M H ∗ and M g ∗ can be obtained with respect to the reference frame placed in M g with the y − z plane parallel to the PH index’s frontal plane and the x − z plane parallel to the PH index’s sagittal median plane. It is then possibile to define a transformation TM G between the two frames { M G} and { M O}. A similar operation can be performed in the TH by defining a new reference frame { S G} placed in the centroid S g of the convex hull S H and oriented with y − z plane parallel to the TH index’s frontal plane and x − z plane parallel to the TH index’s sagittal median plane. Now we can define the convex ∗ hull S H , representing S H described with respect to { S G} and the transformation matrix TS G between { S G} and the TH world frame. Therefore we can consider a new convex hull S M H ∗ characterized by the points of M H ∗ replicated in the TH space with respect to { S G} and a new set S M H describing S M H ∗ with respect to the TH S MH as  S MH , world frame { S O}. Moreover a scaling factor b is applied to S M H ,  b b where the scaling factor b is chosen such that    max(|x|) ˆ [xˆ yˆ zˆ ]T ∈ S M H ∗ = max(|x|) [x b

y z]T ∈ SH ∗

,

(20)

MH ∗ as  S MH defined with respect to { S G}. This choice is driven by the with S b b MH ∗ necessity to contain the range of values of SH ∗ within the range of values of S b along the x axis. On the basis the new Cartesian mapping is defined. Let’s consider TH thumb and index fingertips position S M pT and S M p I obtained by: (i) considering the PH thumb and index fingertips M pT and M p I with respect to the frame { M G}, then (ii) placing them to the TH with respect to the frame { S G}, and finally (iii) evaluating them with respect to the world frame { S O}. S M pT and S M p I are then defined as SM

pT = TS G (TM G )−1

M

pT ,

SM

p I = TS G (TM G )−1

M

pI .

(21)

Eventually, the Cartesian position of the TH thumb and index fingertips S pT,C and S p I,C will be evaluated as S

pT,C = b( S M pT − S g) + S g,

S

p I,C = b( S M p I − S g) + S g−

(22)

Mapping Finger Motions on Anthropomorphic Robotic Hands …

87

Fig. 5 Tips-to-tips motion of the PH (paradigmatic hand) mapped to the TH (colored simulated UBHand)

3.3 Experimental Evaluation The experimental tests involved the simulated paradigmatic hand model as PH (available from the open-source SynGrasp Matlab toolbox [16]) and the University of Bologna Hand IV (UBHand) (fully actuated robotic hand [19]).

3.3.1

Simulation Experiment

The first tests were performed offline in the simulation environment by prerecording a reference trajectory characterized by four consequent motions. During each motion the hand started completely open and a couple thumb-finger would get in contact and then return to the initial configuration according to the sequence thumb-index, thumb-middle, thumb-ring and thumb-pinkie as shown in Fig. 5. During the motion the TH fingers outside a thumb-finger convex hull preserve the PH shape since joint mapping is exploited. Conversely the thumb-finger couple performing the tapping motion enters the respective convex hulls and switches the control between joint and Cartesian mapping to obtain a precise motion and tapping of the fingers. Note the clear transition between the two mapping strategies in the bottom plot where the gains related the transition for the TH are proposed.

3.3.2

Test with the Real UBHand

Eventually some tests with physical TH device have been performed. In Fig. 6 some frames of the hand motion when pouring a liquid in a container (volar grasp, see Fig. 6a) and drawing a line (precision grasp, Fig. 6b) are reported. It is immediate to see that the proposed control algorithm allows at the same time precision grasping through the Cartesian mapping inside the control convex hulls and shape preservation through joint mapping outside.

88

R. Meattini et al.

Fig. 6 a Volar grasp application: pouring mock-up liquid from a bottle. b Precision grasp application: drawing a line with a marker

4 Conclusions In this article a hybrid joint-Cartesian approach based on spatial in-hand information has been proposed for the mapping of motions on anthropomorphic robotic hands. In particular, two specific realizations of such mapping concept – distance-based and workspace-based hybrid mappings—have been presented, reporting for successful preservation of PH finger shapes and correctness of TH fingertips mapping according to a transition between the joint/Cartesian control spaces. Future developments will explore the possibility of varying the shapes of the thumb-centered sphere (distancebased realization) and the thumb-finger convex hulls (workspace-based realization) in an on-line fashion.

References 1. Shahbazi, M., Atashzar, S.F., Patel, R.V.: A systematic review of multilateral teleoperation systems. IEEE Trans. Haptics 11(3), 338–356 (2018) 2. Colasanto, L., Suárez, R., Rosell, J.: Hybrid mapping for the assistance of teleoperated grasping tasks. IEEE Trans. Syste. Man Cybern. Syst. 43(2), 390–401 (2012) 3. Rohling, R.N., Hollerbach, J.M.: Optimized fingertip mapping for teleoperation of dextrous robot hands. In: [1993] Proceedings IEEE International Conference on Robotics and Automation, pp. 769–775. IEEE (1993) 4. Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: A review on vision-based full dof hand motion estimation. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)-Workshops, p. 75. IEEE (2005)

Mapping Finger Motions on Anthropomorphic Robotic Hands …

89

5. Bergamasco, M., Frisoli, A., Avizzano, C.A.: Exoskeletons as man-machine interface systems for teleoperation and interaction in virtual environments. In: Advances in Telerobotics, pp. 61–76. Springer (2007) 6. Bianchi, M., Salaris, P., Bicchi, A.: Synergy-based hand pose sensing: optimal glove design. Int. J. Robot. Res. 32(4), 407–424 (2013) 7. Speeter, T.H.: Transforming human hand motion for telemanipulation. Presence: Teleoper. Virt. Environ. 1(1), 63–79 (1992) 8. Cerulo, I., Ficuciello, F., Lippiello, V., Siciliano, B.: Teleoperation of the schunk s5fh underactuated anthropomorphic hand using human hand motion tracking. Robot. Auton. Syst. 89, 75–84 (2017) 9. Rohling, R.N., Hollerbach, J.M.: Calibrating the human hand for haptic interfaces. Presence: Teleoper. Vir. Environ. 2(4), 281–296 (1993) 10. Chattaraj, R., Bepari, B., Bhaumik, S.: Grasp mapping for dexterous robot hand: a hybrid approach. In: 2014 Seventh International Conference on Contemporary Computing (IC3), pp. 242–247. IEEE (2014) 11. Gioioso, G., Salvietti, G., Malvezzi, M., Prattichizzo, D.: Mapping synergies from human to robotic hands with dissimilar kinematics: an approach in the object domain. IEEE Trans. Robot. 29(4), 825–837 (2013) 12. Meeker, C., Rasmussen, T., Ciocarlie, M.: Intuitive hand teleoperation by novice operators using a continuous teleoperation subspace. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–7. IEEE (2018) 13. Salvietti, G., Malvezzi, M., Gioioso, G., Prattichizzo, D.: On the use of homogeneous transformations to map human hand movements onto robotic hands. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 5352–5357. IEEE (2014) 14. Ekvall, S., Kragic, D.: Interactive grasp learning based on human demonstration. In: IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004. vol. 4, pp. 3519–3524. IEEE (2004) 15. Pedro, L.M., Caurin, G.A., Belini, V.L., Pechoneri, R.D., Gonzaga, A., Neto, I., Nazareno, F., Stücheli, M.: Hand gesture recognition for robot hand teleoperation. In: ABCM Symposium Series in Mechatronics. vol. 5, pp. 1065–1074 (2012) 16. Malvezzi, M., Gioioso, G., Salvietti, G., Prattichizzo, D.: Syngrasp: a matlab toolbox for underactuated and compliant hands. Robot. Autom. Mag. IEEE 22(4), 52–68 (2015). Dec 17. WonikRobotics: Allegro hand by wonik robotics. http://www.simlab.co.kr/Allegro-Hand (2015) 18. Chen, C., Hon, Y., Schaback, R.: Scientific computing with radial basis functions. Department of Mathematics, University of Southern Mississippi, Hattiesburg, MS 39406 (2005) 19. Melchiorri, C., Palli, G., Berselli, G., Vassura, G.: Development of the ub hand iv: overview of design solutions and enabling technologies. IEEE Robot. Autom. Mag. 20(3), 72–81 (2013)

Robotized Laundry Manipulation With Appliance User Interface Interpretation Wendwosen B. Bedada, Ismayil Ahmadli, and Gianluca Palli

Abstract In this paper, a robotized laundry picking and insertion in to washing machine followed by display interpretation and adjustment of the washing cycle for a complete robotic laundry operation is described. To ensure the successful insertion of a laundry, recovery picking from the drum door region in case large clothes remain partially out from the washing machine is also evaluated. A pointcloud-based perception algorithm is proposed to detect wrinkles on the cloth surface to compute spline curves along the wrinkle-like structure and estimate grasping frames. Even more, to insure graspability in the absence of wrinkles, a blob detection approach is evaluated together with grasp pose quality ranking to aim for the optimal pose. A deep learning based washing machine user-interface detection and interpretation algorithm is also developed, to fully Automate the robotic laundry operation. A fully autonomous laundry handling and washing cycle setting on the appliance display is tested and validated extensively by performing multiple tasks using our robotic platforms (Tiago and Baxter) and AEG washing machine. Keywords Laundry automation · Deformable objects · User interface interpretation · Deep learning

1 Introduction The need for automating various stages of laundry washing and related operations has many useful application in domestic as well as industrial scenarios. In particular, appliance manufacturing industries conduct accelerated life test (ALT) on the small W. B. Bedada (B) · I. Ahmadli · G. Palli DEI - Department of Electrical, Electronic and Information Engineering, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_7

91

92

W. B. Bedada et al.

fraction of appliances statistically sampled from the production line and are supervised by a human operators. The test cycle usually involves loading and unloading of clothes and adjusting the washing cycle through the display interface. In this scenario, the deployment of robotic system requires autonomous capability to detect the graspable regions of the cloth and target appliance and robust user-interface interpretation. The detection and manipulation of deformable objects (DOs) like clothes has a wide range of applications in domestic shores as well as industrial scenarios for cloth washing, ironing and folding tasks [1]. Clothes are DOs characterized by having one dimension considerably smaller than the other two (i.e. the thickness of the fabric) [2]. The challenge of manipulating and sensing DOs is massive due to their intrinsic property of being deformable. Indeed, their shape and appearance change during the time. This implies that the vast majority of the approaches and algorithm developed for rigid objects need to be modified or are not applicable at all to DOs. A comprehensive review of the literature on sensing and manipulation of DOs is provided in a recent survey [2]. Extensive work has also emerged in the literature specifically related to cloth(es): state estimation of clothes [3]; grasp point detection [4] and [5]; manipulation tasks as grasping for garment picking-up [6] and [7], manipulation for garment reconfiguration [8] and [9]. These previous works either targets a particular region of the cloth or are trying to extract particular features such as color for identifying grasp. While Our work directly operates on a live pointcloud thereby doesn’t rely on the these specific features for grasp pose detection. In this paper, there are two core algorithms developed: The first one is a pointcloudbased algorithm for the identification of optimal grasping poses based on wrinkles and blobs in a set clothes inside a bin or during recovery while the second algorithms interprets user interface for washing cycle setting using deep neural networks. By combining these algorithms, a fully autonomous robotized laundry loading and washing cycle setting has been achieved.

2 Grasping Pose Detection We present two approaches for grasp poses identification for cloth-like objects: The first method was a Wrinkle detection in 3D scene. The second approach is based on identifying graspable region: based on blob of graspable points in the pointcloud. To improve grasp success both wrinkle and blob based approaches are utilized to generate a set of graspable poses G. A cost function based pose scoring is finally utilized to rank all the poses in G.

Robotized Laundry Manipulation With Appliance …

93

Fig. 1 Wrinkle detection algorithm. The input pointcloud is segmented retrieving only the interior point of the bin (yellow). The entropy map is build utilizing the knowledge embedded into the convexity, curvature and combined (depth and edges) maps. The grayscale image is shown for visualization purpose only

2.1 Wrinkle Detection When dealing with the problem of grasping cloth-like objects, wrinkles are distinctive features for grasp where the information about the wrinkledness is embedded in the 3d surface topology and therefore requires a 3d sensor to capture it. The wrinkle detection algorithm presented here targets the task of bin picking and recovery grasp which consists of four distinct steps. In the first step, the segmentation of the source pointcloud is performed by identifying the bin in the scene and segmenting only the region in the pointcloud that are associated to the clothes only. The second step applies a wrinkledness measure combined with additional cues (convexity and depth) in order to find the areas of the segmented pointcloud with wrinkle-like structures. The third step is responsible for the creation of graph structure and path building in each detected wrinkle area. The last step, estimates a grasping pose for each piecewise wrinkle-path. Starting with live scene pointcloud, the algorithm produces poses along wrinkles described by piecewise curves (see Fig. 1). In the following the main components of the algorithm are discussed. The segmented pointcloud is processed in the second step in order to detect graspable regions. A graspability measure is employed to gain an understanding of the location of highly wrinkled areas in the cloth. This information is encoded into an entropy map. A depth map and convexity map are used as auxiliary cues to robustify the detection of the wrinkled areas. Normals Estimation The first step in the detection of the regions is based on lowlevel features as the surface normals of the pointcloud. They are computed relying on the Moving Least Squares algorithm [10] which smooth out the pointcloud surface by fitting a polynomial curve before estimating the surface normals. By using this approach, a reduction in the noise in the estimation process is obtained. The normal vectors obtained are expressed in Cartesian coordinates as (n x , n y , n z ). They are transformed in spherical coordinates since only two components are relevant. The transformation involved is documented in [11]. The pair of angles (φ, θ)

94

W. B. Bedada et al.

are calculated as: φ = atan

  nz ny

, θ = atan

√

n 2z +n 2y nx

 The angle φ is denoted as

azimuth angle while θ as inclination angle. Convexity Map Detecting the concavity or convexity of a local area is important to remove from the highly wrinkled areas the regions that are not easily graspable. The method presented here is capable of providing such understanding with a very small computation footprint. This procedure can determine if a point’s neighborhood is convex or concave [12]. Given the input and normal vectors pointclouds in Cartesian coordinates, for each point a local neighborhood is found by choosing local patches composed by 9 points. Focusing on a given patch, lets denote by p1 the (x, y, z) coordinates of the considered point and by n1 its normal. Lets p2 and n2 be in turn one of its neighborhood points. The distance vector between p1 and p2 is computed as d = p1 − p2 . Then, the angle α1 between n1 and d is compared with the one α2  A convex connection between p1 and p2 is defined if α1 is smaller between n2 and d. than α2 . The condition can be expressed as: α1 < α2 ⇒ cos(α1 ) − cos(α2 ) > 0 ⇐⇒ n1 · dˆ − n2 · dˆ > 0 dˆ =

(1)

p1 − p2 || p1 − p2 ||

If the condition (1) is not satisfied, the two points will exhibit a concave connection between them. The computation is performed for all the neighborhood points that satisfy a check based on the normal vectors difference angle: the convex connectivity is calculated only if the two normal vectors n 1 and n 2 have a significant angle difference between them. The original point is set to be convex if all of its neighborhood exhibit a convex connection with him. Figure 2 displays the convex and concave conditions. The results show that this simple approach is able to detect convex regions as wrinkles and edges in the clothes, at least in an approximated way. The combination of the entropy filter with the convexity check allows the robust detection of convex wrinkles only. Entropy Filter The entropy filter is employed in order to quantify how much information exists in a given local region. In particular, the goal is to discover regions of the clothes with a sparse distribution of normals. They will result in a high value in the entropy measure. Instead, regions with normals mostly aligned with each other will be characterized by a low value in the entropy measure. For each point in the input pointcloud a local region is considered and a two-dimensional histogram is constructed. The histogram is built with the two spherical components of the surrounding normal vectors. Hence, It is used to model the spherical coordinate angles distributions. The entropy measure is defined as: H (x) = −wx

n  i=1

p(xi ) log p(xi )

(2)

Robotized Laundry Manipulation With Appliance …

95

Fig. 2 Drawing showing the convex and concave conditions

where x is the point considered to which a two-dimensional histogram of orientation angles in spherical coordinates (azimuth and inclination) is associated. The histogram is made of n bins for each dimension. The parameter wx is the weight related to point x. With xi we are denoting the i-th bin of the histogram and with p(xi ) its associated value. The weight factor comes from the depth map. In particular, wx represent the intensity value in the depth map of the point considered. As the point is far away from the reference plane, its associated intensity value is larger and the weight factor increases. As result, the points that are more distant from the plane are preferred. If the not weighted version of the formula is needed, wx can be set equal to one. Wrinkles as Interpolated Splines To improve the original piece-wise curve fitting reported in [13] on the nodes, the new idea exploited here is to build an undirected graph G = (V, E) where the nodes (V ) are interconnected by the edges (E) based on some properties. These nodes are ordered according to a predefined sequence and a spline curve is interpolated to them. The supervoxel clustering algorithm [14] is utilized to create the graph structure, by applying it to the projected segmented entropy map (i.e. the output of Entropy Filtering step described in Sect. 2.1. The algorithm provides, for each supervoxel, its centroid points and the set of adjacent neighbours. The supervoxels centroids will be the nodes of the graph, the adjacency information is used for the introduction of edges. Each node is also augmented with an intensity attribute related to the wrinkledness score of the related area in the entropy map. Figure 3a displays the result of the clustering with the generated graph. The obtained graph is clustered based on connected components resulting into a set of clusters (subsets) Ci , i = 1, . . . , c where c is the number of clusters, such that G = ∪ Ci . The path building process reported in Algorithm 1 build a path Pi = {vi,1 . . . vi,l } over the generic i-th cluster Ci as an ordered set of distinct alternating nodes and

96

W. B. Bedada et al.

Fig. 3 Graph structure (a) and interpolated spline (b). In (a), the points in the graph denote the supervoxels centroids and the segments connecting them resemble the graph edges. The different colors in the nodes highlight their different intensities values. In (b), the segmented entropy map is shown in the background with the spline in yellow

edges, where l is the total number of nodes in Pi . First, the node of Ci exhibiting the maximum intensity value is selected as v ∗ (line 1). The adjacent nodes of v ∗ are stored in N via the function adj(v ∗ ) (line 2) and the two nodes with the larger intensity value are selected from N (lines 3 and 5). Thus, two partial paths are built (lines 8–20) starting from v ∗ and its adjacent nodes with the higher intensity (line 8). Then, an iterative procedure adds nodes in Pt based on a score s j computed as the product between the intensity and curvature scores (line 14) considering the last path node and each of its adjacent nodes. In fact, given a sequence of node positions, i.e. {vi−1 , vi , vi+1 }, the angle between the two 3D vectors di = vi−1 − vi and di+1 = vii − vi is γi = ({vi−1 , vi , vi+1 }) = di · di+1 . Given a Von Mises distribution M(·) centered on π and with variance 1, the curvature score is defined as s cj = M(γi ). In The sequence {vl−1 , vl , v}, denoting the last two elements of the path and the candidate node under test, is considered in the computation of the curvature score (line 13). Since M(·) is concentrated around the value of π, it is more likely to select a straight line path. The node with the greatest product score v j ∗ is chosen as next node and added to the path (line 16). If all the candidate nodes v ∈ N have a product score of zero, the path is closed and the loop stops. If only one neighbor is present in N , then the method described consists simply in checking its curvature and intensity, similarly to a flood-fill algorithm with threshold. Finally, the path Pi is obtained just by merging rev(P1 ), i.e. the reversed set obtained form P1 , and P2 , using v ∗ as junction point (line 20). Spline Interpolation The nodes centroids are translated from the projected space back to the original 3D space. Then, the nodes of each Pi are fed as set of control points to be interpolated by a spline in the 3D space with a degree of 2. Notice that this is an approximated solution and different interpolation strategies can be implemented to refine it. The result of the interpolation for a sample pointcloud is denoted in Fig. 3b as a yellow curve.

Robotized Laundry Manipulation With Appliance …

97

Algorithm 1: Wrinkle Path Building

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Input: Ci Output: Pi v ∗ ← arg maxv ({I (v), ∀v ∈ Ci }) N ← adj(v ∗ ) v1n ← arg maxv ({I (v), ∀v ∈ N }) N ← N \ v1n v2n ← arg maxv ({I (v), ∀v ∈ N }) Pt ← ∅ ∀t ∈ {1, 2} for Pt do Pt ← {v ∗ , vtn } N ← {adj(vtn ) \ Pt } do S←∅ for v ∈ N do s c = M(({vl−1 , vl , v})) s = I (v)s c S ←S ∪s j ∗ = arg max j (S ) Pt ← P t ∪ v j ∗ N ← adj(v j ∗ ) \ Pt while N = ∅ ∪ s j ∗ = 0

20 Pi ← rev(P1 ) ∪ P2 21 return Pi

2.2 Extension to Washing Machines Robotized insertion of clothes into the washing machine drum is a complex task. Due to many factors related to the initial point of grasp and topology/dimension of the cloth considered, it may happen that, after the insertion, a portion of the object lays outside the opening door of the washing machine drum. A possible strategy for the identification and removal of clothes laying outside the drum door is here discussed by extending the approach presented in Sect. 2 about the bin segmentation. The idea consists of using a pre-computed pointcloud model PO of the washing machine for calculating a new pointcloud, named difference map PD , as the difference between PO and the current scene PS pointclouds. The obtained PD can be used for understanding if a misplaced cloth is present. Then, the grasping pose identification algorithm presented in Sect. 2 can be employed for computing the recovery poses for inserting it. Figure 4 provides a summary view of the approach described in this section. Pointclouds Registration In order to calculate PD , PS and PO should be aligned (i.e. registered). In particular, we need to find the transformation that would bring PO to overlap PS , obtaining the aligned model pointcloud POˆ . Notice that PO is computed offline and represents a portion of a washing machine with the door opened. The

98

W. B. Bedada et al.

Fig. 4 Schema of the vision approach for the washing machine recovery picking. Grayscale image shown for clarity

alignment operation can be split into 1) the problem of determining the initial (rough) transformation between P0 and PS , and 2) the optimization of the alignment. The initial alignment is obtained by exploiting the Samples Consensus Prerejective (SCP) [15] method. It evaluates the correspondences between feature points, as features we select the Fast Point Features Histograms (FPFH) [16], and provides the initial guess for the transformation. This is refined by the Iterative Closest Point (ICP) algorithm [11] which minimize the Euclidean distance error metric between the overlapping areas of the pointclouds. Difference Map A reference plane is estimated from POˆ and both pointclouds are projected on this plane. Each point in PS is matched with a point in POˆ and the two point-to-plane distances between each of them and pref are computed. Then the distances are evaluated and their difference stored inside PD , Fig. 4 provides an example of difference map where the point intensity values are encoded in the colormap (reddish means close to zero). Thus, PD is used to display the regions of the washing machine where POˆ and PS differ the most. PD is segmented by discarding all the points with a negligible difference (distance) based on an user-defined threshold. Notice that if all the points in the difference map do not satisfy the threshold, then this can be interpreted as a signal of not presence of misplaced cloth.

2.3 Blob Detection Blobs are another important features that could easily be computed to augment scenarios where the wrinkledness criterion produces very small number of graspable

Robotized Laundry Manipulation With Appliance …

99

Fig. 5 Grasp pose detection using blob regions. The colored set of points represent the cluster of blobs with their principal and normal direction computed for grasp pose estimation

poses. This happens in situations where clothes form a flat top with other laundry pilled up underneath. By combining wrinkledness measure and blob detection, the laundry grasping algorithm gives higher rate of success. Due to deformable nature of clothes, a robot can form the grasp shape by pushing against the laundry where there are enough surface to make contact, particularly when parallel grippers are utilized. In this regard, a set of densely populated clusters of points i.e. blobs can offer another alternative. To extract the blobs from the segmented pointcloud, we first search for the local maxima points with respect to other points within predefined radius along the normal direction from the surface of the container. To create graspable regions, we perform clustering on the locally maximum points, effectively culling regions that are not suitable for the grasp. For each cluster, centroid and Principal component directions are computed to determine the grasp point and it’s orientation (see Fig. 5). By integrating the wrinkle and blob based techniques, there are a number of graspable points of which the optimal one is selected for grasp execution using the robot. To identify optimal grasp pose, a score is computed for each of grasp points based on their location in the container, the direction of the wrinkle or blob with respect to the container. In the case of recovery grasp, the poses ranked by considering the point along the outlaying part of the laundry.

3 User Interface Interpretation Another important problem to autonomous robotic laundry operation is the robust interpretation of the digital display and setup of the washing cycle. Performing user interface interpretation from a mobile robotic platform using conventional 2d cameras has its own challenges. The angle view, lighting condition and reflection from the glass screen are all varying during the operation. These factors has made the traditional computer vision approach (feature based) difficult to utilize in our scenario and a deep neural network based approach is proposed.

100

W. B. Bedada et al.

The First step toward this is to Identify important coordinate points on the display. This requires matching points between the scene image and model image of the display by computing the homography matrix H. The output of this step will be coordinates of digits, symbols and leds that localizes washing program and The second task will be recognizing the value of digits and programs, functions and option at which the washing machine is working. For this, a MobilenetV2 convolutional neural network architecture has been used to recognize 10 number of digits and additional two symbols (’-’, ’h’) that possibly appear on the appliances digital display.

3.1 Point of Interest Coordinate Estimation These are points that should robustly be identified on the scene image i.e (image during the actual robotic execution) to perform user interface interpretation by matching them with the model image. The model image of the user interface and each desired point to be projected on the scene are indicated as shown in Fig. 6. By solving the homography problem between the scene and model images, we will get the homography matrix H. The first step to do so is by utilizing efficient feature descriptors (in our case SURF [17]) to determine the key-points in the reference and scene images (see Fig. 7a and b). The second step is to Determine the correspondence between the keypoints in scene and model image by performing a two step procedure: first an approximate matching is computed by performing nearest neighbour search [18] to rapidly generate pairs of matching keypoints which might include false matches. Then, a homography matrix is computed by applying RANSAC [19] to the matching points from the previous step, thereby eliminating false positives as shown in Fig. 7c The idea of using this two step procedure helps to eliminate outliers (falsematches) while also being consistent with the inliers. It also improves finding desired points of the user interface on the scene image with high accuracy from various positions and orientations of a camera without a need for a prior camera calibration.

Fig. 6 The points of interest on the reference image (red, green and blue dots marking the coordinates of programs, functions and digits respectively)

Robotized Laundry Manipulation With Appliance …

(a)

101

(b)

(c) Final matches after eliminating the incorrect matches Fig. 7 Keypoints from SURF descriptor. In a keypoints are shown for the reference image (user interface) b shows descriptors from the robot scene view. c Final keypoint matches after eliminating the incorrect matches

3.2 Display Recognition A MobilenetV2 convolutional neural network which takes an input tensors as the size of 96 × 96 in three dimension and gives output of (32, 12) where the number 12 indicates the number of classes in Logits layer and the 32 indicates the batch size has been set during training step. Considering the small number of classes to be classified (12), there was no need for training the neural network from the scratch. Therefore, two dense layers and ten convolutional layers of MobilenetV2 network are fine tuned in training process. Once Homography estimation has been implemented, the scene frame captured by the camera and the coordinates of the regions of interests are sent to the recognition steps. The regions of interests on the user interface are divided into three parts according to their functionality (see Fig. 8a). The different classes(digits, functions, programs) of the target areas are enclosed with rectangles in corresponding colours(blue, green, red).

102

W. B. Bedada et al.

Fig. 8 Experimental Layout. In a The overall system layout: detection, control and UI. b Clothes test set with rulers in centimeters for scale

4 Experimental Results 4.1 Laundry Grasping Task To validate the proposed approach in Sects. 2 and 3, the algorithms are implemented in the ROS environment with the 7-DoF Baxter robot arm, while a 3D PicoFlexx camera in an eye-in-hand configuration is employed for the vision system. The schematic representation of the proposed cyber-physical system is reported in Fig. 8a. An external PC is dedicated to run the cloth and UI vision algorithm which provide poses as an output. A task-priority based control algorithm is then utilized to perform grasp execution. In the experiments, the performance of the robot in the grasps with the poses provided by the proposed method is tested. Five clothes shown in Fig. 8b are used as test set. These clothes are selected in order to increase the variance for what concerns the dimensions of the item but also the type of fabric (e.g. “harder” or “softer”) aiming at providing a comprehensive analysis. In this regard, the shirt has the softest fabric in the set, whereas the jeans have the hardest.

Robotized Laundry Manipulation With Appliance …

103

4.2 Graspability Tests The experiments are carried out for the two tasks already introduced throughout this paper: the bin picking and washing machine recovery picking. Bin Picking Task This test is experimented having a single cloth of Fig. 8b in a bin. At the beginning of the experiment, the cloth is arranged into a random configuration. Then, the execution of the picking task is performed. In sequence, the vision system provides a new target pose (as explained in Sect. 2), the robot attempt the grasp, lift the item 1 meter and release it. Hence, the cloth falls back down in the bin assuming a new random configuration. The picking task sequence is repeated 10 times for each cloth. Washing Machine Recovery Picking Task This test is carried out similarly to the bin picking one. The clothes of Fig. 8b are tested one at the time. For each cloth tested, 10 grasp trials are performed. The cloth is placed along the drum door partially hanging out from it to simulate an incorrect insertion. The vision system computes a target pose as described in Sect. 2.2. The robot attempts the grasp, moves the grasped cloth vertically toward the drum center, then it releases the grasp. In this case, after each grasp, the cloth is rearranged manually in a new configuration along the drum door. In Fig. 9, the sequence of actions performed for each task is depicted for clarity. The success rates of the grasps for the two tasks is show to get improved by combining wrinkle and blob based approaches. The average success rate in the bin picking task is 0.850 with the proposed approach and 0.7 with only wrinkle based approach. Similarly, a success rate of 0.87 has been achieved during recovery grasp from the washing machine in contrast to the 0.8 success rate with the wrinkle only approach. Considering the total set of 158 grasps attempted, our algorithm provides a target pose that results on a successful grasp in 84% of the cases. The failure in the grasps are mostly related to wrong orientations of the target frame resulting from an erroneous identification of the wrinkle path. In a smaller extent, failures can be also associated to measurement errors of the camera, calibration errors for the eye-in-hand and accuracy of the robot arm itself.

4.3 Appliance User interface Interpretation In Fig. 10 a UI homography estimation of sequence of images from a camera mounted on a robotic arm from various angles is shown. The robotic arm moves to various poses, the alignment was successfully computed under varying glass reflectance and lighting conditions. Two dense layers and ten convolutional layers of mobileNet neural network is fine tuned (instead of training from the scratch) for display recognition on 1280 training data samples. The resulting neural network was able to correctly classify the 12 classes of the display (sample output is shown in Fig. 11). The three categories of outputs shown on the display, i.e. digits, function and programs are all recognized accurately.

W. B. Bedada et al.

WM

Bin

104

Fig. 9 Sequence of motion during laundry manipulation from Bin and Washing Machine (WM)

Fig. 10 UI homography estimation for recognition from sharp angles and varying glass reflection

Fig. 11 UI interpretation for robotic vision: For instance Spin speed and washing temperature are set to 1400 and 30 respectively as indicated in the interpretation output

Robotized Laundry Manipulation With Appliance …

105

5 Conclusions and Future Work In this paper, a pointcloud-based approach for the perception of clothes aiming at the robotized insertion of clothes inside a washing machine is proposed. A perception pipeline for washing machine display interpretation is also developed to perform washing cycle setting autonomously. The approach is validated extensively for both the bin picking and washing machine recovery picking tasks. The grasping performances are satisfactory allowing a success in 84% of the attempts while the user interface interpretation works correctly under varying environmental conditions and sharp angle of view.

References 1. Jiménez, P., Torras, C.: Perception of cloth in assistive robotic manipulation tasks. Nat. Comput. 1–23 (2020) 2. Sanchez, J., Corrales, J.-A., Bouzgarrou, B.-C., Mezouar, Y.: Robotic manipulation and sensing of deformable objects in domestic and industrial applications: a survey. Int. J. Robot. Res. 37(7), 688–716 (2018) 3. Kita, Y., Kanehiro, F., Ueshiba, T., Kita, N.: Clothes handling based on recognition by strategic observation. In: Proceedings of the IEEE ICRA (2011) 4. Ramisa, A., Alenya, G., Moreno-Noguer, F., Torras, C.: Using depth and appearance features for informed robot grasping of highly wrinkled clothes. In: Proceedings of the IEEE ICRA (2012) 5. Yamazaki, K.: Grasping point selection on an item of crumpled clothing based on relational shape description. In: Proceedings of the IEEE/RSJ IROS (2014) 6. Shibata, M., Ota, T., Hirai, S.: Wiping motion for deformable object handling. In: Proceedings of the IEEE ICRA (2009) 7. Monsó, P., Alenyà, G., Torras, C.: Pomdp approach to robotized clothes separation. In: Proceedings of the IEEE/RSJ IROS (2012) 8. Cusumano-Towner, M., Singh, A., Miller, S., O’Brien, J.F., Abbeel, P.: Bringing clothing into desired configurations with limited perception. In: Proceedings of the IEEE ICRA (2011) 9. Doumanoglou, A., Kargakos, A., Kim, T.-K., Malassiotis, S.: Autonomous active recognition and unfolding of clothes using random decision forests and probabilistic planning. In: Proceedings of the IEEE ICRA (2014) 10. Alexa, M., Behr, J., Cohen-Or, D., Fleishman, S., Levin, D., Silva, C.T.: Computing and rendering point set surfaces. IEEE Trans. Vis. Comput. Graph. 9(1), 3–15 (2003) 11. Rusu, R.B.: Semantic 3d object maps for everyday manipulation in human living environments. KI-Künstliche Intelligenz (2010) 12. Christoph Stein, S., Schoeler, M., Papon, J., Worgotter, F.: Object partitioning using local convexity. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2014) 13. Caporali, A., Palli, G.: Pointcloud-based identification of optimal grasping poses for cloth-like deformable objects. In: Proceeding of the IEEE International Conference on Factory Automation and Emerging Technologies (2020) 14. Papon, J., Abramov, A., Schoeler, M., Worgotter, F.: Voxel cloud connectivity segmentationsupervoxels for point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2013) 15. Buch, A.G., Kraft, D., Kamarainen, J.-K., Petersen, H.G., Krüger, N.: Pose estimation using local structure-specific shape and appearance context. In: Proceedings of the IEEE ICRA (2013)

106

W. B. Bedada et al.

16. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (fpfh) for 3d registration. In: Proceedings of the IEEE ICRA (2009) 17. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: European Conference on Computer Vision, pp. 404–417. Springer (2006) 18. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1) 2(2), 331–340 (2009) 19. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

Specification and Control of Human-Robot Handovers Using Constraint-Based Programming Maxim Vochten, Lander Vanroye, Jeroen Lambeau, Ken Meylemans, Wilm Decré, and Joris De Schutter

Abstract This paper introduces a constraint-based specification of an object handover in order to simplify the implementation of reactive handover controllers. A number of desired robot behaviors are identified in different phases of the handover and specified as constraints. The degrees-of-freedom in the reaching motion towards the tracked object, such as rotational symmetry in the object, are easily expressed with constraints and used by the controller. During the physical transfer, a desired force interaction and compliance can be specified. Deviations from the nominal behavior are also dealt with, such as breaking of the handover intent, a moving object and disturbance forces. The controller is validated on a real robot setup showcasing a bidirectional handover. Thanks to the modular approach of combining constraints the developed task specification can be easily extended with more reactive behaviors in the future. Keywords Human-robot handover · Constraint-based programming · Reactive control · Object handover · Task specification

1 Introduction Object handovers such as depicted in Fig. 1 are challenging to implement on a robot since many different aspects need to be taken into account [15]. For a human-to-robot handover, the intent to start the handover must be detected and ideally a prediction is made where the Object Transfer Point (OTP) will be in space and time. The generated M. Vochten (B) · L. Vanroye · J. Lambeau · K. Meylemans · W. Decré · J. De Schutter Department of Mechanical Engineering, KU Leuven, 3001 Leuven, Belgium e-mail: [email protected] M. Vochten · L. Vanroye · W. Decré · J. De Schutter Core Lab ROB, Flanders Make, 3001 Leuven, Belgium © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_8

107

108

M. Vochten et al.

Fig. 1 Example of a handover between human and robot of a small cylindrical object

reaching motions towards the OTP must be safe and comfortable to the human, while a suitable grasp configuration must be planned for the particular geometry of the object. During the physical transfer of the object, the evolution of the grip force should be accurately timed, while some compliance is required to make the physical interaction feel more natural. The robot must react to unforeseen changes during the execution such as breaking off the intent for starting the handover and avoiding joint limits and collisions. Combining all these behaviors and ensuring reactivity is challenging to achieve with conventional control approaches. In this paper, we instead make use of constraint-based programming. Different behaviors that are relevant to the handover are described and specified as constraints, both nominal behaviors and deviating reactive behaviors to deal with disturbances. The main contribution of this paper is the implementation of the different phases of a bidirectional human-robot handover in the constraint-based task specification and control framework of eTaSL, enabling a detailed and accurate specification of the desired robot behavior throughout the handover. When the robot is reaching towards the object, any rotational symmetry in the object or other degrees-of-freedom can be expressed and exploited by the controller. The robot’s workspace is easily limited in joint- or in task-space. When human and robot are both holding the object, a desired force interaction and impedance behavior can be specified. The different phases of the handover are implemented using a finite state machine where measurements of the object’s motion and the contact force are used to decide when to transition to a new phase. The remainder of this paper is structured as follows. Section 2 discusses insights from human handover studies in literature and lists different approaches to control human-robot handovers. Section 3 elaborates the proposed constraint-based controller for implementing the handover. Section 4 shows practical results on a real robot setup, while Sect. 5 finishes with conclusions.

Specification and Control of Human-Robot Handovers …

109

2 Related Work This section first reviews studies on human handovers in order to extract characteristic behaviors that can be implemented in the proposed controller. Afterwards, existing approaches for controlling human-robot handovers are discussed. For a broader overview on handovers we refer to a recent survey in [15].

2.1 Insights from Human Handover Studies Human handovers are typically divided in different phases [12, 15]. In this paper, we distinguish the following four phases: (i) the communication phase where an intent to start the handover is communicated and recognized, (ii) the approach phase where the giver and receiver are jointly moving their hands towards the intended object transfer point, (iii) the passing phase where physical contact is made between giver and receiver and the object is transferred, and (iv) the retraction phase where both actors move away from the transfer point. During the communication phase the intent for starting a handover can be communicated in multiple ways such as eye gaze, body stance and speech. Recognizing these cues may help the robot to decide when the human is ready to start the interaction [21]. During the approach phase the reaching motions are typically smooth [2] and exhibit a minimum-jerk characteristic [9]. Implementing these minimum-jerk motions on the robot as a giver was found to improve the human’s reaction and reduce the handover duration compared to trapezoidal profiles [9]. The average and peak robot movement speeds preferably remain below human speeds in order to avoid perceived discomfort [8, 16]. Besides motion, a suitable grasping configuration must be planned based on the identified object geometry [21, 24]. The Object Transfer Point (OTP) is the point in space and time where the object is passed between the actors. Generally, the OTP is located in the middle between the two actors with only small deviations [14]. It is therefore mainly dependent on the interpersonal distance and height. When the robot is the giver, a reaching motion and OTP should be planned that ensure comfort and safety to the human, as well as visibility of the object during the approach [19]. When the robot is the receiver, a prediction of the OTP helps to make the interaction more reactive. On-line prediction of the OTP can be achieved by measuring the human’s motion and predicting the remainder using identified models such as minimum-jerk models or models learned from human demonstrations [14, 23]. In the passing phase the giver and receiver are both holding the object. Initially the giver has the highest load share, which is then gradually transferred to the receiver. The timing of this transition is mostly determined through visual and force feedback [4, 7]. The grip force evolves almost linearly with the estimated load share and is therefore the main driver for the evolution of the passing phase [4, 12]. Both

110

M. Vochten et al.

participants also move their hands during the transfer [12]. The receiver slightly raises their hand while the giver lowers their hand. This helps signaling to the giver that the receiver is increasing their load share.

2.2 Motion Planning and Control of Human-Robot Handovers Methods for generating the reaching motion towards the OTP are commonly divided in planning-based and controller-based approaches [12, 15]. Planning-based approaches optimize a trajectory over a certain horizon and can range from fully pre-planned to repeated online replanning to ensure adaptability. Combinations of different planning strategies are also possible such as in [11, 16] where a global plan for grasping the object was combined with local replanning during execution. As mentioned earlier, the generated motion trajectories are preferred to be smooth, minimum-jerk trajectories. In [16], minimum-jerk motion profiles were applied in the trajectory planner based on Bézier curves. Controller-based approaches calculate the desired robot motion at each controller time instant throughout the execution to ensure a reactive robot behavior. This is typically achieved using a goal-directed controller that always drives the robot to the estimated location of the tracked object, either with pure feedback such as visual serving [3, 13] or with a dynamical system approach [12, 17]. The latter combines goal feedback with feed-forward terms that shape the trajectory according to a human model. These models are typically learned from human demonstrations. Other learning by demonstration approaches use probability distributions to model and execute the motions [14]. An alternative learning approach based on reinforcement learning was used in [10] where the reward function during policy search was based on feedback of the human’s preference. The generated motions are executed by low-level joint controllers which may take additional requirements into account, for example exploiting redundancy to avoid joint limits and collisions during motion and to obtain more human-like joint configurations [18]. Interaction controllers such as impedance and admittance controllers can help to deal with unexpected contact forces during execution and to achieve a compliant behavior during the passing phase [12, 16]. For the robot hand itself, a force control strategy for the grip force in function of the estimated load force was proposed in [5]. This approach was also implemented successfully on a compliant under-actuated hand [6] that is position-controlled. This section and the previous one listed many of the desired robot behaviors and control aspects. Implementing all of these together in a conventional controller can be very challenging. Therefore, in the next section, we propose using a constraint-based approach to specify different behaviors separately and combine them automatically in a corresponding controller.

Specification and Control of Human-Robot Handovers …

111

3 Constraint-Based Programming of Handovers This section first introduces the general principle of constraint-based programming and control using the expression graph-based Task Specification Language (eTaSL) [1]. Afterwards, the desired behaviors for each phase of the handover are defined and specified as constraints.

3.1 Principle of Constraint-Based Programming in eTaSL The general principle of constraint-based programming is explained below.1 For each behavior, a scalar-valued task function e is defined that must go to zero: e(q, t) → 0,

(1)

in which q are the robot joint variables and t is the time. The task function is forced to evolve as a first-order system with time constant k −1 [s]: d e(q, t) = −ke(q, t). dt

(2)

After expanding the time-derivative, the resulting control law can be written as: Je (q) q˙ = −ke(q, t) −

∂e(q, t) , ∂t

(3)

∂e using the definition of Jacobian Je (q) = ∂q . The first term in the control law (3) on the right side can be seen as a proportional feedback term with feedback gain k, while the second term is a feed-forward velocity term to improve the error tracking. The Jacobian Je (q) is constructed automatically by eTaSL using algorithmic differentiation on the analytically defined task function e(q, t), using information from the kinematic model of the robot as defined in a Universal Robot Description Format (URDF) file. In constraint-based programming, the controller is defined by the combination of different task functions ei where i signifies the task number. These task functions define different behaviors and sensor-based reactions such as trajectory following, obstacle avoidance and physical interaction control. Because combining these task functions might result in conflicting joint velocities, a slack variable i is added to each control law (3). The values are a measure of the deviation to the ideal first-order system behavior of each control law.

1

For a more detailed discussion on the eTaSL framework in its most general form we refer to the original eTaSL implementation paper [1].

112

M. Vochten et al. Idle

intent

Reaching

contact

Passing

no contact

RetracƟng

no intent

Human-to-robot handover Robot-to-human handover

trajectory end RetracƟng

no contact

Passing

contact

Reaching

trajectory end

Ɵme delay

Idle

Fig. 2 Visualization of phases and transitions for a bidirectional handover

To determine the desired robot joint velocities for the combined control laws, eTaSL solves the following quadratic program: ˙ 2Wq˙ minimize 2W + μ q

(4a)

˙ q,

subject to J e q˙ = −k ◦ e − L ≤ q˙ ≤ U.

∂e + , ∂t

(4b) (4c)

All control laws are now summarized in (4b) where k, e and  are vectors stacking respectively the feedback constants ki , the task function values ei , and the slack variables i of each task, and ◦ denotes the element-wise product. The inequality constraints (4c) are used to enforce the robot joint position and velocity limits, explained in more detail in [22]. The first term in the minimized cost function (4a) contains the weighted sum of the squared slack variables i of each task. In case of conflicting constraints, the chosen weights wi in the diagonal weighting matrix W will determine the weighting of the constraints in the solution. The second term in (4a) is a regularization term to deal with kinematic redundancies and remaining degrees-of-freedom when the system is not fully constrained. The regularization parameter μ should be chosen small so that the regularization of joint velocities has a negligible influence on the task function constraints. The weighting matrix Wq˙ can be used to decide which joints will receive more regularization than others. The solution of the quadratic program (4) results in joint velocities q˙ that are sent as control inputs to the robot. The quadratic program is repeatedly solved at each time step (typically running at 200 − 400 Hz) using the most recent sensor and system information. The remainder of this methodology section explains the specification of the task functions that result in the necessary robot behavior during the three different phases of the handover (reaching, passing, retracting). Figure 2 provides an overview of the evolution of the different phases for the bidirectional handover developed in Section 4. Starting in the top-left idle state, the human-to-robot handover is initiated as soon as an intent for a handover is detected from the human. The intent recognition was simplified here by defining a certain workspace in Cartesian space and checking whether the object has entered this workspace.

Specification and Control of Human-Robot Handovers …

113

During the reaching phase, the robot moves towards the estimated object transfer point. This phase can be interrupted if the human decides to move the object outside the robot’s workspace, returning again to the idle state. When the reaching phase progresses normally, contact between human and robot will be made after which the passing phase starts and the object is handed over from human to robot. Contact is detected when the measured force at the robot’s wrist lies above a certain threshold. The passing phase finishes when the contact force is below another threshold. The robot now holds the object and moves towards a given location during the retracting phase, after which the handover is finished. The robot-to-human handover progresses similarly, except that the robot now initiates the handover by presenting the object at a given location.

3.2 Reaching Phase During the reaching phase the robot moves towards the object transfer point in free space. Fig. 3 sketches the situation, where {w} is the fixed world reference frame, {tf} is a task frame attached to the robot’s end effector, {obj} represents the reference frame for the object transfer point and {traj} represents the current set point on the trajectory. Ideally the generated reaching trajectory takes many aspects into account, for example smoothness, human-likeness and safety (see Sect. 2). Since the focus of this work is on control, the trajectory generation is simplified by calculating linear trajectories between the current pose (i.e. position and orientation) of the robot {tf} and the measured pose of the object {obj}. The trajectories are continuously updated while the object is moving. The generated position and orientation trajectories are represented by the position vector ptraj (t) and the rotation matrix Rtraj (t) with respect to the fixed world frame {w} and as a function of time t. To follow the reference position trajectory, the following task function is added to the controller expressing the error between the position of the robot task frame ptf (q), which is a function of the joint angles q, and the position of the set point on the trajectory ptraj (t):

{tf}

z

{traj}

{w} Fig. 3 Trajectory following towards object transfer point during reaching phase

{obj}

114

M. Vochten et al.

etransl = ptf (q) − ptraj (t),

(5)

For the orientation trajectory, a similar task function is defined that expresses the error between the rotation matrix of the task frame Rtf (q) and the rotation matrix of the trajectory set point Rtraj (t):  ∨ T (t) Rtf (q) , erot = ωerror (q, t) = log Rtraj

(6)

where ∨ stands for the operator that maps the skew-symmetric matrix, resulting from the matrix logarithm, to the three-dimensional rotation error vector ωerror in the axis-angle representation [20]. In special cases, the trajectory following constraints can be relaxed in order to exploit additional degrees-of-freedom in the application. For example, as shown in Fig. 3, there is rotational symmetry of the object around its Z -axis. This rotational redundancy can be encoded by transforming the error vector ωerror to the object frame {obj} and only constraining the X and Y components:   T ωerror , erot,x = 1 0 0 Robj   T erot,y = 0 1 0 Robj ωerror .

(7) (8)

For the handover this means that the robot may approach the object from any direction rotated around the Z -axis of the object frame.

3.3 Object Passing Phase During the passing phase, the human and robot make contact to transfer the object. To make the physical interaction not feel stiff and therefore more natural, a springlike impedance behavior is implemented. This is achieved in eTaSL through the combination of position tracking and force tracking constraints. Object Translation and Rotation Tracking Pose tracking constraints ensure that the robot remains at the position ptarget and orientation Rtarget of the object as determined during the transition from reaching to passing phase. The translation and rotation constraints are implemented similarly as in (5)–(6): etransl = ptf (q) − ptarget , erot =

Rtf (q) log( RtfT (q)

(9) ∨

Rtarget ) ,

(10)

except that the rotation error has now been explicitly expressed in the world frame through left-multiplication with Rtf . This is required for later combining with force constraints in the same reference frame.

Specification and Control of Human-Robot Handovers …

115

Force and Moment Tracking Since eTaSL works with velocity controllers, force and moment tracking constraints cannot be applied directly. Instead, the measured force and moment F meas and M meas from the force sensor are related to the motion of the robot using a specified model. In this case, the force is related to the robot’s motion via a compliance model, ultimately resulting in the following admittance control laws that become part of the set of control laws in (4b):   d ptf (q) = −k f C f F meas − F des , dt   d log Rtf (q)∨ = −km Cm M meas − M des , dt

(11) (12)

where C f and Cm are the specified compliance matrices for force and moment, k f and km are the proportional feedback control gains for force and moment, and F des and M des are the desired force and moment values. The left-hand side of equations (11) and (12) corresponds to the translational and rotational velocity of the robot end effector respectively. In this way, force and moment errors are transformed into a desired velocity of the robot end effector. Note that all coordinates are expressed in the world frame {w}. Resulting Impedance Behavior When the position and force tracking constraints (9)–(12) are combined in the quadratic program (4), the conflicting constraints result in an impedance behavior as visualized in Fig. 4. The corresponding equivalent spring constants may be determined by deriving the necessary conditions for stationarity. These are found by substituting each slack variable i in (4a) using their corresponding control law (3) and afterwards putting the gradient of the cost function (4a) towards the robot joint velocities q˙ to zero. Assuming that the inequalities (4c) are inactive and that the robot is in a non-singular joint configuration and neglecting the influence of the regularization term by setting μ = 0, the following relations can be derived as the necessary conditions for stationarity: F meas − F des = K eq,f M

meas

−M

des

=



 ptf (q) − ptarget ,

K eq,m Rtf (q) log( RtfT (q)

(13) ∨

Rtarget ) .

(14)

In these relations, K eq,f and K eq,m are the equivalent spring constants for translation and rotation and they are expressed as a function of previously defined parameters as follows: wt k t wr k r , K eq,m = Cm−1 , (15) K eq,f = Cf−1 wf k f wm k m with wt , wr , w f , wm the weights of the translation, rotation, force and moment constraints respectively, while kt , kr , k f , km are the corresponding feedback constants.

116

M. Vochten et al.

Fig. 4 Conflicting position and force constraints result in a spring-like impedance behavior during the passing phase

F meas

{tf}

Cf

z {target}

{w}

3.4 Retraction Phase After the object has been handed over to the robot (or when the reaching phase is interrupted), the robot moves the object towards a given location. In our implementation, this was achieved as a linear trajectory qtraj (t) in joint space between the current robot configuration and a predefined joint configuration. The corresponding task function is given by: ejoint = q − qtraj (t).

4 Experiments The proposed constraint-based framework for bidirectional handovers is now tested on a real robot setup.

4.1 Experimental Setup and Parameter Choices Figure 5 visualizes the experimental setup. It consists of a 7-DoF Franka Emika Panda robot arm on which a qb SoftHand from qbrobotics has been mounted. This compliant hand intends to mimic human hand grasping and consists of 19 flexible joints in the fingers driven by a single actuator. The considered object is a cylindrical rod of which the position and orientation are measured online by an HTC Vive measurement system using a tracker that was attached to the object. The contact force and moment during the passing phase are estimated using the internal joint torque sensors of the robot and filtered in a preprocessing step using a low-pass filter (cut-off frequency: 5 Hz) to attenuate the effect of noise. Table 1 lists the values of the feedback constants ki for each constraint controller −1 of Sect. 3. In addition to that, the stiffness matrices C −1 f and C m for the force and moment constraints are provided of which the coordinates are expressed in reference

Specification and Control of Human-Robot Handovers …

117

Vive base stations

Vive tracker

Object

qb SoftHand

Franka Emika Panda Robot object

(a) Robot with HTC Vive

(b) Close-up of object

Fig. 5 Overview of the experimental setup. A Franka Emika Panda robot holds an object using a qb Softhand. The object consists of a cylindrical rod with an HTC Vive tracker mounted on top. The position and orientation of this tracker is measured by two static HTC Vive base stations placed on tripods Table 1 Chosen values for the controller feedback constants ki in each handover phase and the −1 stiffness matrices C −1 f and C m in the passing phase Phase

Constraint

Feedback gain ki [1/s]

Reaching (Sect. 3.2)

Translation Rotation

2 2

Passing (Sect. 3.3)

Translation Rotation Force Moment

1.5 1 1.5 1

Joints

3

Retracting (Sect. 3.4)

Impedance values



C −1 f

Cm−1

⎤ 200 0 0 ⎢ ⎥ = ⎣ 0 70 0 ⎦ 0 0 170 ⎡ ⎤ 300 ⎢ ⎥ = ⎣0 3 0⎦ Nm rad 003

N m,

frame {w} as depicted in Fig. 6b. The stiffness in the Y-direction has been set lower than X and Z since a more compliant behavior in that direction was found to improve human-likeness. Since the conflicting constraints have the same feedback constants and the weights wi are all set to 1, the equivalent spring constants in (15) are the same −1 as C −1 f and C m . Finally, in the force and moment tracking constraints (11)–(12), all components of the desired force and moment F des and M des are set to zero except for the Z -component of F des which is set to 10 N. This was done to accomplish the upward motion of the receiver during the passing phase as mentioned in Sect. 2, which helps signaling to the human that the robot is supporting the load.

118

M. Vochten et al.

Operator

z y

x {w}

(a)

(b)

Fig. 6 a Successful (blue) and failed (red) attempts at handovers, drawn within the workspace of the robot (green). Each short line piece indicates the object orientation when starting the transfer. b The same workspace shown with respect to the robot

4.2 Results To validate the results a total of 28 handovers were executed. In each trial the human operator is offering the object at a different position and orientation in the workspace of the robot, which is determined as a quarter-sphere with a radius of 0.8m as visualized in Fig. 6a. Figure 7 shows snapshots of one of the experiments, where the different phases during the handover are each indicated with a number. The corresponding distance and contact force between object and robot throughout the experiment is given by the graphs in Fig. 8. The robot starts in the idle phase 1 , waiting for the object to enter the robot’s workspace. In 2 , the object was briefly brought inside the workspace resulting in a start of the reaching motion but was then interrupted when the object was removed again in 3 . Another approach was initiated in 4 causing the robot to again reach for the object. The passing phase with interaction control starts in 5 when a contact force above 1N is detected, closing the hand. When the hand is closed and the contact force drops below 2N, the robot retracts with the object in hand 6 . After fully retracting, the human-to-robot handover is finished. Next, the robot-to-human handover starts with the robot presenting the object to the human 7 . After detecting contact, the passing phase occurs again 8 . After the passing phase the robot retracts 9 and waits for a new object approach, restarting again the human-to-robot handover. A video of one of the experiments can be found in the link in the footnote.2 The success rate for all 28 handovers together is given by Fig. 6a. Success was evaluated based on the fact whether the human had to correct their motion given the robot’s actions. Most of the handovers succeeded except for the four on the left side of the robot’s workspace. This happened because the reaching motion is calculated as a linear trajectory between the current hand pose and the object. When the object 2

Video of one of the experiments: https://youtu.be/JdbtuFxH6QA.

Specification and Control of Human-Robot Handovers …

119

Fig. 7 Overview of different phases during a practical bidirectional handover. The human’s motion is indicated with red arrows, the robot’s motion with green arrows and the estimated contact force direction in blue. For the human-to-robot handover, 1 – 4 corresponds to the reaching phase, 5 is the passing phase and 6 is the retraction phase. 7 – 9 is the robot-to-human handover

is offered to the left behind the back of the robot’s hand, a collision will therefore occur. This can be remedied by better trajectory planning algorithms that take this aspect into account. The timing of the handover was measured between the start of the approach phase and the start of the retraction phase ( 3 – 5 ). An average human-to-robot handover took 2.87 s (averaged over 12 regular handovers), and the standard deviation was 0.98 s.

5 Conclusions The objective of this paper was to specify the control of bidirectional humanrobot handovers using the formalism of constraint-based programming. The different phases of the handover (reaching, passing, retracting) were characterized and implemented with a corresponding set of position and force tracking constraints. The resulting method was successfully tested on the Franka Panda robot for a large number of handovers in different locations.

120

M. Vochten et al.

Fig. 8 Position and force evolution during a bidirectional handover. Top: distance between robot hand position and object position (orange) and distance between robot base and object (blue). Bottom: norm of the filtered contact force (blue)

The main advantage of our approach is that with a simple set of constraints, many human-like behaviors during the handover are already achieved, such as taking the object’s rotational symmetry into account during the reaching phase and compliance in the passing phase. Behaviors are implemented with velocity-resolved controllers. This means that no dynamical models of the robot are required which helps to transfer the task specification to other robot platforms. Since the constraint-based approach is modular, it can be readily extended with additional reactive behaviors to deal with disturbances. An example could be to add on-line obstacle avoidance or to deal with physical interaction during the reaching phase. Acknowledgements This result is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 788298).

References 1. Aertbeliën, E., De Schutter, J.: ETASL/ETC: a constraint-based task specification language and robot controller using expression graphs. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1540–1546. IEEE (2014) 2. Basili, P., Huber, M., Brandt, T., Hirche, S., Glasauer, S.: Investigating human-human approach and hand-over. In: Human Centered Robot Systems, pp. 151–160. Springer (2009)

Specification and Control of Human-Robot Handovers …

121

3. Bdiwi, M., Kolker, A., Suchý, J.: Transferring model-free objects between human hand and robot hand using vision/force control. In: 2014 IEEE 11th International Multi-Conference on Systems, Signals Devices (SSD14), pp. 1–6 (2014) 4. Chan, W.P., Parker, C.A., Van der Loos, H.M., Croft, E.A.: Grip forces and load forces in handovers: implications for designing human-robot handover controllers. In: Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction, pp. 9–16 (2012) 5. Chan, W.P., Parker, C.A., Van der Loos, H.M., Croft, E.A.: A human-inspired object handover controller. Int. J. Robot. Res. 32(8), 971–983 (2013) 6. Chan, W.P., Kumagai, I., Nozawa, S., Kakiuchi, Y., Okada, K., Inaba, M.: Implementation of a robot-human object handover controller on a compliant underactuated hand using joint position error measurements for grip force and load force estimations. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 1190–1195 (2014) 7. Controzzi, M., Singh, H.K., Cini, F., Cecchini, T., Wing, A., Cipriani, C.: Humans adjust their grip force when passing an object according to the observed speed of the partner’s reaching out movement. Exp. Brain Res. 236, 3363–3377 (2018) 8. Fujita, M., Kato, R., Tamio, A.: Assessment of operators’ mental strain induced by hand-over motion of industrial robot manipulator. In: 19th International Symposium in Robot and Human Interactive Communication, pp. 361–366 (2010) 9. Huber, M., Rickert, M., Knoll, A., Brandt, T., Glasauer, S.: Human-robot interaction in handingover tasks. In: RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication, pp 107–112 (2008) 10. Kupcsik, A., Hsu, D., Lee, W.S.: Learning dynamic robot-to-human object handover from human feedback. In: Robotics Research, pp. 161–176. Springer (2018) 11. Marturi, N., Kopicki, M., Rastegarpanah, A., Rajasekaran, V., Adjigble, M., Stolkin, R., Leonardis, A., Bekiroglu, Y.: Dynamic grasp and trajectory planning for moving objects. Auton. Robot. 43(5), 1241–1256 (2019) 12. Medina, J.R., Duvallet, F., Karnam, M., Billard, A.: A human-inspired controller for fluid human-robot handovers. In: 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pp. 324–331. IEEE (2016) 13. Micelli, V., Strabala, K., Srinivasa, S.: Perception and control challenges for effective humanrobot handoffs. In: Proceedings of 2011 Robotics: Science and Systems, Workshop RGB-D (2011) 14. Nemlekar, H., Dutia, D., Li, Z.: Object transfer point estimation for fluent human-robot handovers. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 2627– 2633. IEEE (2019) 15. Ortenzi, V., Cosgun, A., Pardi, T., Chan, W.P., Croft, E., Kuli´c, D.: Object handovers: A review for robotics. IEEE Trans. Robot. 1–19 (2021) 16. Pan MK, Knoop E, Bächer M, Niemeyer G (2019) Fast handovers with a robot character: Small sensorimotor delays improve perceived qualities. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 6735–6741 17. Prada, M., Remazeilles, A., Koene, A., Endo, S.: Implementation and experimental validation of dynamic movement primitives for object handover. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2146–2153 (2014) 18. Rasch, R., Wachsmuth, S., König, M.: Combining cartesian trajectories with joint constraints for human-like robot-human handover. In: 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids), pp. 91–98 (2019) 19. Sisbot, E.A., Marin-Urias, L.F., Broquere, X., Sidobre, D., Alami, R.: Synthesizing robot motions adapted to human presence. Int. J. Soc. Robot. 2(3), 329–343 (2010) 20. Solà, J., Deray, J., Atchuthan, D.: A micro lie theory for state estimation in robotics (2018). ArXiv arXiv:abs/1812.01537 21. Strabala, K., Lee, M.K., Dragan, A., Forlizzi, J., Srinivasa, S.S., Cakmak, M., Micelli, V.: Toward seamless human-robot handovers. J. Human-Robot Interact. 2(1), 112–132 (2013)

122

M. Vochten et al.

22. Vergara Perico, C.A.: Combining modeled and learned information to rapidly deploy humanrobot collaborative tasks. PhD thesis, Arenberg Doctoral School, KU Leuven. Faculty of Engineering Science (2020) 23. Widmann, D., Karayiannidis, Y.: Human motion prediction in human-robot handovers based on dynamic movement primitives. In: 2018 European Control Conference (ECC), pp 2781–2787 (2018) 24. Yang, W., Paxton, C., Mousavian, A., Chao, Y.W., Cakmak, M., Fox, D.: Reactive humanto-robot handovers of arbitrary objects. In: IEEE International Conference on Robotics and Automation (ICRA), IEEE (2021)

Speed Maps: An Application to Guide Robots in Human Environments Akansel Cosgun

Abstract We present the concept of speed maps: speed limits for mobile robots in human environments. Static speed maps allow for faster navigation on corridors while limiting the speed around corners and in rooms. Dynamic speed maps put limits on speed around humans. We demonstrate the concept for a mobile robot that guides people to annotated landmarks on the map. The robot keeps a metric map for navigation and a semantic map to hold planar surfaces for tasking. The system supports automatic initialization upon the detection of a specially designed QR code. We show that speed maps not only can reduce the impact of a potential collision but can also reduce navigation time.

1 Introduction Efficient collision-free navigation has been the longstanding goal of mobile robot navigation. As mobile robots are used more and more in human environments, they would need to exhibit interactive behaviors as well, such as following a person and guiding a person to a location. Such robots will be useful in a various type of settings. For example, in an airport the robot can carry the luggage of a passanger in amusement parks the robot can guide people to a attraction venues. Robots should demonstrate safe and socially acceptable navigation behaviors near humans. Recent research offer different approaches on this problem, however many of those approaches focus on finding only the path. In addition to the path, speed of mobile robots should also be dependent on the context. For example, a robot should move slowly and carefully in a hospital room while it is acceptable to navigate with high speed in a an empty corridor. In this work, we present a one-time setup robotic system that exhibits social awareness for navigation. The setup requires an expert to map the environment with a robot and place speed limits according to the context. Any robot that detects a specially designed QR code can acquire this knowledge about the environment. We A. Cosgun (B) Monash University, Melbourne, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 G. Palli et al. (eds.), Human-Friendly Robotics 2021, Springer Proceedings in Advanced Robotics 23, https://doi.org/10.1007/978-3-030-96359-0_9

123

124

A. Cosgun

describe a system that can track people, follow them, and guide them to locations. These locations are specified and labeled by the user. The robot calculates a goal point towards the landmark or waypoint and navigate under the speed limits. The contributions of this paper are (1) a method to guiding a person to a location and (2) the introduction of “speed maps” for standardization of allowable robot speeds in human environments.

2 Related Works In this section, we briefly review existing literature on localization, person detection/tracking, and human-aware navigation.

2.1 Mapping and Localization In mobile robotics, the standard practice for mapping and localization is described as follows: When the robot is first taken to a new environment, it has to map the environment. There has been extensive research on Simultaneous Mapping and Localization (SLAM) literature. The usual output is a binary 2D grid map where 1’s represent obstacles and a 0’s represent free space. Once the map is created, the robot can localize itself in the map using on-board sensing. Every time the robot is restarted, it has to start with an initial estimation of its location. Although there are global localization methods developed in the community, the usual practice is that the robotics expert manually provides an approximate initial location of the robot, then the localization method iteratively corrects the localization estimation as the robot moves in the environment.

2.2 Person Detection and Tracking Using laser scanners for detecting and tracking people is common practice in robotics. Early works by [1, 2] focused on tracking multiple legs using particle filters. Legs are typically distinguished in laser scans using geometric features such as arcs [3] and boosting can be used to train a classifier on a multitude of features [4]. Topp and Christensen [5] demonstrates that leg tracking in cluttered environments is prone to false positives. For robust tracking, some efforts fused information from multiple lasers. Carballo et al. [6] uses a second laser scanner at torso level. Glas et al. [7] uses a network of laser sensors at torso height in hall-type environments to track the position and body orientation of multiple people. Several works used different modalities of sensors to further improve the robustness. Some applications [8, 9] combine leg detection and face tracking in a multi-modal tracking framework.

Speed Maps: An Application to Guide Robots in Human Environments

125

2.3 Navigation in Human Environments Among the early works of guide robots, museum tour-guide robots are the most common [10, 11]. Burgard et al. [12] talks about the experiences obtained in a museum obtained over six days. Bennewitz et al. [13] focuses on interaction with multiple people for museum robots. Pacchierotti et al. [14] presents deployment of a guide robot in an office environment and analyses interactions with bystanders. Kanda et al. [15] desribes a guide robot that was deployed in a shopping mall for 25 days. Some works considered evacuation scenarios where robots need to guide people to safety [16, 17]. Person following has been extensively studied the literature. Loper et al. [18] presents a system that is capable of responding to verbal and non-verbal gestures and follow a person. A common method to follow a group of people is to choose and follow a leader [19]. Some papers addressed the problems encountered during following, such as robot speed not catching up with the person [20] and how to recover when the robot loses track of the person [21].

3 Elements for Navigation In this section, we provide an overview of our navigation system. During regular operation, the robot waits at a base location for new users. When a robot gets close to a robot, the robot rotates towards the user and listens for input from the GUI. User can either send the robot to a previously saved waypoint or planar landmark, or have the robot guide him/her to that location. We use Robot Operating System (ROS) Navigation stack for basic point-to-point navigation and provide different goal points according to the task. The speed of the robot is adjusted using the context of the environment. For example, near blind corners and in small rooms the robot’s speed is limited, however in open corridors robot is allowed to navigate faster. Our system requires only a one time setup by a robotics expert. A navigation-capable robot that is restarted or coming to the environment for the first time can acquire map of the environment whenever it reads a special QR-code that is placed to a known location in the environment. The QR code includes links to the map and landmarks, speed limits, as well as the location of the QR pattern in the map. Therefore the robot automatically can infer its initial location upon detection of the QR code, without external intervention. The environment is initially mapped during operation using gmapping SLAM package and robot is localized by the amcl package of ROS. This section is organized as follows. We present the state machine used for this work in Sect. 3.1, followed by the brief description of the robot maps in Sect. 3.2 and tablet GUI in Sect. 3.3. The robot needs global initialization the first time it is started. In Sect. 3.4, we present our QR-base approach and how the robot finds its own position upon detection of the QR code. Then we discuss how the goal locations for guidance is determined in Sec. 3.5. When a plane is labeled instead of a waypoint,

126

A. Cosgun

the robot first needs to find a goal location that is nearby the labeled landmark. In Sect. 3.6, we introduce the speed maps, which gives the robot awareness about how fast it is permitted to navigate. The speed maps are calculated off-line, and is dependent on the context.

3.1 State Machine The robot’s higher level actions are governed by a finite state machine (Fig. 1). When the robot is first initialized, it is in the NON-LOCALIZED state. When the initial pose is given, either manually or by detection of the QR code, robot switches to WAITING state. This is the state where the robot actively looks input from users. When a person is closeby, robot approaches towards the person so its tablet GUI faces the person. If the person inputs a guide request, the robot displays destination and asks for confirmation. If confirmed, the robot switches to the GUIDING PERSON state. When guiding is completed, robot waits for a while and goes back to the base. At any time, a user can cancel the operation and enter new commands.

Fig. 1 Finite state machine for the guide robot

Speed Maps: An Application to Guide Robots in Human Environments

127

3.2 Maps We use both a metric and a semantic map. To generate semantic maps, we use our previous work [22]. Maps that include semantic information such as labels for structures, objects, or landmarks can provide context for navigation and planning of tasks, and facilitate interaction with humans. Planar surfaces can be detected from point cloud data, and represented in a consistent map coordinate frame. Maps composed of such features can represent the locations and extents of landmarks such as walls, tables, shelves, and counters. We represent planes by its convex hull’s 3D coordinates.

3.3 User Interface Using an on-board tablet interface, the user can also enable person following. A screenshot from the GUI for saving waypoints is shown in Fig. 2.

Fig. 2 Tablet GUI for commanding the robot to navigate to various goal positions

128

A. Cosgun

3.4 Global Localization This section describes how robots can acquire knowledge of the environment without doing the mapping themselves. The proposed system can be used for 2 purposes: 1. A robot enters to an environment the first time and does not have the map 2. Robot has the map but does not know its initial position in the map System designers can strategically place QR code tags to the walls/floors in the environment. These places can be walls at the entrance of an indoor space, or mostly visited areas. QR codes in general contain links, text and various other data. In our application, the QR pattern includes data in XML format, which includes the link to the map and speed map files, and the position of the QR pattern in the map (Fig. 3). The position provided in the pattern is the position and orientation (in quaternions) of the QR tag in the map frame. Using this information, a robot locates itself in the map, where: r obot

Tcam is the transformation from the robot base frame to the camera frame on the robot and is known. cam TQ R is the pose of the QR code in the camera frame and is available upon detection of the QR code.

Fig. 3 A robot can acquire map information and localize itself against the map upon detection of a specially designed QR code

Speed Maps: An Application to Guide Robots in Human Environments

129

map

TQ R is the transformation from the robot base frame to the camera frame on the robot and is read from the data embedded to the QR code. map Tr obot is the pose of the robot in the map and is unknown. map QR

Tr obot = (map TQ R ) ∗ ( Q R Tr obot )

Tr obot = ( Q R Tcam ) ∗ (cam Tr obot ) = (cam TQ R )−1 ∗ (r obot Tcam )−1

Therefore: map

Tr obot = (map TQ R ) ∗ (cam TQ R )−1 ∗ (r obot Tcam )−1

(x, y, θ ) of the robot in the map frame is required for initialization. Position (x, y) is readily found by looking at the displacement of the transformation. Orientation θ is found by projecting the axes to the map floor plane and. After the initial pose is provided, amcl package handles the localization using the laser scanner readings. We use the visp_auto_tracker ROS package to extract the pose of the QR code and the content embedded in the pattern. To interpret the XML data, we use libXML++ package.

3.5 Goal Location Determination We want the robot to be able to navigate to semantic landmarks. If the label is attached to an explicit coordinate (using the current position of the robot during labeling), then goal is readily that coordinate. We consider two cases when calculating a goal point for a given label. The requested label may correspond to one landmark, or multiple landmarks. In the case that it is one landmark, it corresponds to a plane somewhere in the environment, but if multiple landmarks share the label, we assume the entity corresponds to a volume. For example, a user may label all four walls of a room, so the extent of the room is represented. The robot should infer that a volume is encoded with the labels provided to it, and find a proper goal location accordingly.

3.5.1

Only One Plane Has the Requested Label

In this case, we assume that the robot should navigate to the closest edge of the plane, so we select the closest vertex on the landmark’s boundary to the robot’s current position. We calculate a line between the closest vertex and the robot’s current pose, and navigate to a point on this line 1 meter away from, and facing the vertex. This method is suitable for both horizontal planes such as tables, or vertical planes such as doors (Fig. 4).

130

A. Cosgun

Fig. 4 Our system allows two types of semantic features: a Planar surfaces and b Rooms

3.5.2

Multiple Planes Has the Requested Label

In this case, we project the points of all planes with this label to the ground plane, and compute the convex hull. For the purposes of navigating to this label, we simply navigate to the centroid of the hull. While navigating to a labeled region is a simple task, this labeled region could also be helpful in the context of more complex tasks, such as specifying a finite region of the map for an object search task.

3.6 Speed Maps In this section, we introduce speed maps, which imposes speed limits to robots. A speed map can be static, so that it defines speed zones as in Fig. 5, or it can be dynamic, so that the robot can adjust its speed around humans. We claim that usage of such speed maps can make the robots safer, more predictable and possibly more efficient. Speed maps can be used in a layered manner. At the bottom, there is the static speed map, which originated from the layout. Dynamic speed maps, which change with time, can build on top of the static speed map, for example impose speed constraints around humans. We define three zones for static speed maps: – Green zone: It is safe to speed up – Yellow zone: Relatively safe, but human interaction possible – Red zone: Human encounter likely, move slowly The speed map shown in Fig. 5 depicts an office environment. This speed map is designed by hand using the following rules: Spaces corresponding to rooms and cubicles are covered as Red Zones. Blind corners are covered with a Red Zone

Speed Maps: An Application to Guide Robots in Human Environments

131

Fig. 5 Static speed map for an office environment. The robot has to be slow in red zones, can have moderate speed in yellow zones and can speed up to a limit in green zones

close to corner and Yellow zone enclosing the Red Zone. Corridors are covered as Green Zones and the rest is covered as Yellow Zones. Although we manually labeled this speed map, there are several approaches in the literature that does automatic room categorization. Room segmentation has been proposed in an interactive fashion by Diosi et al. [23], as well as automatically, especially for creating topological maps [24]. An example for a dynamic speed map is given in Sect. 4.3 and will be discussed context of speed control for guidance.

4 Implementation In this section, we discuss how the following and guidance behaviors are obtained for a mobile robot. In Sect. 4.1, we describe how the multimodal person detection and tracking is performed. In Sect. 4.2 we discuss the person following behavior and in Sect. 4.3 we discuss person guidance.

132

A. Cosgun

4.1 Person Detection and Tracking In order to interact with people, the robot has to be able able to track them. We use a person tracking system that fuses detections from two sources. These detectors are the leg detector and the torso detector. The reason for using multiple detectors for the person tracking system is to to increase the robustness of the tracking and provide 360 degrees coverage around the robot. Below are the brief descriptions of the person detection methods.

4.1.1

Leg Detector

Ankle-height front-facing laser scanner (Hokuyo UTM 30-LX) is used for this detector. We trained a classifier for detection using three geometric features: Width, Circularity and Inscribed Angle Variance [3]. We find a distance score for each segment in the laser data using the weighted sum of the distance to each feature and threshold the distance for detection.

4.1.2

Torso Detector

Torso-height back-facing laser scanner (Hokuyo UTM 30-LX) is used for this detector. We model the human torso as an ellipse and fit each segment in the laser image an ellipse by solving the problem with a generalized eigensystem [25]. Then the axis lengths of the fitted ellipse is compared to ellipse parameters obtained from the training data. More information on this detector is provided in [26].

4.1.3

Person Tracking

The detections from these different sources are fused using a Kalman Filter with constant velocity model. We used a simple Nearest Neighbors approach for associating tracks with detections. Whenever a track is not observed for a couple of seconds, it is deleted from the list.

4.2 Person Following When the robot is following a person, it constantly aims to keep a fixed distance from the person. In our experiments, we found that keeping 0.9 m was far enough for the person to feel safe, but close enough to keep track of the target. The robot updates its goal position that is following distance away from the target at every control step (10 Hz). This goal point is fed to the ROS Navigation Stack, which calculates the

Speed Maps: An Application to Guide Robots in Human Environments

133

Fig. 6 Speed profile of the robot for guiding a person

velocity commands to keep the robot on the path. If the goal position happens to intersect an obstacle, a new goal position is found by raytracing towards the robot and finding the first non-colliding position.

4.3 Person Guidance In previous work, we developed a system to guide blind individuals to a goal position using a haptic interface [26]. In this work we describe a more generalized approach. To guide a person to a location, the robot has to plan a path for itself and try to keep the person engaged in the guidance task. This actually is a hard problem if the robot also plans a separate path for the user, however we use a simpler approach for guidance, which produces plausible results. The robot executes the path that is calculated by ROS navigation, however modulates its speed according to the distance to the user. This actually corresponds to a dynamic speed map, that is used in conjunction with the static speed map described in Sect. 3.6. We define a speed profile function that is a function of the distance between the robot and the human (Fig. 6). The robot moves at a low speed vsa f e if the human is too close. The speed is peaked at distance d peak and the robot stops if the distance is larger than dguide , which may indicate that the human is not interested in being guided. Note that this speed profile is subject to the static speed limits, i.e. v peak is capped by the static speed map limit.

5 Demonstration The system is implemented on a Segway RMP-200 robot with casters (Fig. 7). We made use of ROS Architecture, for mapping, localization and path planning. We evaluate the system in two scenarios: corridor navigation and guidance.

134

A. Cosgun

Fig. 7 The Segway robot platform used in our experiments

In the corridor navigation scenario, the robot has a goal point and it is not guiding anybody. In the first condition, we used standard ROS Navigation and in the second condition, we used ROS Navigation with our speed modulation method described in the static speed maps section. There is a person standing right around the corner, which is invisible to the robot until it actually starts turning around the corner. Evaluation criteria is the execution time and the actual robot positions along the path. In the guidance scenario, there is no person around the corner and the robot guides the person to the same goal. In the first condition, we use ROS Navigation but robot stops if the distance to the human is over a threshold. In second condition, dynamic speed method is used. Instantaneous speed of the robot and the human are logged for comparison.

5.1 Scenario: Point-to-Point Navigation The speed map shown in Fig. 5 is used for the experiments. The speed limits were set as: vgr een = 1.5, v yellow = 0.5, vr ed = 0.15, all m/s. We compared our approach with fixed maximum speed of 1.0 m/s. The paths taken by the robot are compared in Fig. 8. With our approach, the robot slowed down before turning, which allowed for early detection of the human and it was able to give more space to the human. Our approach was slightly more efficient, reaching the goal in 28 s as opposed to 29.1 s.

Speed Maps: An Application to Guide Robots in Human Environments

135

Fig. 8 Corridor scenario. Robot is given a fixed goal location, and it is not guiding a user. There is a bystander person standing right around the corner, that is not visible to the robot until it turns the corner. Points annotate robot position taken at fixed time intervals. a ROS Navigation. Note that the distance between robot positions are mostly constant. The robot gets very close to the bystander because it is moving relatively fast when it turned the corner. b Our approach using static speed map. Green points annotate Green Zone where robot can move fast, yellow points are where robot is limited to moderate speed, and red points are when the robot is in Red Zones and has to move relatively slow. It can be seen by looking at robot’s positions that this approach was more gracious turning the corner and respecting human’s personal space. It also reached the goal faster

5.2 Scenario: Guiding a Person In this scenario, the robot is given a fixed goal to guide a person. We compared the velocity profile in Fig. 6 with dguide = 1.7 m, d peak = 0.9 m, dsa f e = 0.1 m, vsa f e = 0.1 m/s, v peak = 1.0 m/s. We compared our approach with the simple strategy: If the human is closer than dguide , then the navigation continues with a fixed max speed. Otherwise robot stops and waits. The comparison of robot speeds is given in Fig. 9a for fixed max speed and Fig. 9b with the speed profile method. Between t = 0 and t = 9s, the accelerations are steeper for the fixed max speed case. Robots that exhibit high accelerations will likely be perceived as unsafe, therefore our approach exhibits a more socially acceptable behavior. Moreover, after the person started following (t > 9s), our approach is better at matching the speed pattern of the human.

6 Conclusion In this work, we presented a context-aware robot that accepts user-annotated landmarks as the goal. By reading special QR codes, the robot acquires knowledge about the environment and its global position. We showed that speed maps not only can reduce the impact of a potential collision, but it can also reduce task execution time.

136

A. Cosgun

Fig. 9 Comparison of robot and human speeds with respect to time. a Standard ROS navigation. b Our approach: accelerations are less steeper than (a), which employs the dynamic speed adjustment for guidance

Robots yielding to speed maps also can potentially be can perceived as more safe as the speed limits serves as traffic rules. We also showed that our guidance approach results in more gracious motions compared to standard ROS Navigation. Future work includes the manual design of speed maps in Augmented Reality [27], incorporating the speed maps concept with human-aware navigation [28], perhaps for a different application such as a photographer robot [29], and conducting user studies to evaluate the usability of the system with metrics tailored for navigating among people [30].

References 1. Montemerlo, M., Thrun, S., Whittaker, W.: Conditional particle filters for simultaneous mobile robot localization and people-tracking. In: IEEE International Conference on Robotics and Automation (ICRA) (2002) 2. Schulz, D., Burgard, W., Fox, D., Cremers, A.B.: Tracking multiple moving targets with a mobile robot using particle filters and statistical data association. In: IEEE International Conference on Robotics and Automation (ICRA) (2001) 3. Xavier, J., Pacheco, M., Castro, D., Ruano, A., Nunes, U.: Fast line, arc/circle and leg detection from laser scan data in a player driver. In: IEEE International Conference on Robotics and Automation (ICRA) (2005) 4. Arras, K.O., Mozos, O.M., Burgard, W.: Using boosted features for the detection of people in 2d range data. In: IEEE International Conference on Robotics and Automation, vol. 2007, pp. 3402–3407 (2007) 5. Topp, E.A., Christensen, H.I.: Tracking for following and passing persons. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2005) 6. Carballo, A., Ohya, A., et al.: Fusion of double layered multiple laser range finders for people detection from a mobile robot. In: IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (2008) 7. Glas, D.F., Miyashita, T., Ishiguro, H., Hagita, N.: Laser-based tracking of human position and orientation using parametric shape modeling. Adv. Robot. 23(4), 405–428 (2009)

Speed Maps: An Application to Guide Robots in Human Environments

137

8. Kleinehagenbrock, M., Lang, S., Fritsch, J., Lomker, F., Fink, G.A., Sagerer, G.: Person tracking with a mobile robot based on multi-modal anchoring. In: International Workshop on Robot and Human Interactive Communication (2002) 9. Bellotto, N., Hu, H.: Multisensor-based human detection and tracking for mobile service robots. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 39(1), 167–181 (2009) 10. Nourbakhsh, I.R., Kunz, C., Willeke, T.: The mobot museum robot installations: a five year experiment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2003) 11. Thrun, S., Bennewitz, M., Burgard, W., Cremers, A.B., Dellaert, F., Fox, D., Hahnel, D., Rosenberg, C., Roy, N., Schulte, J., et al.: Minerva: a second-generation museum tour-guide robot. In: IEEE International Conference on Robotics and Automation (ICRA) (1999) 12. Burgard, W., Cremers, A.B., Fox, D., Hähnel, D., Lakemeyer, G., Schulz, D., Steiner, W., Thrun, S.: Experiences with an interactive museum tour-guide robot. Artif. Intell. 114(1), 3–55 (1999) 13. Bennewitz, M., Faber, F., Joho, D., Schreiber, M., Behnke, S.: Towards a humanoid museum guide robot that interacts with multiple persons. In: IEEE-RAS International Conference on Humanoid Robots (2005) 14. Pacchierotti, E., Christensen, H., Jensfelt, P., et al.: Design of an office-guide robot for social interaction studies. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2006) 15. Kanda, T., Shiomi, M., Miyashita, Z., Ishiguro, H., Hagita, N.: An affective guide robot in a shopping mall. In: Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction, pp. 173–180. ACM (2009) 16. Kim, Y.-D., Kim, Y.-G., Lee, S.H., Kang, J.H., An, J.: Portable fire evacuation guide robot system. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2009) 17. Robinette, P., Howard, A.M.: Incorporating a model of human panic behavior for robotic-based emergency evacuation. In: IEEE RO-MAN, pp. 47–52 (2011) 18. Loper, M.M., Koenig, N.P., Chernova, S.H., Jones, C.V., Jenkins, O.C.: Mobile human-robot teaming with environmental tolerance. In: ACM/IEEE International Conference on Human Robot Interaction (HRI) (2009) 19. Stein, P.S., Santos, V., Spalanzani, A., Laugier, C., et al.: Navigating in populated environments by following a leader. In: International Symposium on Robot and Human Interactive Communication (Ro-MAN) (2013) 20. Cai, J., Matsumaru, T.: Robot human-following limited speed control. In: IEEE RO-MAN, pp. 81–86 (2013) 21. Ota, M., Ogitsu, T., Hisahara, H., Takemura, H., Ishii, Y., Mizoguchi, H.: Recovery function for human following robot losing target. In: Annual Conference of the IEEE Industrial Electronics Society, pp. 4253–4257 (2013) 22. Trevor, A.J.B., Cosgun, A., Kumar, J., Christensen, H.I.: Interactive map labeling for service robots. In: IROS Workshop on Active Semantic Perception (2012) 23. Diosi, A., Taylor, G., Kleeman, L.: Interactive slam using laser and advanced sonar. In: IEEE International Conference on Robotics and Automation (ICRA) (2005) 24. Mozos, O.M., Triebel, R., Jensfelt, P., Rottmann, A., Burgard, W.: Supervised semantic labeling of places using information extracted from sensor data. Robot. Auton. Syst. 55(5), 391–402 (2007) 25. Fitzgibbon, A., Pilu, M., Fisher, R.B.: Direct least square fitting of ellipses. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (1999) 26. Cosgun, A., Sisbot, E.A., Christensen, H.I.: Guidance for human navigation using a vibrotactile belt interface and robot-like motion planning. In: IEEE International Conference on Robotics and Automation (ICRA) (2014) 27. Gu, M., Cosgun, A., Chan, W.P., Drummond, T., Croft, E.: Seeing thru walls: visualizing mobile robots in augmented reality (2021). arXiv preprint arXiv:2104.03547

138

A. Cosgun

28. Cosgun, A., Sisbot, E.A., Christensen, H.I.: Anticipatory robot path planning in human environments. In: 2016 25th IEEE international Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 562–569 (2016) 29. Newbury, R., Cosgun, A., Koseoglu, M., Drummond, T.: Learning to take good pictures of people with a robot photographer. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11268–11275 (2020) 30. Wang, J., Chan, Carreno-Medrano, W.P., Cosgun, A., Croft, E.: Metrics for evaluating social conformity of crowd navigation algorithms. In: Workshop on Novel and Emerging Test Methods and Metrics for Effective HRI (2021)