In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands 3031069668, 9783031069666

This book introduces a novel model-based dexterous manipulation framework, which, thanks to its precision and versatilit

206 28 10MB

English Pages 212 [213] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Series Editor’s Foreword
Preface
Contents
List of Symbols and Abbreviations
List of Symbols
Abbreviations
List of Figures
List of Tables
1 Introduction
1.1 Dexterous Manipulation
1.2 Contribution
1.3 Organization of this Work
References
2 Related Work
2.1 Dexterous Robotic Hands
2.1.1 A Brief History of Robotic Hands
2.1.2 DLR David
2.2 Dexterous Manipulation
2.2.1 Overview
2.2.2 Grasp State Estimation
2.2.3 Impedance-Based Object Control
2.2.4 Learning-Based Methods
References
3 Grasp Modeling
3.1 Definitions
3.2 Kinematics
3.2.1 Forward Kinematics
3.2.2 Grasp Matrix
3.2.3 Hand Jacobian
3.2.4 Contact Model
3.3 Dynamics
3.3.1 Rigid Body Dynamics
3.3.2 Grasp Dynamics
3.3.3 Contact Dynamics
3.4 Grasp Subspaces
3.5 Types of Grasps
References
4 Grasp State Estimation
4.1 Introduction
4.1.1 Concept
4.1.2 Problem Statement
4.2 Probabilistic Grasp State Estimation
4.2.1 Fundamentals
4.2.2 Particle Filter
4.2.3 Extended Kalman Filter
4.2.4 Filter Selection
4.3 Contact Detection and Localization
4.3.1 Collision Detection
4.3.2 Joint Torque Measurements
4.3.3 Contact Point Localization
4.4 State Estimation from Finger Position Measurements
4.4.1 Grasp State Definition
4.4.2 Motion Model
4.4.3 Measurement Model
4.4.4 Extensions
4.5 Data Fusion with Fiducial Markers
4.5.1 AprilTag
4.5.2 Measurement Model
4.5.3 Camera Localization
4.5.4 Target Tracking
4.6 Data Fusion with Contour Features
4.6.1 Feature Extraction
4.6.2 Measurement Model
4.7 Data Fusion with Visual Object Tracking
4.7.1 Multi-Modality Visual Object Tracking
4.7.2 Measurement Model
4.8 Data Fusion Under Measurement Delays
4.9 Experimental Validation
4.9.1 Grasp Acquisition
4.9.2 Pick-and-Place
4.9.3 In-Hand Manipulation
4.10 Summary
References
5 Impedance-Based Object Control
5.1 Introduction
5.1.1 Concept
5.1.2 Problem Statement
5.2 Controller Design
5.2.1 Object Impedance
5.2.2 Force Distribution
5.2.3 Architecture Overview
5.3 Object Impedance Control
5.3.1 Object Positioning
5.3.2 Maintaining the Grasp Configuration
5.4 Internal Forces
5.4.1 Force Distribution
5.4.2 Quadratic Optimization
5.4.3 Extensions
5.5 Torque Mapping
5.5.1 Force Mapping
5.5.2 Nullspace Control
5.6 Grasp Reconfiguration
5.6.1 Adding and Removing Contacts
5.6.2 Grasp Acquisition
5.7 Enabling In-Hand Manipulation
5.7.1 Finger Gaiting Interface
5.7.2 Contact Point Relocation
5.8 Experimental Validation
5.8.1 Tracking Performance
5.8.2 Stabilizing the Grasp Acquisition
5.8.3 Finger Gaiting
5.9 Summary
References
6 Conclusion
6.1 Summary and Discussion
6.2 Outlook
References
Recommend Papers

In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands
 3031069668, 9783031069666

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Springer Tracts in Advanced Robotics 149

Martin Pfanne

In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands

Springer Tracts in Advanced Robotics Volume 149

Series Editors Bruno Siciliano, Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione, Università degli Studi di Napoli Federico II, Napoli, Italy Oussama Khatib, Artificial Intelligence Laboratory, Department of Computer Science, Stanford University, Stanford, CA, USA Advisory Editors Nancy Amato, Computer Science & Engineering, Texas A&M University, College Station, TX, USA Oliver Brock, Fakultät IV, TU Berlin, Berlin, Germany Herman Bruyninckx, KU Leuven, Heverlee, Belgium Wolfram Burgard, Institute of Computer Science, University of Freiburg, Freiburg, Baden-Württemberg, Germany Raja Chatila, ISIR, Paris cedex 05, France Francois Chaumette, IRISA/INRIA, Rennes, Ardennes, France Wan Kyun Chung, Robotics Laboratory, Mechanical Engineering, POSTECH, Pohang, Korea (Republic of) Peter Corke, Queensland University of Technology, Brisbane, QLD, Australia Paolo Dario, LEM, Scuola Superiore Sant’Anna, Pisa, Italy Alessandro De Luca, DIAGAR, Sapienza Università di Roma, Roma, Italy Rüdiger Dillmann, Humanoids and Intelligence Systems Lab, KIT - Karlsruher Institut für Technologie, Karlsruhe, Germany Ken Goldberg, University of California, Berkeley, CA, USA John Hollerbach, School of Computing, University of Utah, Salt Lake, UT, USA Lydia E. Kavraki, Department of Computer Science, Rice University, Houston, TX, USA Vijay Kumar, School of Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, PA, USA Bradley J. Nelson, Institute of Robotics and Intelligent Systems, ETH Zurich, Zürich, Switzerland

Frank Chongwoo Park, Mechanical Engineering Department, Seoul National University, Seoul, Korea (Republic of) S. E. Salcudean, The University of British Columbia, Vancouver, BC, Canada Roland Siegwart, LEE J205, ETH Zürich, Institute of Robotics & Autonomous Systems Lab, Zürich, Switzerland Gaurav S. Sukhatme, Department of Computer Science, University of Southern California, Los Angeles, CA, USA

The Springer Tracts in Advanced Robotics (STAR) publish new developments and advances in the fields of robotics research, rapidly and informally but with a high quality. The intent is to cover all the technical contents, applications, and multidisciplinary aspects of robotics, embedded in the fields of Mechanical Engineering, Computer Science, Electrical Engineering, Mechatronics, Control, and Life Sciences, as well as the methodologies behind them. Within the scope of the series are monographs, lecture notes, selected contributions from specialized conferences and workshops, as well as selected PhD theses. Special offer: For all clients with a print standing order we offer free access to the electronic volumes of the Series published in the current year. Indexed by SCOPUS, DBLP, EI Compendex, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

Martin Pfanne

In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands

Martin Pfanne Institut für Robotik und Mechatronik Deutsches Zentrum für Luft- und Raumfahrt Oberpfaffenhofen-Weßling, Germany

ISSN 1610-7438 ISSN 1610-742X (electronic) Springer Tracts in Advanced Robotics ISBN 978-3-031-06966-6 ISBN 978-3-031-06967-3 (eBook) https://doi.org/10.1007/978-3-031-06967-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Series Editor’s Foreword

At the dawn of the century’s third decade, robotics is reaching an elevated level of maturity and continues to benefit from the advances and innovations in its enabling technologies. These all are contributing to an unprecedented effort to bringing robots to the human environment in hospitals and homes, factories, and schools; in the field robots fighting fires, making goods and products, picking fruits, and watering the farmland, and saving time and lives. Robots today hold the promise of making a considerable impact in a wide range of real-world applications from industrial manufacturing to health care, transportation, and exploration of the deep space and sea. Tomorrow, robots will become pervasive and touch upon many aspects of modern life. The Springer Tracts in Advanced Robotics (STAR) is devoted to bringing to the research community the latest advances in the robotics field on the basis of their significance and quality. Through wide and timely dissemination of critical research developments in robotics, our objective with this series is to promote more exchanges and collaborations among the researchers in the community and contribute to further advancements in this rapidly growing field. The monograph by Martin Pfanne is based on the author’s doctoral thesis. The contents are focused on dexterous manipulation and control of objects with robotic hands and are organized into six chapters. The challenge of localization and inhand manipulation of various objects is tackled by resorting to a novel grasp state estimation method that integrates information from tactile sensing, proprioception, and vision into a common formulation. Rich of examples developed by means of extensive experimentation on the DLR humanoid robot David in a range of grasping scenarios, this volume is a very fine addition to the STAR series! Naples, Italy April 2022

Bruno Siciliano STAR Editor

v

Preface

The findings that are presented in this manuscript are the result of my research at the Institute of Robotics and Mechatronics at the German Aerospace Center (DLR). I conducted this work as a graduate student between 2015 and 2021 in cooperation with the Center for Cognitive Interaction Technology (CITEC) at Bielefeld University. As two of the leading figures at these institutions and the principal supervisors of my doctoral studies, I would like to express my deepest gratitude to Prof. Alin Albu-Schäffer and Prof. Helge Ritter for providing me this opportunity and for their continued support. I would also like to thank Dr. Freek Stulp for his guidance during the early years of my research. Thank you to Jens Reinecke for initially opening the doors to DLR for me as a diploma student in 2012 and for his friendship in the many years since then. My deep gratitude goes to Dr. Maxime Chalon for his invaluable support of my research and the many fruitful discussions on all aspects of dexterous manipulation. Many thanks to Manuel Stoiber, my former master student, whose work greatly contributed to the algorithmic framework that is presented in this book. The DLR humanoid robot David was the primary research platform for the work during my studies. Therefore, I would like to thank the entire David team for enabling me to conduct this research in the first place. Thank you to Dr. Bastian Deutschmann, Dr. Maxime Chalon, Dr. Daniel Leidner and Dr. Freek Stulp for their helpful advice and comments during the writing of my dissertation, on which this book is based. Moreover, I am very thankful to Prof. Antonio Bicchi for agreeing to be part of the evaluation committee of my doctoral examination. Finally, I would like to very much thank my family. In particular, I am eternally grateful to my wife, Carolina, without whom this work would not have been possible. And a very special thank you goes to my daughter, Eva Marie, for being my inspiration during the final years of the creation of this book. Landsberg am Lech, Germany July 2022

Martin Pfanne

vii

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Dexterous Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Organization of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 4 9 12 14

2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Dexterous Robotic Hands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 A Brief History of Robotic Hands . . . . . . . . . . . . . . . . . . . . . 2.1.2 DLR David . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Dexterous Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Grasp State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Impedance-Based Object Control . . . . . . . . . . . . . . . . . . . . . 2.2.4 Learning-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15 15 15 18 20 20 22 25 26 27

3 Grasp Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Forward Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Grasp Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Hand Jacobian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Contact Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Rigid Body Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Grasp Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Contact Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Grasp Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Types of Grasps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 37 37 39 41 43 46 46 48 49 50 52 55

ix

x

Contents

4 Grasp State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Probabilistic Grasp State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Filter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Contact Detection and Localization . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Collision Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Joint Torque Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Contact Point Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 State Estimation from Finger Position Measurements . . . . . . . . . . . 4.4.1 Grasp State Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Motion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Data Fusion with Fiducial Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 AprilTag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Camera Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Target Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Data Fusion with Contour Features . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Data Fusion with Visual Object Tracking . . . . . . . . . . . . . . . . . . . . . 4.7.1 Multi-Modality Visual Object Tracking . . . . . . . . . . . . . . . . 4.7.2 Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Data Fusion Under Measurement Delays . . . . . . . . . . . . . . . . . . . . . . 4.9 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.1 Grasp Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.2 Pick-and-Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.3 In-Hand Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57 57 58 61 63 63 64 67 69 70 70 71 72 75 76 76 79 81 86 86 88 89 91 92 93 96 97 98 99 100 105 105 113 116 120 122

5 Impedance-Based Object Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Controller Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Object Impedance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Force Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

125 125 126 128 131 131 134 135

Contents

5.3

xi

Object Impedance Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Object Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Maintaining the Grasp Configuration . . . . . . . . . . . . . . . . . . 5.4 Internal Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Force Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Quadratic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Torque Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Force Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Nullspace Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Grasp Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Adding and Removing Contacts . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Grasp Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Enabling In-Hand Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Finger Gaiting Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Contact Point Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Tracking Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.2 Stabilizing the Grasp Acquisition . . . . . . . . . . . . . . . . . . . . . . 5.8.3 Finger Gaiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

136 136 138 139 140 144 145 147 147 148 149 150 151 153 153 155 157 158 160 167 169 170

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

173 173 177 180

List of Symbols and Abbreviations

In this thesis, a range of symbols is used to represent the relevant quantities for the model-based in-hand localization and object control. Unless stated otherwise, they are used consistently throughout this work. The meaning of a symbol is further defined by the chosen nomenclature. Plain letters (e.g. n, f⊥ , μ) denote scalar quantities, while bold symbols (e.g. xdes , f c , G) represent vectors or matrices. The existence of a left-side superscript (e.g. o xp and c T o,t ) specifies the coordinate frame, in which the quantity is expressed. The lack of such a superscript implies that the symbols are described w.r.t. the inertial frame. The right-side subscript t (e.g. qt , H p,t ) denotes that a quantity is time-dependent. It is sometimes omitted if the time dependence of the quantity is apparent or not relevant in the given context. Finally, a dot (e.g. x˙ , c˙ null ) represents the derivative w.r.t. the time. In the following, the most important quantities and their meanings are listed. Subsequently, a summary of the abbreviations used in this work is presented.

List of Symbols 0 ∈ R3 0a×b ∈ Ra×b 1a×b ∈ Ra×b a[i] ∈ R3 aa[i] ∈ R ar[i] ∈ R A ∈ R3n×3n b[i] ∈ R3 bf ∈ Rm bl ∈ R5n bl,friction ∈ R4n bl,range ∈ Rn bo ∈ R6

Origin point Matrix of zeros of the indicated size Matrix of ones of the indicated size 1st tangential vector of a surface at contact i Activation function for the addition of contact i Activation function for the removal of contact i Weighting matrix of the optimization 2nd tangential vector of a surface at contact i Vector of the velocity product terms of the fingers Lower bound vector of the optimization Lower bound for the friction constraint of the optimization Lower bound for the range constraint of the optimization Vector of the velocity product terms of the object xiii

xiv

bu ∈ R5n bu,friction ∈ R4n bu,range ∈ Rn B ∈ R5n×3n Bfriction ∈ R4n×3n Brange ∈ Rn×3n bel bel c ∈ R3n c[i] ∈ R3 cdes ∈ R3n 3 c[i] des ∈ R [i] cdes,s ∈ R3 3 c[i] f ∈R cinit ∈ R3n 3 c[i] init ∈ R [i] 3 cl ∈ R cnull ∈ R3n 3 c[i] o ∈R cu ∈ R cv ∈ R C ∈ R3×4 d [i] ∈ R da ∈ R d depth ∈ RL [l] ddepth ∈R dl ∈ R Dc ∈ R Dnull ∈ R Dx ∈ R e∈R e ∈ R3n f ∈ R6+m fa[i] ∈ R fb[i] ∈ R f c ∈ R3n 3 f [i] c ∈R f c,null ∈ R3n f d ∈ R3n fd[i] ∈ R 3 f [i] d ∈R f des ∈ R3n 3n f [i] des ∈ R fi ∈ R

List of Symbols and Abbreviations

Upper bound vector of the optimization Upper bound for the friction constraint of the optimization Upper bound for the range constraint of the optimization Mapping matrix of the constraints of the optimization Mapping of the friction constraint of the optimization Mapping of the range constraint of the optimization Belief of the state Predictive belief of the state Vector of all n contact positions Position of contact i Desired contact positions Desired position of contact i Desired position of contact i in step s Position of contact i on the surface of the finger Initial contact positions Initial position of contact i Lifted position of contact i Nullspace contact positions Position of contact i on the surface of the object 1st image coordinate of the optical center 2nd image coordinate of the optical center Camera matrix Smallest distance or penetration depth of two bodies Size of an AprilTag Vector of all L predicted feature depths Predicted depth of a feature of index l Finger lift height during contact relocation Damping of the contact impedance Damping of the nullspace impedance Damping of the object impedance Finger index Parameter vector of the optimization Motion model Force component of contact i in the direction of a[i] Force component of contact i in the direction of b[i] Vector of all n contact forces Force transmitted through contact i Nullspace contact forces User-specified contact forces User-specified normal force for contact i User-specified force for contact i Desired contact forces Desired force of contact i Time step of the filter update from a visual measurement

List of Symbols and Abbreviations

f int ∈ R3n 3 f [i] int ∈ R [i] fint,a ∈ R [i] fint,b ∈R [i] fint,⊥ ∈ R [i] fint,  ∈R fmin ∈ R fmax ∈ R fn[i] ∈ R fu ∈ R fv ∈ R f x ∈ R3n f⊥ ∈ R f ∈ R f ∗ ∈ R3n F ∈ R6+m×6+m g∈R G ∈ R6×3n G[i] ∈ R6×3 G ∈ R6×3ˆn Gpalm ∈ R6×3n h∈R h ∈ R3n hp ∈ R2L hpalm ∈ R2L hq ∈ Rm htr ∈ R6 H ∈ R3n×6+m H c ∈ Rh×6 h×6 H [i] c ∈R 2L×6+m Hp ∈ R H palm ∈ R2L×12+m H q ∈ Rm×6+m H tr ∈ R6×6+m i∈R I a×a ∈ Ra×a I o ∈ R3×3 j∈R J ∈ R3n×m J [j] ∈ R3×m J ∈ R3ˆn×m k∈R kq ∈ R K ∈ R6+m×3n 



Internal contact forces Internal force of contact i Component of the internal force in the direction of a[i] Component of the internal force in the direction of b[i] Tangential component of the internal contact force Normal component of the internal contact force Minimal desired contact force Maximal desired contact force Normal force component of contact i Focal length of the 1st image coordinate Focal length of the 2nd image coordinate Contact forces of the object impedance Force component perpendicular to the surface Force component tangential to the surface Argument of the internal force optimization Motion derivative matrix Gravitational acceleration Grasp matrix Partial grasp matrix for contact i Reduced grasp matrix Palm grasp matrix Number of the selected components of the contact twist Measurement model Feature model Palm feature model Joint stabilization model Tracker model Measurement derivative matrix Selection matrix for the twist components of the contacts Selection matrix for the twist components of contact i Feature derivative matrix Palm feature derivative matrix Joint stabilization derivative matrix Tracker derivative matrix Contact index Identity matrix of the indicated size Matrix of the moments of inertia of the object Joint index Hand Jacobian matrix Partial hand Jacobian matrix for contact i Reduced hand Jacobian matrix Link index Virtual stiffness of the joint stabilization Kalman gain

xv

xvi

Kc ∈ R Knull ∈ R Kx ∈ R l∈R L∈R m∈R 3 m[i] c ∈R mi ∈ R mo ∈ R mp ∈ R M ∈R M f ∈ Rm×m M o ∈ R6×6 Mp ∈ R n∈R n ∈ R3n n[i] ∈ R3 n∈R 3 n[i] 0 ∈R [i] ndes ∈ R3 N ∈R N(A) p[l] ∈ R2 p¯ [l] ∈ R2 2 p[l] palm ∈ R 

List of Symbols and Abbreviations

Stiffness of the contact impedance Stiffness of the nullspace impedance Stiffness of the object impedance Feature index Number of features Number of joints Moment transmitted through contact i Time step of the capturing of a visual measurement Mass of the object Particle index Number of links of the hand Inertia matrix of the fingers Inertia matrix of the object Number of particles Number of contacts Vector of all n contact normal directions Normal direction of a surface at contact i Reduced number of contacts Normal direction of contact i at time 0 Normal direction of contact i at the desired position Sample size Nullspace of a matrix A Image coordinates of a feature of index l Predicted image coordinates of a feature of index l Image coordinates of a palm feature of index l

2 p¯ [l] palm ∈ R

Predicted image coordinates of a palm feature of index l

P∈R P¯ ∈ R6+m×6+m P 0 ∈ R6+m×6+m q ∈ Rm q[j] ∈ R q˜ ∈ Rm q ∈ Rm [j] q ∈R q˜ 0 ∈ Rm qdes ∈ Rm qe ∈ R4 [j] qe ∈ R qnull ∈ Rm Q ∈ R3n×3n Qp ∈ R2L×2L

Image coordinates of a target feature of index l 1st image coordinate of a feature of index l 2nd image coordinate of a feature of index l Covariance of the grasp state Predictive covariance of the grasp state Initial covariance of the grasp state Vector of all m joint positions Position of joint j Estimated joint position biases Corrected joint positions Corrected position of joint j Initial estimate of the joint position biases Desired joint positions Joint positions of finger e Position of joint j of finger e Nullspace joint positions Covariance of the measurement disturbance Covariance of the feature disturbance

2 p[l] target ∈ R [l] pu ∈ R pv[l] ∈ R 6+m×6+m





List of Symbols and Abbreviations

Qpalm ∈ R2L×2L Qq ∈ Rm×m Qtr ∈ R6×6 R R ∈ R6+m×6+m R[i] ∈ R3×3 s∈R S ∈ R3×3 t∈R ta ∈ R tl ∈ R to ∈ R tr ∈ R ts ∈ R T a ∈ R4×4 T c ∈ R4×4 4×4 T [k] f ∈R T o ∈ R4×4 T p ∈ R4×4 4×4 T [k] s ∈R u∈R u ∈ Rm v∈R v ∈ R3 w ∈ R6 w [mp ] ∈ R wc ∈ R6 wext ∈ R6 w g ∈ R6 wdes ∈ R6 wdyn ∈ R6 W ∈ R6×6 W palm ∈ R6×6 x∈R x ∈ R6 x0 ∈ R6 xc ∈ R6 xc,0 ∈ R6 xdes ∈ R6 xdes,s ∈ R6 xp ∈ R3L 3 x[l] p ∈R xpalm ∈ R6 xtarget ∈ R6

Covariance of the palm feature disturbance Covariance of the joint stabilization disturbance Covariance of the tracker disturbance Set of real numbers Covariance of the motion disturbance Rotation matrix of the contact frame {C [i] } Step index of a finger gaiting sequence Cross-product matrix Time or time step Time of the addition of a contact Time of the relocation of a finger on the object Time of the transition to the object controller Time of the removal of a contact Time of the transition to the desired object pose Transformation matrix of the AprilTag-fixed frame {A} Transformation matrix of the camera-fixed frame {C} Transformation matrix of the link-fixed frame {F [k] } Transformation matrix of the object-fixed frame {O} Transformation matrix of the palm-fixed frame {P} Transformation matrix of the joint-fixed frame {S [k] } 1st image coordinate Control vector 2nd image coordinate Translational velocity of the object Object wrench Weight of a particle of index mp Object wrench applied through the contacts Object wrench from external loads Object wrench generated by gravity Desired object wrench Object wrench from dynamical loads Mapping matrix of the object twist Mapping matrix of the palm twist 1st coordinate of the position of the object Estimated object pose Initial estimate of the object pose Estimated camera pose Initial estimate of the camera pose Desired object pose Desired object pose in step s Vector of all L feature positions Position of a feature of index l Pose of the palm Pose of the target

xvii

xviii

xtr ∈ R6 X tr ∈ R6×6 y∈R y ∈ R6+m y¯ ∈ R6+m y ∈ R6+m y[mp ] ∈ R6 y¯ [mp ] ∈ R6 y0 ∈ R6+m Y0 Y z∈R z ∈ R3n z¯ ∈ R3n zdes ∈ R zp ∈ R2L zpalm ∈ R2L zq ∈ Rm ztr ∈ R6 δ ∈ R3n  ∈ R6+m θ ∈R h λ[i] c ∈R μ∈R ν ∈ R6 ν c, f ∈ R6n 6 ν [i] c, f ∈ R ν c,o ∈ R6n 6 ν [i] c,o ∈ R m τ ∈R τ [j] ∈ R τ act ∈ Rm τ c ∈ Rm [j] τc ∈ R τ c,null ∈ Rm τ cmd ∈ Rm [j] τcmd ∈ R τ e ∈ R4 τ e,cmd ∈ R4 τ e,null ∈ R4 τ ext ∈ Rm [j] τf ∈ R τ g ∈ Rm τ null ∈ Rm 

List of Symbols and Abbreviations

Tracker pose Covariance of the tracker pose 2nd coordinate of the position of the object Estimated grasp state Predicted grasp state Actual state Discrete hypothesis of the state of index mp Predicted state of a particle of index mp Initial estimate of the grasp state Initial set of Mp particles Set of all Mp particles 3rd coordinate of the position of the object Measurement vector Predicted measurement vector Desired object position in z Feature measurement vector Palm feature measurement vector Joint stabilization measurement vector Tracker measurement vector Measurement disturbance Motion disturbance 2nd Euler angle of the orientation of the object Vector of the transmitted loads of contact i Friction coefficient Twist of the object Vector of all contact twists on the surface of the fingers Twist of contact i on the surface of the finger Vector of all contact twists on the surface of the object Twist of contact i on the surface of the object Vector of all m joint torques Torque of joint j Joint torques generated by the actuators Joint torques applied through the contacts Torque of joint j applied through a contact Nullspace contact torques Commanded joint torques Commanded torque of joint j Joint torques of finger e Commanded joint torques of finger e Nullspace torques of finger e Joint torques from external loads Estimated friction torque of joint j Joint torques generated by gravity Nullspace torques

List of Symbols and Abbreviations [j]

τo ∈ R [j] τq ∈ R φ∈R ψ ∈R ψdes ∈ R ω ∈ R3 {A} {C} {C [i] } {F [k] } {I } {O} {P} {S [k] }

xix

Commanded joint torque from the object controller Commanded joint torque from the joint controller 1st Euler angle of the orientation of the object 3rd Euler angle of the orientation of the object Desired object orientation around ψ Angular velocity of the object Coordinate frame fixed to an AprilTag Coordinate frame fixed to the camera Coordinate frame fixed to contact i Coordinate frame fixed to a link k of the hand Inertial coordinate frame Coordinate frame fixed to the object Coordinate frame fixed to the palm of the hand Coordinate frame fixed to the joint on a link k

Abbreviations 2D 3D ca. CSO DLR DoF e.g. EDAN EKF EMG Eq. Eqs. Fig. Figs. GJK HIT i.e. ICP IIT IPC LED PDF PWP3D QP

Two-dimensional Three-dimensional Circa Configuration space obstacle Deutsches Zentrum für Luft- und Raumfahrt (German Aerospace Center) Degree(s) of freedom Exempli gratia (for example) EMG-controlled daily assistant Extended Kalman filter Electromyography Equation Equations Figure Figures Gilbert–Johnson–Keerthi (distance algorithm) Harbin Institute of Technology Id est (that is) Intrinsically passive controller Istituto Italiano di Tecnologia (Italian Institute of Technology) Iterative closest point Light-emitting diode Probability density function Pixel-wise posterior 3D Quadratic programming

xx

RANSAC RGB RGB-D Sec. Secs. SIPC SLAM w.r.t.

List of Symbols and Abbreviations

Random sample consensus Red, green, blue Red, green, blue and depth Section Sections Static intrinsically passive controller Simultaneous localization and mapping With respect to

List of Figures

Fig. 1.1

Fig. 1.2

Fig. 1.3

Fig. 1.4

Fig. 1.5

Fig. 1.6

Fig. 1.7

At the age of 18 months, my daughter is developing her fine motor skills. Solving this stacking game requires hand-eye coordination and the ability to move a grasped object inside of her hand with precision . . . . . . . . . . . . . . . . . . . . Accomplishing the stacking game with a humanoid robot challenges its dexterous manipulation abilities. The robot has to be able to accurately position the grasped pieces w.r.t. the wooden board, requiring the in-hand control of the object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of the main components of a dexterous manipulation framework. Observations from the environment are perceived through sensors of the robot and processed by the perception component. The control component generates commands, which are executed by the actuators of the robot. Both components rely on an internal model, which describes the behavior of the hand-object system . . . . . . . . . . . . . . . . . . . . . The knowledge of the scene is internally represented by a model, which describes the kinematic and dynamic relations of the grasp system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . For the stacking game, the primary purpose of the perception system is to determine the in-hand pose of a grasped object, as well as the board, utilizing different sensing modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The control component is tasked with positioning the object inside of the hand, such that the arm can reach the desired target location above the board . . . . . . . . . . . . . . . . . . Structure of this book. Chapters chapterspsmodeling to chapterspscontrol correspond to the main components of the proposed dexterous manipulation framework . . . . . . . . . . .

2

3

5

5

7

8

13

xxi

xxii

Fig. 2.1 Fig. 2.2

Fig. 2.3

Fig. 2.4 Fig. 2.5

Fig. 2.6

Fig. 3.1 Fig. 3.2

Fig. 3.3

List of Figures

Overview of the most relevant references in the context of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A selection of robotic hands, which illustrate the progress and research trends of the past 40 years. a The Salisbury hand is the first robotic manipulator that was specifically designed for dexterous manipulation [1]. b Miniaturization allowed to integrate the joint drives of the DLR Hand II inside of the manipulator [2]. c The anthropomorphic design of the Shadow Dexterous Hand approximates the kinematics of the human hand [3]. d Ongoing algorithmic challenges motivated the creation of simplified and soft manipulators, such as the Pisa/IIT SoftHand [4]. . . . . . . The DLR humanoid robot David has been the primary research platform for the developed algorithms. Because of its anthropomorphic design and mechanical compliance, David is close to a human in size and performance. a David consists of two arms and an actuated neck, including an anthropomorphic right hand and a left hand gripper [13]. b The fully-actuated, tendon-driven hand of David enables human-like dexterous manipulation [13] . . . . . . . . . . . . . Illustration of the coupling between two joints of the hand and the corresponding four tendons [18] . . . . . . . . . . . . . . . . . . . . Block diagram of the pre-existing joint controller, which provides a torque interface and allows the compliant positioning of the fingers [15] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustrations of a small sample of past work on a range of dexterous manipulation topics, including perception, control and learning. a Reinforcement learning was used to train this robotic hand to reorient a cube [37]. b Contact measurements allowed to estimate the pose of the grasped object [38]. c Depth measurements and physical constraints were fused to track both the object and the hand [39]. d An object-level impedance controller realized the compliant positioning of the grasped disc [40] . . . . . The David hand, holding an object with its five fingers in a fingertip grasp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The grasp modeling is concerned with describing the behavior of the fingers and the object, and how they relate to each other through the contact points. Here, the main quantities and relevant coordinate frames are illustrated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The orientation of the object is defined by the Euler angles φ, θ and ψ, which describe three elemental rotations that transforms the initial frame {I } (red) to the body frame {O} (blue) in the x-y -z  sequence . . . . . . . . . . . . . . . . . . . . . . . . .

16

17

18 19

20

24 34

34

35

List of Figures

Fig. 3.4

Fig. 3.5

Fig. 3.6

Fig. 3.7

Fig. 3.8

Fig. 3.9

The same contact can be described by either a point on the surface of the object (c[1] o ) or on the corresponding finger link (c[1] ). Expressing the latter involves calculating f the forward kinematics of the finger, which are obtained through a series of transformations . . . . . . . . . . . . . . . . . . . . . . . . The dynamics of the grasp relate the contact forces, f c , to the corresponding joint torques, τ c , and object wrench, wc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The friction constraint of a contact c can be described by a cone around the contact normal n, with its opening angle being defined by the friction coefficient μ. Whether a force f lies inside of the cone is determined by the ratio of its normal (f⊥ ) and tangential (f ) components. a If the force is inside of the cone, static friction is maintained. b If the force lies outside of the cone, the surfaces will start to slide relative to each other . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The grasp matrix G and hand Jacobian matrix J relate the kinematic and dynamic quantities of the grasp to one another. Through the pseudo-inverse relation (+ ), the least-square solution is obtained. a Kinematics: G maps the object twist to the contact velocities; J maps the joint velocities to the contact velocities. b Dynamics: G maps the contact forces to the object wrench; J maps the contact forces to the joint torques . . . . . . . . . . . . . . . . . . . . . . If the number of contacts is at least three (n ≥ 3) and the DoF of the fingers are greater than those of the affected contact points (m > 3n), there exist the illustrated nullspaces in the mappings between the kinematic and dynamic quantities of the grasp. a Kinematics: The nullspace of GT allows to realize an additional joint velocity, which does not influence the object twist; Similarly, moving the joints in the nullspace of J + is not affecting the contact velocities. b Dynamics: Applying contact forces in the nullspace of G+ is not influencing the object wrench; An added torque, which is projected through the nullspace of J T , has no effect on the contact forces . . . . . . . . . . . . . . . . . . . . . . . . . The type of grasp affects some of the aspects of the grasp model. In the context of this work, the distinction between precision and power grasps is most relevant. a Example of a precision grasp, which involves three fingertip contacts between the object and the hand. b In this power grasp, the object is enveloped by the hand, resulting in a large number of contacts . . . . . . . . . . . . . . . . . . . . .

xxiii

37

46

50

51

53

54

xxiv

Fig. 4.1

Fig. 4.2

Fig. 4.3

Fig. 4.4

Fig. 4.5

List of Figures

Objects may move during the grasp acquisition in ways that were not planned or foreseen. In this example, the ketchup bottle tilts, when being picked up. a Hand and object before the grasp. b The power grasp moves the bottle inside of the hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The measurement of the joint positions allows to identify inconsistencies between the fingers and the estimated object pose. By correcting the pose of the object and/or fingers, collisions between the two can be resolved. Furthermore, the localization of contact points allows to predict object displacements from the motion of the fingers. a Illustration of the estimated (solid, orange) and the actual (dashed) pose of the object, as well as inconsistencies in the assumed grasp state (red). b Resolving the inconsistencies improves the pose estimation. Identified contact points are marked in red. c Prediction of the object displacement from the motion of the fingers, using the identified contacts . . . . . . . . . . . . . . . . . . The detection or inference of contacts between the object and the fingers allows to further constrain the pose, thereby improving the estimation. a Sensor measurements identified a contact on the finger link that is marked in red. b The pose of the object is corrected in order to align it with the finger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Even sparse visual information allows to further improve the estimation quality. Individual features can be used to align the camera view of an object and the estimation, similar to the contact points. a A characteristic object feature is extracted from the camera view of the grasp. b The position of the same feature is identified at the estimated object pose. c By correcting the difference between the two positions, the object estimation is aligned with the real pose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of the inputs that are used in the proposed method. The estimation from finger measurements relies on the availability of the hand and object geometries, as well as an initial object pose, and incorporates joint position measurements. Additionally, joint torque measurements or tactile sensors allow to detect contacts. Finally, a camera, which is mounted to the head of the robot, provides views of the scene. a 3D visualization of the available information at the beginning of the estimation. b The corresponding view of a head-mounted RGB camera . . . . . . . . .

58

59

60

61

62

List of Figures

Fig. 4.6

Fig. 4.7 Fig. 4.8

Fig. 4.9

Illustration of the main steps of the particle filter framework, which was proposed in [2]. a Each particle in the filter represents one hypothesis of the 6 DoF pose of the object. b The first set of particles is randomly sampled around the initial assumption of the object pose, marked by the yellow square. c In the prediction step of the filter, the displacement of the object is inferred from the motion of the fingers. d The effect of the prediction is calculated for each particle in the set. e The weight of a particle represents the consistency of the object pose with the finger measurements. f Poses, which are heavily colliding with the fingers are assigned small weights, while collision-free particles are weighted highly. g The particle with the highest weight represents the current best estimate of the object pose. h In the re-sampling step, particles are redistributed according to their weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagram of the main components of an extended Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A contact of index i between a finger link and the object [i] is characterized by the points c[i] o and cf , which lie on the surface of the two bodies. While, in the colliding [i] case (left), c[i] o and cf describe the location of the deepest penetration, in the non-colliding case (right), they represent the closest points on the two surfaces . . . . . . . . . . . . . . . . . . . . . . Because of joint friction, it may not be possible to infer the location of a contact from joint torque measurements alone. In the illustrated examples, measured torques, [j] which are bigger than the estimated joint friction, τf , are highlighted in red. a Depending on the contact force, f c , only τ [1] may be greater than the joint friction torque, while τ [2] and τ [3] are smaller. b The same may be the case in this configuration, making it impossible to reliably distinguish the two, based on torque measurements alone . . . . . .

xxv

67 68

70

72

xxvi

Fig. 4.10

Fig. 4.11

Fig. 4.12

Fig. 4.13

List of Figures

Considering the current estimate of the object pose, the likeliest contact location, which is able to explain the torque measurements, is chosen. The selected finger link is highlighted in red. a Assuming this object pose, it is most likely that the contact is located on the distal phalanx. b Similarly, for this estimate of the object location, the proximal phalanx is inferred to be in contact. c If all three measured torques are greater than the joint friction, only a contact on the distal phalanx explains the measurements. d Note that this would still be the case, even if the current estimate of the object pose is closer to another phalanx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A−B describes the Minkowski difference of two convex sets, A and B, such as the polygon meshes of two bodies. The location of the origin point w.r.t. this set relates to the collision state of the two geometries. a If the bodies A and B are not colliding, the origin lies outside of A−B. b If the bodies A and B are colliding, the origin lies inside of A−B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In order to determine the collision state, the goal of the GJK algorithm is to grow a simplex, which contains the origin. In the non-intersecting case, this will not be possible, since the origin lies outside of the geometry. a Starting from a random vertex (v o ), the simplex is grown towards the origin point, selecting the vertex (v 1 ), which lies furthest in its direction. b If the simplex can no longer be grown beyond the origin, before enclosing it, it must lie outside of the geometry. c By further evolving the simplex towards the origin, eventually, the closest point on the surface (c) can be identified . . . . . . . . . . . . . . . . . . . . . . . . If the two bodies are colliding, the origin lies inside of the geometry. Therefore, it has to be possible to enclose it in a simplex of vertices. For the purpose of this work, this process was extended in order to identify the exact position of the deepest penetration. a As before, the simplex is grown in the direction of the origin, beginning from a randomly selected vertex (v o ). b In the colliding case, the origin will eventually be contained in a simplex (v o , v 1 , v 2 ). c By further growing the simplex towards the surface, it is possible to identify the closest element (i.e. face, edge or vertex) to the origin, as well as the exact point on this element (c) . . . . . . . . . . . . . . . . . . . . . . .

73

74

74

75

List of Figures

Fig. 4.14

Fig. 4.15

Fig. 4.16

Fig. 4.17

When predicting the motion of the object from the displacement of a collision contact, this contact should only be considered if it moves towards the object, thereby further penetrating the object. In the opposite direction, this type of contact should be neglected. a c[1] marks a collision contact that has been identified between the object and a finger link. b If the finger moves towards the object, predicting a corresponding object displacements prevents the two bodies from penetrating. c However, if the finger is removed from the object, no displacement is predicted, since the two bodies may not be physically in contact . . . . . . . . . A sensed contact represents a point, where the object and a finger link are known to be touching. Therefore, an object displacement, which corresponds to the motion of the contact point, should be predicted in all cases, both towards and away from the object. a Here, c[1] marks a sensed contact that has been measured between the object and a finger link. b Similar to the collision contact, moving the finger towards the object results in a predicted object motion. c However, in contrast to before, when moving the finger away, the object pose is predicted to maintain the contact, since the two bodies are known to be in contact . . . . Illustration of the two types of contacts in the context of the grasp state estimation. a The update step resolves the identified collision contact by minimizing [1] the distance between c[1] o and cf . b For sensed contacts, [1] both the collision of contact c , as well as the separation of contact c[2] are considered in the update . . . . . . . . . . . . . . . . . . Moving the palm leads to significant misalignments in the estimated grasps state. However, this is avoided by including the palm motion in the prediction of the object pose. a Estimation of the grasp state before the palm motion. b Moving the palm results in large object motions, which are not considered in the pose estimation. c Including the palm motion in the motion model allows to predict the corresponding object displacement . . . . . . . . . . . . .

xxvii

77

78

80

82

xxviii

Fig. 4.18

Fig. 4.19

Fig. 4.20 Fig. 4.21

Fig. 4.22

Fig. 4.23

List of Figures

Bodies in the environment of the manipulator can be considered as additional kinematic constraints in order to inform the estimation. a Similar to the fingers, the collision between the object and an additional body can be identified and described by the corresponding points on the surfaces. b By considering this additional contact in the update step of the EKF, the object pose is corrected to resolve the collision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The estimation of the joint biases does not constrain the finger positions to one specific configuration, allowing them to drift over time. This can be prevented by introducing an additional joint stabilization component in the update step, which minimizes the biases. a The joint position biases may take on different values, which increasingly distort the estimated finger configuration. b The joint stabilization constrains the biases by keeping the estimated joint configuration as close as possible to the measured one (transparent) . . . . . . . . . . . . . . . . . . . . . . . . . AprilTags, which are rigidly attached to objects, allow to easily extract artificial features from camera images . . . . . . . . From a camera image, the coordinates of the corners of an AprilTag can be extracted. Correspondingly, based on the current estimate of the object pose, the predicted position of the corners can be calculated. The difference between these two sets of positions describes the misalignment of the pose estimation, which is subsequently corrected in the EKF update. a Illustration of the camera view of the AprilTag and the extracted corner positions. b Prediction of the corner positions, based on the current estimate of the object pose . . . . . . . . . . . . . . The precise location of the head-mounted camera of David cannot be determined kinematically because of the continuum-elastic mechanism that is used to actuate its neck. a The 6 DoF pose of the head is determined by the length of four tendons and a continuum-elastic element [11]. b The head house an Intel RealSense D435 camera, which provides RGB and depth images [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AprilTags, which are attached to the upper and lower side of the David hand, allow to localize the camera w.r.t. the palm. a Upper-side AprilTag attached to the David hand. b Lower-side AprilTag attached to the David hand . . . . . .

83

85 87

87

90

90

List of Figures

Fig. 4.24

Fig. 4.25 Fig. 4.26

Fig. 4.27

Fig. 4.28

Fig. 4.29

Illustration of the image processing steps for the extraction and matching of the contour features. a Camera image and the rendered depth image based on the current estimate. b Edge images, generated using the Canny edge detector. c Extraction of prominent features. d Gradient images, which encode the direction of the edges. e Matching of contour pieces from the rendered image with the directed edges in the camera image. f Selection of good matches (red+green) and filtering of the final set (green) using RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extraction of the corresponding depth values from the rendered image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Iterative Closest Point (ICP) algorithm minimizes the distance between two point clouds, one of which represents the object model, and the other is obtained from depth measurements [18] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of the visual object tracker, which was proposed in [18]. a Camera overlay of the tracking output. b The gray-level of the image represents the background probability of a pixel, clearly distinguishing the object from its surrounding. The colored lines are the correspondence rays, which are used to align the contour of the object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measurements from vision-based modalities incur significant delays before they are incorporated in the EKF. If the object was moved in the meantime, fusing these measurements without taking the delay into account, will impair the estimation. However, by calculating a correction relative to the state at the time of measurement and subsequently applying it w.r.t. the object frame, this is avoided. a The red object represents a pose, which was estimated based on visual input. b Because of delays, the measurement is only available after the in-hand motion of the object occurred, subsequently distorting the estimation. c By calculating the correction w.r.t. the initial state and applying it in object coordinates, the pose is correctly updated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of the sequence of measurements and EKF updates. Measurements from the joints (green) are immediately available and can be incorporated in the same step. However, visual measurements (red) incur significant and varying delays, even larger than the sampling rate of the image sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxix

94 95

99

99

101

102

xxx

Fig. 4.30

Fig. 4.31

Fig. 4.32

Fig. 4.33

List of Figures

If the measurement delay is larger than the time between two measurements, updates would be applied without considering the effect of the previous correction. Potentially, this would cause the same correction to be applied multiple times. a Before applying the first correction, another measurement of the object pose is made. b The calculated correction would be similar to the previous one, resulting in an overcorrection of the object pose, once applied. c By considering all corrections, which occurred during the measurement delay, this can be avoided . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ground truth setup that was used for the grasp acquisition experiments. a Two sets of tracking LEDs allow to calculate the 6 DoF pose of the object w.r.t. the palm of the robot. b The K610 visual tracking system from Nikon triangulates the 3D positions of the LEDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error graphs for the grasp acquisition experiments. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Ketchup bottle. b Brush. c Shampoo. d Water bottle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ketchup bottle: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line . . . . . . . . . . .

104

106

108

109

List of Figures

Fig. 4.34

Fig. 4.35

Fig. 4.36

Brush: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shampoo: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line . . . . . . . . . . . . . . . . . . . . . . . . . . . . Water bottle: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line. . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxxi

110

111

112

xxxii

Fig. 4.37

Fig. 4.38

Fig. 4.39

Fig. 4.40

Fig. 4.41

List of Figures

Illustration of the initial and target poses on the table for the two pick-and-place tasks. Each experiment consisted of eight trials. a Ketchup bottle: The object was placed twice at each of the four initial poses. b Stacking game: The object was placed at one of four initial positions, oriented in one of two different directions . . . . . Pick-and-place of a ketchup bottle. During the grasp acquisition, the object tilts inside of the hand. If this tilt is not compensated, the bottle may fall over, when being placed at the target location. Using the output of the grasp state estimation, the final positioning of the hand is adjusted to account for the in-hand displacement. a Hand and object before the grasp. b The object tilts inside of the hand before settling in a stable power grasp. c Hand positioning at the target location, without considering the in-hand motion. d Corrected hand placement, based on the estimated grasp state. e Estimated displacement of the object during a successful trial of the pick-andplace task, enabled by the grasp state estimation from joint measurements. (a), (b), and (d) mark the moments in time of the corresponding images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stacking of a pentagon-shaped object on the pins of a wooden board. The three-finger precision grasp moves the object slightly, when it is picked up. Placing the object on the board requires high precision, making high demands on the quality of the pose estimation. a Hand and object before the grasp. b The object moves slightly during the grasp acquisition. c Without correcting the in-hand motion, the object will not be aligned with the pins. d The correct estimation of the object pose allows to stack the object successfully. e Estimated displacement of the object during a successful trial of the stacking game, enabled by the grasp state estimation from joint measurements and visual object tracking. (a), (b), and (d) mark the moments in time of the corresponding images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three-finger precision grasp of a triangular object, which was continuously rotated, back and forth, around its vertical axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimated displacement of the triangle object during the continuous in-hand rotation. The colors denote the different estimation variants: Prediction only (black), joint measurements (red), joint measurements + visual object tracker (green) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

114

116

118

119

119

List of Figures

Fig. 4.42

Fig. 5.1

Fig. 5.2

Fig. 5.3

Fig. 5.4

Fig. 5.5

Fig. 5.6

The combination of finger measurements and visual data enabled the estimation of the in-hand pose of grasped objects, such as this rectangular piece of the stacking game . . . . The kinematics of a robotic arm may not provide sufficient range of motion in order to move a statically grasped object to the desired target pose. Reaching it requires the capability of reorienting the object inside of the hand. a Hand and object before the grasp. b The kinematic constraints of the robotic arm do not allow to stack the object on the pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moving the object to the desired in-hand pose involves the coordinated repositioning of the fingers that are in contact with the object. a Illustration of the initial (solid, yellow) and the desired (dashed) pose of the object. b Displacement of the contacts with the fingers, which generates the desired object motion . . . . . . . . . . . . . . . . . . . . . . . . Maintaining a stable grasp configuration requires applying internal forces on the object. a Exemplary force distribution, which balances the object. b Redistributed forces following the reconfiguration of the grasp . . . . . . . . . . . . . The grasp state estimation provides the current object pose, x, contact positions, c, and normals, n, and corrected joint positions, qˆ . The desired object pose, xdes , is specified by the user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of the surface friction constraint. If the applied contact force lies outside of the friction cone, the finger starts to slide on the surface of the object. The opening angle of the cone is defined by the friction coefficient, μ, which depends on the interacting materials. a If [1] [1] fint, ≤ μfint,⊥ , the finger sticks on the surface of the object. [1] [1] , the finger starts to slid on the surface . . . . . . b If fint, > μfint,⊥ The force distribution problem involves finding a set of internal forces, f int , which are as close as possible to the user-defined forces, f d , while balancing dynamic loads on the object, w dyn , and considering the friction constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxxiii

120

126

127

128

129

130

130

xxxiv

Fig. 5.7

Fig. 5.8

Fig. 5.9 Fig. 5.10

Fig. 5.11

Fig. 5.12

Fig. 5.13

List of Figures

Generating the impedance behavior involves relating a pose error to a force quantity. For the in-hand object positioning, this can be realized on three different levels, entailing dissimilar considerations. a Object-level: The impedance behavior is realized by relating the desired object displacement to an object wrench, which is mapped to a set of contact forces, using the inverse of the grasp matrix, and subsequently to the joint torques. b Contact-level: The object displacement is first related to corresponding changes in the positions of the contact points. Subsequently, the contact forces from the impedance law are mapped to the joint torques. c Joint-level: Here, the compliant behavior is generated for a desired joint displacement, which is obtained from the change in the contact positions, using the inverse of the hand Jacobian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A number of different approaches for the force distribution of an object impedance controller have been proposed in literature. Their behavior can be illustrated as a set of virtual springs, which generate the internal forces on the object. a In the virtual linkage model, attracting forces are generated between each pair of fingertips [2]. b For the dynamic IPC controller, the springs connect to a virtual object [3]. c The static IPC represents a simplification of the dynamic variant, in which the contact forces are directed towards a virtual grasp center [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Block diagram of the proposed control architecture . . . . . . . . . . . The desired object positioning is realized by generating an impedance for the corresponding contact displacements, cx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An additive impedance force between the current contact location, c[1] , and its position in the initial configuration, c[1] init , counteracts the sliding of the finger . . . . . . . . . . . . . . . . . . . . The friction cone is approximated by a pyramid to allow for a linear formulation of the friction constraint. It is described by the contact normal, n, as well as the vectors a and b, which are two arbitrary, perpendicular vectors that are tangential to the surface at c. fa , fb and f⊥ denote the corresponding components of a force vector f . . . . . . . . . . . . The force distribution is adapted to generate the desired object force f ext on the tip of the pen at cext . . . . . . . . . . . . . . . . .

132

134 136

137

138

142 145

List of Figures

Fig. 5.14

Fig. 5.15

Fig. 5.16

Fig. 5.17

Fig. 5.18

Fig. 5.19

Fig. 5.20

While the palm cannot actively apply a force on the object, including it in the calculation of the internal forces will redistribute them, such that a desired palm force (blue) is generated by the other fingers through the object . . . . . . . . . . . . . Because of the nullspace in the mapping from the contact forces to the joint torques, the joint position may drift into mechanical limits or singularities over time. Controlling the nullspace motion allows to prevent this undesirable behavior. a The unconstrained nullspace motion moves the finger in an undesirable joint configuration. b The proposed nullspace controller couples the third and forth joint of the finger, thereby maintaining a preferable configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The inclusion of activation functions, aa,t and ar,t , allows to gradually add and remove contacts, thereby avoiding jumps in the commanded joint torques. a The activation function for adding a contact to the configuration. b The activation function for removing a contact from the configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . When switching controllers, the command joint torque, [j] [j] τcmd ,t , transitions from τq , the output of the joint-level [j] controller, to τo , the output of the object-level controller, within a specified duration of to . . . . . . . . . . . . . . . . . . . . . . . . . Achieving large object displacements inside of the hand requires the reconfiguration of the object in order to deal with the physical limitations of the workspace of the fingers. a A circular object shall be continuously rotated around its symmetry axis. b Eventually, kinematic constraints will limit the range of motion. c By repositioning the fingers, the movability of the object is restored. d The object can be rotated further, until the next reconfiguration is required . . . . When repositioning a finger, the contact point is moved along a two-part trajectory, which lifts the finger off of the object, before placing it again. a Illustration of the trajectory (red) of the contact point on the finger. b The desired contact position, cdes , t, transitions from the start point, c0 , through the intermediate point, cl , to the desired end point, cdes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The objects and initial grasp configurations used in the evaluation of the tracking accuracy of the object controller. a A pentagon shape, grasped with three fingers. b A tennis ball, grasped with four fingers. c A brush, grasped with five fingers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxxv

146

149

151

152

154

156

159

xxxvi

Fig. 5.21

Fig. 5.22

Fig. 5.23

Fig. 5.24

List of Figures

Pentagon: Change in the position and orientation of the object during exemplary displacements as part of the tracking evaluation. a Actual and desired object pose for a commanded displacement of 20 mm along the z-axis. b Actual and desired object pose for a commanded displacement of 45◦ around the z-axis. c Actual and desired object pose for a commanded coupled, periodic trajectory along and around the z-axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tennis ball: Change in the position and orientation of the object during exemplary displacements as part of the tracking evaluation. a Actual and desired object pose for a commanded displacement of 20 mm along the z-axis. b Actual and desired object pose for a commanded displacement of 45◦ around the z-axis. c Actual and desired object pose for a commanded coupled, periodic trajectory along and around the z-axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brush: Change in the position and orientation of the object during exemplary displacements as part of the tracking evaluation. a Actual and desired object pose for a commanded displacement of 20 mm along the z-axis. b Actual and desired object pose for a commanded displacement of 45◦ around the z-axis. c Actual and desired object pose for a commanded coupled, periodic trajectory along and around the z-axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brush: Illustration of the in-hand motion of the object during the grasp acquisition, both with and without the application of the object controller. a Initial pose of the object and hand before the grasp. b Terminal pose of the grasped object, if no object controller is used. c Terminal pose of the grasped object, when using the proposed object controller. d Change in position and orientation of the object during the grasp execution without any object control (black), as well as with the proposed controller being active (red). The dashed line marks the moment, at which the controller was activated, enabled by the detection of the third contact . . . . . . . . . . . . . . . . . . . . . . . .

161

162

163

164

List of Figures

Fig. 5.25

Fig. 5.26

Fig. 5.27 Fig. 5.28 Fig. 5.29 Fig. 5.30

Fig. 6.1

Fig. 6.2

Water bottle: Illustration of the in-hand motion of the object during the grasp acquisition, both with and without the application of the object controller. a Initial pose of the object and hand before the grasp. b Terminal pose of the grasped object, if no object controller is used. c Terminal pose of the grasped object, when using the proposed object controller. d Change in position and orientation of the object during the grasp execution without any object control (black), as well as with the proposed controller (red). The dashed line marks the moment, at which the controller was activated, enabled by the detection of the third contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of the sequence of steps for the extended rotation of the grasped tennis ball. a Initial pose of the object and the fingers. b Grasp configuration after rotating the object 30◦ around ψ. c Removal of the thumb (green) from the grasp configuration. d Placement of the thumb at the new location. e Fully reconfigured grasp that allows to rotate the object further. f Object pose after the second rotation . . . . . . . . . . . . . . . . . . . . . . . Actual and desired position of the thumb contact during the first relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Redistribution of the absolute contact forces of the fingers during the consecutive reconfiguration of the tennis ball . . . . . . . Change in position and orientation of the tennis ball during the full revolution around ψ . . . . . . . . . . . . . . . . . . . . . . . . Applying the developed in-hand object controller to the stacking game allows to overcome the kinematic limitations of the robotic arm. a Before, kinematic constraints prevented the robot from stacking the game piece. b The ability to reorient the object inside of the hand allows to align it with the pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . Besides the humanoid robot David, the developed in-hand localization method was integrated on two additional robotic systems at DLR. a The mobile robot EDAN and its under-actuated DLR HIT hand [1]. b The humanoid robot Rollin’ Justin, equipped with the DLR Hand II [2] . . . . . . . . . . . Additional sensor measurements will allow to further improve the quality of the grasp state estimation and in-hand object control. a Tactile sensors, which are integrated in the skin of the fingers, provide reliable contact information. b Visually tracking the fingers and palm of the hand increases the accuracy of the estimated contact locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xxxvii

165

166 167 168 169

170

175

178

List of Tables

Table 1.1 Table 4.1

Table 4.2

Table 4.3 Table 4.4 Table 5.1

Table 5.2

List of publications in the context of this work . . . . . . . . . . . . . . Comparison of good (+), bad (−) and neutral (/) properties of the particle filter and the extended Kalman filter according to [5]. The main advantage of the EKF is its computational efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mean and standard deviation of the terminal absolute errors in position and orientation for the grasp acquisition experiments (sample size of N = 10) . . . . . . . . . . . . . . . . . . . . . . Success rate of the pick-and-place tasks for the different pose estimation variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average drift of the pose estimation variants during the in-hand rotation of the triangle . . . . . . . . . . . . . . . . . . Mean and standard deviation of the terminal absolute position and orientation errors of the tracking evaluation (sample size of N = 20) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mean and standard deviation of the terminal and maximum errors in position and orientation of the grasp acquisition experiments (sample size of N = 10) . . . . . . . . . . . . . . . . . . . . . .

9

69

107 118 119

159

167

xxxix

Chapter 1

Introduction

Futurists envision a not too distant world, in which robots are ubiquitous and an inherent part of our everyday life. Designed to assist us in undesirable tasks, these autonomous systems will integrate seamlessly in our environment. In our homes, they will clean, prepare meals or do laundry. In dangerous scenarios, such as disaster response, they will be the first on the scene. On extraterrestrial worlds, they will set up infrastructure in preparation of subsequent human missions. However, this vision is not yet reality. So far, most robotic systems are confined to industrial settings, in which the environment is highly structured and adapted to the individual robot. Realizing the illustrated vision of the future requires robots, which operate in settings that have not been designed for them. Instead, in large part, they have to function in environments that are primarily adapted to humans. Operating in a typical human environment, such as a household setting, involves interacting with a variety of objects and tools that have not been specifically created for the robot. Indeed, the ideal home robot is expected to be able to use existing appliances and tools, such as vacuum cleaners, drills or kitchen utensils, which have all been designed for humans. Consequently, interacting with this wide range of objects requires human-like manipulation skills. The ability of humans to precisely manipulate objects with their hands was made possible by the evolution of the opposable thumb. Moreover, the resulting dexterity of the hand enabled humans to develop and use tools. Indeed, the capabilities of the human hand were essential for the technological development of our species [1]. Therefore, it is not surprising that roboticists have long dreamed of creating artificial manipulators that are as capable as the human hand. The development of such robotic hands saw great progress in the last several decades, culminating in the creation of anthropomorphic manipulators, which come close to the human hand in size and performance [2]. However, achieving dexterity also requires the development of the corresponding cognitive capabilities. Humans are not born with the ability to manipulate objects with precision. It is one of the skills, which have to be mastered © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. Pfanne, In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands, Springer Tracts in Advanced Robotics 149, https://doi.org/10.1007/978-3-031-06967-3_1

1

2

1 Introduction

Fig. 1.1 At the age of 18 months, my daughter is developing her fine motor skills. Solving this stacking game requires hand-eye coordination and the ability to move a grasped object inside of her hand with precision

by a child in the first years of its life. Through playful interactions with the environment, it learns about the functioning of the world and its body at the same time [3]. Stacking games, such as the one in Fig. 1.1, allow to develop and test the fine motor skills of a child. Accomplishing this task involves picking up the individual pieces and placing them on the corresponding pins of the wooden board. W.r.t. the dexterous manipulation capabilities of the child, this poses a number of challenges, such as hand-eye coordination and the ability to reposition a grasped object inside of the hand. So far, accomplishing manipulation tasks, which require this level of dexterity, with robotic systems has proven difficult [4]. In fact, most practical applications of robots have been limited to simple pick-and-place scenarios. Addressing this challenge, this manuscript introduces a novel model-based dexterous manipulation framework, which enables the precise control of grasped objects. Principally, the proposed system consists of the three main algorithmic components that are required to facilitate skillful in-hand manipulation: • A model, which provides an internal representation of the hand-object system. • A perception system, which infers the state of the manipulated object from inaccurate sensor measurements. • A control component, which generates actions in order to regulate the in-hand pose of the grasped object. The robot requires an internal model of the scene to be able to reason about the interactions between an object and its own hand. Specifically, this requires the modeling of the kinematics and dynamics of the grasp system. The kinematic description

1 Introduction

3

of the grasp includes the in-hand location of the object, the positions of the fingers, as well as the contacts between them. The dynamics model of the grasp describes the physical behavior of the hand-object system, such that the effect of actions on the state of the grasp can be anticipated. Accordingly, the formulation of a suitable grasp representation is fundamental to this manuscript. Since the state of the grasp cannot be measured directly, the role of the perception component is to infer it from sensor measurements. In simple manipulation scenarios, grasps are usually executed “open loop,” i.e. without considering sensor feedback during the operation. Consequently, unforeseen motions of the object during the grasp are not observed. This may be acceptable in scenarios, which do not require high precision. Accomplishing more challenging tasks, such as the stacking game, is not possible based on this approach. The accurate positioning of the game pieces w.r.t. the board is required. Robots typically rely on computer vision for the localization of objects. However, in the context of manipulation, occlusions by the hand impair this measurement. A consistent integration of different sensing modalities is needed in order to infer the in-hand pose of a grasped object. Humans take advantage of their sense of touch, proprioception and visual perception to accomplish manipulation tasks. Robots have to incorporate similar sensor fusion approaches to obtain the best possible estimate of the grasp state, as is elaborated in this book. Finally, the ability to control the object is required to accomplish advanced manipulation tasks. While positioning the hand allows to move the object, the workspace of the arm may be insufficient to reach the desired target pose. Figure 1.2 shows the DLR humanoid David, tasked with stacking one of the pieces of the children’s game.

Fig. 1.2 Accomplishing the stacking game with a humanoid robot challenges its dexterous manipulation abilities. The robot has to be able to accurately position the grasped pieces w.r.t. the wooden board, requiring the in-hand control of the object

4

1 Introduction

Unable to move the base of the robot, the kinematics of the arm do not allow to align the object with the corresponding pins of the board. Faced with this situation, humans use their fingers to reorient the game piece inside of their hands. Robots need the same skill in order to accomplish this task, which is one of the main objectives of this book. The grasp model, perception system and in-hand control method represent the three main components of a dexterous manipulation framework. This book is concerned with the detailed formulation of such a system and proposes novel approaches, which extend the capabilities of previous algorithms. The first section of this introductory chapter further explores the topic of dexterous manipulation, detailing the challenges of each component, as well as the relations between them. Subsequently, the specific contributions of this book are outlined. Finally, the organization of this work is presented.

1.1 Dexterous Manipulation In industrial scenarios, most robots interact with their environment using simple manipulators, such as grippers. In these highly structured settings, the application and the robot are adapted to each other, thereby reducing the demands and uncertainty of the task. However, moving robotic systems off the factory floor and into our homes requires manipulators, which are much more versatile. In these unstructured and unpredictable environments, robots have to be able to interact with a wide variety of objects in unforeseen configurations. Inspired by the human hand, multi-fingered robotic hands have been developed as dexterous manipulators, which enable complex interactions with the environment. Anthropomorphic robotic hands, like the one of the DLR robot David, have come close to their human counterparts, w.r.t. size and performance [5]. Moreover, the integration of elastic elements make them robust to collisions, which are unavoidable in unstructured environments. Finally, the miniaturization of tactile sensing technologies allowed to equip robotic hands with a sense of touch. Altogether, dexterous robotic manipulators have progressed considerably in the past decades. However, enabling these hands to manipulate objects with precision also requires advanced algorithms, which allow to handle the complexity of the system. Controlling the in-hand pose of a grasped object demands the accurate coordination of all fingers that are in contact with the object. The large number of degrees of freedom (DoF), as well as the dynamics of the contacts between the fingers and the object, make it challenging to model the physical behavior of the grasp system. Moreover, the control of a grasped object relies on knowledge about its in-hand location, which has to be inferred from a combination of sensor measurements, each affected by inaccuracies and noise. The integration of these elements into a common dexterous manipulation framework is illustrated in Fig. 1.3. In the following, the specifics of each algorithmic component are described.

1.1 Dexterous Manipulation

5

Fig. 1.3 Illustration of the main components of a dexterous manipulation framework. Observations from the environment are perceived through sensors of the robot and processed by the perception component. The control component generates commands, which are executed by the actuators of the robot. Both components rely on an internal model, which describes the behavior of the hand-object system

Model The purpose of the grasp model is to provide a mathematical representation of the physical interactions that occur during the manipulation of an object with a robotic hand (see Fig. 1.4). These descriptions are essential for both the estimation of the

Fig. 1.4 The knowledge of the scene is internally represented by a model, which describes the kinematic and dynamic relations of the grasp system

6

1 Introduction

grasp state from perceptual inputs and the control of the in-hand pose of the object. Analyzing the grasp allows to relate the various quantities of the system. Principally, the hand-object system is defined by three main components, which are the fingers of the manipulator, the object and the contact points between them. Motions of the joints of the fingers affect the position of the contacts and subsequently cause a displacement of the object. An appropriate kinematics model of the grasp mathematically relates these different quantities, allowing to predict the movement of the object from changes in the joint positions. W.r.t. the dynamics of the system, grasping an object involves the application of forces on the object, which are generated by the actuators of the joints and transmitted through the contact points. Controlling the loads on the objects requires a model, which describes the relation between the various dynamics quantities of the grasp. Finally, the contacts between the fingers and the object are subject to kinematic and dynamic constraints, such as unilaterality and friction, which have to be considered in the grasp modeling.

Perception To be able to accomplish challenging manipulation operations, knowledge of the pose of the object inside of the hand is fundamental. In fact, even simple pick-andplace tasks may fail if the assumed location of the object is unreliable. Typically, in robotic manipulation scenarios, the object pose before the interaction is known or determined visually, which allows to plan the grasp of the object. However, when executing the grasp, approximations in the planning model, deviations in the motion of the fingers, or various other sources of inaccuracies may result in unpredicted motions of the object. While the object may still reach a stable grasp configuration, neglecting the unforeseen in-hand displacement may impair the outcome of a task. For example, in the stacking game, even small errors in the knowledge of the object pose make it impossible to place a game piece successfully. Consequently, a means to determine the location of a grasped object inside of the hand is required. While pure camera-based methods may be sufficient to locate the object before being picked up, within the hand, occlusions by the manipulator severely restrict the utility of this sensing modality. In contrast, tactile sensing allows to infer information about the hand-object state, once the fingers are in contact. However, depending on the grasp configuration, the fingers may not fully constrain the estimated object pose. When manipulating objects, humans rely on a combination of vision, proprioception and their sense of touch in order to perceive the state of the grasp. Enabling precise manipulation with robotic hands requires a similar multimodality approach, which consistently integrates different measurement modalities, such as joint measurements, tactile sensing and visual data, into a common framework. For dexterous manipulation scenarios, in which the in-hand pose of the object is controlled, a continuous real-time estimation is required. Therefore, the motion of

1.1 Dexterous Manipulation

7

Fig. 1.5 For the stacking game, the primary purpose of the perception system is to determine the in-hand pose of a grasped object, as well as the board, utilizing different sensing modalities

the object has to be predicted from displacements of the fingers. For a controller to be able to regulate the forces on the object, the perception system has to provide the complete grasps state. This comprises not only of the object pose, but also the location of the contact points, as well as finger positions, which are consistent with the location of the object. Finally, to be able to accurately place an object, the hand has to be properly positioned w.r.t. a specific target. For the geometric pieces of the children’s game to be stacked correctly, the objects have to be moved precisely above the pins, before releasing them. Beyond the in-hand pose of the object, this requires knowledge about the location of the board relative to the hand, as depicted in Fig. 1.5. Even if the global position of the target is known, inaccuracies in the kinematics of the arm of the robot will cause a misalignment. Therefore, the target has to be located relative to the hand to allow for the necessary precision. Since the state of the grasp cannot be measured directly, it has to be inferred from related quantities, which are provided by different sensors. In a real-world system, all of these measurements are subject to inaccuracies and noise. The perception system has to account for the resulting uncertainties in order to determine the best estimate of the state.

Control For many manipulation scenarios, the freedom of the robotic arm is sufficient to move an object from its initial location to a target pose. However, the potential object poses that can be reached by the arm are limited by its kinematic constraints. Indeed, the limitations of the David arm do not allow to move all the pieces of the stacking game from arbitrary initial orientations to the corresponding target locations on the board,

8

1 Introduction

Fig. 1.6 The control component is tasked with positioning the object inside of the hand, such that the arm can reach the desired target location above the board

as illustrated in Fig. 1.6. Faced with similar situations, humans use their fingers to rotate objects inside of their hands, thereby extending the range of reachable orientations. Robotic hands require a similar capability to be able to accomplish such tasks. Moreover, some applications require the explicit in-hand repositioning of a grasped object, e.g. when unscrewing the lid of a bottle or writing with a pen. Therefore, the ability to control the in-hand object pose is fundamental. Relocating a grasped object inside of the hand involves the coordinated motion of all fingers that are in contact with the object. Additionally, the forces on the object have to be controlled, such that the fingers remain in contact. In order to avoid the unintended sliding of the fingers on the surface of the object, friction constraints have to be considered as well. The ability to control the in-hand pose of the object also allows to correct the unintended displacement of the object during the acquisition of the grasp. However, in certain applications, it is imperative to prevent the unintended object motion altogether. For instance, when grasping a full glass of water, tilting the object at any point in time will cause some of its contents to be spilled. To avoid spilling, the controller has to be able to stabilize the object during the execution of the grasp. In contrast to the previously described scenarios, during the acquisition of the grasp, the object is not yet maintained in a set configuration, i.e. involving a fixed number of contacts. Instead, as the fingers approach the object, new contacts are successively established. The object controller has to account for the changing grasp configuration by appropriately redistributing the forces on the object as the grasp acquisition progresses. When moving a grasped object inside of the hand, the range of motion is constrained by the kinematic limitations of the fingers. Sequentially relocating the fingers w.r.t. the object allows to further extend the workspace. Similar to the grasp stabilization, finger gaiting requires the repeated reconfiguration of the grasp. Moreover, a controller, which realizes the repositioning of the fingers, is needed.

1.2 Contribution

9

1.2 Contribution The goal of this book is the realization of the individual algorithmic components required for dexterous manipulation, as well as their integration into a unified architecture. This consists of the formulation of a common grasp model, which describes the kinematic and dynamic behavior of the hand-object system, providing the basis for the perception and control algorithms. The development of a novel grasp state estimation method realizes the combination of tactile sensing, proprioception, as well as vision into a consistent probabilistic framework. Finally, an impedance-based controller, which allows to regulate the in-hand pose of a grasped object, is proposed. The specific contributions to each of these elements are summarized in this section. Table 1.1 lists the most important publications, which resulted from the work in the context of this book.

Table 1.1 List of publications in the context of this work References Descriptions Journal: [6]

Journal: [7]

Journal: [8] (under review) Conference: [9]

Conference: [10]

Conference: [11]

Workshop: [Pfanne and Chalon 2016]

Pfanne, M., Chalon, M., Stulp, F. and Albu-Schäffer, A. (2018). Fusing joint measurements and visual features for in-hand object pose estimation. IEEE Robotics and Automation Letters, 3(4):3497–3504 Pfanne, M., Chalon, M., Stulp, F., Ritter, H. and Albu-Schäffer, A. (2020). Object-level impedance control for dexterous in-hand manipulation. IEEE Robotics and Automation Letters, 5(2):2987–2994 Lange, F., Pfanne, M., Steinmetz, F., Wolf, S. and Stulp, F. (2020). Friction estimation for tendon-driven robotic hands. Submitted to IEEE Robotics and Automation Letters Chalon, M., Pfanne, M. and Reinecke, J. (2013). Online in-hand object localization. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2977–2984. IEEE Pfanne, M. and Chalon, M. (2017). EKF-based in-hand object localization from joint position and torque measurements. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2464–2470. IEEE Stoiber, M., Pfanne, M., Strobl, K., Triebel, R. and Albu-Schäffer, A. (2020). A sparse gaussian approach to region-based 6 dof object tracking. In 2020 Asian Conference on Computer Vision (ACCV). Springer Pfanne, M. and Chalon, M. (2016). EKF-based in-hand object localization from tactile sensing. Workshop on Tactile sensing for manipulation: new progress and challenges, At 2016 IEEE-RAS International Conference on Humanoid Robots (Humanoids)

10

1 Introduction

Grasp Modeling The grasp modeling is fundamental for the realization of the grasp state estimation and in-hand control techniques. Based on previous work on grasp analysis, in particular [12], this work presents a model of the kinematic and dynamic relations of the grasp. Choosing an appropriate grasp model involves weighing trade-offs between accuracy and computational performance. For the selected model, which relies on the rigid body assumption, the grasp matrix and hand Jacobian matrix are identified as the most important tools for the mapping between different kinematic and dynamic quantities of the grasp. Moreover, the presented subspace formulations allow to take advantage of the nullspaces of these mapping, which are exploited in the design of the in-hand controller. Another consideration in the description of the grasp is the selection of the contact model. A discussion of different options yields the hard finger model under consideration of friction constraints as the most applicable choice. Finally, limitations of the presented model w.r.t. different types of grasps are discussed, which ultimately determine the applicability of the perception and control methods.

Grasp State Estimation For the object localization, a multi-modality approach is proposed, which probabilistically estimates a consistent grasp state, comprising of the in-hand object pose, contact locations, as well as errors in the joint position measurements. The main concepts of the in-hand localization method have been described in [6]. The specific contributions are as follows: • Based on an extended Kalman filter, a flexible estimation framework is presented, which allows to integrate a range of different sensor measurements, while at the same time explicitly considering the uncertainty of these inputs. The formulation allows for an efficient implementation, which supports the utilization of the system as part of a real-time action-perception loop. • The proposed method relies on joint position values as the minimal set of sensor measurements. They allow to identify and subsequently correct inconsistencies between the finger positions and the assumed location of the object. Moreover, using the kinematic grasp model, displacements of the object are predicted from the motion of the fingers. The estimation is enabled by a novel extension to the Gilbert–Johnson–Keerthi distance algorithm, which is used to precisely determine the location of the deepest penetration between the object and a finger link. • Tactile sensing information from the fingers are utilized to detect contacts with the object. Consequently, the estimated grasp state is updated, such that the corresponding bodies are in alignment. Additionally, joint torque measurements are used to infer contacts if no direct sensing capability is available.

1.2 Contribution

11

• In addition to the measurements from the hand, the proposed framework allows to fuse different types of visual information in the estimation. Artificial features from fiducial markers on the object allow to improve the estimated object pose. Foregoing the need for fiducial markers, a novel visual perception pipeline is proposed, which extracts naturally occurring features on the contour of the object from monocular RGB or grayscale camera images. Artificial and natural features are similarly fused in the state estimation, enabling both to be used complementarily. Finally, the integration with a visual object tracker illustrates the probabilistic fusion with an independent estimate of the 6 DoF object pose. Delays in the sensors measurements are explicitly considered in the fusion based on a novel extrapolation approach. • The developed in-hand localization system is demonstrated for various objects and in different grasping scenarios, using the anthropomorphic robotic hand of the DLR humanoid David. These experiments include the previously discussed stacking game, thereby validating the method in a challenging manipulation task.

Impedance-Based Object Control Enabled by the grasp model and grasp state estimation, the proposed controller realizes the compliant in-hand positioning of the object, while balancing the forces on it. The main contributions to this topic have been published in [7] and are further elaborated in this book: • The object controller consists of two main components. First, the impedance-based positioning controller regulates the 6 DoF object pose. Realizing the compliant behavior on contact-level ensures the stability of the fingers, even in the case of contact loss. Moreover, it allows to actively maintain the desired grasp configuration, thereby limiting the slippage of fingers. • The second principal component of the controller is the distribution of the internal forces on the object. The proposed method explicitly considers friction constraints in order to prevent the sliding of contacts. At the same time, the contact forces are chosen to balance any dynamical loads on the object. The resulting optimization problem is solved using quadratic programming techniques. • For anthropomorphic fingers, the mapping of the desired contact force on a fingertip to the corresponding joint torques contains a non-trivial nullspace, which allows joints to drift into mechanical limits or singular configurations. An additional nullspace controller exploits this subspace, thereby preventing the undesirable drift. • Beyond the positioning of an object in a static grasp configuration, the proposed framework is extended to also support the reconfiguration of the grasp, i.e. the addition and removal of contacts. The inclusion of continuous activation functions realizes the gradual redistribution of the contact forces. The ability to reconfigure the grasp is demonstrated in two scenarios. First, the object controller allows

12

1 Introduction

to stabilize the object pose during the grasp acquisition, thereby reducing any unintended displacements. Second, the presented method is applicable in finger gaiting scenarios, which involve the active reconfiguration of the grasp. Moreover, a Cartesian finger controller is described, which is capable of repositioning fingers on the surface of the object. • A series of dexterous manipulation experiments demonstrate the capabilities of the system. Utilizing the David hand, these scenarios include the in-hand positioning of different objects, the stabilization of the grasp acquisition and the full in-hand revolution of a tennis ball, using finger gaiting.

1.3 Organization of this Work This section describes the structure of the remainder of this manuscript. Figure 1.7 illustrates the organization of the chapters. In particular, it highlights the correspondence between the components of the dexterous manipulation framework in Fig. 1.3 and the main chapters of this book. In summary, the outline of this work is as follows: • Chapter chapterspsrelatedspswork provides an overview of related work on the topic of dexterous manipulation. Following a resume of the history of dexterous robotic hands, a brief introduction to the DLR robot David is presented. The remainder of the chapter is concerned with a survey of the state of the art in dexterous manipulation algorithms. In particular, past work on the topics of grasp state estimation and impedance-based object control is discussed. • Chapter chapterspsmodeling formulates the grasp model, which is fundamental to the development of the subsequent in-hand localization and control methods. Moreover, it introduces the most important quantities for the description of the hand-object system. The derivation of the kinematic and dynamic relations of the system are both shown to be defined by the grasp matrix and hand Jacobian matrix. The formulation of the corresponding subspaces illustrates how under-constrained mappings can be exploited to realize secondary objectives. • Chapter chapterspsestimation presents the proposed grasp state estimation framework. Following a conceptual description of the method, the design of the probabilistic filter is discussed. Consequently, the incorporation of the various measurement modalities is described, starting with the contact detection and localization. Next, the state estimation from joint position measurements is derived, which represents the baseline variant of the filter. The subsequent explanation of the data fusion approaches for different types of visual information, including fiduacial markers, contour features and object tracking, represent optional extensions to the system. A range of experiments compare different combinations of sensor inputs and demonstrate the applicability of the algorithm in various grasping scenarios.

1.3 Organization of this Work

13

Fig. 1.7 Structure of this book. Chapters 3 to 5 correspond to the main components of the proposed dexterous manipulation framework

14

1 Introduction

• Chapter chapterspscontrol describes the novel impedance-based object controller. A discussion of different impedance concepts yields the design of the developed positioning controller. Similarly, the comparison of previous force distribution methods reveals their various shortcomings, which are addressed by the proposed formulation. The balancing of the internal forces, under consideration of dynamic object loads and friction constraints, is shown to produce a quadratic optimization problem. Moreover, the control framework is extended to support the reconfiguration of the grasp during the grasp acquisition and in finger gaiting scenarios. The experimental validation assesses the performance of the system in a series of dexterous manipulation tasks. • Chapter chapterspsconclusion concludes this work by summarizing its results and contributions. A discussion of the proposed framework highlights strengths and limitations of the developed methods. Moreover, an outlook on open research questions provides an insight into potential future work, which could address the limitations of the proposed concept and further expand its applicability.

References 1. Mary W. Marzke and Robert F. Marzke. Evolution of the human hand: Approaches to acquiring, analysing and interpreting the anatomical evidence. Journal of Anatomy, 197 (1):121–140, 2000. 2. Markus Grebenstein. Approaching human performance: The functionality-driven Awiwi robot hand. PhD thesis, ETH Zurich, 2012. 3. R. Jason Gerber, Timothy Wilks, and Christine Erdie-Lalena. Developmental milestones: Motor development. Pediatrics in Review, 31 (7):267–277, 2010. 4. Piazza, C., G. Grioli, M.G. Catalano, and A. Bicchi. 2019. A century of robotic hands. Annual Review of Control, Robotics, and Autonomous Systems 2: 1–32. 5. Grebenstein, Markus, Maxime Chalon, Werner Friedl, Sami Haddadin, Thomas Wimböck, Gerd Hirzinger, and Roland Siegwart. 2012. The hand of the DLR hand arm system: Designed for interaction. The International Journal of Robotics Research 31 (13): 1531–1555. 6. Pfanne, Martin, Maxime Chalon, Freek Stulp, and Alin Albu-Schäffer. 2018. Fusing joint measurements and visual features for in-hand object pose estimation. IEEE Robotics and Automation Letters 3 (4): 3497–3504. 7. Pfanne, Martin, Maxime Chalon, Freek Stulp, Helge Ritter, and Alin Albu-Schäffer. 2020. Object-level impedance control for dexterous in-hand manipulation. IEEE Robotics and Automation Letters 5 (2): 2987–2994. 8. Friedrich Lange, Martin Pfanne, Franz Steinmetz, Sebastian Wolf, and Freek Stulp. Friction estimation for tendon-driven robotic hands. IEEE Robotics and Automation Letters, 2020. 9. Maxime Chalon, Martin Pfanne, and Jens Reinecke. Online in-hand object localization. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2977–2984. IEEE, 2013. 10. Martin Pfanne and Maxime Chalon. EKF-based in-hand object localization from joint position and torque measurements. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2464–2470. IEEE, 2017. 11. Manuel Stoiber, Martin Pfanne, Klaus Strobl, Rudolph Triebel, and Alin Albu-Schäffer. A sparse gaussian approach to region-based 6DoF object tracking. In 2020 Asian Conference on Computer Vision (ACCV). Springer, 2020. 12. Domenico Prattichizzo and Jeffrey C. Trinkle. Grasping. In Springer Handbook of Robotics, chapter 28, pages 671–700. Springer, 2008.

Chapter 2

Related Work

The goal of this chapter is to provide a concise overview about the state of the art in dexterous manipulation, both in hardware and software. In the first section, the progress in the development of robotic hands, which are able to support the execution of advanced in-hand manipulation tasks, is reviewed. It culminates in the presentation of the DLR humanoid robot David and, in particular, its anthropomorphic hand, which was the primary research platform for the work in this manuscript. Following the introduction of the relevant robotic hardware, the second section discusses the current state of dexterous manipulation w.r.t. algorithmic advances in the field. Subsequently, an overview of related work on the topic of grasp state estimation allows to place the proposed algorithm in the context of recent research. Finally, past solutions to the in-hand control problem that have been presented in literature are discussed. Figure 2.1 summarizes the most relevant references.

2.1 Dexterous Robotic Hands 2.1.1 A Brief History of Robotic Hands The work on robotic hands, which are designed for dexterous manipulation, has been ongoing for almost 40 years. The first hand that was specifically developed for this purpose was the Salisbury Hand [1], shown in Fig. 2.2a. Its three fingers each consist of three joints, allowing it to control the 6 DoF pose of a grasped object. The mathematical considerations, which led to the creation of the Salisbury Hand, were fundamental for the research on robotic grasping that followed since. However, w.r.t. the mechanical design, the development of subsequent hands aimed at creating manipulators that were closer to the human hand, starting with the Utah/MIT Hand [5]. This four-fingered manipulator is driven by tendons and has touch-sensing © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. Pfanne, In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands, Springer Tracts in Advanced Robotics 149, https://doi.org/10.1007/978-3-031-06967-3_2

15

16

2 Related Work

Fig. 2.1 Overview of the most relevant references in the context of this work

capabilities. Another consideration in the early research on robotic hands was the choice of their kinematics. Designed to resemble the glove of an astronaut, the Robonaut Hand is a five-fingered manipulator with 14 independent DoF [6]. Both of these hands are remotely actuated, since it was not possible at the time to integrate the actuators inside of the manipulators. However, subsequent advances allowed for the development of modular hands, such as the DLR Hand II [2], which contains the drives of the joints within the hand (see Fig. 2.2b). An in-depth overview of the progress in the development of dexterous robotic hands in the previous century is provided in [7]. Since then, robotic manipulators have further advanced towards their human inspiration. On the one hand, the integration of elastic actuators aimed at creating robotic hands, which are robust to unforeseen collisions with the environment [8, 9]. On the other hand, the anthropomorphic designs of these robotic manipulators have come ever closer to approximating the human hand [10], as exemplified by the popular Shadow Hand, shown in Fig. 2.2c [3]. The work on human-inspired robotic hands at DLR culminated in the creation of the Hand-Arm-System, which was later renamed to David. This robot integrates variable stiffness-actuation to realize increased robustness and performance. It was also the primary robotic platform for the research in the context of this manuscript. Therefore, a more detailed presentation follows in the subsequent part of this section. While robotic hands have advanced considerably in the last decades, many algorithmic challenges in the context of dexterous manipulation remain. Many of these longstanding problems have not seen the same progress as the hardware, on which they are implemented. This situation is illustrated by a novel trend towards simplified robotic hand designs. Because of the great complexity of fully-actuated dexterous hands, robotic manipulators with a reduced set of capabilities, such as the soft RBO Hand 2 [11], have been more practical to apply in real-world applications. The Pisa/IIT SoftHand, which is shown in Fig. 2.2d, exploits adaptive synergies to position its 19 joints using only a single actuator [4]. A review of this current work is

2.1 Dexterous Robotic Hands

17

(a)

(b)

(c)

(d)

Fig. 2.2 A selection of robotic hands, which illustrate the progress and research trends of the past 40 years. a The Salisbury hand is the first robotic manipulator that was specifically designed for dexterous manipulation [1]. b Miniaturization allowed to integrate the joint drives of the DLR Hand II inside of the manipulator [2]. c The anthropomorphic design of the Shadow Dexterous Hand approximates the kinematics of the human hand [3]. d Ongoing algorithmic challenges motivated the creation of simplified and soft manipulators, such as the Pisa/IIT SoftHand [4].

presented in [12]. This trend highlights the need for novel solutions, which are able to handle the complexity of dexterous robotic hands and fully take advantage of these advanced manipulators.

18

2 Related Work

2.1.2 DLR David The DLR robot David, formerly known as the Hand-Arm-System, was developed as a research platform for the study of advanced robotic manipulation [14]. It is shown in Fig. 2.3a. All components of this system follow the common design principle of mechanical compliance. This means, elastic elements, such as mechanical springs, are integrated in the joints of the robot, resulting in improved robustness compared to mechanically stiff robots. Initially, the robot consisted of two main components, its right arm and hand. The arm of the robot houses seven variable stiffness actuators. Two motors per joint allow to independently control the position and stiffness of each joint. Moreover, the spring, which is integrated in the mechanism of the joint, is able to store energy, allowing the robot to move faster than a stiff joint of the same size. The second component of the initial setup is its anthropomorphic hand [15] (see Fig. 2.3b). The five-fingered, tendon-driven manipulator was developed to approach the human hand, both in size and performance. To reach this goal, the actuators of the hand are fully integrated in the forearm of the arm. Following the design principle of the system, the joints of the fingers are mechanically compliant. Spring mechanisms

(a)

(b)

Fig. 2.3 The DLR humanoid robot David has been the primary research platform for the developed algorithms. Because of its anthropomorphic design and mechanical compliance, David is close to a human in size and performance. a David consists of two arms and an actuated neck, including an anthropomorphic right hand and a left hand gripper [13]. b The fully-actuated, tendon-driven hand of David enables human-like dexterous manipulation [13]

2.1 Dexterous Robotic Hands

19

along the routing of the tendons allow to control the stiffness of each joint. In total, 36 motors are used to actuate the same number of tendons. Later, the humanoid robot was extended with additional hardware components, starting with a left arm. However, instead of a complete left hand, only a two-finger gripper was mounted as the end effector. The integration of an actuated neck enabled the controlled positioning of the head of the robot. Here, the mechanical compliance is realized by an elastic continuum mechanism, which is controlled by four tendons [16]. The head houses an Intel RealSense D435 camera, which provides RGB and depth images [17].

The David Hand The five fingers of the right hand of David are fully-actuated, consisting of four independent positional DoF per finger. Each joint is antagonistically actuated by multiple tendons. Figure 2.4 illustrates the coupling of two finger joints and the corresponding four tendons. The position of the joints, q, is determined by the motor positions, θ motor , as well as the elongation of the tendons, l td . The length of a tendon is affected by the position of the spring mechanism, which results from the tendon force, f td . Since there are no link-side position or torque sensors, the respective quantities have to be inferred from the motors positions and tendon force measurements. Consequently, the calculated joint values are affected by a number of inaccuracies, including the limited precision of the tendon measurements and the inability to observe friction in the joints of the fingers. Based on these quantities, a cascaded joint controller has been developed. The structure of the control architecture is shown in Fig. 2.4. The joint impedance controller enables the compliant positioning of the fingers, specified by the desired joint positions, q des . The output of the impedance controller is a set of desired joint torques, τ des , which can be mapped to the corresponding tendon forces, f td,des , based on

Fig. 2.4 Illustration of the coupling between two joints of the hand and the corresponding four tendons [18]

20

2 Related Work

the coupling description. Subsequently, the regulation of the desired tendon forces is realized by the force controller. The antagonist actuation of a finger with flexible tendons constitutes a non-linear control problem. For the David hand, a backstepping approach was developed, which enables the reliable control of the tendon forces. The design of this method was described in [18]. Finally, the desired motor torques, τ motor , which are generated by the force controller, are commanded to the robot. While the fingers are typically positioned using the joint impedance interface, the existing architecture also allows to directly command desired joint torques, thereby bypassing the positioning controller. Moreover, when grasping an object, the calculated joint torques can be used to regulate the strength of the applied grip.

2.2 Dexterous Manipulation 2.2.1 Overview Generally, the term dexterous manipulation has a broad definition. In the context of robotics, it is usually understood as the ability of a manipulator to control the pose of a grasped object within the workspace of the hand. Equipping robotic hands with this capability has been a longstanding goal of the research community. Beyond the complexity of the robotic hardware, dexterous manipulation is particularly challenging because of its interdisciplinary nature, demanding the integration of perception, control and planning algorithms (Fig. 2.5). Moreover, describing the kinematics and dynamics of the hand-object system requires the modeling of complex physical behaviors, such as the sliding and rolling of contacts. Therefore, a large body of work has addressed the various aspects of this topic in the past. Summaries of accomplishments and open challenges in the context of dexterous manipulation research have been reported in [7, 19] and [20].

Fig. 2.5 Block diagram of the pre-existing joint controller, which provides a torque interface and allows the compliant positioning of the fingers [15]

2.2 Dexterous Manipulation

21

The first practical approach for changing the in-hand pose of an object was regrasping [21, 22]. Here, the object is fully released, i.e. placed on a table or passed to another manipulator, before being grasped again in a different configuration. However, the range of object grasps, which are reachable through this strategy, is limited. The planning problem to generate both finger and object trajectories in order to reach a desired grasp configuration have been addressed in [23, 24] and [25]. Alternatively, the object can be repositioned while being grasped through inhand manipulation and finger gaiting [26, 27]. In this context, in-hand manipulation refers to the ability to move an object inside of the hand, while maintaining a static grasp configuration. Kinematic constraints restrict the range of the reachable object poses. However, finger gaiting allows to overcome this limitation. It involves the sequential reconfiguration of the fingers of the grasp. A finger gaiting approach, which relies on reusable closed-loop controllers, was proposed in [28]. In [29], the reconfiguration of the grasped object was realized through a sequence of controllers, which consider wrench closure constraints. These and other methods were enabled by previous work, which focused on the analysis and modeling of the grasp system, i.e. in [30, 31] and [32]. A formal analysis of grasp stability was presented in [33], from which a grasp and orientation controller for a simulated two-finger setup was derived. In [34], the authors showed that the distribution of the grasp forces, under consideration of friction constraints, can be formulated as a convex optimization problem. The planning of finger gaiting sequences was the focus of the work in [35] and [36]. Most in-hand manipulation approaches try to avoid the slippage of the fingers on the surface of the object because of the difficulty of modeling the underlying physical behavior. However, exploiting the sliding of contacts can provide additional flexibility, as it is often utilized by humans. A number of approaches have been proposed, which involve controlled slippage [41, 42]. They conceptually demonstrated the application of active sliding, but the integration in an advanced manipulation task has not yet been reported. Similarly, the consideration of the rolling of contacts allows to more accurately describe the in-hand behavior of manipulated objects at the cost of increased mathematical complexity [43]. Models, which allow to predict the rolling of contacts, have been introduced in [44, 45] and [46]. A robotic hand, which was specifically designed to take advantage of rolling, was presented in [47]. In [48], a framework was proposed, which integrates rolling and finger gaiting. However, the validation was limited to simulated scenarios. Finally, a planning method, which combines rolling and sliding, was introduced in [49]. Beyond these broad categories, there exist a range of subproblems and specific considerations in the context of dexterous manipulation that have been addressed by the research community. These include the use of underactuated and elastic hands [11, 50] or the manipulation under environmental constraints [51]. Moreover, advanced in-hand manipulation strategies such as pivoting [52] and tracking [53], or the use of palm grasps [54] have been investigated. Recently, the effect of postural synergies on the controllability of the grasp forces has been studied [55, 56]. In [57], the utilization of adaptive synergies in a range of dexterous manipulation tasks was demonstrated.

22

2 Related Work

In the last decade, a number of machine learning approaches have been applied to dexterous manipulation problems as well [58, 59]. These range from supervised methods focused on object perception [60, 61], to reinforcement learning approaches used for the in-hand control of grasped objects [37, 62]. A brief overview of learningbased methods in the context of dexterous manipulation is provided in Sect. 2.2.4. The review of these publications illustrates the progress that has been made in the past 40 years in advancing dexterous manipulation with robotics hands. However, as has been recently pointed out in [12], various shortcomings of these methods have so far limited the number of real-world applications. Typically, the proposed algorithms rely on restrictive assumptions, which are not representative of practically relevant problems. Consequently, many approaches could only be demonstrated under highly controlled conditions, for specific objects or in simulated environments. Two of the most significant limitations are addressed by the methods that are proposed in this work, namely the estimation of the grasp state and the compliant in-hand control of the manipulated object. In the following, previous work on these specific topics is discussed.

2.2.2 Grasp State Estimation Many of the dexterous manipulation algorithms, which have previously been presented, rely on the availability of the in-hand pose of the object and the location of the contacts with the fingers. While these information may be easily obtainable in dedicated setups or simulations, in the general case, they require the integration of various perceptional inputs in order to determine a high quality estimate of the grasp state. In the past, perception in the context of dexterous manipulation has typically been approached by considering different sensor modalities individually. On the one hand, visual data is used to localize the object before grasping, such as in the work in [63], which relies on monocular camera images and knowledge about the geometry of the object. In [64], the object is subsequently tracked during the acquisition of the grasp. However, vision-only approaches are severely limited in their ability to localize a grasped object, since large parts of it may be occluded by the hand. On the other hand, the research area of tactile sensing is concerned with using measurements from sensors, which are embedded in the fingers, to infer information about the grasp. Reviews of the current state of the art in tactile sensing were presented in [65] and [66]. Primarily, this modality is used to provide information about the contact forces between the object and the fingers. The grasping controller, which was introduces in [67], takes advantage of these measurements in order to improve the robustness of grasps of unknown objects. However, tactile sensing has also been applied to the object localization problem. In [68], it was used to determine the position of stationary objects with a force controlled robot, based on a Monte Carlo filter. This approach was extended in [69], enabling it to estimate the full 6 DoF pose of the object.

2.2 Dexterous Manipulation

23

The utilization of contact information for the purpose of localizing a manipulated object was proposed in [38, 70]. It involved the generation of a number of hypotheses of possible contact configurations, which were then matched with a pre-computed database to determine the most likely object pose (see Fig. 2.6b). The simultaneous estimation of the pose and shape of a manipulated object was presented in [71]. Here, the tactile sensing information was integrated in a particle filter. Principal work for a learning approach, which predicts the interaction of pushed objects, was reported in [72], but has not been extended to complex manipulation scenarios. Finally, more recent methods have started to also include the dynamics of the object in the state estimation [73, 74]. In [75], parameters of the dynamics model were tracked simultaneously with the object pose. Finally, in [76], different contact models were compared in their ability to predict the motion of a manipulated object. However, both of these investigations have been limited to simple scenarios. The integration of visual information and tactile sensing into a common perception framework has only gotten significant attention in the recent past. In [77, 78] the authors presented methods, which fuse visual, tactile and joint measurements, to estimate the object pose. A particle filtering approach was proposed in [79], in which tactile sensing is used to bridge phases of visual occlusion, when the camerabased tracking is impaired. In [80], measurements from the GelSight contact sensor are combined with RGB-D images in a point-cloud-based algorithm that tracks the object. Depth-only visual input is integrated with kinematic constraints in [39] in order to localize the manipulator and the object, as shown in Fig. 2.6c. Finally, the method that was described in [81] relied on stereo vision, as well as force-torque and joint position measurements, for the pose estimation. The proposed state estimation approach in this work differs from these methods in that it only relies on joint position measurements as the minimal set of sensor inputs. Depending on the manipulation scenario, the estimation from this data may already be sufficient to solve a task that otherwise would fail. If available, tactile sensing information (from joint torque measurements or tactile sensors) are used to further constrain the estimated grasp state, and several types of visual information are leveraged in order to improve the estimated object pose, even if significant parts of the object are occluded.

Probabilistic Filtering Since the pose of a manipulated object cannot be measured directly, it has to be inferred from sensor measurements, such as camera images or tactile information. However, all measurements are subject to inaccuracies. In order to explicitly account for the resulting uncertainties, probabilistic methods have become a popular means of state estimation in robotics. An in-depth description of the most important techniques was presented in [82]. The Kalman filter, which was introduces in [83], has been the most widely applied probabilistic method. In robotics, variants of the Kalman filter have been used for the localization of mobile robots from landmark observations [84, 85]. The development

24

2 Related Work

(a)

(b)

(c)

(d)

Fig. 2.6 Illustrations of a small sample of past work on a range of dexterous manipulation topics, including perception, control and learning. a Reinforcement learning was used to train this robotic hand to reorient a cube [37]. b Contact measurements allowed to estimate the pose of the grasped object [38]. c Depth measurements and physical constraints were fused to track both the object and the hand [39]. d An object-level impedance controller realized the compliant positioning of the grasped disc [40]

of the extended Kalman filter further expanded the applicability of the method to nonlinear tasks, such as the simultaneous localization and mapping (SLAM) problem [86]. Another popular probabilistic algorithm is the particle filter. Among other things, it has been used to estimate the movement of people [87, 88] and objects [89]. In the context of mobile robotics, a particle filter enabled the localization in extended outdoor environments [90].

2.2 Dexterous Manipulation

25

2.2.3 Impedance-Based Object Control The second topic, which is addressed in this work, is the compliant in-hand control of a grasped object. As was discussed in the beginning of this section, this capability is fundamental in order to reposition an object inside of the hand, while maintaining a stable grasp. Moreover, it is a prerequisite of many of the advanced manipulation strategies, such as finger gaiting. So far, there has been a lack of satisfying solutions to the in-hand control problem. Subsequently, work on the modeling and planning aspects of these advanced techniques could only be demonstrated in simulation or simplified scenarios. While the success of in-hand control methods has been limited, the control of objects with multiple manipulators, which can be considered a generalization of this problem, has seen considerably more progress in the past decades [91, 92]. In [93], object impedance control was introduces as a means of regulating the pose of an object, as well as the internal forces, using cooperative manipulators. Followup work in [94] proposed the virtual linkage model as an approach to obtain the joint forces and torques, which allow to manipulate an object with multiple robotic arms. Subsequently, Lyapunov stability of such an internal force based impedance controller was proven in [95]. In [96], the impedance control of the object was extended to the contact points, thereby guaranteeing stability of each individual manipulator, even in the case of contact loss. Utilizing visual and tactile feedback, [97] outlined a feedback controller for the manipulation of unknown objects. In [98], a dual-arm impedance controller was demonstrated in a valve turning task. Finally, based on projected inverse dynamics, a Cartesian impedance controller for the positioning of an object with multiple robotic arms was proposed in [99]. Applied to the in-hand control problem, an impedance-based approach was presented in [40, 100]. Here, a virtual object frame was utilized in order to determine approximate contact forces, which maintain the grasp, while positioning the object. However, the method was only demonstrated for the fixed grasp of a disc, as shown in Fig. 2.6d, under the assumptions that the contact points between the object and the fingers do not move. The controller that is proposed in this work overcomes this limitation by integrating with the grasp state estimation, which estimates the slipping and rolling of the contact points. Moreover, the presented method explicitly considers friction constraints to improve grasp stability, and supports the reconfiguration of the grasp.

Impedance Control In the past decades, impedance control has become the preferred approach to realize the compliant behavior of a manipulator, when interacting with the environment. Early work on this topic has been presented in [91]. More recently, related methods have become popular for the control of manipulators with flexible joints [101]. These advances allowed to extend the applicability of this method to novel scenarios such

26

2 Related Work

as the whole-body control of wheeled humanoids [102] or the balancing of a bipedal robot [103], the latter of which is fundamentally similar to the grasping of an object.

2.2.4 Learning-Based Methods This work is focused on the development of model-based approaches for the dexterous manipulation of objects with robotic hands. However, in recent years, model-free and learning-based methods have gained in popularity as a means to address the challenges of this problem using data. Research on this topic has been concerned with both the perception and control aspects of dexterous manipulation. In [60], deep learning was used to train a neural network called DeepIM to visually estimate the 6D pose of known objects. By matching the input image to a rendered view of the object the pose is iteratively refined. The BundleTrack method, which was proposed in [61], extends this work by also being able to track novel objects without the need for 3D models. A deep neural network realizes the segmentation and feature extraction, and is applicable both under significant occlusions and object motions. In order to reduce the need for large hand-labeled data sets, [104] introduced a self-supervised learning architecture to train object segmentation and 6D object pose estimation algorithms. Using a robot manipulator, real world images are autonomously labeled with accurate 6D object poses. Beyond the use of visual data, a number of learning-based approached for the utilization of tactile information have been proposed. In [105], the authors presented a data-driven system to track the in-hand pose of manipulated objects from contact feedback. Relying on a physics simulation and sample-based optimization, the state and parameters of the simulation are jointly determined. Both haptic and visual feedback in the context of contact-rich manipulation are considered in [106]. Using self-supervision, the proposed method learns a compact, multimodal representation of both sensory inputs. Besides perception, learning methods have also been applied in the context of inhand object control, such as the work in [58]. Here, the desired manipulation stiffness of an object impedance controller was inferred form data obtained by human demonstration. One of the earliest approaches to apply reinforcement learning to the problem of in-hand manipulation was [59]. Without requiring analytic dynamics or kinematics models, the system acquired tactile manipulation skills purely from data. In [107], optimal control was combined with learned linear models, which were trained using model-based reinforcement learning, thereby increasing the sample-efficiency compared to purely model-free methods. Similarly, learned dynamics models together with on-line model-predictive control were proposed in [62]. The system required only four hours of real-world data in order to learn to coordinate multiple freefloating objects. In contrast, [108] relied on a combination of simulation and human demonstration in order to learn complex dexterous manipulation skills. However, the resulting control policy was only demonstrated in simulated experiments.

References

27

The integration of a learned visual state estimator and control policy was applied to an object reorientation task in [37] (see Fig. 2.6a). Using automatic domain randomization, this system was entirely trained in simulation with reinforcement learning and subsequently transferred to the real robot. Follow-up work, presented in [109], demonstrated the solving of a Rubik’s cube utilizing a similar approach.

References 1. Salisbury, J.K., and B. Roth. 1983. Kinematic and force analysis of articulated mechanical hands. Journal of Mechanisms, Transmissions, and Automation in Design 105: 35. 2. Jörg Butterfaß, Markus Grebenstein, Hong Liu, and Gerd Hirzinger. DLR-Hand II: Next generation of a dextrous robot hand. In 2001 IEEE International Conference on Robotics and Automation (ICRA), volume 1, pages 109–114. IEEE, 2001. 3. Shadow Robot Co. Shadow Dexterous Hand. https://www.shadowrobot.com/products/ dexterous-hand/, 2003. 4. Manuel G. Catalano, Giorgio Grioli, Edoardo Farnioli, Alessandro Serio, Cristina Piazza, and Antonio Bicchi. Adaptive synergies for the design and control of the Pisa/IIT SoftHand. The International Journal of Robotics Research, 33 (5):768–782, 2014. 5. Steve C Jacobsen, John E Wood, DF Knutti, and Klaus B Biggers. The UTAH/MIT dextrous hand: Work in progress. The International Journal of Robotics Research, 3(4):21–50, 1984. 6. CS Lovchik and Myron A Diftler. The Robonaut Hand: A dexterous robot hand for space. In 1999 IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages 907–912. IEEE, 1999. 7. Bicchi, Antonio. 2000. Hands for dexterous manipulation and robust grasping: A difficult road toward simplicity. IEEE Transactions on Robotics and Automation 16 (6): 652–662. 8. Myron A. Diftler, J.S. Mehling, Muhammad E. Abdallah, Nicolaus A. Radford, Lyndon B. Bridgwater, Adam M. Sanders, Roger Scott Askew, D. Marty Linn, John D. Yamokoski, F.A. Permenter, et al. Robonaut 2: The first humanoid robot in space. In 2011 IEEE International Conference on Robotics and Automation (ICRA), pages 2178–2183. IEEE, 2011. 9. Eduardo Torres-Jara. Obrero: A platform for sensitive manipulation. In 2005 IEEE-RAS 5th International Conference on Humanoid Robots (Humanoids), pages 327–332. IEEE, 2005. 10. Matsuoka, Yoky, Pedram Afshar, and Oh. Michael. 2006. On the design of robotic hands for brain-machine interface. Neurosurgical Focus 20 (5): 1–9. 11. Deimel, Raphael, and Oliver Brock. 2016. A novel type of compliant and underactuated robotic hand for dexterous grasping. The International Journal of Robotics Research 35 (1– 3): 161–185. 12. Piazza, C., G. Grioli, M.G. Catalano, and A. Bicchi. 2019. A century of robotic hands. Annual Review of Control, Robotics, and Autonomous Systems 2: 1–32. 13. DLR. DLR—Institute of Robotics and Mechatronics—David. https://www.dlr.de/rm/en/ desktopdefault.aspx/tabid-11666/, 2020. 14. Markus Grebenstein, Alin Albu-Schäffer, Thomas Bahls, Maxime Chalon, Oliver Eiberger, Werner Friedl, Robin Gruber, Sami Haddadin, Ulrich Hagn, Robert Haslinger, et al. The DLR hand arm system. In 2011 IEEE International Conference on Robotics and Automation (ICRA), pages 3175–3182. IEEE, 2011. 15. Grebenstein, Markus, Maxime Chalon, Werner Friedl, Sami Haddadin, Thomas Wimböck, Gerd Hirzinger, and Roland Siegwart. 2012. The hand of the DLR hand arm system: Designed for interaction. The International Journal of Robotics Research 31 (13): 1531–1555. 16. Jens Reinecke, Bastian Deutschmann, and David Fehrenbach. A structurally flexible humanoid spine based on a tendon-driven elastic continuum. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 4714–4721. IEEE, 2016.

28

2 Related Work

17. Intel. Intel RealSense Depth Camera D435. https://www.intelrealsense.com/depth-camerad435/, 2018. 18. Maxime Chalon and Brigitte d’Andrèa Novel. Backstepping experimentally applied to an antagonistically driven finger with flexible tendons. IFAC Proceedings Volumes, 47 (3):217 – 223, 2014. 19. Allison M Okamura, Niels Smaby, and Mark R Cutkosky. An overview of dexterous manipulation. In 2000 IEEE International Conference on Robotics and Automation (ICRA), volume 1, pages 255–262. IEEE, 2000. 20. Raymond R Ma and Aaron M Dollar. On dexterity and dexterous manipulation. In 2011 IEEE 15th International Conference on Advanced Robotics (ICAR), pages 1–7. IEEE, 2011. 21. Nikhil Chavan Dafle, Alberto Rodriguez, Robert Paolini, Bowei Tang, Siddhartha S Srinivasa, Michael Erdmann, Matthew T Mason, Ivan Lundberg, Harald Staab, and Thomas Fuhlbrigge. Extrinsic dexterity: In-hand manipulation with external forces. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1578–1585. IEEE, 2014. 22. Pierre Tournassoud, Tomás Lozano-Pérez, and Emmanuel Mazer. Regrasping. In 1987 IEEE International Conference on Robotics and Automation (ICRA), volume 4, pages 1924–1928. IEEE, 1987. 23. Maximo A. Roa and Raul Suarez. Regrasp planning in the grasp space using independent regions. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1823–1829. IEEE, 2009. 24. Sascha A. Stoeter, Stephan Voss, Nikolaos P. Papanikolopoulos, and Heiko Mosemann. Planning of regrasp operations. In 1999 IEEE International Conference on Robotics and Automation (ICRA), volume 1, pages 245–250. IEEE, 1999. 25. Katharina Hertkorn, Maximo A Roa, and Christoph Borst. Planning in-hand object manipulation with multifingered hands considering task constraints. In 2013 IEEE International Conference on Robotics and Automation (ICRA), pages 617–624. IEEE, 2013. 26. RONALDS Fearing. Simplified grasping and manipulation with dextrous robot hands. IEEE Journal on Robotics and Automation, 2 (4):188–195, 1986. 27. Okada, Tokuji. 1979. Object-handling system for manual industry. IEEE Transactions on Systems, Man, and Cybernetics 9 (2): 79–89. 28. Manfred Huber and Roderic A Grupen. Robust finger gaits from closed-loop controllers. In 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1578–1584. IEEE, 2002. 29. R Platt, Andrew H Fagg, and Roderic A Grupen. Manipulation gaits: Sequences of grasp control tasks. In 2004 IEEE International Conference on Robotics and Automation (ICRA), pages 801–806. IEEE, 2004. 30. Li Han, Jeffrey C Trinkle, and Zexiang Li. Grasp analysis as linear matrix inequality problems. In 1999 IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages 1261–1268. IEEE, 1999. 31. Jeffrey C. Trinkle. On the stability and instantaneous velocity of grasped frictionless objects. IEEE Transactions on Robotics and Automation, 8 (5):560–572, 1992. 32. Domenico Prattichizzo and Jeffrey C Trinkle. Grasping. In Springer Handbook of Robotics, chapter 28, pages 671–700. Springer, 2008. 33. Suguru Arimoto, Kenji Tahara, J-H. Bae, and Morio Yoshida. A stability theory of a manifold: Concurrent realization of grasp and orientation control of an object by a pair of robot fingers. Robotica, 21 (2):163–178, 2003. 34. Martin Buss, Hideki Hashimoto, and John B. Moore. Dextrous hand grasping force optimization. IEEE Transactions on Robotics and Automation, 12 (3):406–418, 1996. 35. Moëz Cherif and Kamal K. Gupta. Planning quasi-static fingertip manipulations for reconfiguring objects. IEEE Transactions on Robotics and Automation, 15 (5):837–848, 1999. 36. Jean-Philippe Saut, Anis Sahbani, Sahar El-Khoury, and Véronique Perdereau. Dexterous manipulation planning using probabilistic roadmaps in continuous grasp subspaces. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2907– 2912. IEEE, 2007.

References

29

37. Andrychowicz, Marcin, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39 (1): 3–20. 38. Steffen Haidacher and Gerd Hirzinger. Estimating finger contact location and object pose from contact measurements in 3D grasping. In 2003 IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages 1805–1810. IEEE, 2003. 39. Tanner Schmidt, Katharina Hertkorn, Richard Newcombe, Zoltan Marton, Michael Suppa, and Dieter Fox. Depth-based tracking with physical constraints for robot manipulation. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 119–126. IEEE, 2015. 40. Wimböck, Thomas, Christian Ott, Alin Albu-Schäffer, and Gerd Hirzinger. 2012. Comparison of object-level grasp controllers for dynamic dexterous manipulation. The International Journal of Robotics Research 31 (1): 3–23. 41. David L. Brock. Enhancing the dexterity of a robot hand using controlled slip. In 1988 IEEE International Conference on Robotics and Automation (ICRA), pages 249–251. IEEE, 1988. 42. Jian Shi, J. Zachary Woodruff, Paul B. Umbanhowar, and Kevin M. Lynch. Dynamic in-hand sliding manipulation. IEEE Transactions on Robotics, 33 (4):778–795, 2017. 43. Zoe Doulgeri and Leonidas Droukas. On rolling contact motion by robotic fingers via prescribed performance control. In 2013 IEEE International Conference on Robotics and Automation (ICRA), pages 3976–3981. IEEE, 2013. 44. Chunsheng Cai and Bernard Roth. On the spatial motion of a rigid body with point contact. In 1987 IEEE International Conference on Robotics and Automation (ICRA), volume 4, pages 686–695. IEEE, 1987. 45. David J Montana. The kinematics of contact and grasp. The International Journal of Robotics Research, 7(3):17–32, 1988. 46. Li Han, Yi-Sheng Guan, Z.X. Li, Q. Shi, and Jeffrey C. Trinkle. Dextrous manipulation with rolling contacts. In 1997 IEEE International Conference on Robotics and Automation (ICRA), volume 2, pages 992–997. IEEE, 1997. 47. Antonio Bicchi and Raffaele Sorrentino. Dexterous manipulation through rolling. In 1995 IEEE International Conference on Robotics and Automation (ICRA), volume 1, pages 452– 457. IEEE, 1995. 48. Li Han and Jeffrey C Trinkle. Dextrous manipulation by rolling and finger gaiting. In 1998 IEEE International Conference on Robotics and Automation (ICRA), pages 730–735. IEEE, 1998. 49. Moëz Cherif and Kamal K. Gupta. Global planning for dexterous reorientation of rigid objects: Finger tracking with rolling and sliding. The International Journal of Robotics Research, 20 (1):57–84, 2001. 50. Lael U Odhner and Aaron M Dollar. Dexterous manipulation with underactuated elastic hands. In 2011 IEEE International Conference on Robotics and Automation (ICRA), pages 5254–5260. IEEE, 2011. 51. Parra-Vega, Vicente, Alejandro Rodríguez-Angeles, Suguru Arimoto, and Gerd Hirzinger. 2001. High precision constrained grasping with cooperative adaptive handcontrol. Journal of Intelligent & Robotic Systems 32 (3): 235–254. 52. Yasumichi Aiyama, Masayuki Inaba, and Hirochika Inoue. Pivoting: A new method of graspless manipulation of object by robot fingers. In 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), volume 1, pages 136–143. IEEE, 1993. 53. Daniela Rus. Dexterous rotations of polyhedra. In 1992 IEEE International Conference on Robotics and Automation (ICRA), pages 2758–2759. IEEE, 1992. 54. Yunfei Bai and C. Karen Liu. Dexterous manipulation using both palm and fingers. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1560–1565. IEEE, 2014. 55. Gabiccini, Marco, Antonio Bicchi, Domenico Prattichizzo, and Monica Malvezzi. 2011. On the role of hand synergies in the optimal choice of grasping forces. Autonomous Robots 31 (2–3): 235.

30

2 Related Work

56. Prattichizzo, Domenico, Monica Malvezzi, Marco Gabiccini, and Antonio Bicchi. 2013. On motion and force controllability of precision grasps with hands actuated by soft synergies. IEEE Transactions on Robotics 29 (6): 1440–1456. 57. Cosimo Della Santina, Cristina Piazza, Giorgio Grioli, Manuel G Catalano, and Antonio Bicchi. Toward dexterous manipulation with augmented adaptive synergies: The Pisa/IIt SoftHand 2. IEEE Transactions on Robotics, 34 (5):1141–1156, 2018. 58. Miao Li, Hang Yin, Kenji Tahara, and Aude Billard. Learning object-level impedance control for robust grasping and dexterous manipulation. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 6784–6791. IEEE, 2014. 59. Herke Van Hoof, Tucker Hermans, Gerhard Neumann, and Jan Peters. Learning robot inhand manipulation with tactile features. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), pages 121–127. IEEE, 2015. 60. Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, and Dieter Fox. Deepim: Deep iterative matching for 6d pose estimation. In 2018 European Conference on Computer Vision (ECCV), pages 683–698, 2018. 61. Bowen Wen and Kostas Bekris. Bundletrack: 6D pose tracking for novel objects without instance or category-level 3D models. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8067–8074. IEEE, 2021. 62. Anusha Nagabandi, Kurt Konolige, Sergey Levine, and Vikash Kumar. Deep dynamics models for learning dexterous manipulation. In 2020 Conference on Robot Learning (CoRL), pages 1101–1112, 2020. 63. Markus Ulrich, Christian Wiedemann, and Carsten Steger. CAD-based recognition of 3D objects in monocular images. In 2009 IEEE International Conference on Robotics and Automation (ICRA), pages 1191–1198. IEEE, 2009. 64. Peter K. Allen, Aleksandar Timcenko, Billibon Yoshimi, and Paul Michelman. Automated tracking and grasping of a moving object with a robotic hand-eye system. IEEE Transactions on Robotics and Automation, 9 (2):152–165, 1993. 65. Yousef, Hanna, Mehdi Boukallel, and Kaspar Althoefer. 2011. Tactile sensing for dexterous in-hand manipulation in robotics: A review. Sensors and Actuators A: Physical 167 (2): 171– 187. 66. Kappassov, Zhanat, Juan-Antonio. Corrales, and Véronique. Perdereau. 2015. Tactile sensing in dexterous robot hands. Robotics and Autonomous Systems 74: 195–220. 67. Taro Takahashi, Toshimitsu Tsuboi, Takeo Kishida, Yasunori Kawanami, Satoru Shimizu, Masatsugu Iribe, Tetsuharu Fukushima, and Masahiro Fujita. Adaptive grasping by multi fingered hand with tactile sensor based on robust force and position control. In 2008 IEEE International Conference on Robotics and Automation (ICRA), pages 264–271. IEEE, 2008. 68. Klaas Gadeyne and Herman Bruyninckx. Markov techniques for object localization with force-controlled robots. In 2001 IEEE 10th International Conference on Advanced Robotics (ICAR). IEEE, 2001. 69. Anna Petrovskaya, Oussama Khatib, Sebastian Thrun, and Andrew Y Ng. Bayesian estimation for autonomous object manipulation based on tactile sensors. In 2006 IEEE International Conference on Robotics and Automation (ICRA), pages 707–714. IEEE, 2006. 70. Steffen Haidacher. Contact point and object position from force/torque and position sensors for grasps with a dextrous robotic hand. PhD thesis, Technische Universität München, 2004. 71. Craig Corcoran and Robert Platt. A measurement model for tracking hand-object state during dexterous manipulation. In 2010 IEEE International Conference on Robotics and Automation (ICRA), pages 4302–4308. IEEE, 2010. 72. M. Kopicki, R. Stolkin, S. Zurek, and T. Morwald. Learning to predict how rigid objects behave under simple manipulation. In 2011 IEEE International Conference on Robotics and Automation (ICRA), pages 5722 – 5729. IEEE, 2011. 73. Shuai Li, Siwei Lyu, and Jeff Trinkle. State estimation for dynamic systems with intermittent contact. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 3709–3715. IEEE, 2015.

References

31

74. Posa, Michael, Cecilia Cantu, and Russ Tedrake. 2014. A direct method for trajectory optimization of rigid bodies through contact. The International Journal of Robotics Research 33 (1): 69–81. 75. Li Zhang, Siwei Lyu, and Jeff Trinkle. A dynamic Bayesian approach to real-time estimation and filtering in grasp acquisition. In 2013 IEEE International Conference on Robotics and Automation (ICRA), pages 85–92. IEEE, 2013. 76. Roman Kolbert, Nikhil Chavan-Dafle, and Alberto Rodriguez. Experimental validation of contact dynamics for in-hand manipulation. In 2016 International Symposium on Experimental Robotics (ISER), pages 633–645. Springer, 2016. 77. Bimbo, Joao, Petar Kormushev, Kaspar Althoefer, and Hongbin Liu. 2015. Global estimation of an object’s pose using tactile sensing. Advanced Robotics 29 (5): 363–374. 78. Joao Bimbo, Lakmal D. Seneviratne, Kaspar Althoefer, and Hongbin Liu. Combining touch and vision for the estimation of an object’s pose during manipulation. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4021–4026. IEEE, 2013. 79. Li Zhang and Jeffrey C. Trinkle. The application of particle filtering to grasping acquisition with visual occlusion and tactile sensing. In 2012 IEEE International Conference on Robotics and Automation (ICRA), pages 3805–3812. IEEE, 2012. 80. Gregory Izatt, Geronimo Mirano, Edward Adelson, and Russ Tedrake. Tracking objects with point clouds from vision and touch. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 4000–4007. IEEE, 2017. 81. Paul Hebert, Nicolas Hudson, Jeremy Ma, and Joel Burdick. Fusion of stereo vision, forcetorque, and joint sensors for estimation of in-hand object location. In 2011 IEEE International Conference on Robotics and Automation (ICRA), pages 5935–5941. IEEE, 2011. 82. Thrun, S., W. Burgard, and D. Fox. 2005. Probabilistic Robotics. Intelligent Robotics and Autonomous Agents series: MIT Press. 83. Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82 (1): 35–45. 84. Jensfelt, P., and S. Kristensen. 2001. Active global localization for a mobile robot using multiple hypothesis tracking. IEEE Transactions on Robotics and Automation 17 (5): 748– 760. 85. Leonard, J.J., and H.F. Durrant-Whyte. 1991. Mobile robot localization by tracking geometric beacons. IEEE Transactions on Robotics and Automation 7 (3): 376–382. 86. G. Zunino and H.I. Christensen. Simultaneous localization and mapping in domestic environments. In 2001 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pages 67–72. IEEE, 2001. 87. J. Deutscher, A. Blake, and I. Reid. Articulated body motion capture by annealed particle filtering. In 2000 IEEE International Conference on Computer Vision and Pattern Recognition, pages 126–133, 2000. 88. L. Liao, D. Fox, J. Hightower, H. Kautz, and D. Schulz. Voronoi tracking: Location estimation using sparse and noisy sensor data. In 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 723–728, 2003. 89. Hue, C., J.P. Le Cadre, and P. Pérez. 2002. Sequential Monte Carlo methods for multiple target tracking and data fusion. IEEE Transactions on Signal Processing 50 (2): 309–325. 90. M. Montemerlo and S. Thrun. Simultaneous localization and mapping with unknown data association using FastSLAM. In 2003 IEEE International Conference on Robotics and Automation (ICRA), pages 1985–1991, 2003. 91. N Hogan. Impedance control: An approach to manipulation. I-Theory. II-Implementation. III-Applications. Journal of Dynamic Systems, Measurement, and Control, 107:1–24, 1985. 92. Christian Smith, Yiannis Karayiannidis, Lazaros Nalpantidis, Xavi Gratal, Peng Qi, Dimos V. Dimarogonas, and Danica Kragic. Dual arm manipulation: A survey. Robotics and Autonomous Systems, 60 (10):1340–1353, 2012. 93. Stanley A. Schneider and Robert H. Cannon. Object impedance control for cooperative manipulation: Theory and experimental results. IEEE Transactions on Robotics and Automation, 8 (3):383–394, 1992.

32

2 Related Work

94. David Williams and Oussama Khatib. The virtual linkage: A model for internal forces in multigrasp manipulation. In 1993 IEEE International Conference on Robotics and Automation (ICRA), pages 1025–1030. IEEE, 1993. 95. R.C. Bonitz, and Tien C. Hsia. Internal force-based impedance control for cooperating manipulators. IEEE Transactions on Robotics and Automation, 12 (1):78–89, 1996. 96. Stefano Stramigioli and Vincent Duindam. Variable spatial springs for robot control applications. In 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1906–1911. IEEE, 2001. 97. Qiang Li, Christof Elbrechter, Robert Haschke, and Helge Ritter. Integrating vision, haptics and proprioception into a feedback controller for in-hand manipulation of unknown objects. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2466–2471. IEEE, 2013. 98. Arash Ajoudani, Jinoh Lee, Alessio Rocchi, Mirko Ferrati, Enrico Mingo Hoffman, Alessandro Settimi, Darwin G Caldwell, Antonio Bicchi, and Nikos G Tsagarakis. A manipulation framework for compliant humanoid coman: Application to a valve turning task. In 2014 IEEERAS 14th International Conference on Humanoid Robots (Humanoids), pages 664–670. IEEE, 2014. 99. Hsiu-Chin Lin, Joshua Smith, Keyhan Kouhkiloui Babarahmati, Niels Dehio, and Michael Mistry. A projected inverse dynamics approach for multi-arm Cartesian impedance control. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–5. IEEE, 2018. 100. Thomas Wimboeck, Christian Ott, and Gerhard Hirzinger. Passivity-based object-level impedance control for a multifingered hand. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4621–4627. IEEE, 2006. 101. Albu-Schäffer, Alin, Christian Ott, and Gerd Hirzinger. 2007. A unified passivity-based control framework for position, torque and impedance control of flexible joint robots. The International Journal of Robotics Research 26 (1): 23–39. 102. Alexander Dietrich, Thomas Wimböck, and Alin Albu-Schäffer. Dynamic whole-body mobile manipulation with a torque controlled humanoid robot via impedance control laws. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3199– 3206. IEEE, 2011. 103. Bernd Henze, Máximo A Roa, and Christian Ott. Passivity-based whole-body balancing for torque-controlled humanoid robots in multi-contact scenarios. The International Journal of Robotics Research, 35 (12):1522–1543, 2016. 104. Xinke Deng, Yu Xiang, Arsalan Mousavian, Clemens Eppner, Timothy Bretl, and Dieter Fox. Self-supervised 6d object pose estimation for robot manipulation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 3665–3671. IEEE, 2020. 105. Jacky Liang, Ankur Handa, Karl Van Wyk, Viktor Makoviychuk, Oliver Kroemer, and Dieter Fox. In-hand object pose tracking via contact feedback and gpu-accelerated robotic simulation. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 6203– 6209. IEEE, 2020. 106. Michelle A Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, and Jeannette Bohg. Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks. In 2019 IEEE International Conference on Robotics and Automation (ICRA), pages 8943–8950. IEEE, 2019. 107. Vikash Kumar, Emanuel Todorov, and Sergey Levine. Optimal control with learned local models: Application to dexterous manipulation. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 378–383. IEEE, 2016. 108. Rajeswaran, Aravind, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. 2018. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Robotics: Science and Systems (RSS). 109. Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, et al. Solving rubik’s cube with a robot hand. arXiv:1910.07113, 2019.

Chapter 3

Grasp Modeling

This work is concerned with the development of methods for the localization and control of manipulated objects. However, the discussion of the proposed algorithms first requires a common model of the hand-object system, on which both components can be built. This chapter presents the utilized grasp model. The first section introduces the most essential definitions and quantities in the context of grasping, as well as specifies some of the notations that are used in the remainder of this work. Subsequently, the kinematic model of the grasp system is elaborated, which allows to relate the velocity of the finger joints to the motion of the manipulated object. The dynamics model, which is outlined in the following section, equivalently describes the relation between loads on the object and the corresponding joint torques. The formulation of the grasp subspaces illustrates how the nullspaces of these mappings can be exploited. Finally, the applicability and limitations of the presented model for different types of grasps are discussed.

3.1 Definitions The purpose of the grasp model is to provide a mathematical framework, which describes the physical behavior of a manipulated object. In the context of this work, it shall support the development of the in-hand localization and control methods. In order to be applicable to both of these problems, the grasp model has to cover the kinematics, as well as the dynamics of the hand-object system. A typical grasping scenario is depicted in Fig. 3.1. It shows an object, in this case a brush, which is held by the robotic hand of the DLR robot David. Mathematically modeling such a scenario involves the definition of a number of quantities, as illustrated in Fig. 3.2. Principally, the grasp system consists of three main components. First, there is the object, assumed to be a rigid body. Second, there are the fingers of the robotic © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. Pfanne, In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands, Springer Tracts in Advanced Robotics 149, https://doi.org/10.1007/978-3-031-06967-3_3

33

34

3 Grasp Modeling

Fig. 3.1 The David hand, holding an object with its five fingers in a fingertip grasp

Fig. 3.2 The grasp modeling is concerned with describing the behavior of the fingers and the object, and how they relate to each other through the contact points. Here, the main quantities and relevant coordinate frames are illustrated

hand, which consist of an actuated kinematic chain of rigid links. And third, there are the contact points, which connect the first two components. Here, the contacts are modeled as unique, infinitesimally small points on the surface of the object and one of the finger links.

3.1 Definitions

35

The pose of an object-fixed coordinate frame {O} w.r.t. a global inertial frame {I } shall be described by the vector x ∈ R6 . It consists of the three positional DoF of the object, as well as three Euler angles, which specify its orientation: ⎛ ⎞ x ⎜y⎟ ⎜ ⎟ ⎜z⎟ ⎟ x=⎜ ⎜φ⎟ ⎜ ⎟ ⎝θ⎠ ψ

(3.1)

The Euler angles describe the magnitude of three elemental rotations, which transform the orientation of {I } to that of {O} in the x-y  -z  sequence. That means, the frame is first rotated around the initial x-axis, subsequently around the resulting y  axis, and finally around the object-fixed o z-axis. Figure 3.3 illustrates the description of the orientation. Equivalently, the pose of the object can be expressed as a transformation matrix, T o ∈ R4×4 :

Fig. 3.3 The orientation of the object is defined by the Euler angles φ, θ and ψ, which describe three elemental rotations that transforms the initial frame {I } (red) to the body frame {O} (blue) in the x-y  -z  sequence

36

3 Grasp Modeling



cθ cψ cθ cψ ⎜sφ sθ cψ − cφ sψ sφ sθ sψ + cφ cψ T o (x) = ⎜ ⎝cφ sθ cψ + cφ sψ cφ sθ sψ − sφ cψ 0 0

−sθ sφ cθ cφ cθ 0

⎞ x y⎟ ⎟ z⎠ 1

(3.2)

with: sφ = sin(φ)

cφ = cos(φ)

(3.3)

sθ = sin(θ) sψ = sin(ψ)

cθ = cos(θ) cψ = cos(ψ)

(3.4) (3.5)

Using T o , a position p ∈ R3 , which is described w.r.t. {O}, can be transformed in {I }:  o p p = To 1 1

(3.6)

Here, the left-side superscript o denotes that p is expressed in {O}, while the lack of such a superscript implies that the quantity is described w.r.t. {I }. The position of the fingers of the robotic hand is determined by the angles of its joints. The vector q ∈ Rm contains the position of all m joints. The angle of one specific joint of index j is denoted by q [ j] : T

q = q [1] q [2] · · · q [m]

(3.7)

Similar to the object, the pose of a finger link can be described by a link-fixed frame {F [k] }, where k is the index of the link among a set of size M. The transforma4×4 expresses its translation and rotation w.r.t. {I }. The relation tion matrix T [k] f ∈R between the positions of the joints and the poses of the links is obtained from the forward kinematics of the hand and will be addressed in the following section. Finally, the position of a contact between the object and a finger link, expressed in {I }, is represented by c[i] ∈ R3 , with i being the index of the contact. The vector of all n contact positions is denoted by c ∈ R3n : T

c = c[1]T c[2]T · · · c[n]T

(3.8)

Located at the contact point, the frame {C [i] } describes the orientation of the surface. It is defined by the normal vector n[i] ∈ R3 , as well as the perpendicular tangential vectors a[i] ∈ R3 and b[i] ∈ R3 . The normal vector is directed towards the object. Using these definitions for the object, fingers and contact points, the following section will elaborate the kinematics, which describe the velocity relations of the respective quantities.

3.2 Kinematics

37

3.2 Kinematics The kinematics of the grasp allow to relate the motion of its different components, i.e. the object, the fingers and the contact points, to one another. This section will outline the mathematics behind these relations, which are fundamental for addressing the in-hand localization and control problems in the subsequent chapters. Following a brief explanation of the relevant forward kinematics, the main focus of this section is on the derivation of the velocity kinematics of the grasp, including the grasp matrix and the hand Jacobian matrix, which represent the most important quantities of the presented model. Finally, the choice of the contact model is discussed.

3.2.1 Forward Kinematics The previous section defined a contact of the grasp as a point in 3D space, at which the surface of the object and one of the finger links connect. Consequently, the contact position can also be described w.r.t. these two bodies, as is shown in Fig. 3.4. The 3 vector o c[i] o ∈ R shall denote the position of a contact of index i on the surface of the object, expressed in {O}. Using Eq. (3.6), the vector can be projected in the inertial frame, thereby obtaining c[i] o : 

c[i] o 1

= To

o

c[i] o 1

(3.9)

Fig. 3.4 The same contact can be described by either a point on the surface of the object (c[1] o ) ). Expressing the latter involves calculating the forward or on the corresponding finger link (c[1] f kinematics of the finger, which are obtained through a series of transformations

38

3 Grasp Modeling

Similarly, the position of the contact on the surface of the corresponding finger link [i] 3 shall be represented by f c[i] f ∈ R . Here, the global position of the contact point, c f , can be obtained using the pose of a body-fixed frame {F [k] } w.r.t. {I }. The respective transformation shall be denoted T [k] f , with k being the link index: 

c[i] f 1

=

T [k] f

f

c[i] f 1

(3.10)

However, since the position of the fingers is described by the vector of the joint angles, q, the transformation T [k] f has to be calculated using the forward kinematics of the fingers. Figure 3.4 illustrates how the pose of a link results from a chain of transformations along the finger. Let T p be the transformation between the inertial frame and a palm-fixed frame {P}. With l being the length of the chain, the index of the first joint is k − l. The constant transformation between {P} and a frame that is . It is defined by the kinematic fixed to the first joint, {S [k−l] }, shall be denoted f T [k−l] s parameters of the hand. Using T p , its global transformation is obtained: T [k−l] = T p f T [k−l] s s

(3.11)

Calculating the transformation of the first link, T [k−l] , involves rotating T [k−l] s f [k−l] according to the angle of the first joint, q . The transformation matrix T q[k−l] (q [k−l] ) realizes the corresponding rotation: T [k−l] = T p f T [k−l] T q[k−l] (q [k−l] ) s f

(3.12)

Matrix T q[k−l] depends on the rotation axis of the joint, as described in {S [k−l] }. For example, if the joint is turning around the z-axis of {S [k−l] }, the transformation can be expressed as follows: ⎛ cos(q [k−l] ) − sin(q [k−l] ) ⎜ sin(q [k−l] ) cos(q [k−l] ) T q[k−l] (q [k−l] ) = ⎜ ⎝ 0 0 0 0

0 0 1 0

⎞ 0 0⎟ ⎟ 0⎠ 1

(3.13)

Following this procedure, the constant and joint-dependent transformations for all l links in the chain can successively be calculated and multiplied, ultimately yielding the desired transformation of link k: T [k] f = Tp

k κ=k−l

f

[κ] [κ] T [κ] s T q (q )

(3.14)

3.2 Kinematics

39

3.2.2 Grasp Matrix The following two sections focus on the velocity kinematics of the grasp, which describe the relation between changes of different quantities of the hand-object system. First, the grasp matrix is introduced as a means of mapping the twist of the object to the velocities of the contact points. The notations, which are used in the following formulations, are based on [1], which provides a concise overview of the fundamentals of grasp analysis. The twist vector ν ∈ R6 comprises of the translational, v ∈ R3 , and angular velocity, ω ∈ R3 , of the object: ν=

 v ω

(3.15)

It is described in {I }. It is important to note that the twist of the object is unequal to the time-derivative of its pose, x˙ ∈ R6 : ν = x˙

(3.16)

dx dt

(3.17)

with: x˙ =

where t ∈ R is the time. While the translational velocity maps directly to the derivative of the position, the angular velocity has to be converted to the corresponding changes in the Euler angles. The matrix, which maps ν to x˙ , shall be denoted W ∈ R6×6 and can be expressed as follows: x˙ = W ν

(3.18)

with: 

I 3×3 03×3 W = 3×3 0 W r ot

(3.19)

where: ⎛

W r ot

⎞ 1 sin(φ) tan(θ) cos(φ) tan(θ) cos(φ) − sin(φ) ⎠ = ⎝0 0 sin(φ)/ cos(θ) cos(φ)/ cos(θ)

(3.20)

Here, I 3×3 ∈ R3×3 and 03×3 ∈ R3×3 represent the identity and zero matrices of the indicated sizes.

40

3 Grasp Modeling

The purpose of the grasp matrix is to relate the twist of the object to the motion of the contact points on its surface. The twist of a contact point with index i w.r.t. 6 {I } shall be denoted ν [i] c,o ∈ R . It is related to the twist of the object using the matrix P [i] ∈ R6×6 : [i]T ν [i] ν c,o = P

(3.21)

Since the object is assumed to be rigid, the angular velocity of the contact point is equivalent to that of the object. However, to obtain the translational velocity, the displacement between the contact point and the origin of the body-fixed frame has to be considered:  03×3 I 3×3 P [i] = (3.22) 3×3 S(c[i] o − x pos ) I with x pos ∈ R3 being the position of the object:

T x pos = x y z

(3.23)

and S ∈ R3×3 being the cross-product matrix, which can be formulated as follows, for a vector r = (r x r y r z )T : ⎛

⎞ 0 −r z r y S(r) = ⎝ r z 0 −r x ⎠ −r y r x 0

(3.24)

In order to express the contact twist w.r.t. the contact-fixed frame {C [i] }, both its translational and angular velocity have to be rotated according to the orientation of {C [i] }: c

ν [i] c,o = R

[i]T

ν [i] c,o

(3.25)

where: R

[i]

 =

R[i] 03×3 03×3 R[i]

(3.26)

The rotation matrix R[i] ∈ R3×3 of the contact frame is obtained from its defining vectors:

R[i] = n[i] a[i] b[i]

(3.27)

3.2 Kinematics

41

˜ [i] ∈ R6×6 for contact i can be formulated: Finally, the partial grasp matrix G ˜ [i]T = R[i]T P [i]T G

(3.28)

It maps the object twist in {I } to the contact twist in {C [i] }: c

˜ [i]T ν ν [i] c,o = G

(3.29)

˜ ∈ R6×6n : Stacking the n partial matrices yields the complete grasp matrix, G

 ˜ = G ˜ [1] G ˜ [2] · · · G ˜ [n] G

(3.30)

which relates the object twist to the velocities of all contacts, denoted by the vector ν c,o ∈ R6n :

c

c

˜ Tν ν c,o = G

(3.31)

3.2.3 Hand Jacobian Similar to how the grasp matrix maps the object twist to the contact velocities, the hand Jacobian matrix allows to relate the joint velocities to the twist of the contacts. 6 For a contact i on the surface of the corresponding finger link, the vector ν [i] c, f ∈ R denotes its twist w.r.t. {I }. It can be calculated from the vector of joint velocities, q˙ ∈ Rm , using the matrix Z [i] ∈ R6×m [1]: [i] ˙ ν [i] c, f = Z q

(3.32)

with: q˙ =

dq dt

(3.33)

Matrix Z [i] is defined by the kinematics of the fingers and can be expressed as follows:  [i,1] [i,2] d · · · d [i,m] d [i] Z = [i,1] [i,2] (3.34) l l · · · l [i,m]

42

3 Grasp Modeling

The values of d [i, j] ∈ R3 and l [i, j] ∈ R3 depend on whether a joint j is affecting a contact i. If the contact is not affected by the respective joint, both vectors are zero: d [i, j] = 03×1 l

[i, j]

=0

3×1

(3.35) (3.36)

Otherwise, d [i, j] is obtained as follows: [ j] T [ j] d [i, j] = S(c[i] ˆ f −ζ ) z

(3.37)

where S denotes the cross-product matrix of Eq. (3.24). The vector ζ [ j] describes the position of the origin of the joint-fixed coordinate frame {S [ j] }, while zˆ [ j] is the direction of the rotation axis of joint j. Both vectors are expressed in the inertial frame {I }. Vector l [i, j] is obtained from the direction of the joint axis: l [i, j] = zˆ [ j]

(3.38)

These formulations only apply for revolute joints, which are the focus of this work. However, d [i, j] and l [i, j] can also be modified to represent other joint types, such as prismatic joints. As before, the contact twist shall be described w.r.t. the contact-fixed frame {C [i] }, [i] which involves rotating the global vector, using R of Eq. (3.26): c

ν [i] c, f = R

Combining the matrices Z [i] and R J˜[ j] ∈ R6×m :

[i]

[i]T

ν [i] c, f

(3.39)

yields the partial hand Jacobian matrix

[i] [i]T J˜ = R Z [i]

(3.40)

It relates the joint velocities to the twist of contact i, expressed in {C [i] }: c

˜ [i] ˙ ν [i] c, f = J q

(3.41)

Finally, stacking the partial matrices produces the complete hand Jacobian matrix J˜ ∈ R6n×m , which maps the joint velocities to the vector of all contact twists, c ν c, f ∈ R6n : c

ν c, f = J˜ q˙

(3.42)

3.2 Kinematics

43

with: T

[1]T [2]T [n]T J˜ = J˜ · · · J˜ J˜

(3.43)

3.2.4 Contact Model The previous elaborations introduced two means of describing the twist of a contact point. First, the object twist is related to c ν [i] c,o through the grasp matrix. And second, the hand Jacobian matrix maps the joint velocities to c ν [i] c, f . However, since a contact point does not represent a rigid connection between the object and a finger, not all DoF of the contact twist may be transferred: c

c [i] ν [i] c,o  = ν c, f

(3.44)

In fact, which components of the twist are transmitted depends on the specific contact model that is chosen to describe the in-hand behavior. The contact model h×6 , which selects the h components of the defines the entries of a matrix H [i] c ∈R relative twist that are transferred: c [i] c [i] h×1 H [i] c ( ν c,o − ν c, f ) = 0

(3.45)

Practically, there are three contact models that are most relevant in the context of grasping [1]. Each shall be briefly introduced, before discussing the choice of one of them for the remainder of this work:

Point Contact without Friction: In this model, only the normal component of the translational velocity of the contact twist is transmitted. Therefore, the corresponding H [i] c matrix only has one row:

H [i] c = 0 0 1 0 0 0

(3.46)

Neither the tangential components, nor any of the elements of the angular velocity are transferred. The point-contact-without-friction model is most applicable for grasps, in which fingers may easily slide on the object. This may be the case if there is very low friction or if the contact surface is particularly small.

44

3 Grasp Modeling

Hard Finger: The hard finger model is suitable, when there is considerable friction between the surface of the fingers and the object, but the contact area is still relatively small. In this case, all components of the translational velocity are transmitted, which is realized by the following selection matrix: ⎛

H [i] c

⎞ 100000 = ⎝0 1 0 0 0 0⎠ 001000

(3.47)

Since the small contact surface does not allow to transfer any significant moments, the angular velocity components are still omitted.

Soft Finger: In contrast to the previous two models, the soft finger model assumes a significantly large contact area, such that a friction moment around the normal direction of the contact can be transmitted as well. This is expressed by the following H [i] c matrix: ⎛

H [i] c

1 ⎜0 =⎜ ⎝0 0

0 1 0 0

0 0 1 0

0 0 0 1

0 0 0 0

⎞ 0 0⎟ ⎟ 0⎠ 0

(3.48)

The tangential components of the angular velocity are still not transferred. Rotations in these DoF typically causes the contact to roll on the surface, without generating any considerable friction moment. In the context of this work, the hard finger model was chosen to represent the contact behavior. For the dexterous manipulation scenarios, which are the focus of this work, this model proved most applicable. The skin of the David hand, which is made of a silicone, provides enough surface friction to transmit all three translational velocity components. However, when interacting with rigid objects, the contact areas are typically too small to transfer any significant friction moments. Based on this choice, the selection of the transmittable DoF can be applied to the grasp and hand Jacobian matrices. First, the selection matrix of Eq. (3.47) is extended to realize the chosen contact behavior for all n contacts: ⎞ ⎛ [1] 3×6 Hc 0 · · · 03×6 3×6 ⎟ ⎜03×6 H [2] c ··· 0 ⎟ ⎜ Hc = ⎜ . (3.49) . .. . . . ... ⎟ ⎠ ⎝ .. 03×6 03×6 · · · H [n] c

3.2 Kinematics

45

Multiplying H c ∈ R3n×6n with the grasp and hand Jacobian matrices yields the corresponding selected matrices G H ∈ R6×3n and J H ∈ R3n×m : ˜T G TH = H c G J H = H c J˜

(3.50) (3.51)

These matrices relate the object twist and joint velocities to the translational contact velocities. Based on the hard finger model, all components of the contact velocities are transmitted. Therefore the following equality can be expressed: G TH ν = c c˙ o = c c˙ f = J H q˙

(3.52)

with c c˙ o and c c˙ f being the vectors of the n translational contact velocities, each expressed in the corresponding {C [i] }. This equation remains valid, as long as the contact configuration is maintained, i.e. the fingers do not slide on the object or separate from it. The utilization of the hard finger model allows to further simplify the formulation in Eq. (3.52). Since, in this model, the three velocity components are transmitted, the equality can be expressed in any reference frame. This includes the inertial frame {I }, such that: c˙ o = c˙ f

(3.53)

In order to obtain the contact velocities w.r.t. {I }, the grasp and hand Jacobian [i] matrices are calculated without considering the rotation matrix R . Therefore, Eqs. (3.28) and (3.40) reduce to: G [i] = P [i] J

[i]

=Z

[i]

(3.54) (3.55)

Following this simplification, the final grasp matrix, G ∈ R6×3n , and hand Jacobian matrix, J ∈ R3n×m , shall relate the object twist and joint velocities, respectively, to the velocities of the contact points, c˙ , expressed in {I }: c˙ = G T ν c˙ = J q˙

(3.56) (3.57)

These relations identify the grasp matrix and hand Jacobian matrix as the most important quantities for describing the velocity kinematics of the grasp. Moreover, these matrices are just as relevant for the modeling of the grasp dynamics, as will be discussed in the next section.

46

3 Grasp Modeling

3.3 Dynamics Following the derivation of the kinematics model of the grasp, this section is concerned with the modeling of the dynamic behavior of the hand-object system. The formulations in the previous section allowed to relate the twists and velocities of the different quantities. In contrast, the dynamics model of the grasp describes the relation between the various forces, torques and wrenches that are relevant to the grasp analysis. This section begins by formulating the rigid body dynamics of both the object and the fingers of a robotic hand. Subsequently, the dynamics of the grasp are explained, which involves relating the contact forces to the joint torques and object wrench. Finally, the friction dynamics of the contacts are discussed.

3.3.1 Rigid Body Dynamics Modeling the dynamics of the grasp system involves describing the motion of both the object and the fingers. Figure 3.5 illustrates the most important quantities. For the object, the model is expressed by its rigid body dynamics. The motion of the object is characterized by the Newton-Euler equations. When formulated w.r.t. the center of mass of the object, the equations can be expressed as follows: M o (x)ν˙ + bo (x, ν) + w g = w

(3.58)

Fig. 3.5 The dynamics of the grasp relate the contact forces, f c , to the corresponding joint torques, τ c , and object wrench, w c

3.3 Dynamics

47

with M o ∈ R6×6 being the inertia matrix of the object, bo ∈ R6 the vector of the velocity product terms and w g ∈ R6 the wrench that is generated by gravity:  m o I 3×3 03×3 M o (x) = 03×3 I o  [3×1] 0 bo (x, ν) = ω × I oω T

w g = 0 0 −m o g 0 0 0

(3.59) (3.60) (3.61)

where m o ∈ R is the mass and I o ∈ R3×3 the matrix of the moments of inertia of the object and g ∈ R the gravitational acceleration. The wrench vector w ∈ R6 contains all additional forces and moments acting on the object. In the context of grasping, they can be separated in two distinct contributions: M o (x)ν˙ + bo (x, ν) + w g + w c = wext

(3.62)

Here, wext ∈ R6 denotes any external wrenches on the object. Vector wc ∈ R6 is the wrench, which is applied through the contacts by the fingers of the hand. The formulation of this term will be addressed later in this section. The dynamics of the fingers are described by the equations of motion of the multibody system. Based on the previous Newton-Euler formulation, the dynamics of the kinematic chain can be expressed as follows: ˙ + τ g (q) = τ M f (q)q¨ + b f (q, q)

(3.63)

Here, M f ∈ Rm×m denotes the inertia matrix, b f ∈ Rm the vector of the velocity product terms of the fingers and τ g ∈ Rm the gravitational torques. Determining these quantities for a multi-body system is non-trivial. Therefore, various method have been proposed in literature, which are typically derived from either the NewtonEuler equations [2, 3] or the Lagrange formulation [4, 5]. For the modeling of the grasp system, the vector of any additional torques, τ ∈ Rm , is further divided: ˙ + τ g (q) + τ c = τ ext + τ act M f (q)q¨ + b f (q, q)

(3.64)

with τ ext ∈ Rm being the vector of any external loads on the joints of the fingers. Additionally, the equation contains the vector τ act ∈ Rm , which denotes the torques that are generated by the actuators of the fingers. Finally, τ c ∈ Rm represents the effect of the grasp forces on the joints, which will be elaborated next.

48

3 Grasp Modeling

3.3.2 Grasp Dynamics When an object is grasped by a robotic hand, the fingers apply loads on the object, which are transmitted through the contact points. The components of the force and moment, which are transferred through a contact of index i, depend on the specific contact model and are obtained using the selection matrix H [i] c of Eq. (3.45): c

λ[i] c

=

H [i] c

c

f [i] c c [i] mc

(3.65)

c h 3 where c λ[i] c ∈ R is the vector of the transmitted contact loads, f c ∈ R is the contact c 3 force and mc ∈ R is the contact moment. All vectors are described in the contact frame {C [i] }. In the previous section, the hard finger model was selected as the contact model, which is used in the context of this work. This simplifies the expression of the transmitted loads, since only the contact force is retained: c

c [i] λ[i] c = fc

(3.66)

Similar to the contact velocities, this vector can also be expressed w.r.t. the inertial frame {I }: [i] λ[i] c = fc

(3.67)

Because of the equality of the vector of the transmitted components and the contact force, only the force vector f [i] c shall be further used to describe the relevant contact loads. The vector of all n contact forces shall be denoted f c ∈ R3n : T

f c = f [1]T f [2]T · · · f [n]T c c c

(3.68)

When applied through the contact points on the object, they generate the object wrench wc ∈ R6 . The relation between these two quantities is described by the grasp matrix, G, of Eq. (3.56) [1]: wc = G f c

(3.69)

The same matrix, which maps the object twist to the contact velocities, is applicable to relate these dynamic quantities, demonstrating the close relation between the kinematic and dynamic behavior of the object. Similarly, the mapping between the contact forces and the corresponding torques on the joints of the fingers is expressed by the hand Jacobian matrix, J, of Eq. (3.57):

3.3 Dynamics

49

τc = JT f c

(3.70)

Here, the same matrix, which relates the joint velocities to the contact velocities, is used to derive the joint torques from the contact forces. Inserting Eq. (3.69) in Eq. (3.62) and Eq. (3.70) in Eq. (3.64), the complete dynamics model of the grasp can be expressed: M o (x)ν˙ + bo (x, ν) + w g + G f c = wext ˙ + τ g + J f c = τ ext + τ act M f (q)q¨ + b f (q, q) T

(3.71) (3.72)

This model is valid, as long as the conditions for the hard finger model are met. Principally, this involves ensuring that the contact forces are fully transmitted through the contacts. The corresponding dynamics of the contact are addressed next.

3.3.3 Contact Dynamics The hard finger model assumes that the contact force that is generated by a finger is equivalent to the one that is applied to the object. For this to be the case, the two surfaces have to remain in a static friction regime. If the finger were to slide on the object, this would no longer be the case. Moreover, the kinematic assumption, i.e. that the velocity of the contact point on the object is the same as the one on the finger link, would also not be valid anymore. The dynamics of the contact friction are described by the Coulomb model, which formulates the static friction constraint: f ≤ μ f⊥

(3.73)

Here, f ⊥ ∈ R and f  ∈ R denote the normal and tangential components of a force f ∈ R3 , which is transmitted between two surfaces. The scalar μ represents the coefficient of friction, which is defined by the interacting materials. When applied to the grasp system, the friction constraint can be visualized as a cone around the normal direction of the contact. For the contact force, f c , to be fully transmitted through the contact, the force vector has to lie inside of the friction cone. If the contact force lies outside of the friction cone, the finger starts to slide on the object surface. Both cases are illustrated in Fig. 3.6. By describing the contact force w.r.t. {C [i] }, the normal and tangential components of the vector are obtained: c

[i] [i] [i] T f [i] fa fb c = fn

(3.74)

50

3 Grasp Modeling

(a)

(b)

Fig. 3.6 The friction constraint of a contact c can be described by a cone around the contact normal n, with its opening angle being defined by the friction coefficient μ. Whether a force f lies inside of the cone is determined by the ratio of its normal ( f ⊥ ) and tangential ( f  ) components. a If the force is inside of the cone, static friction is maintained. b If the force lies outside of the cone, the surfaces will start to slide relative to each other

They allow to express the friction constraint of Eq. (3.73) for contact i: 

f a[i]2 + f b[i]2 ≤ μ f n[i]

(3.75)

with μ ∈ R being the friction coefficient for the pair of materials of the finger and the object.

3.4 Grasp Subspaces The previous two sections identified the grasp and hand Jacobian matrices as the most important tools to relate the various kinematic and dynamic quantities of the hand-object system. On the one hand, the grasp matrix allows to map the twist of the object to the velocity of the contact points, as well as the contact forces to the object wrench. On the other hand, the hand Jacobian matrix describes the relation between the joint velocities and the contact velocities, as well as between the contact forces and the joint torques. However, these relations are not sufficient to describe all aspects of the grasp behavior, which might be of interest. For example, they do not allow to answer, what set of contact forces would produce a desired object wrench. Similarly, the joint velocities, which would generate a given contact velocity, cannot be determined. Describing these relations requires the inverse mapping of the grasp quantities. However, in general, these calculations do not have a clear solution, since they represent

3.4 Grasp Subspaces

51

problems, which are either over- or under-constrained. There exist different options for the determination of the inverse relations. Using the Moore–Penrose inverse for this operation, the resulting vectors represent the least-square solution to the respective problem: νˆ = G T + c˙

(3.76)

qˆ˙ = J c˙

(3.77)

ˆf c = G + wc

(3.78)

ˆf c = J

(3.79)

+

T+

τc

where the superscript + denotes the pseudoinverse of a matrix andˆthe least-square solution. Figure 3.7 summarizes both the forward and inverse mappings of the grasp quantities. In Eqs. (3.76) and (3.79), the left-hand side vectors typically represent the closest possible solution, since these problems are often over-constrained. For example, for grasps, which involve three or more fingers, the velocities of the contact points can generally not be mapped to an equivalent object twist, since it only has 6 DoF. Therefore, the vector νˆ represents the object twist, which best matches the given contact velocities. Similarly, a vector of four or more joint torques cannot be bijectively related to a contact force with 3 DoF. In contrast, under the same conditions, the mappings in Eqs. (3.77) and (3.78) are under-constrained. This means, the left-hand side vectors represent only one possible solution to the problem. For instance, if the number of joints exceeds the DoF of a contact, the same contact velocity may be generated by different combinations of

(a)

(b) Fig. 3.7 The grasp matrix G and hand Jacobian matrix J relate the kinematic and dynamic quantities of the grasp to one another. Through the pseudo-inverse relation (+ ), the least-square solution is obtained. a Kinematics: G maps the object twist to the contact velocities; J maps the joint velocities to the contact velocities. b Dynamics: G maps the contact forces to the object wrench; J maps the contact forces to the joint torques

52

3 Grasp Modeling

joint velocities. Equally, different sets of contact forces may result in the same object wrench. Mathematically, this additional freedom in the solution space of an underconstrained problem can be expressed by the nullspace of the mapping matrix: N( A) = (I − A A+ )

(3.80)

where N is the nullspace of a matrix A. Extending Eqs. (3.77) and (3.78) with their respective nullspaces yields the subspace formulations of these mappings: q˙ = J + c˙ + (I − J + J)q˙ null f c = G + wc + (I − G + G) f c,null

(3.81) (3.82)

By projecting the vector q˙ null onto the nullspace of J + , additional joint velocities may be generated, without influencing the motion of the contact points. Similarly, f c,null represents a set of internal contact forces, which do not affect the object wrench, after being projecting onto the nullspace of G + . For the over-constrained mappings that have been discussed, there exist no exploitable nullspaces. However, if the inverse relation of two quantities is overconstrained, then the forward mapping is under-constrained and therefore contains a non-trivial nullspace: c˙ = G T ν + (I − G T G T + )˙cnull τ c = J f c + (I − J J T

T

T+

)τ c,null

(3.83) (3.84)

Here, the nullspace projection of the additional contact velocities c˙ null does not influence the twist of the object, and the added nullspace torque τ c,null modifies the joint loads, without affecting the contact forces. Figure 3.8 illustrates the subspace components of the mappings in Eqs. (3.81) to (3.84).

3.5 Types of Grasps When describing the behavior of the hand-object system, there are several types of grasps, which are important to distinguish. In the context of the problems in this work, two classes are most relevant, the precision grasp and the power grasp. Therefore, any consideration w.r.t. the modeling of these two configurations shall be discussed.

3.5 Types of Grasps

53

(a)

(b) Fig. 3.8 If the number of contacts is at least three (n ≥ 3) and the DoF of the fingers are greater than those of the affected contact points (m > 3n), there exist the illustrated nullspaces in the mappings between the kinematic and dynamic quantities of the grasp. a Kinematics: The nullspace of G T allows to realize an additional joint velocity, which does not influence the object twist; Similarly, moving the joints in the nullspace of J + is not affecting the contact velocities. b Dynamics: Applying contact forces in the nullspace of G + is not influencing the object wrench; An added torque, which is projected through the nullspace of J T , has no effect on the contact forces

Precision Grasp A precision grasp denotes a hand-object configuration, in which the object is only in contact with the fingertips of the hand. Figure 3.9a shows an example of such an arrangement. A precision grasp relies on force closure in order to maintain the grasp of the object. To be able to generate an object wrench in all of its 6 DoF, at least three contacts are required. Under these conditions, the grasp model, which was presented in this chapter, is fully applicable.

54

(a)

3 Grasp Modeling

(b)

Fig. 3.9 The type of grasp affects some of the aspects of the grasp model. In the context of this work, the distinction between precision and power grasps is most relevant. a Example of a precision grasp, which involves three fingertip contacts between the object and the hand. b In this power grasp, the object is enveloped by the hand, resulting in a large number of contacts

Power Grasp In a power grasp, the fingers are wrapped around the object, enclosing it inside of the hand. The power grasp of a bottle is depicted in Fig. 3.9b. In this type of grasp, contacts do not only exist on the fingertips, but also on the other phalanxes of the fingers, as well as the palm of the hand. As a partial form closure, a power grasp relies on restricting the possible motion of the object, thereby creating a very robust hand-object configuration. W.r.t. the grasp modeling, the contact configuration of a power grasp limits the applicability of some aspects of the presented formulations. In particular, the existence of contacts on multiple phalanxes of a finger can cause a grasp to become hyperstatic. In this case, there are internal hand forces, which cannot be determined or controlled. Consequently, the contact forces, which act on the object, can no longer be fully described. This problem in the modeling of the grasp is caused by limitations of the rigid body assumption [1]. Moreover, because of the kinematic configuration of a power grasp, it may be impossible to generate object twists in arbitrary directions. This is most apparent for contacts with the palm of the hand. Since the palm is fixed w.r.t. the hand, the object cannot move in the normal direction of the surface, while still maintaining the contact.

References

55

Of course, the grasp of an object may be characterized as neither a precision, nor a power grasp. However, these intermediate grasps may be affected by the same limitations as a power grasp. Therefore, in the context of this work, they are treated similarly. The grasp state estimation method, which is presented in the next chapter, was designed to be applicable for all types of grasps. That means, it can be used for precision, intermediate, as well as power grasps. For the incorporation of kinematic information in the in-hand localization, the rigid body model is a sufficient approximation of the hand-object system. In contrast, the in-hand object controller, which is proposed in Chap. 5, may only be used for precision grasps. The large number of contacts of a power grasp make it impossible to freely position an object inside of the hand, which is the purpose of the controller. Moreover, in the case of a hyperstatic grasp, the desired forces on the object can no longer be fully controlled, because of the limitations of the rigid body model. If a grasped object is perceived to be in such a configuration, only one contact per finger is considered by the controller.

References 1. Domenico Prattichizzo and Jeffrey C. Trinkle. Grasping. In Springer Handbook of Robotics, Chapter 28, pages 671–700. Springer, 2008. 2. Gianluca Garofalo, Christian Ott, and Alin Albu-Schäffer. On the closed form computation of the dynamic matrices and their differentiations. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2364–2359. IEEE, 2013. 3. David E. Orin, R.B. McGhee, M. Vukobratovi´c, and G. Hartoch. Kinematic and kinetic analysis of open-chain linkages utilizing Newton-Euler methods. Mathematical Biosciences, 43 (1– 2):107–130, 1979. 4. Wayne J. Book. Recursive Lagrangian dynamics of flexible manipulator arms. The International Journal of Robotics Research, 3 (3):87–101, 1984. 5. John M. Hollerbach. A recursive Lagrangian formulation of maniputator dynamics and a comparative study of dynamics formulation complexity. IEEE Transactions on Systems, Man, and Cybernetics, 10 (11):730–736, 1980.

Chapter 4

Grasp State Estimation

This chapter introduces the proposed method for the estimation of the grasp state of a manipulated object. It combines different sensor modalities in order to provide a robust estimate of the object pose, contact configuration and joint position errors. The chapter begins by giving an intuition of how different types of measurements can be used in the in-hand localization process and by formally defining the problem. Subsequently, a brief comparison of state estimation methods in the context of inhand manipulation is presented. The majority of this chapter focuses on the description of the two main components of the in-hand localization method. First, the detection of contacts between the hand and the object, as they are inferred from various measurements, is described. Second, the fusion of a range of sensors modalities as part of the object pose estimation is explained, ranging from joint position measurements to fiducial markers, contour features and visual object tracking. In conclusion of this chapter, the experimental validation of the estimation method is presented, including a comparison of the capabilities of the various sensors measurements in a range of manipulation scenarios, as well as a discussion of the results.

4.1 Introduction Currently, even simple manipulation tasks, such as pick-and-place operations, may fail because of a lack of knowledge about the state of the object inside of the hand. Grasping an object usually involves closing the fingers of the robotic hand along pre-planned trajectories. However, inaccuracies in the planning model, incorrect assumptions about the grasp conditions, unknown physical properties or many other effects will cause the object motion to be different than anticipated. Although the grasp may still be qualitatively successful, i.e. resulting in a stable grasp of the object, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. Pfanne, In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands, Springer Tracts in Advanced Robotics 149, https://doi.org/10.1007/978-3-031-06967-3_4

57

58

(a)

4 Grasp State Estimation

(b)

Fig. 4.1 Objects may move during the grasp acquisition in ways that were not planned or foreseen. In this example, the ketchup bottle tilts, when being picked up. a Hand and object before the grasp. b The power grasp moves the bottle inside of the hand

not knowing the final grasp state may cause subsequent tasks to fail. For instance, the ketchup bottle in Fig. 4.1 unintentionally tilted, when it was grasped. This tilt would need to be compensated in order to prevent the bottle from falling over, when placing it. However, without any knowledge of the in-hand motion of the object during the grasp, this is not possible. Potentially, many sources of information can be used to inform such an estimation, when trying to infer the state of the grasp. In the case of the ketchup bottle grasp, just examining the measured finger positions will reveal inconsistencies with the assumed, upright pose of the object, since the fingers would need to be inside of the bottle. If available, tactile sensors or torque measurements of the joints allow to infer contacts between the fingers and the object. Finally, visual information, although often impaired by occlusions, may additionally constrain the estimated grasp state.

4.1.1 Concept In the following, an illustrating example is used to give an intuition of how different sensor modalities can be combined in order to produce a consistent estimation of the grasp state. Figure 4.2a depicts an object held by a manipulator with three fingers. The object is drawn at its assumed pose, which differs from the ground truth.

Kinematic Measurements Kinematic measurement are used to determine the position of the finger links, e.g. from joint position sensors. Combined with knowledge of the finger and object geometries, these measurements allow to identify collisions between the two. As it is

4.1 Introduction

(a)

59

(b)

(c)

Fig. 4.2 The measurement of the joint positions allows to identify inconsistencies between the fingers and the estimated object pose. By correcting the pose of the object and/or fingers, collisions between the two can be resolved. Furthermore, the localization of contact points allows to predict object displacements from the motion of the fingers. a Illustration of the estimated (solid, orange) and the actual (dashed) pose of the object, as well as inconsistencies in the assumed grasp state (red). b Resolving the inconsistencies improves the pose estimation. Identified contact points are marked in red. c Prediction of the object displacement from the motion of the fingers, using the identified contacts

physically not possible for two bodies to occupy the same space, these collisions point to an error in the assumed poses of the object and/or fingers, which are themselves calculated from inaccurate joint position measurements. Figure 4.2a illustrates the case, in which two of the fingertips are in collision with the object. By moving the estimated pose of the object and fingers, these collisions are resolved, as depicted in Fig. 4.2b. This corrected grasp configuration, while not yet fully aligned with the ground truth, already improves the estimation considerably. Additionally, collisions in the estimation can be used to infer contact points between the fingers and the object (see Fig. 4.2b). This allows to predict displacements of the object from motions of the fingers. Chapter 3 described how joint velocities can be related to object displacements, using the hand Jacobian and grasp matrices. Figure 4.2c illustrates how the rotation of the object is inferred from the finger motions.

Contact Detection Contacts between the object and the finger links can be determined through a number of different sensor modalities. These range from joint torque measurements, which can be used to indirectly infer contacts, to tactile sensors, integrated in the skin of the robotic hand, which provide the precise location of the contact point and even the direction and magnitude of the applied forces. In any case, w.r.t. the inhand localization problem, the detection of a contact allows to further constrain the estimated grasp state. For instance, if a contact is detected on the tip of the right-side finger in Fig. 4.3a, the estimated object and finger poses can be corrected to satisfy

60

(a)

4 Grasp State Estimation

(b)

Fig. 4.3 The detection or inference of contacts between the object and the fingers allows to further constrain the pose, thereby improving the estimation. a Sensor measurements identified a contact on the finger link that is marked in red. b The pose of the object is corrected in order to align it with the finger

this constraint, as depicted in Fig. 4.3b. Moreover, the detection of additional contacts will improve the prediction of the object motion from the joint velocities, as outlined before.

Visual Information As with the contact detection, visual information may be provided and utilized in various ways to inform the estimated grasp state. A fiducial marker that is placed on the object and observed by a camera can be used as a set of image coordinates that describe its pose. Knowledge about the contour of the object allows to detect characteristic features, even if large parts of the object are occluded. And the combination of RGB and depth images make it possible to independently track the object and probabilistically fuse the output of this method. Evensparsevisualinformationcanhelptoconstrainthegraspstateestimation,when combined with kinematic and contact measurements. The principal error that remains in the illustrated example, after considering the previous measurements, is along the larger dimension of the object, since this direction is not constrained by the finger measurements. However, the detection of a single visual feature, such as the upper-right corner, as illustrated in Fig. 4.4a, allows to align the object. Here, the pixel coordinates of the detected feature are related to its estimated position (Fig. 4.4b) and subsequently corrected to resolve the difference. Figure 4.4c shows the final estimation.

4.1 Introduction

(a)

61

(b)

(c)

Fig. 4.4 Even sparse visual information allows to further improve the estimation quality. Individual features can be used to align the camera view of an object and the estimation, similar to the contact points. a A characteristic object feature is extracted from the camera view of the grasp. b The position of the same feature is identified at the estimated object pose. c By correcting the difference between the two positions, the object estimation is aligned with the real pose

4.1.2 Problem Statement The illustrated example demonstrates how different sensor modalities can be utilized and combined in order to inform the grasp state estimation. Formally, this grasp state is described by the following quantities: • The pose of the object • Errors in the joint measurements of the fingers • Contact positions on the surface of the object and the finger links Subsequently, the goal of the in-hand localization method is to determine these quantities. As presented, the posed problem can be divided into three components: • Contact detection • State estimation from kinematic data • Fusion with visual data The integration of each of these components shall be based on specific assumptions and available data:

Contact Detection Contacts can be detected or inferred through various sensor modalities. Determining the contact points based on finger position measurements, as illustrated in Fig. 4.2b, assumes the availability of the following information: • (Inaccurate) measurements of the joint positions of the fingers • An (inaccurate) initial estimate of the pose of the object

62

(a)

4 Grasp State Estimation

(b)

Fig. 4.5 Illustration of the inputs that are used in the proposed method. The estimation from finger measurements relies on the availability of the hand and object geometries, as well as an initial object pose, and incorporates joint position measurements. Additionally, joint torque measurements or tactile sensors allow to detect contacts. Finally, a camera, which is mounted to the head of the robot, provides views of the scene. a 3D visualization of the available information at the beginning of the estimation. b The corresponding view of a head-mounted RGB camera

• That the object is rigid and of known geometry • That the finger links are rigid and of known geometry The initial object pose can potentially be provided by a classical vision system. During the manipulation, occlusions of the object by the hand make it very challenging to localize purely from vision. However, before the hand approaches the object, this is not yet a problem. The poses of the finger links are calculated from the joint angles, using a kinematic description of the fingers, as described in Chap. 3. Figure 4.5a depicts a 3D visualization of the information and inputs, which are required for the initialization of the estimation. Additionally, contacts between the fingers and the object can be sensed. The considered means of contact detection shall include: • (Inaccurate) measurements of the joint torques of the fingers • (Inaccurate) measurements from tactile or contact sensors

State Estimation from Kinematic Data Using the determined contact points, the goal of the second component is to estimate the object pose and errors in the joint positions, such that the grasp state is consistent with the measurements. Additionally, the method shall provide a measure of the uncertainty of the estimation.

4.2 Probabilistic Grasp State Estimation

63

Fusion with Visual Data In order to improve the grasp state, visual data shall be probabilistically fused in the estimation. For this, three types of visual information are considered: • Detection of fiducial markers, attached to the object • Contour features, extracted from monocular camera images • The estimated object pose from a visual object tracker Figure 4.5b shows an exemplary camera view from a head-mounted camera, from which these visual information can be extracted.

4.2 Probabilistic Grasp State Estimation Since the grasp state cannot be observed directly, it has to be inferred from sensor data. However, all sensor measurements are corrupted by noise or other inaccuracies. Probabilistic state estimation is a means of recovering the state by explicitly representing uncertainties [1]. Following an introduction of the fundamentals of probabilistic filtering, this section presents two popular approaches and discusses them in the context of the in-hand localization problem. First, a particle filter solution, as an example of a non-parametric filter. And second, an extended Kalman filter (EKF), which is a variant of a Gaussian filter. Finally, a discussion of their respective advantages and disadvantages, in the context of the grasp state estimation, yields the EKF as the preferable choice for this work.

4.2.1 Fundamentals In probabilistic robotics, state estimation describes the process of indirectly inferring relevant quantities of a system from sensor data, which may only provide partial information about the state and is subject to inaccuracies and noise. The state ˆy at time t is denoted ˆyt . In order to explicitly consider the uncertainty in the sensor measurements, probabilistic algorithms calculate belief distributions over possible states, instead of computing just a best guess. Hereby, the belief bel represents the internal knowledge about the state. Formally, the belief describes the posterior probability over the state conditioned on the sensor data. The base algorithm for calculating the belief is the Bayes filter. It estimates the evolution of the state from measurement and control data. The Bayes filter is a recursive algorithm, which computes the belief at time t, bel( ˆyt ), from the prior belief at time t − 1, bel( ˆyt−1 ). Therefore, an initial belief distribution at time t = 0, bel( ˆy0 ), has to be provided.

64

4 Grasp State Estimation

Starting from the initial belief, the Bayes filter recursively estimates the state in two steps. In the first step, the predictive belief, bel( ˆyt ), is calculated from the previous belief, bel( ˆyt−1 ), and the current control data, ut :  bel( ˆyt ) =

p( ˆyt | ˆyt−1 , ut )bel( ˆyt−1 )d ˆy

(4.1)

Controls represent actions executed by a robot, which actively change the state. The conditional probability distribution p( ˆyt | ˆyt−1 , ut ) is called the state transition probability. It describes the probability of the system being at state ˆyt , under the condition that the previous state was ˆyt−1 and the most recent control inputs were ut . Therefore, it represents the motion model of the system. The calculation of bel( ˆyt ) is called the prediction step, or just prediction. The second step of the Bayes filter consists of the measurement update, or update step, of the state belief. Here, the posterior belief, bel( ˆyt ), is calculated from the predictive belief, bel( ˆyt ), and the measurement data, z t : bel( ˆyt ) = ηt p(z t | ˆyt )bel( ˆyt )

(4.2)

The measurement probability p(z t | ˆyt ) describes the probability of observing a measurement z t under the condition that the state is ˆyt . It represents the measurement model. The scalar ηt denotes a normalizing constant. The Bayes filter is the most general algorithm for probabilistic state estimation. However, it is not a practical algorithm. As stated, the filter could only be computed for very simple problems. Derived from the Bayes filter, there exist a number of practical filter implementations. Relying on different assumptions, these algorithms vary in their way of representing the belief and describing the motion and measurement models. In the following, the two most popular variants of the Bayes filter, the particle filter and the extended Kalman filter, are presented.

4.2.2 Particle Filter The particle filter is a discrete, non-parametric implementation of the Bayes filter algorithm. It describes the belief by a finite set of state samples, called particles. [m ] Each of these particles, yt p , represent a concrete hypothesis of the state at time t. The set of all M p particles is denoted by Yt : [M p ]

[2] Yt = { y[1] t , yt , · · · , yt

}

(4.3)

The particles are drawn from the posterior distribution in order to approximate the belief. Starting from an initial set of particles, Y0 , the particle filter recursively estimates the state from measurement and control data. This consists of three steps,

4.2 Probabilistic Grasp State Estimation

65

beginning with the prediction step. Here, for each particle in the set, the predicted [m ] state, yt p , is generated based on the control data, ut : [m p ]

yt

[m ]

∼ p( ˆyt | yt−1p , ut )

(4.4)

The predicted state is sampled from the state transition probability, which is described by the motion model of the system. [m ] In the second step, for each particle, a scalar weight, wt p , is calculated from the predicted state and the measurement data, z t : [m p ]

wt

[m p ]

= p(z t | yt

)

(4.5)

These weights represent the probability that a given measurement was made at [m ] the sampled states. Using the measurement probability, p(z t | yt p ), the weighted particles describe the posterior belief, bel( yˆ t ). The final step of the particle filter is the re-sampling, in which a new set of particles is drawn from the old set according to the previously calculated weights. Since the size of the set remains the same, particles with high weights are likely to be drawn multiple times, while particles with low weights are discarded. Consequently, the distribution of particles in the new set represents the posterior belief. Because of the discrete sampling, particle filters are able to represent any belief distribution. In contrast to other implementations of the Bayes filter, this allows to describe separate, distinct hypotheses of the state, such as in multi-modal distributions. The accuracy of the particle filter depends on the number of samples. The computational complexity of the algorithm grows with the size of the set. Furthermore, because of the nature of the algorithm, approximation errors and the effect of random sampling may impair the convergence behavior of the filter. The development of an in-hand localization method based on a particle filter was the topic of the diploma thesis of the author of this manuscript [2], as well as a subsequent publication in [3]. In the proposed framework, each sample of the filter represents a concrete hypothesis of the 6 DoF pose of the object, which is weighted based on its consistency with the finger positions. Subsequently, the particles are re-sampled based on their weights. Iteratively, the estimation converges to the object pose, which best explains the finger measurements. An overview of the developed algorithm is presented in Fig. 4.6. Practically, the method was limited by the computational effort of the calculation of the particle weights, which involves determining the distances between the object and all finger links, for each hypothesis of the object pose. Therefore, only a relatively small number of particles could be used, while still meeting the real-time requirements of the framework. In turn, the particle filter implementation suffered from a slow convergence rate, making it impracticable for most applications, in particular those involving fast in-hand motions.

66

4 Grasp State Estimation

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

4.2 Probabilistic Grasp State Estimation

67

Fig. 4.6 Illustration of the main steps of the particle filter framework, which was proposed in [2]. a Each particle in the filter represents one hypothesis of the 6 DoF pose of the object. b The first set of particles is randomly sampled around the initial assumption of the object pose, marked by the yellow square. c In the prediction step of the filter, the displacement of the object is inferred from the motion of the fingers. d The effect of the prediction is calculated for each particle in the set. e The weight of a particle represents the consistency of the object pose with the finger measurements. f Poses, which are heavily colliding with the fingers are assigned small weights, while collision-free particles are weighted highly. g The particle with the highest weight represents the current best estimate of the object pose. h In the re-sampling step, particles are redistributed according to their weights

4.2.3 Extended Kalman Filter The most widely used implementation of the Bayes filter is the extended Kalman filter (EKF). It represents the belief as a multivariate Gaussian distribution, which is characterized by its first two moments, the mean, yt , and the covariance, P t . The EKF framework assumes that the evolution of the actual state, ˆyt , can be described by a motion model with additive Gaussian noise: ˆyt = f ( ˆyt−1 , ut ) +  t

(4.6)

Here, f is the motion model, which depends on the previous state, ˆyt−1 , and the most recent control inputs, ut , and  t is the zero mean Gaussian motion disturbance. This formulation approximates the state transition probability, p( ˆyt | ˆyt−1 , ut ), as a Gaussian distribution. Similarly, by expressing measurements as the sum of a measurement model and Gaussian noise, the measurement probability, p(z t | ˆyt ), is approximated by a Gaussian: z t = h( ˆyt ) + δ t

(4.7)

with h being the state dependent measurement model and δ t being the Gaussian measurement disturbance with zero mean. For systems, where f and h are described by linear functions in their arguments, the algorithm reduces to the original implementation of the Kalman filter. The extended Kalman filter was developed as an extension to the linear variant, specifically to make it applicable to non-linear Gaussian systems. Based on the previous assumptions, the extended Kalman filter algorithm is computed recursively in two steps. Figure 4.7 provides an overview of the method. The estimation starts from the initial belief about the state, represented by the mean y0 and the covariance P 0 . In the prediction step of the EKF, the predicted mean, yt , and covariance, P t , are computed from the previous moments, yt−1 and P t−1 , and the controls ut , using the motion model f :

68

4 Grasp State Estimation

Fig. 4.7 Diagram of the main components of an extended Kalman filter

yt = f ( yt−1 , ut )

(4.8)

Pt =

(4.9)

F t P t−1 F Tt

+ Rt

Here, Rt is the covariance of the motion disturbance,  t , and F t is the partial derivative of f :  ∂ f  Ft = ∂ y  yt−1 ,ut

(4.10)

In the update step, the posterior mean, yt , and covariance, P t , are corrected by incorporating the measurements z t : yt = yt + K t (z t − h( yt ))

(4.11)

P t = (I − K t H t ) P t

(4.12)

where H t is the partial derivative of h:  ∂ h  Ht = ∂ y  yt

(4.13)

The Kalman gain, K t , represents the degree, to which the measurements are incorporated and is calculated as follows: K t = P t H Tt (H t P t H Tt + Q t )

(4.14)

with Q t being the covariance of the measurement disturbance, δ t . Because of the analytical representation of the belief and the models, the extended Kalman filter is computationally very efficient, in particular in comparison to discrete

4.2 Probabilistic Grasp State Estimation

69

filter variants. However, the quality of the estimation depends on approximation errors that are introduced by the linearization of the models. Furthermore, describing the belief by a unimodel Gaussian distribution may be an insufficient representation of the probability distribution. To address these limitation, several extension to the EKF have been proposed, such as the unscented Kalman filter [4], which comprise of a more accurate linearization technique, or the multi-hypothesis filter, which is able to track a mixture of Gaussians.

4.2.4 Filter Selection During the initial development of an in-hand localization method in [2], a number of considerations led to the selection of the particle filter as the basis of the implementation. In [5], a comparison of a number of different filter techniques was presented, including the particle filter and extended Kalman filter. A classification of their most important characteristics is shown in Table 4.1. Considering the qualities of the particle filter, it was the clear choice for the developed method. However, practically, the limited efficiency of this approach greatly limited its applicability. For the grasp state estimation, the computation of the measurement update involves the execution of a number of geometric distance calculations. Since these have to be evaluated for each of the state samples individually, the practical number of particles is severely limited. The EKF avoids this problem by describing the state by its first two moments. Consequently, the measurement model only has to be evaluated once per time step, which significantly reduces the necessary computations. For complex measurement models, this makes the standard implementation of the EKF also preferable over other variants, such as the unscented Kalman filter, which relies on a small number of state samples. In addition, the probabilistic representation of the uncertainty of the state and measurements using the covariance enables the gradient-based formulation of the update step. This means, the estimation converges more efficiently towards the most

Table 4.1 Comparison of good (+), bad (−) and neutral (/) properties of the particle filter and the extended Kalman filter according to [5]. The main advantage of the EKF is its computational efficiency

Particle filter

Extended Kalman filter

Distribution

Discrete

Unimodal

Accuracy

+

+

Robustness

+

/

Sensor variety

+



Efficiency

/

+

Implementation

+

/

70

4 Grasp State Estimation

likely state, compared to the particle filter, which relies on random sampling. For these reasons, the extended Kalman filter was chosen for the realization of the proposed grasp state estimation method.

4.3 Contact Detection and Localization In order for the extended Kalman filter to be able to estimate the most likely grasp state, in each step, it is first necessary to describe the contact configuration between the hand and the object. Section 4.1 conceptually illustrated how various sensor measurements shall be used to inform the estimation. From this, contacts can be divided into two categories, collision contacts and sensed contacts. While collision contacts are inferred from areas of penetration between a finger link and the estimated object, sensed contacts are directly measured through additional sensing modalities, such as torque measurements or tactile sensors. In addition to the detection of contacts, the full description of the contact configuration requires to locate the position of the contact points, both on the surface of the object, as well as on the respective finger links. This section presents how these different aspects of the contact state detection are realized.

4.3.1 Collision Detection Figure 4.8 illustrates the estimated grasp state between the hand and an object. When describing potential contacts between the finger links and the object, two types have to be distinguished. The contact may either exist for two colliding bodies, which

Fig. 4.8 A contact of index i between a finger link and the object is characterized by the points c[i] o [i] [i] and c[i] f , which lie on the surface of the two bodies. While, in the colliding case (left), co and c f describe the location of the deepest penetration, in the non-colliding case (right), they represent the closest points on the two surfaces

4.3 Contact Detection and Localization

71

penetrate each other, as shown on the left. In the other case, there is a contact between a pair of bodies, which are not in collision, as depicted on the right. [i] 3 Geometrically, a contact of index i is described by the positions c[i] o ∈ R and c f ∈ R3 on the surface of the object and the finger link, respectively. For the non-colliding [i] case, c[i] o and c f shall be defined as the points of the smallest distance between the [i] two bodies. For the colliding case, c[i] o and c f characterize the penetration depth, i.e. [i] the vector c[i] f − co represents the smallest possible object displacement, by which the collision would be resolved. The scalar d [i] ∈ R describes the smallest distance or penetration depth of the two bodies, having a negative sign in the colliding case:  d

[i]

=

[i] c[i] if not colliding f − co  [i] [i] −c f − co  if colliding

(4.15)

Collision Contacts For any collision that is detected between the object and a finger link, a collision contact is added to the description of the contact configuration. Subsequently, the state estimation will be updated in order to resolve this collision, since it is not physically possible for two rigid bodies to be in penetration. Mathematically, this can be expressed by the constraint d [i] ≥ 0. As soon as the two bodies are no longer colliding, the contact will be removed from the estimation. Collision contacts can be inferred solely based on kinematic data, i.e. finger position measurements and geometric descriptions of the hand and the object.

Sensed Contacts Sensed contacts are determined from additional sensor measurements. Most directly, this is possible through tactile contact sensors, which are embedded in the skin of the robotic hand and able to detect whether a finger link is touching the object. In contrast to collision contacts, sensed contacts are added to the contact configuration both in the colliding and the non-colliding case. In both cases, the state estimation aims to align the two bodies, either by resolving a collision or by moving the finger link and the object closer together, subsequently satisfying the constraint d [i] = 0.

4.3.2 Joint Torque Measurements For torque controlled robotic hands, joint torque measurements represent an easily accessible sensor modality, from which sensed contacts can be inferred, without the need for additional hardware extensions. However, the torque measurements are

72

(a)

4 Grasp State Estimation

(b)

Fig. 4.9 Because of joint friction, it may not be possible to infer the location of a contact from joint torque measurements alone. In the illustrated examples, measured torques, which are bigger than [ j] the estimated joint friction, τ f , are highlighted in red. a Depending on the contact force, f c , only [1] τ may be greater than the joint friction torque, while τ [2] and τ [3] are smaller. b The same may be the case in this configuration, making it impossible to reliably distinguish the two, based on torque measurements alone

affected by joint friction. Therefore, only measured torques, which are greater than [ j] the estimated joint friction, τ f , should be considered, when inferring contacts. This can make it difficult to precisely locate a contact on the finger. Figure 4.9a illustrates a scenario, in which an object is in contact with the distal phalanx of a finger, applying a force through the contact. In turn, this results in a measurable torque at the joints of the fingers. Depending on the magnitude of the force, only the torque that is applied to the metacarpophalangeal joint (τ [1] ) may be above the joint friction threshold. However, this would also be the case if the object was in contact with the proximal instead of the distal phalanx, as depicted in Fig. 4.9b. This example demonstrates that it is not generally possible to locate the finger link that is in contact with the object solely from torque measurements, which are affected by joint friction. However, given the current best estimate of the grasp state, it is possible to make an educated guess about the contact location. If a set of torque measurements does not constrain the contact to one link, the most likely explanation is that the contact lies on the link that is closest to the object at its currently estimated pose. Figure 4.10 depicts several different combinations. Similar to the ambiguity of the contact point location, it is not possible to reliably identify multiple contacts on one finger based on torque measurements. Therefore, for the purpose of robustness, not more than one sensed contact per finger is added from this modality.

4.3.3 Contact Point Localization As was illustrated, the determination of the collision state between two geometric bodies is essential in order to fully identify the contact configuration between the hand and the object. This problem arises also in a range of other applications, such

4.3 Contact Detection and Localization

73

(a)

(b)

(c)

(d)

Fig. 4.10 Considering the current estimate of the object pose, the likeliest contact location, which is able to explain the torque measurements, is chosen. The selected finger link is highlighted in red. a Assuming this object pose, it is most likely that the contact is located on the distal phalanx. b Similarly, for this estimate of the object location, the proximal phalanx is inferred to be in contact. c If all three measured torques are greater than the joint friction, only a contact on the distal phalanx explains the measurements. d Note that this would still be the case, even if the current estimate of the object pose is closer to another phalanx

as physics engines. Therefore, a number of potential methods have been proposed in the past. Additionally, as part of the grasp state estimation, the employed collision algorithm also has to be able to provide the location of the contact points, c[i] o and [i] c f , as they were previously introduced. The Gilbert–Johnson–Keerthi (GJK) distance algorithm is a widely used method for determining the collision state of two bodies [6]. It is popular for its computational efficiency and stability. Furthermore, it is able to compute the euclidean distance between two non-intersecting bodies. The GJK algorithm, as presented in [7], was the basis for the method that was used in this work. However, the algorithm in [7] is only able to provide upper and lower bounds for the penetration depth of two colliding bodies. Therefore, the method had to be extended to produce the exact penetration depth and associated contact points. The GJK algorithm iteratively processes two convex sets, determining their minimum distance. For the grasp state estimation, these sets consist of polygon meshes representing the geometry of the finger links and the object. The GJK algorithm reduces the complexity of computing the distance between two shapes to finding the distance between one shape and the origin. This single set, called the configuration space obstacle (CSO), is constructed by computing the Minkowski difference of the original two sets. Figure 4.11 illustrates the mapping, both in the intersecting and

74

4 Grasp State Estimation

(a)

(b)

Fig. 4.11 A−B describes the Minkowski difference of two convex sets, A and B, such as the polygon meshes of two bodies. The location of the origin point w.r.t. this set relates to the collision state of the two geometries. a If the bodies A and B are not colliding, the origin lies outside of A−B. b If the bodies A and B are colliding, the origin lies inside of A−B

the non-intersecting case. If the two bodies are intersecting, the origin lies inside of the CSO, otherwise it lies outside. In order to determine the collision state, the goal of the GJK algorithm is to iteratively construct a simplex of vertices inside of the CSO, which contains the origin. In 2D space, this will require a triangle of vertices. In contrast, for 3D problems, the simplex will be a tetrahedron. If it is not possible to enclose the origin point, the two sets are not in collision. Once the collision state has been determined, the simplex is further developed in order to find the distance or penetration depth. For 3D geometries this consists of finding the vertice, edge or face of the CSO that is closest to the origin. For the non-intersecting case, this was described in [7] and is shown in Figure 4.12. For the intersecting case, the algorithm had to be extended as part of this work in order to provide this capability. It involves efficiently growing the simplex inside of the CSO until the closest surface element to the origin is identified. Figure 4.13 depicts the process.

(a)

(b)

(c)

Fig. 4.12 In order to determine the collision state, the goal of the GJK algorithm is to grow a simplex, which contains the origin. In the non-intersecting case, this will not be possible, since the origin lies outside of the geometry. a Starting from a random vertex (v o ), the simplex is grown towards the origin point, selecting the vertex (v 1 ), which lies furthest in its direction. b If the simplex can no longer be grown beyond the origin, before enclosing it, it must lie outside of the geometry. c By further evolving the simplex towards the origin, eventually, the closest point on the surface (c) can be identified

4.4 State Estimation from Finger Position Measurements

(a)

(b)

75

(c)

Fig. 4.13 If the two bodies are colliding, the origin lies inside of the geometry. Therefore, it has to be possible to enclose it in a simplex of vertices. For the purpose of this work, this process was extended in order to identify the exact position of the deepest penetration. a As before, the simplex is grown in the direction of the origin, beginning from a randomly selected vertex (v o ). b In the colliding case, the origin will eventually be contained in a simplex (v o , v 1 , v 2 ). c By further growing the simplex towards the surface, it is possible to identify the closest element (i.e. face, edge or vertex) to the origin, as well as the exact point on this element (c)

The vertice, edge or face of the CSO that is closest to the origin is subsequently mapped to the corresponding simplices in the original two body geometries. Finally, [i] the precise contact points on the object and finger link, c[i] o and c f , can be calculated. As noted, the GJK algorithm is only applicable for convex bodies. While this restriction is generally acceptable for finger links, requiring the object to be convex considerable limits the applicability of the method. However, non-convex geometries can be supported by decomposed their meshes in convex sub-components. Subsequently, the distance calculation is computed for each component. This allows to identify the component with the smallest distance or deepest penetration, which is equivalent to the distance of the complete object.

4.4 State Estimation from Finger Position Measurements Section 4.2 introduced the extended Kalman filter as a method for probabilistic state estimation. Here, the design of such a filter in the context of grasping is presented. The minimum amount of information that is necessary for the grasp state estimation are finger position measurements. This section defines the grasp state and proposes motion and measurement models for the integration of this data in the filter. Moreover, it presents extensions, which further expand the applicability of the system. Building on this framework, the subsequent sections describes the data fusion with additional sensing modalities, in particular vision, in order to further improve the quality of the estimation.

76

4 Grasp State Estimation

4.4.1 Grasp State Definition The EKF represents the belief distribution over the estimated state by its mean and covariance. In the context of the grasp state estimation, at time t, these shall be denoted yt and P t , respectively. The estimated state of the grasp is defined by two components. The object pose, x ∈ R6 , comprises of the translational position, as well as three Euler angles. The second component is the vector of the joint position biases, q˜ t ∈ Rm , which describe the estimated error on the joint position measurements. The complete state vector, y ∈ R6+m , is expressed as follows: yt =

  xt q˜ t

(4.16)

Both the mean and covariance of the state at time t = 0 have to be provide in order to initialize the EKF. Typically, in robotics applications, the initial object pose, x 0 , can be obtained by a visual object detection system. Before the object is being grasped, it is not yet occluded by the hand. Therefore, traditional vision-based systems can be applied to determine the initial object location. In this case, the initial covariance would be set to account for the expected inaccuracy of the used system. The initial joint biases, q˜ 0 , are typically set to zero if there is no additional information available about the errors on the joint position measurements. The corresponding covariance is initialized according to the estimated inaccuracy of the measurements, which may vary depending on the robotic hand or even individual joints.

4.4.2 Motion Model In each iteration, the first step of the EKF is the prediction. Here, the control vector, ut , is applied to the previously estimated mean, yt−1 , and covariance, P t−1 , in order to obtain the predicted state, yt and P t . As elaborated in Chap. 3, the displacement of the object can be predicted from the motion of the fingers if the contact points are known. Therefore, the vector of joint velocities, q˙ t , is considered as the control input: ut = q˙ t

(4.17)

If the joint velocities cannot be measured directly, they are calculated from the discrete difference of q t and q t−1 : q˙ t =

q t − q t−1 t

(4.18)

4.4 State Estimation from Finger Position Measurements

77

The motion model f describes how to obtain yt from yt−1 and ut : yt = f ( yt−1 , ut )

(4.19)

Chapter 3 introduced the hand Jacobian matrix, J, for hard-finger contacts, which relates joint velocities to contact point velocities. Using this mapping, for each of the detected contact points, an approximate displacement can be predicted: ct = J q˙ t t

(4.20)

with t being the time between two steps. Similarly, the grasp matrix, G, relates the contact point motion to the object twist. However, not all contacts should be considered in this calculation. Section 4.3 introduced two types of contacts, collision contacts and sensed contacts. While sensed contacts are indicative of physically existing contacts between the hand and the object, collision contacts merely identify inconsistencies between the estimated positions of the object and the fingers. Therefore, as part of the prediction steps, collision contacts should only be considered if the respective finger moves deeper into the object. If the finger moves away from the object, tendentially resolving the collision, no object motion should be predicted. Figures 4.14 and 4.15 illustrates both cases for the two types of contacts.

(a)

(b)

(c)

Fig. 4.14 When predicting the motion of the object from the displacement of a collision contact, this contact should only be considered if it moves towards the object, thereby further penetrating the object. In the opposite direction, this type of contact should be neglected. a c[1] marks a collision contact that has been identified between the object and a finger link. b If the finger moves towards the object, predicting a corresponding object displacements prevents the two bodies from penetrating. c However, if the finger is removed from the object, no displacement is predicted, since the two bodies may not be physically in contact

78

(a)

4 Grasp State Estimation

(c)

(b)

Fig. 4.15 A sensed contact represents a point, where the object and a finger link are known to be touching. Therefore, an object displacement, which corresponds to the motion of the contact point, should be predicted in all cases, both towards and away from the object. a Here, c[1] marks a sensed contact that has been measured between the object and a finger link. b Similar to the collision contact, moving the finger towards the object results in a predicted object motion. c However, in contrast to before, when moving the finger away, the object pose is predicted to maintain the contact, since the two bodies are known to be in contact

Analytically, these cases can be distinguished by the direction of the contact point motion w.r.t. the surface normal direction of the finger link: α [i] = c[i]T n[i] t

(4.21)

where n[i] is the normal direction of the surface of the finger links at the position of contact i. According to this definition, only collision contacts, for which α [i] > 0 should be considered in the prediction. The number of all such collision contacts, as well as all sensed contacts, shall be ˆ Similarly, the corresponding hand Jacobian and grasp matrices shall be denoted n. ˆ ˆ ∈ R6×3nˆ , respectively. and G Jˆ ∈ R3n×m Finally, the complete relation between the joint velocities and the change in the object pose can be expressed: ˆ T + Jˆ q˙ t x˙ t = W G

(4.22)

The second component of the state vector, the joint position biases, are not adjusted during the prediction step. Thus, the full motion model can be expressed as follows: 

 T+ ˆ ˆ f ( yt−1 , ut ) = yt−1 + W Gm×1 J ut t 0

(4.23)

4.4 State Estimation from Finger Position Measurements

79

The predicted state covariance, P, is computed according to Eq. (4.9): P t = F t−1 P t−1 F Tt−1 + Rt

(4.24)

Matrix F t ∈ R6+m×6+m is the partial derivative of f w.r.t. y:  ∂ f  Ft = ∂ y  yt−1 ,ut

(4.25)

It can be obtained analytically. However, because of the complexity of the full expression, the utilization of computer-aided symbolic differentiation techniques is advisable. Using the hand Jacobian and grasp matrices for the prediction of the object motion implicitly makes the assumption that the grasp configuration, i.e. the positions of the contacts on the surface of the finger links and the object, is constant. However, because of the rolling and sliding of fingers on the object, this is inaccurate. Moreover, the calculation of discrete differences introduces approximation errors, which compound over time. Therefore, the size of the motion disturbance, Rt ∈ R6+m×6+m , has to be chosen appropriately to represent the growing uncertainty that arises from the prediction. Inconsistencies in the grasp state, such as collisions between finger links and the object, are not addressed by the prediction step. They are corrected in the update step, which is described next.

4.4.3 Measurement Model The second step of the EKF is the update, in which a set of measurements z t is incorporated in order to obtain a corrected state estimation. For the grasp state estimation, the goal of the update step is to resolve incorrect alignments between the fingers and the object, as depicted in Fig. 4.16. On the one hand, these inconsistencies may arise from errors in the joint position measurements or initial object pose. On the other hand, simplifications in the motion model may cause finger links to penetrate of separate from the object surface by neglecting the rolling and slipping of contacts. The measurement model shall describe the constraints that are inherent for contacts. The penetration of a finger link and the object is physically impossible if both are rigid. Moreover, if there is a sensed contact between a finger link and the object, the two bodies have to align. Analytically, this constraint can be expressed using the contact points, c[i] o and [i] c f , on the surfaces of the object and the link, respectively. If the two bodies are in fact aligned, the difference between the two points has to be zero: [i] 03×1 = c[i] f − co

(4.26)

80

4 Grasp State Estimation

(a)

(b)

Fig. 4.16 Illustration of the two types of contacts in the context of the grasp state estimation. a The update step resolves the identified collision contact by minimizing the distance between co[1] and [1] c[1] f . b For sensed contacts, both the collision of contact c , as well as the separation of contact [2] c are considered in the update [i] Otherwise, the difference vector c[i] f − co describes the smallest possible displacement that may either resolve a collision or bring the two bodies in contact. This type of constraint can be expressed as a perfect measurement in the EKF [8]. For a contact of index i, the measurement model shall describe the difference between [i] c[i] o and c f . If the finger link and the object are indeed in contact, the measurable 3×1 distance has to be z [i] . This formulations can be expressed for all n contact t =0 constraints, which yields the complete measurement model at time t:

h( yt ) = c f − co

(4.27)

as well as the full measurement vector: z t = 03n×1

(4.28)

While the vector h( yt ) ∈ R3n can be easily obtained based on the contact point locations, the calculation of the update step also requires the partial derivative of h( yt ): Ht =

 ∂ h  ∂ y  yt

(4.29)

Since c f and co are computed by the iterative GJK algorithm, this cannot be analytically formulated. However, the partial derivative of the contact positions in h( yt ) w.r.t. the state can be expressed using the hand Jacobian and grasp matrices. The motion of the contact points on the object can be related to changes in the object pose as follows: ∂ co = G T W −1 ∂x

(4.30)

4.4 State Estimation from Finger Position Measurements

81

Similarly, adjusting the joint position biases in the state vector affects the position of the contact points on the finger links, as described by the hand Jacobian: ∂cf = J ∂ q˜

(4.31)

The complete matrix H t ∈ R3n×6+m is obtained by stacking the two partial derivatives:

H t = −G T W −1 J

(4.32)

Subsequently, the Kalman gain, K t ∈ R6+m×3n , can be computed, as described in Eq. (4.14): K t = P t H Tt (H t P t H Tt + Q t )

(4.33)

In principle, the measurement disturbance Q t ∈ R3n×3n could be set to zero since the perfect measurement of z t = 03n×1 is not subject to any noise. However, with regard to the numerical stability of the filter, it is preferable to choose a small, nonzero value. Finally, the mean and covariance of the grasp state can be update: yt = yt + K t (z t − h( yt ))

(4.34)

P t = (I − K t H t ) P t

(4.35)

4.4.4 Extensions So far, this section presented the two main components of the grasp state estimation from joint position measurements. First, the motion model allows to predict the object motion from the joint velocities. Second, the measurement model helps to align the fingers on the object. Building on these core capabilities, the filter can be extended by a number of additional components, which improve the estimation quality under practical considerations.

Palm Prediction Previously, only the hand-object system was considered in the grasp state estimation. However, practically, robotic hands are usually not fixed at a static location. Instead, the palm of the hand is attached to a robotic arm, which can freely position the manipulator. This aspect is also relevant to the in-hand localization problem. When describing the object pose in a fixed inertial coordinate system, moving the palm

82

(a)

4 Grasp State Estimation

(b)

(c)

Fig. 4.17 Moving the palm leads to significant misalignments in the estimated grasps state. However, this is avoided by including the palm motion in the prediction of the object pose. a Estimation of the grasp state before the palm motion. b Moving the palm results in large object motions, which are not considered in the pose estimation. c Including the palm motion in the motion model allows to predict the corresponding object displacement

will also change the pose of a grasped object. W.r.t. the estimation filter, moving the palm will create a misalignment between the hand and the object, which will need to be resolved by the measurement update. However, depending on the velocity, with which the arm is moved, this misalignment may be rather large, even between two time steps. In turn, this may cause considerable estimation errors. Figure 4.17 illustrates this problem, showing the displacement of an object caused by the motion of the hand. Examining the scenario in Fig. 4.17b, it can be observed that the resulting contact point misalignment is equivalent to the displacement of the palm. This means, the motion of the contact points could have been predicted from the palm movement, instead of subsequently estimating a correction. Therefore, the motion of the palm shall be considered in the prediction step as an extension to the motion model. The pose of the palm at time step t shall be denoted x palm,t and is calculated using the forward kinematics of the robotic arm. Its velocity, x˙ palm,t , is measured or computed from discrete differences. It is related to the twist of the palm, ν palm,t , using the mapping matrix W palm . The relation between the palm twist and the contact point velocities can be expressed using a grasp matrix, similar as for the object. However, this grasp matrix has to be formulated w.r.t. x palm instead of the object pose, x. Therefore, it shall be denoted G palm . The change in the contact positions from the palm motion, c palm,t , can thus be expressed as follows: ˙ palm,t t c palm,t = G Tpalm W −1 palm x

(4.36)

4.4 State Estimation from Finger Position Measurements

83

This motion contributes to the contact point displacement from the finger joints, ct , as expressed in Eq. (4.20). The combined ct can be written as:

ct = J G Tpalm W −1 palm ut t

(4.37)

with:  ut =

q˙ t

x˙ palm,t

 (4.38)

being the extended control vector. All the subsequent operations of the prediction, including the contact point selection, are unchanged from the previous formulations in this section.

Considering Additional Bodies In most manipulation scenarios, the grasped object is not the only relevant body in the environment of the robot. Least of all, the object is typically placed on a surface, such as a table, before the interaction. W.r.t. the object localization, this provides an additional constraint, since the object has to be above the surface. Therefore, the estimated object pose can be improved by considering additional geometric bodies, as shown in Fig. 4.18.

(a)

(b)

Fig. 4.18 Bodies in the environment of the manipulator can be considered as additional kinematic constraints in order to inform the estimation. a Similar to the fingers, the collision between the object and an additional body can be identified and described by the corresponding points on the surfaces. b By considering this additional contact in the update step of the EKF, the object pose is corrected to resolve the collision

84

4 Grasp State Estimation

A body in the environment can be taken into account similarly to the finger links. Using the GJK algorithm, the geometries of the body and the estimated object can be checked for collisions. If they are indeed colliding, an additional collision contact is added to the existing list of contacts. The position of this contact on the object and on the respective body are appended to co and c f , creating the extended vectors cˆ o ∈ R3nˆ and cˆ f ∈ R3nˆ . Here, nˆ denotes the size of the extended list of contacts: nˆ = n + 1

(4.39)

By considering this additional contact point in the update step of the EKF, the estimated object pose will be corrected, such that the identified collision is resolved. All the necessary calculations are equivalent to the previous descriptions, using cˆ o and cˆ f , instead of co and c f . Merely of note are the modifications to the grasp and Jacobian matrices in Eqs. (4.30) and (4.31). While the grasp matrix is extended to include the relation between the motion of the object and the added contact point, three rows consisting of zeros are added to the hand Jacobian matrix, since non of the joints are affecting the pose of the body in the environment. Consequently, the joint position biases will not be directly influenced by the measurement update from this type of contact. Of course, more than one additional body can be considered using this approach. In fact, even contacts between the object and the palm of the hand can be incorporated in this way.

Joint Stabilization As formulated, the in-hand localization method will estimate a consistent hand-object state, which considers the contact constraints. The EKF corrects both the object pose and the joint position biases by converging to an estimate, where the fingers are aligned on the surface of the object. However, typically, there is not just one unique solution to this problem. Figure 4.19a shows a finger at two different configurations, both of which satisfy the contact constraints. Over extended periods of time, this freedom in the solution space may cause the joint position biases to drift. Eventually, this would also affect the estimated object pose, which is correlated with the joint positions. To prevent the drift of the joint position biases, an additional extension to the filter is proposed. Essentially, the unconstrained joint position biases allow the fingers to freely move on the surface of the object, when estimating their position. A preferable approach is to estimate the smallest possible bias, which satisfies the constraint. An intuitive analog is a virtual spring between the measured and the estimated finger position, which allows to move the fingers, i.e. to resolve a collision with the object, but which continuously pulls the estimation towards the original, measured value. As part of the EKF, this can be realized by an extension to the measurement model of the update step. The additional component, hq ( yt ) ∈ Rm , shall be equivalent to the vector of the joint position biases, which is part of the state:

4.4 State Estimation from Finger Position Measurements

(a)

85

(b)

Fig. 4.19 The estimation of the joint biases does not constrain the finger positions to one specific configuration, allowing them to drift over time. This can be prevented by introducing an additional joint stabilization component in the update step, which minimizes the biases. a The joint position biases may take on different values, which increasingly distort the estimated finger configuration. b The joint stabilization constrains the biases by keeping the estimated joint configuration as close as possible to the measured one (transparent)

hq ( yt ) = q˜ t

(4.40)

The corresponding measurement vector, z q,t ∈ Rm , shall be zero in order to minimize the biases: z q,t = 0m×1

(4.41)

Partially deriving hq,t yields the matrix H q,t ∈ Rm×6+m :

H q,t = 0m×6 I m×m

(4.42)

with I being the identity matrix. Finally, the measurement disturbance, Q q,t ∈ Rm×m , has to be specified: Q q,t =

1 m×m I kq

(4.43)

where the scalar factor kq corresponds to the stiffness of the virtual springs. This additional measurement can be integrated in the existing operations, extending the respective vectors and matrices. Alternatively, it can be processed in a separate measurement update step.

86

4 Grasp State Estimation

4.5 Data Fusion with Fiducial Markers The previous section presented a probabilistic framework for the estimation of the grasp state, based on an extended Kalman filter. The proposed motion and measurement models incorporate joint position measurements in order to kinematically constrain the object pose and predict object displacements from the motion of the joints. Both models rely on the contact points between the object and the hand, which are identified using joint torque and position measurements or through tactile sensing. While these measurements allow to reduce the state space of the estimation, they are usually not sufficient to completely constrain the object pose. For example, the position of a grasped bottle along its vertical axis cannot be determined from finger measurements alone. However, in such cases, even very sparse visual information can significantly contribute to the estimation. Being able to identify a corner of the bottle, combined with the finger measurements, allows to fully locate the object. Visual information may be represented differently, depending on how they are obtained. Correspondingly, the means of integrating the information in the grasp state estimation varies. The following sections will present three different sources for vision-based data, as well as the respective data fusion approaches. The incorporation of fiducial markers is introduced in this section.

4.5.1 AprilTag Fiducial markers are objects, which are added to a scene in order to provide artificial visual features. Typically, these artificial features are easier to extract and associate than naturally-occurring features, such as corners or edges. This makes fiducial markers a source of robust and reliable visual data. For this work, the AprilTag fiducial system was utilized [9]. AprilTags are one of the most popular designs of fiducial markers, which is used in the context of robotics. Figure 4.20 shows some examples. Compared to other solutions, AprilTags are efficient to detect, provide high accuracy and are robust to partial occlusions. The available software implementation allows the easily integrate the system in custom applications. When localizing an AprilTag in an image, the processing determines the 2D image coordinates of the four corners of the tag, as well as an identifier. Knowing the size of the AprilTag and using the pinhole camera model, it is possible to calculate the 3D positions of the corners w.r.t. a camera-fixed coordinate system. Subsequently, the 6 DoF pose of the tag can be computed. Figure 4.21 illustrates these quantities. While for most applications only the final 6 DoF pose of the AprilTag is relevant, when integrating the output in a probabilistic filter, fusing the raw 2D corner coordinates is advantageous. Ref. [10] introduced the concept of loose, tight and ultra-tight coupling in the context of visual state estimation. Here, loose coupling describes using the 6 DoF pose of the marker, tight coupling represents fusing the

4.5 Data Fusion with Fiducial Markers

87

Fig. 4.20 AprilTags, which are rigidly attached to objects, allow to easily extract artificial features from camera images

(a)

(b)

Fig. 4.21 From a camera image, the coordinates of the corners of an AprilTag can be extracted. Correspondingly, based on the current estimate of the object pose, the predicted position of the corners can be calculated. The difference between these two sets of positions describes the misalignment of the pose estimation, which is subsequently corrected in the EKF update. a Illustration of the camera view of the AprilTag and the extracted corner positions. b Prediction of the corner positions, based on the current estimate of the object pose

3D corner positions, and ultra-tight coupling means that the 2D image coordinates of the corners are transferred to the filter. In the case of the AprilTag fusion, a tighter coupling increases the accuracy of the estimation because more information about the uncertainty of the measurement can be retained. When determining the pose of an AprilTag, not all degrees of freedom are localized to the same accuracy. As a result of the measurement principle, the position of the AprilTag in the image plane

88

4 Grasp State Estimation

will be more precise than the distance to the camera. In the case of the loose coupling, in which the 6 DoF pose of the tag is fused, this relation will be lost. On the other end of the spectrum, for the ultra-tight coupling, describing the uncertainty in image coordinates will allow to propagate it correctly for each DoF in the EKF. Therefore, the measurement of an AprilTag at time t will be integrated as four pairs [l] [l] T of image coordinates, p[l] t = ( pu,t , pv,t ) , with l = {1, 2, 3, 4}, and shall be denoted 8 z p,t ∈ R :

T z p,t = p[1]T p[2]T p[3]T p[4]T t t t t

(4.44)

4.5.2 Measurement Model In order to inform the grasp state estimation, an AprilTag is rigidly attached to the object, as depicted in Fig. 4.20. Consequently, the pose of the tag, described in object coordinates, is constant and can be measured. The fixed transformation of an AprilTag-fixed frame, {A}, w.r.t. the object frame, {O}, shall be denoted o T a . Given the size of the AprilTag, da , the position of the four corners w.r.t. {A} can be expressed as follows: ⎛



−0.5da a x [1] = ⎝ 0.5d ⎠ a p







0.5da a x [2] = ⎝0.5d ⎠ a p

0



0.5da a x [3] = ⎝−0.5d ⎠ a p

0





−0.5da a x [4] = ⎝−0.5d ⎠ a p

0

0

(4.45) Using o T a , the position of the corners in object coordinates is obtained: o

x [l] p 1

 = Ta o

a

x [l] p 1

 (4.46)

The vector of all four corner positions shall be denoted o x p ∈ R12 : o

xp =

o

o [2]T o [3]T o [4]T x [1]T xp xp xp p

T

(4.47)

Knowing the location of the corners w.r.t. the object, the estimated object pose can be used to predict their image coordinates as well. Comparing the vector of predicted corner coordinates, p¯ t , with the measurements, z p,t , allows to calculate a correction, which is fused in the update step of the EKF to improve the estimation. The measurement model, h p ( yt ) ∈ R8 , describes the relation between the object pose and the image coordinates of the corners: h p ( yt ) = p¯ t

(4.48)

4.5 Data Fusion with Fiducial Markers

89

A point in 3D space can be projected onto a 2D image plane using the pinhole camera model. The image coordinates of an AprilTag corner, p¯ [l] t , are obtained by , described in the camera-fixed frame, {C}: projecting its position, c x [l] p,t  s

p¯ t[l] 1

 =C

c

x [l] p,t 1

 (4.49)

where s is a scaling factor and C ∈ R3×4 is the camera matrix: ⎛

⎞ f u 0 cu 0 C = ⎝ 0 f v cv 0⎠ 0 0 1 0

(4.50)

Here, f u and f v describe the focal lengths of the camera, and cu and cv the coordinates of the optical center of the image. When describing the corner position w.r.t. the object frame, {O}, the projection can be expressed as follows:  s

p¯ t[l] 1

 =

C T −1 c T o,t

o

x [l] p 1

 (4.51)

with T c being the transformation between the inertial frame and the camera fixed frame, {C}. The object transformation, T o,t , is calculated from the current estimate of the pose, x t . Matrix H p,t ∈ R8×6+m is obtained by partially deriving p¯ [l] t w.r.t. yt , which can be done analytically: H p,t

  ∂ p¯ [l] t  =  ∂ yt 

(4.52) yt

The measurement disturbance, Q p,t ∈ R8×8 , represents the inaccuracy of the corner coordinates in number of pixels. A small integer number appropriately parameterizes the precision of the AprilTag system.

4.5.3 Camera Localization The previous formulation of the measurement model for the incorporation of visual features implicitly made the assumption that T c , the transformation of the camera, is known. However, this is not necessarily true for every robotic system. Moreover, the camera pose would need to be quite accurate, since even small errors can have a large impact on the estimated object pose. This is not the case for the head-mounted camera of David. The orientation of its head is controlled by an elastic continuum

90

4 Grasp State Estimation

(a)

(b)

Fig. 4.22 The precise location of the head-mounted camera of David cannot be determined kinematically because of the continuum-elastic mechanism that is used to actuate its neck. a The 6 DoF pose of the head is determined by the length of four tendons and a continuum-elastic element [11]. b The head house an Intel RealSense D435 camera, which provides RGB and depth images [12]

mechanism, which is shown in Fig. 4.22 [11]. While the resulting pose of the head can be estimated from the tendon-driven actuation, the provided precision is not sufficient to accurately locate manipulated objects inside of the hand of the robot. Instead of relying on the availability of the camera pose, it shall be estimated from additional visual information. Since the goal of the in-hand localization method is to determine the hand-object state, the estimation would most benefit from visually localizing the hand and inferring the camera pose from it. Similar to the object pose, AprilTags provide an accessible solution to determine the location of the hand w.r.t. the camera. AprilTags can be mounted to the robotic hand, as illustrated in Fig. 4.23. Another advantage of visually localizing the hand is that errors in the kinematics of the arm will be compensated as well.

(a)

(b)

Fig. 4.23 AprilTags, which are attached to the upper and lower side of the David hand, allow to localize the camera w.r.t. the palm. a Upper-side AprilTag attached to the David hand. b Lower-side AprilTag attached to the David hand

4.5 Data Fusion with Fiducial Markers

91

The camera localization is fully integrated in the grasp state estimation to ensure the consistency of the hand-object state. The estimated camera pose, x c,t ∈ R6 , extends the state vector, yt ∈ R12+m , as it was introduces in Eq. (4.16): ⎛

⎞ xt yt = ⎝ q˜ t ⎠ x c,t

(4.53)

For the initialization of the EKF, an initial camera pose, x c,0 , has to be available. The initial covariance of this component is set according to the quality of its source. The addition of a palm-mounted AprilTag allows to measure the image coordinates of its corners, concatenated in the vector z palm,t ∈ R8 : T  [2]T [3]T [4]T z palm,t = p[1]T p p p palm,t palm,t palm,t palm,t

(4.54)

The correction from this measurement is similar to the fusion of the objectmounted AprilTag. Measuring the constant transformation of the palm tag, p T a , w.r.t. a palm-fixed frame, {P}, allows to calculate the position of the corners, p x p , in palm coordinates. These can be projected to the corresponding image coordinates using the pinhole camera model of Eq. (4.51):  s

p¯ [l] palm,t 1



= C T −1 c,t T p

p

x [l] p 1

 (4.55)

where T p is the transformation of the palm frame, {P}, w.r.t. the inertial frame. It is calculated using the forward kinematics of the arm. Vector p¯ palm,t represents the measurement model, h palm,t ( yt ) ∈ R8 , of this input: h palm ( yt ) = p¯ palm,t

(4.56)

In Eq. (4.51), the camera transformation, T c , was understood to be fixed. However, by including the camera pose in the estimation, the camera transformation becomes dependent on the grasp state as well. Subsequently, the derivative of T c w.r.t. x c,t has to be considered in the calculation of H palm,t ∈ R8×12+m .

4.5.4 Target Tracking Integrating the localization of the camera in the estimation of the grasp state allows to accurately determine the relative pose of the object w.r.t. the hand, since the fiducial markers on both the object and the palm are considered in a consistent manner. This

92

4 Grasp State Estimation

capability of the grasp state estimation can be extended even further by also including the localization of a target object. In many manipulation scenarios, the grasped object has to be placed w.r.t. some element in the environment of the robot. For instance, solving the stacking game, which was introduced in Chap. 1, requires the precise positioning of the game pieces above a wooden board. Because of inaccuracies in the kinematics of the robot, this is only feasible by localizing the target relative to the grasped object. Including the target pose in the grasp state estimation allows to determine a consistent description of the complete hand-object-target system. Similar to the camera localization, the state vector is extended to include the estimated target pose, x target,t ∈ R6 : ⎞ xt ⎜ q˜ t ⎟ ⎟ yt = ⎜ ⎝ x c,t ⎠ x target,t ⎛

(4.57)

One of the means of determining the pose of the target is the fusion of artificial features from an AprilTag, which has been rigidly attached to it. As before, the image 8 coordinates of the four corners of the tag are used, denoted by the vector p[l] target ∈ R . The fusion of these target measurements in the EKF is equivalent to the incorporation of an object-mounted AprilTag, substituting the object transformation, T o,t , with the target transformation, T target,t , in the corresponding formulations.

4.6 Data Fusion with Contour Features The previous section presented the use of fiducial markers as a source of visual information that is fused with the finger measurements in order to improve the grasp state estimation. Artificial markers allow the reliable and precise extraction of features from an image. However, they require the object to be physically modified before they can be used for the localization. This is not desirable or even possible in all applications. For example, small objects do not offer the space to attach a marker. Furthermore, the fiducial has to be in view of the camera. It cannot be occluded by the hand or facing in the opposite direction. Adding multiple markers to an object to alleviate this problem only increases the intrusiveness of the fiducial system. Under these circumstances, a marker-less solution is preferable. While the problem of extracting naturally occurring image features, such as corners and edges, has been the focus of extensive research, the proposed method has been specifically developed to be used as part of the in-hand localization framework. This means, the conditions and requirements for this feature extraction method differ from classical vision-only formulations. Since the image features are fused with the grasp state estimation from finger measurements, the vision system is not required to fully constrain the object pose by itself. Combined with the geometric constraints, even a

4.6 Data Fusion with Contour Features

93

single image feature, such as the corner of an object, may be sufficient to localize the object. This section presents the developed feature extraction method. Subsequently, the fusion of the image features in the EKF is described.

4.6.1 Feature Extraction The goal of the proposed method is to extract naturally occurring object features from an image, which can be fused like the artificial features of an AprilTag. Similar to the AprilTag system, the presented algorithm shall rely on a monocular RGB or grayscale image. Furthermore, only the object geometry shall be required for the application of the method. The process of integrating natural features into the grasp state estimation is similar to the incorporation of the AprilTag markers. First, the image coordinates of the features have to be extracted. Second, the expected position of the features in the image based on the current estimation of the object pose has to be determined. In the case of the AprilTag system, the second step is trivial, since the artificial features are pre-determined, i.e. the corners of the tag. However, when automatically extracting the features from the image, the association with points on the object is not necessarily evident. To overcome this challenge, the proposed method not only extracts features from the camera image, but also from a virtual image, which represents the current state of the estimation. Features that are extracted from the virtual image can be easily associated with points on the object, since the pose of the object in this image is known. By extracting characteristic features on the contour of the object in both the real and virtual image, it is possible to match the same feature in both images. The difference in the pixel coordinates of a pair of image features describes the misalignment of the object. The extraction and matching of the two sets of image features is realized in a series of image processing steps, which are illustrated in Fig. 4.24: (a) Using the geometric description of the object and the current estimate of the object and camera pose, a virtual depth image of the scene is rendered. The real image is provided by the camera sensor. (b) The Canny edge detector is applied to both the real and the virtual image to produce edge images [13]. If necessary, the real image from the camera is converted from RGB to grayscale beforehand. Contour vectors, describing the edges in both images, are extracted based on the algorithm in [14]. Edges with short lengths are removed. (c) Prominent corners are detected in both images using the method described in [15]. For each corner in the virtual image, the surrounding contour pieces are extracted.

94

4 Grasp State Estimation

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 4.24 Illustration of the image processing steps for the extraction and matching of the contour features. a Camera image and the rendered depth image based on the current estimate. b Edge images, generated using the Canny edge detector. c Extraction of prominent features. d Gradient images, which encode the direction of the edges. e Matching of contour pieces from the rendered image with the directed edges in the camera image. f Selection of good matches (red+green) and filtering of the final set (green) using RANSAC

4.6 Data Fusion with Contour Features

95

(d) The application of the Sobel operator to the edge images computes their gradients. The derivatives for both image axes are combined to generate a new image, where the pixel value describes the orientation of the edge at this point. (e) Each pair of corners in the real and virtual image, if they are closer than a set threshold, are checked for matches. For this, the similarity of the contour piece around the virtual corner and the pixel orientations around the real corner is calculated. If more than a set number of the contour points can be matched, the pair is selected. This procedure is similar to the computation of the generalized Hough transform, but for contour pieces instead of the complete object contour [16]. (f) To filter outliers from the set of matched pairs, a perspective transformation between the selected corners of the real image and the virtual image is computed using the RANSAC algorithm [17]. Subsequently, using the rendered depth image of the estimated object, the distance to the camera of each selected virtual corner is determined, as depicted in Fig. 4.25. In summary, the proposed image processioning pipeline extracts matching pairs of characteristic contour features from both a camera image and a rendered image, which represents the current estimation of the scene. Corresponding to the definition in the [l] 2 2 previous section, the points p[l] t ∈ R and pt ∈ R denote the image coordinates of the corner of a contour feature in the real image and in the virtual image, respectively. Stacking the positions of all L extracted features yields the vectors pt ∈ R2L and pt ∈ R2L :

T pt = p[1]T p[2]T · · · p[L]T t t t

T pt = p[1]T · · · p[L]T p[2]T t t t

Fig. 4.25 Extraction of the corresponding depth values from the rendered image

(4.58) (4.59)

96

4 Grasp State Estimation

Additionally, the image processing generates for each feature in the virtual image [l] ∈ R. The vector of all L depth the corresponding distance to the camera, ddepth,t L values shall be denoted d depth,t ∈ R : T  [1] [2] [L] d depth,t = ddepth,t ddepth,t · · · ddepth,t

(4.60)

4.6.2 Measurement Model The integration of the AprilTag detections in the grasp state estimation was based on measuring the image coordinates of the corners of the tag and comparing them to their expected positions, based on the estimated object pose. The marker-less vision pipeline was developed to allow for a similar integration. While the positions of the contour features in the camera image represent the measurement vector, z p,t ∈ R2L : z p,t = pt

(4.61)

the location of the points in the virtual image corresponds to the values of the measurement model, h p ( yt ) ∈ R2L , given the current state estimation: h p ( yt ) = p¯ t

(4.62)

However, the calculation of the update step of the EKF also requires the formulation of the derivative of the measurement model, H p,t ( yt ) ∈ R2L×6+m , in order to express the relation between the state vector and the measurement. For the AprilTag integration, this was realized by describing the projection of the corner positions in object coordinates onto the image plane. In the case of the contour features, the respective point on the object is not known beforehand. However, knowing the image coordinates and distance to the camera of a point allows to calculate its 3D position. Using the camera matrix C, a ray r can be obtained, which describes a 3D vector from the camera towards the image point: ⎛ [l] ⎞   p u,t r [l] ⎠ s = C −1 ⎝ p v,t 1 1

(4.63)

with s being a scaling factor. This ray represents the direction of a vector from the camera origin to the 3D point, described w.r.t. the camera frame, {C}. The magnitude of the vector does not correspond to the euclidean distance of the point to the camera.

4.7 Data Fusion with Visual Object Tracking

97

[l] However, this distance, ddepth,t , was previously determined from the rendered depth [l] image. Therefore, by scaling r ∈ R3 with ddepth,t , the position of the feature in camera coordinates is obtained: c [l] x p,t

[l] = ddepth,t

r |r|

(4.64)

Using the pose of the camera and the object, the position of the feature w.r.t. the object frame, {O}, can be expressed: o

x [l] p,t 1



= T −1 o,t T c,t

c

x [l] p,t 1

 (4.65)

The vector of all L feature positions shall be denoted o x p,t ∈ R3L : o

x p,t =

o

o [2]T x [1]T x p,t · · · o x [L]T p,t p,t

T

(4.66)

This vector represents the positions of the features on the object, similar to the positions of the four AprilTag corners, as described in Eq. (4.47). Therefore, determining o x p,t allows to fuse the extracted contour features in exactly the same way as the AprilTag features. Besides following the same formulations as in the previous section, this means that artificial features and natural features can be incorporated equivalently and even simultaneously within the proposed framework, using a consistent, probabilistic description. Moreover, the described process may also be used to extract and fuse features of a target object, thereby providing a marker-less method for its localization. Similar to the pose estimation of the object, this requires the availability of the target geometry.

4.7 Data Fusion with Visual Object Tracking The image processing method that was presented in the previous section allowed the extraction of marker-less, natural object features that can be fused in the grasp state estimation in the same way as the artificial features of a fiducial marker. However, the proposed algorithm only utilizes characteristic contour features of the object, neglecting other visual information, which could additionally inform the localization. For example, separating the object surface from the background, based on its color, may further constrain its position in the image. Beyond taking advantage of more of the monocular RGB data, additional optical modalities, such as depth sensing or stereo vision, have the potential to improve the pose estimation even further. These considerations motivated the work on a novel visual object tracker. The development of this method was carried out as part of a master’s thesis [18], under the supervision of the author of this manuscript. Manuel Stoiber, the student who worked on this topic, was tasked with integrating both RGB and depth information into an

98

4 Grasp State Estimation

object tracking system, which complements the grasp state estimation. This involved estimating the object pose, as well as its uncertainty, to allow for the probabilistic fusion with the EKF. Furthermore, the method would need to be robust to significant occlusions, since its main application is the use in the context of grasping and in-hand manipulation. This section will briefly introduce the developed visual object tracker, as well as describe how the output of the tracker is integrated in the grasp state estimation. While, the first part was the content of the master’s thesis of Manuel Stoiber, the latter part was implemented in the context of this work.

4.7.1 Multi-Modality Visual Object Tracking Utilizing measurements from an RGB-D camera, the proposed method incorporates both a depth-based and a region-based modality into a common probabilistic framework, which is able to provide the estimated object pose and uncertainty. While the integration of the depth measurements is based on the Iterative Closest Point (ICP) algorithm, RGB information are processed using a novel sparse contour model. By considering contour information only along a limited number of correspondence lines, the computational performance is significantly improved compared to other region-based methods. As before, the visual object tracker only relies on the availability of the geometry of the object. Additionally, information about the position of the hand is used to mask occluded image regions, further improving the robustness of the system. As illustrate in Fig. 4.26, the purpose of the ICP algorithm is to minimize the difference between two point clouds, one being generate from the measurements of the depth camera, and the other representing the object at the estimated pose. An error metric describes the misalignment of a set of correspondences between the point clouds. For the proposed object tracker, the error metric is modified to represent a Gaussian distribution to be compatible with the probabilistic formulation of the overall framework. The integration of the RGB information is realized by a novel region-based modality [19]. Based on the PWP3D tracker, the method uses color histograms in order to segment the object from its surrounding, assigning a foreground and background probability to each pixel. Using this probability distribution, the joint probability that the estimated object pose matches the image can be described. However, instead of calculating this probability over the entire image, the proposed implementation computes the metric only along a limit number of correspondence lines perpendicular to the contour of the object. The resulting model is proven to have a Gaussian distribution as well. Figure 4.27 illustrates the background probability and the correspondence lines for a grasped object. Utilizing the probability density functions (PDF) from both modalities, NewtonRaphson optimization is used to determine the most likely object pose, as well as

4.7 Data Fusion with Visual Object Tracking

99

Point Cloud 1

Point Cloud 2

Fig. 4.26 The Iterative Closest Point (ICP) algorithm minimizes the distance between two point clouds, one of which represents the object model, and the other is obtained from depth measurements [18]

(a)

(b)

Fig. 4.27 Illustration of the visual object tracker, which was proposed in [18]. a Camera overlay of the tracking output. b The gray-level of the image represents the background probability of a pixel, clearly distinguishing the object from its surrounding. The colored lines are the correspondence rays, which are used to align the contour of the object

its covariance. The returned object pose, c x tr,t ∈ R6 , and covariance, c X tr,t ∈ R6×6 , are described w.r.t. the camera coordinate system, {C}.

4.7.2 Measurement Model Similar to the other sources of visual information, the estimated pose from the visual object tracker is fused in the update step of the EKF. While the integration of both

100

4 Grasp State Estimation

the AprilTag features and the contour features was realized as an ultra-tight coupling, fusing the full 6 DoF object pose from the tracker represents a loose coupling. However, the covariance of that pose, which is calculated from the probability distributions of the modalities, properly describes the effect of the uncertainty of the respective measurements. Therefore, the full probabilistic information is transferred, even for this loose coupling. The provided object pose and covariance represent the measurement vector, z tr,t ∈ R6 , and measurement disturbance matrix, Q tr,t ∈ R6×6 , for this modality: z tr,t = c x tr,t

(4.67)

Q tr,t = X tr,t

(4.68)

c

The measurement model, htr ( yt ) ∈ R6 , is expressed by the current estimation of the object pose, described w.r.t. the camera frame, {C}: htr ( yt ) = c x t

(4.69)

It is obtained from the current estimate of the pose of the object, x t , and the camera, x c,t , using the equivalent transformation notion for both poses: c

T o,t = T −1 c,t T o,t

(4.70)

Partially deriving htr,t ( yt ) w.r.t. the state vector yields the matrix H tr,t ∈ R6×6+m :  ∂ c x t  H tr = ∂ y  yt

(4.71)

Similar to the previous two fusion approaches, the application of the visual object tracker may be extended, utilizing it for the tracking of a target object as well. Assuming the availability of the respective geometry, a separate instance of the tracker is able to provide an estimate of the target pose and covariance, expressed in camera coordinates. The fusion of these quantities is analogous to the previous formulations for the grasped object.

4.8 Data Fusion Under Measurement Delays The various sensor modalities that were introduced in this chapter all contribute to a common probabilistic estimation framework. The formulation of specific motion and measurement models allows to fuse the different inputs appropriately in the EKF. However, the discussion of the integration of the different sensor modalities in the previous sections omitted one aspect of the data fusion, which is of great

4.8 Data Fusion Under Measurement Delays

(a)

(b)

101

(c)

Fig. 4.28 Measurements from vision-based modalities incur significant delays before they are incorporated in the EKF. If the object was moved in the meantime, fusing these measurements without taking the delay into account, will impair the estimation. However, by calculating a correction relative to the state at the time of measurement and subsequently applying it w.r.t. the object frame, this is avoided. a The red object represents a pose, which was estimated based on visual input. b Because of delays, the measurement is only available after the in-hand motion of the object occurred, subsequently distorting the estimation. c By calculating the correction w.r.t. the initial state and applying it in object coordinates, the pose is correctly updated

practical relevance. Some of the measurements are subject to considerable time delays before they can be incorporated in the grasp state estimation. In particular, visual information are not immediately available because of communication and processing times. Not considering these delays may significantly degrade the estimated grasp state. Figure 4.28 illustrates a scenario, in which a visual observation is made previous to the in-hand rotation of the object. If the fusion of this measurement was delayed until after the in-hand motion, without considering the time delay, the resulting update would worsen the estimation. The consideration of delays in the fusion of measurements in a Kalman filter has been the subject of previous research in the field [20]. Typically, the proposed methods rely on the validity of specific assumptions or require a trade-off between approximation errors and computational effort. If the output of the Kalman filter is not required in real-time, the simplest solution is to delay all measurements and control inputs by the same amount of time as the slowest measurement, thereby synchronizing them. Another optimal solution, if delays to the state estimation are not acceptable, is to recomputed all prediction and update steps in the delay period, as soon as the measurement becomes available. However, depending on the complexity of the motion and measurement models and the number of steps, this may be prohibitively expensive to compute. Similarly, for a small number of delayed steps, augmentations to the state vector have been proposed [21]. Other approaches focus on scenarios, where no additional measurements are fused in the delay period [22]. In [23], the authors present methods that are applicable if there is a fixed, predictable delay in the measurements. Finally, an approximate solution, which relies on extrapolating the measurement, is proposed in [24]. First, using the state at the time that

102

4 Grasp State Estimation

Fig. 4.29 Illustration of the sequence of measurements and EKF updates. Measurements from the joints (green) are immediately available and can be incorporated in the same step. However, visual measurements (red) incur significant and varying delays, even larger than the sampling rate of the image sensor

the measurement was made, a correction is calculated. Subsequently, the correction is extrapolated, according to the current state, and applied through the update step. In the case of the grasp state estimation, the data fusion problem is subject to a number of specific considerations. While there is a large delay for vision-based inputs, the finger torque and position measurements can be considered lag-free, arriving at a high rate. Consequently, in the time between two vision measurements, several prediction and update steps from finger inputs will be computed. While the prediction step is easy to compute, the update from the finger positions is computationally expensive. Therefore, it is infeasible to recalculate a large number of these updates steps. Because of the principles of the image processing algorithms, the delay of the visual measurements is not deterministic. When considering delays from the camera communication and image processing, the time duration between capturing an image and the fusion in the EKF may potentially be even longer than the image rate. This means, by the time that the information from an image is incorporated in the estimation, several more images might have been captured already. Figure 4.29 illustrates the sequence of events. Time step m i represents the moment in time, at which a visual measurement of index i is captured. Between time step m i and m i+1 , a number of intermediate steps occur, at which the grasp state is predicted and updated using the finger position measurements. Because of the communication and processing delays, a measurement that was observed at time step m i will not be available to be fused in the EKF until time step f i . Depending on the delays, it may even be the case that f i > m i+1 . Given the previous considerations in the context of the grasp state estimation, it is not feasible to recompute all updates from the finger measurements for steps m i through f i , in order to optimally consider the measurement as it becomes available. Instead, inspired by the method in [24], a past correction, using the measurement and state at time step m i , is computed and subsequently applied to the state at time step f i .

4.8 Data Fusion Under Measurement Delays

103

Utilizing the measurement model for the respective visual modality, i.e. fiducial markers, contour features or visual object tracking, a correction to the state,  ym i , is calculated from the measurement z m i and state ym i :  ym i = K m i (z m i − h( ym i ))

(4.72)

where K m i is the Kalman gain, which is calculated from the state covariance, measurement disturbance and derivative of the measurement model at time step m i . Before, this correction was directly applied to the predicted state, ym i , in order to obtain the updated state, ym i : ym i = ym i +  ym i

(4.73)

For the illustrated example, this correction would produce the correct object pose at time step m i , as shown in Fig. 4.28a. However, applying the same correction to the state at time step f i would be incorrect, because of the estimated change in the object pose between m i and f i . Instead, by describing the correction w.r.t. the object coordinate system, {O}, not globally, the correction becomes independent of the current object pose: −1

o T o,m i = T o,m i T o,m i

(4.74)

with o T o,m i , T o,m i and T o,m i being equivalent to o x m i , x m i and x m i , respectively, written in the transformation notation. Subsequently, the correction in object coordinates, o T o,m i , can be applied to the estimated object pose at time step f i : T o, fi = T o, fi o T o,m i

(4.75)

Figure 4.28c depicts this correction, when applied to the illustrated example. Equation (4.75) described the incorporation of delayed measurement into the estimation of the object pose x fi . Of course, the state vector y fi also contains the estimated finger position biases q˜ fi and, depending on the available data, the camera pose x c, fi . Since the finger biases are not directly affected by the vision-based measurements, no correction has to be calculated. The estimated camera pose, on the other hand, is only influenced by visual inputs. This means, the prediction and update steps from finger measurements will not affect the camera pose. Therefore, the correction x c,m i can be directly applied to the estimation at f i : x c, fi = x c, fi + x c,m i

(4.76)

The proposed method is applicable as long as a visual measurement is observed after the previous measurement has already been incorporated. Figure 4.30 illustrates the same scenario as before, but additionally depicts the observation of a second measurement at time step m 2 , which occurs before the fusion of the first measurement

104

(a)

4 Grasp State Estimation

(b)

(c)

Fig. 4.30 If the measurement delay is larger than the time between two measurements, updates would be applied without considering the effect of the previous correction. Potentially, this would cause the same correction to be applied multiple times. a Before applying the first correction, another measurement of the object pose is made. b The calculated correction would be similar to the previous one, resulting in an overcorrection of the object pose, once applied. c By considering all corrections, which occurred during the measurement delay, this can be avoided

at f 1 . As the corresponding correction is calculated at time step f 2 , the update from the first measurement is not yet consider in the calculation of o x m 2 . Consequently, the same correction will be computed and applied twice to the estimation of the object pose, once at f 1 and another time at f 2 . This over-correction has to be avoided in order to maintain an accurate estimate of the grasp state. To prevent this behavior, the correction from the first measurement has to be considered before integrating the second one. At time step f 2 , when the second measurement will be incorporated, the first correction, o x m 1 , which was computed at f 1 , is already available. Therefore, it can be applied to the estimated state at m 2 , similar to the correction at f 1 in Eq. (4.75): Tˆ o,m 2 = T o,m 2 o T o,m 1

(4.77)

Here, Tˆ o,m 2 denotes the transformation notation of an object pose xˆ m 2 , which includes the correction from the previous measurement. Calculating the update from the second measurement based on this pose avoids any over-correction. Subsequently, Eq. (4.72) is reformulated as follows:  ym i = K m i (z m i − h( ˆym i ))

(4.78)

where ˆym i describes the state, which includes all corrections that were fused during the delay period.

4.9 Experimental Validation

105

4.9 Experimental Validation This chapter proposed a flexible framework for the estimation of the grasp state of a manipulated object. It allows the consideration of various sensor modalities, ranging from finger position and contact measurements to different visual information. The experimental validation of the method is presented in this section. It consisted of several manipulation scenarios, each highlighting a different aspect of the system. The platform for these experiments was the DLR humanoid robot David, which was introduced in Chap. 2. The evaluation covered three typical manipulation tasks. First, the quality of the pose estimation during the grasp acquisition of four different objects was tested. The utilization of an external ground truth allowed to quantify the localization error, as well as compare the strengths and limitations of the various sensing modalities. Next, the effect of the grasp state estimation on the success rate of two pick-andplace scenarios was evaluated. The first task involved the relocation of a bottle, which tilted during the grasp. The second scenario comprised of the stacking game for children that was introduced in Chap. 1. Finally, the stability of the estimation during an extended in-hand manipulation operation was examined. Here, an object was continuously rotated back and forth inside of the hand, testing the long-term drift of different localization approaches.

4.9.1 Grasp Acquisition The goal of the first set of experiments was to quantify the quality of the object pose estimation and to compare the contribution of the different measurement modalities. To this end, a ground truth measurement of the object pose was required. This was provided by the K610 visual tracking system from Nikon [25]. It allowed to accurately determine the 3D positions of a set of active LEDs. Tracking the 6 DoF pose of an object required three of these LEDs to be rigidly attached to the object. Furthermore, in order to measure the pose of the object w.r.t. the hand of the robot, another set of three LEDs had to be attached to its palm. Figure 4.31 illustrates the setup, as it was utilized for the grasp acquisition experiments. Using the K610 tracking system as the ground truth for grasping experiments posed a number of challenges. First, the need to attach the LED holder limited the selection of suitable objects. For small objects, it is not possible to mount the LEDs without impairing their graspability. Additionally, the attachment alters the appearance of the object, which may affect the vision-based modalities of the pose estimation. Furthermore, to ensure successful tracking, an LED has to be in line of sight of all three cameras of the K610 system. That means, occlusions of any of the six LEDs have to be avoided at all times, further limiting the possible grasp

106

(a)

4 Grasp State Estimation

(b)

Fig. 4.31 Ground truth setup that was used for the grasp acquisition experiments. a Two sets of tracking LEDs allow to calculate the 6 DoF pose of the object w.r.t. the palm of the robot. b The K610 visual tracking system from Nikon triangulates the 3D positions of the LEDs

configurations. Finally, reconstructing the pose of the object from the LED positions requires knowledge about the position of the LEDs w.r.t. the object coordinate system. While these positions are constant, they have to be measured initially. Therefore, these measurements represent a possible source of error, which may degrade the quality of the ground truth pose. Considering these constraints, the first set of experiments consisted of the grasp acquisition of four different objects. For each trial, the object was placed on a table in front of the robot. The hand of the robot was positioned at a fixed location next to the object. Subsequently, the fingers were commanded to move to a predefined configuration, utilizing the joint impedance controller that was mentioned in Chap. 2. If the measured joint torques exceeded a given value before reaching the desired position, the respective joints were stopped. For each object, a total number of N = 10 grasps were executed. During each iteration, the joint position and torque measurements, RGB-D camera images, as well as the ground truth were recorded at a rate of 60 Hz. The head of robot David houses an Intel RealSense D435 depth camera, which was used as the visual input source in the grasp state estimation. Recording all relevant measurements allowed to subsequently recompute and compare the pose estimation based on different combinations of sensing modalities. For the initialization of the EKF, the object pose from the ground truth was used. In total, five variants of the grasp state estimation, each using a different combination of measurements, were evaluated: 1. Joint position measurements: Only using collision contacts and the EKF implementation, as described in Sect. 4.4. 2. Joint position and torque measurements: Considering additional sensed contacts inferred from the joint torques. 3. Both joint measurements + AprilTag features: Fusion of the corner positions of an AprilTag, which is rigidly attached to the object (see Sect. 4.5).

4.9 Experimental Validation

107

Table 4.2 Mean and standard deviation of the terminal absolute errors in position and orientation for the grasp acquisition experiments (sample size of N = 10)

Joint Positions

Positions + Torques

AprilTag Features

Contour Features

Object Tracker

Ketchup bottle Position in [mm] Orientation in [deg]

16.9 ± 2.8 16.3 ± 3.3

13.1 ± 6.5 14.4 ± 3.5

3.6 ± 0.8 1.2 ± 0.4

5.2 ± 2.0 8.5 ± 5.0

7.2 ± 1.2 6.0 ± 1.1

Brush Position in [mm] Orientation in [deg]

71.0 ± 2.1 48.7 ± 13.2 6.2 ± 0.8 62.9 ± 15.9 48.2 ± 11.5 3.3 ± 0.5

12.7 ± 3.4 14.0 ± 3.6

7.9 ± 0.7 5.7 ± 0.5

Shampoo Position in [mm] Orientation in [deg]

18.5 ± 3.2 7.7 ± 2.4

15.0 ± 3.3 7.7 ± 1.8

8.6 ± 1.7 4.0 ± 0.5

11.0 ± 3.4 9.0 ± 4.0

12.7 ± 3.0 5.9 ± 2.1

Water bottle Position in [mm] Orientation in [deg]

14.3 ± 3.2 20.2 ± 1.1

15.1 ± 1.6 14.8 ± 1.4

7.2 ± 0.4 5.2 ± 0.2

13.3 ± 3.2 13.3 ± 3.4

13.6 ± 0.3 10.3 ± 0.5

4. Both joint measurements + contour features: Fusion with natural features, which are extracted from the contour of the object (see Sect. 4.6) 5. Both joint measurements + visual object tracker: Fusion with the 6 DoF object pose from the RGB-D based object tracker (see Sect. 4.7). The results of all grasp executions are summarized in Fig. 4.32 and Table 4.2. They present the terminal errors in position and orientation, as well as their standard deviations, for each object and combination of measurement modalities. The four objects were chosen to test the pose estimation under a range of conditions: 1. Ketchup bottle: The ketchup bottle was picked up using a power grasp, causing the object to tilt inside of the hand. Figure 4.33 depicts the bottle before and after grasping. 2. Brush: The brush that is shown in Fig. 4.34 was grasped using only the fingertips, resulting in a five-finger precision grasp. 3. Shampoo: While the first two objects can be described by simple, geometric shapes (a cylinder and a cuboid), the shampoo bottle in Fig. 4.35 has a freeform geometry. Furthermore, the manipulation of the shampoo involved neither a typical precision nor power grasp, but an intermediate grasp. 4. Water bottle: As depicted in Fig. 4.36, the water bottle is an example of a mostly transparent object, which challenges classical vision-based localization methods.

108

4 Grasp State Estimation

(a)

(b)

(c)

(d)

Fig. 4.32 Error graphs for the grasp acquisition experiments. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Ketchup bottle. b Brush. c Shampoo. d Water bottle

To illustrate the object displacements, as well as the performance of the different sensing modalities, Figs. 4.33–4.36 show the progress of the grasp state estimation for one of the ten executions for each object. Additionally, overlays over the camera view of the robot show the estimated object pose at the end of the grasp for the same trial. The results demonstrate the strengths and limitations of the different pose estimation variants. For the power grasp of the ketchup and water bottle, joint position measurements alone allow to approximately estimate the displacement of the object.

4.9 Experimental Validation

(a)

109

(b)

(c) Fig. 4.33 Ketchup bottle: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line

Both the error in position and orientation are significantly reduced, compared to when no in-hand object motion is assumed. The additional inclusion of joint torque measurements, which allow to infer sensed contacts between the fingers and the object, further improves the estimation. However, not all degrees of freedom of the object pose are constrained by the finger measurements. In the case of the bottle grasps,

110

(a)

4 Grasp State Estimation

(b)

(c) Fig. 4.34 Brush: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line

both the position along and the orientation around the symmetry axis, i.e. z and ψ, cannot be observed from this measurement modality. The effect of this limitation is even more apparent for the grasps of the shampoo and brush. Since these grasps rely mainly on fingertip contacts, the objects are even less constrained by the geom-

4.9 Experimental Validation

(a)

111

(b)

(c) Fig. 4.35 Shampoo: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line

etry of the hand. In turn, this results in significant estimation errors, in particular around φ. The incorporation of visual information allows to greatly improve the pose estimation. Across the different vision-based variants, as well as objects, the fusion of

112

(a)

4 Grasp State Estimation

(b)

(c) Fig. 4.36 Water bottle: Illustration of the pose estimation during one of the grasp trials. The colors denote the different combinations of measurements as follows: No estimation (black), joint positions (red), joint positions and torques (blue), both joint measurements + AprilTag features (orange), both joint measurements + contour features (magenta), and both joint measurements + visual object tracker (green). a Camera view before the grasp, including an overlay of the initial estimate of the object pose (black). b Camera view after grasping, including overlays of the pose estimates from the different measurement variants. c Estimated change in position and orientation during the execution of the grasp. The ground truth pose is represented by the dashed line.

these additional measurements markedly reduces the terminal error. Among the three modalities, the fusion with artificial features from an AprilTag demonstrates the best overall estimation quality, reducing the error below 10 mm in position and 10◦ in rotation. The incorporation of the RGB-D based visual object tracker proved to be the preferable marker-less variant, demonstrating comparable performance to the April-

4.9 Experimental Validation

113

Tag modality in most cases. While not as limited as the finger measurements, the visual object tracker is also not able to identify all degrees of freedom of rotationally symmetrical objects, such as the ketchup or water bottle. Since the algorithm only relies on the geometry of the object, the orientation around the symmetry axis, ψ, cannot be determined. This is also the case for the contour features method, which, on average, demonstrated a marginally worse estimation quality compared to the visual object tracker. Often, the orientation of an object around its symmetry axis is not relevant for the task, e.g. when pouring water from a bottle. Therefore, an estimation error in this DoF will not impede the successful execution. However, in applications where this knowledge is required, additional information would need to be considered in the localization, such as from the texture of the object. Standalone results using only the visual object tracking modality have been reported in [19].

4.9.2 Pick-and-Place The purpose of the second part of the experimental validation was to demonstrate the practical relevance of the grasp state estimation and how its incorporation may improve the success rate of different manipulation tasks. This evaluation consisted of two pick-and-place scenarios. In the first application, the robot was tasked with picking up a ketchup bottle from different positions on a table and subsequently placing it at a desired target location. The task in the second scenario was to stack the pentagon-shaped object of the game that was introduced in Chap. 1. Designed to develop the fine motor skills of small children, the goal of this game is to correctly place the geometric pieces on the pins of a wooden board. These two tasks vary greatly in the precision that is required to successfully accomplish them. Based on the result from the previous experiment, three variants of the grasp state estimation were tested, using different modalities: 1. No in-hand estimation 2. Joint position and torque measurements 3. Both joint measurements + visual object tracker Here, the use of joint position and torque measurements is considered the baseline, since they are always available, when using a torque-controlled robotic hand, such as the one of David. In contrast, the visual object tracker requires images from a camera that observes the scene. Among the tested vision-based modalities, the RGB-D based object tracker was shown to be the preferable marker-less solution. At least for the second task, the use of an AprilTag is not possible, because of the small size of the object and occlusions by the hand. Therefore, only these variants were compared in the pick-and-place scenarios. For the first task, the ketchup bottle was approximately placed at one of four initial locations on the table in front of David (see Fig. 4.37a). Following the initialization of the grasp state estimation, the robotic hand was moved towards the object. The approach pose of the hand w.r.t. the ketchup bottle was pre-defined. Subsequently,

114

4 Grasp State Estimation

the bottle was grasped similar to the procedure that was described for the previous experiments. When power grasping the ketchup bottle, the object considerably tilted inside of the hand. Based on the current knowledge of the in-hand object pose, the motion of the arm towards the desired target location was planned. Any observed motion of the object during the grasp would be compensated by the positioning of the hand. Finally, the hand was opened, placing the ketchup bottle on the table. The task was considered successful if the bottle remained standing after release. In order to evaluate the contribution of the proposed pose estimation, the task was first executed without any in-hand localization method. In this case, the tilt of the ketchup bottle inside of the hand would not be known. For each of the three variants, i.e. no in-hand localization, estimation from finger measurements and fusion with the visual object tracker, the task was repeated eight times, twice for each of the four initial positions. Figure 4.38 illustrates the individual steps of the task. Broadly, the execution of the second task followed similar steps. Initially, the pentagon-shaped object was placed flat at one of four locations on the table, with its orientation around the vertical axis being one of two possible angles (see Fig. 4.37b). Subsequently, the object was picked up by the robot, using a three-finger precision grasp, thereby moving it slightly during the acquisition. As before, the estimated in-hand object pose was used to plan the positioning of the hand w.r.t. the target. Once the hand reached the final pose, the fingers were opened, dropping the object on the pins of the board. If the pentagon was indeed stacked on the board, the task was considered a success. For each pose estimation variant, the task was repeated eight times, once for each possible combination of the initial positions and orientations. The series of steps of the task execution are shown in Fig. 4.39. The success rate of the three estimation variants for both tasks are summarized in Table 4.3. Additionally, Figures 4.38e and 4.39e illustrate the continuous pose estimation from finger measurements and visual object tracking for one successful trial each.

(a)

(b)

Fig. 4.37 Illustration of the initial and target poses on the table for the two pick-and-place tasks. Each experiment consisted of eight trials. a Ketchup bottle: The object was placed twice at each of the four initial poses. b Stacking game: The object was placed at one of four initial positions, oriented in one of two different directions

4.9 Experimental Validation

(a)

(c)

(e)

115

(b)

(d)

116

4 Grasp State Estimation

Fig. 4.38 Pick-and-place of a ketchup bottle. During the grasp acquisition, the object tilts inside of the hand. If this tilt is not compensated, the bottle may fall over, when being placed at the target location. Using the output of the grasp state estimation, the final positioning of the hand is adjusted to account for the in-hand displacement. a Hand and object before the grasp. b The object tilts inside of the hand before settling in a stable power grasp. c Hand positioning at the target location, without considering the in-hand motion. d Corrected hand placement, based on the estimated grasp state. e Estimated displacement of the object during a successful trial of the pick-and- place task, enabled by the grasp state estimation from joint measurements. (a), (b), and (d) mark the moments in time of the corresponding images

The experiment demonstrated the required precision in order to accomplish two different pick-and-place tasks. In the case of the ketchup bottle scenario, without any in-hand localization method, only two out of eight operations could be finished successfully. This means, in 75% of the trials the ketchup bottle fell over after opening the hand. In contrast, incorporating the estimated pose from finger measurements ensured the proper placement of the object in every trial. Similarly, the fusion with the visual object tracker resulted in a perfect success rate for this task. Considering the results of the previous experiment, this means even moderate estimation quality is sufficient to accomplish certain operations. Even without access to visual information about the object, the pose estimation from finger measurements alone may greatly improve the success rate of a robotic manipulation task. Accomplishing the second task posed a much greater challenged w.r.t. the placement precision. Although there was only a slight displacement during the grasp execution, without any in-hand localization capability, the task was accomplished zero out of eight times. Using only finger measurements for the object pose estimation allowed to occasionally finish the task correctly, resulting in a total success rate of 25%. Finally, the incorporation of visual data from the object tracker yielded success in six out of eight trials, a significant improvement. While not resulting in a perfect success rate, the fusion of finger measurements and visual data enabled the execution of this challenging manipulation tasks, which demands high precision.

4.9.3 In-Hand Manipulation The third experiment was designed to test the stability of the pose estimation during extended in-hand manipulation tasks. When grasping, the object typically reaches its final in-hand position in less than a second. However, in the case of in-hand manipulation tasks, the object may be repeatably repositioned inside of the hand. For an in-hand localization method to be applicable in these scenarios as well, the estimation is not allowed to drift significantly over time. To validate the stability of the proposed grasp state estimation, it was tested in an extended in-hand manipulation scenarios. The triangle-shaped object in Fig. 4.40 was continuously rotated back and forth inside of the hand for one minute, returning

4.9 Experimental Validation

117

(a)

(b)

(c)

(d)

(e)

118

4 Grasp State Estimation

Fig. 4.39 Stacking of a pentagon-shaped object on the pins of a wooden board. The three-finger precision grasp moves the object slightly, when it is picked up. Placing the object on the board requires high precision, making high demands on the quality of the pose estimation. a Hand and object before the grasp. b The object moves slightly during the grasp acquisition. c Without correcting the in-hand motion, the object will not be aligned with the pins. d The correct estimation of the object pose allows to stack the object successfully. e Estimated displacement of the object during a successful trial of the stacking game, enabled by the grasp state estimation from joint measurements and visual object tracking. (a), (b), and (d) mark the moments in time of the corresponding images Table 4.3 Success rate of the pick-and-place tasks for the different pose estimation variants #1

#2

No estima- X tion

X

Joint mea-  surements Joint +  visual data

#3

#4

#5



X

X











No estima- X tion

X

Joint mea- X surements Joint +  visual data

#6

#7

#8

Success rate



X

X

2/8









8/8











8/8

X

X

X

X

X

X

0/8

X

X

X

X





X

2/8







X

X





6/8

Ketchup bottle

Stacking game

to its initial pose after each iteration. The experiment compared three variants of the pose estimation system: 1. Prediction only: Merely the prediction step of the EKF is executed, inferring object twists from joint velocities. This is comparable to the pose estimation in related work in the context of in-hand manipulation such as in [26]. 2. Joint position and torque measurements: Estimation from both finger measurements, including the measurement update as described in Sect. 4.4. 3. Both joint measurements + visual object tracker: The pose estimation from finger measurements is fused with the output of the visual object tracker. Figure 4.41 illustrates the progress of the pose estimation for all three variants. Additionally, the relative drift in position and orientation of the object for each localization method is shown in Table 4.4. When compering the results of the three variants, purely predicting the object pose resulted in the most significant drift, both in position and orientation, on average 5 mm and 5◦ per minute. Since this variant only relies on integrating the joint motions,

4.9 Experimental Validation

119

Fig. 4.40 Three-finger precision grasp of a triangular object, which was continuously rotated, back and forth, around its vertical axis

Fig. 4.41 Estimated displacement of the triangle object during the continuous in-hand rotation. The colors denote the different estimation variants: Prediction only (black), joint measurements (red), joint measurements + visual object tracker (green) Table 4.4 Average drift of the pose estimation variants during the in-hand rotation of the triangle Drift in [mm/min]

Drift in [deg/min]

x

y

z

Position

φ

θ

ψ

Orientation

0.7

–4.4

1.8

4.8

–0.1

3.1

–3.7

4.8

Joint measurements –0.1

0.2

0.3

0.4

–3.2

0.2

–1.9

3.8

Joint + visual data

–1.0

–0.3

1.1

1.2

0.2

0.3

1.3

Prediction only

0.0

numerical errors will compound over time, leading to a continually increasing error. In contrast, the proposed state estimation from finger measurements, which includes the update step, greatly reduces the positional drift of the object. However, progressive drift in φ remains, since this DoF is not fully constrained by the finger positions.

120

4 Grasp State Estimation

Overall, the fusion of finger measurements and visual data results in the smallest compounding error over time. For this variant, the drift was contained to ca. 1 mm and 1◦ per minute, representing a 5x improvement over the pure prediction approach, which was used in previous in-hand manipulation frameworks.

4.10 Summary In this chapter, the development and validation of a novel grasp state estimation method was described. The proposed framework enables the integration of a range of sensing modalities into a common probabilistic framework, which allows to determine the in-hand pose and contact configuration of a grasped object (see Fig. 4.42). Based on an extended Kalman filter, the system efficiently estimates the mean and covariance of the grasp state, explicitly considering the uncertainty of the sensor measurements. Utilizing geometric descriptions of the object and the hand, inconsistencies in the assumed grasp configuration are detected. Since two bodies cannot occupy the same space, collisions between the fingers and the object indicate an error in the estimation. An extension to the popular GJK distance algorithm enabled the localization of a pair of points on the surfaces of two intersecting bodies, which precisely describe the penetration depth. Based on these points, the update step of the EKF corrects the grasp state, such that any collisions between the object and the fingers are resolved. This involves adjusting the object pose, as well as estimating errors on the positions of the fingers. Moreover, utilizing the grasp model from Chap. 3, displacements of the object are predicted from the motion of the fingers. The capabilities of the in-hand localization method were extended by the consideration of tactile sensing information, either from dedicated sensors or inferred from joint torque measurements. The detection of a contact between the object and

Fig. 4.42 The combination of finger measurements and visual data enabled the estimation of the in-hand pose of grasped objects, such as this rectangular piece of the stacking game

4.10 Summary

121

one of the fingers allows to further constrain the estimated grasp state. Similar to the collision contacts, the estimated object pose and finger positions are updated in the measurement update of the EKF, such that the touching bodies are aligned. Extensions to the filter further enabled the consideration of additional bodies in the environment, such as tables, the prediction of the object motion from displacements of the palm, as well as the stabilization of the estimated finger corrections. Beyond the estimation of the grasp state from finger measurements, this chapter elaborated how different types of visual information can be incorporated in the proposed system. First, the data fusion of artificial image features from fiducial markers realized the ultra-tight coupling of this measurement input and the Kalman filter. In addition to the estimation of the object pose, this allowed to integrate the localization of the camera and the tracking of a target object into the same probabilistic framework. Next, a procedure for the extraction of naturally occurring features on the contour of the object was presented, eliminating the need for artificial markers. Finally, the integration of the output of a visual object tracker, which independently estimates the full 6 DoF pose of the object, was discussed. The presentation of a novel method for the consideration of measurement delays concluded the description of the design of the grasp state estimation. Subsequently, a series of experiments demonstrated the capabilities of the perception system. They involved the grasp acquisition of different objects, the execution of two pick-and-place tasks and the extended in-hand rotation of a triangle. In all scenarios, different combinations of measurements were compared, illustrating their respective strengths and limitations. Examining the various sensing modalities, estimating the grasp state solely from finger measurements, i.e. joint position and torque measurements, allowed to significantly improve the knowledge of the grasp state. Objects may move considerably during the acquisition of the grasp. Consequently, as demonstrated by the pick-and-place experiments, the lack of any in-hand localization capability results in low success rates. However, in applications that do not require high accuracy, the estimation from finger measurements can be sufficient to enable the successful execution of a manipulation task. At the same time, the evaluation highlighted the limitations of these measurements. If the pose of the object is not largely constrained by the fingers, such as in precision grasps, visual information may be required to contain the estimation error. The grasp acquisition experiments illustrated that each of the visual modalities is capable of significantly improving the estimation of the grasp state. In particular, the performance of a combination of finger measurements and visual object tracking was demonstrated in the second pick-and-place task. The stacking game, which was first introduced in Chap. 1, requires the precise positioning of the game pieces. By incorporating all available information in the estimation, a high success rate could be achieved. While the grasp state estimation enabled the placement of the pentagon object of the stacking game, a practical limitation prevented solving the complete task. Determining the in-hand position of one of the grasped game pieces, as shown in Fig. 4.42, allows to plan the positioning of the manipulator, such that the object would be aligned with the corresponding pins of the board. However, kinematic constraints

122

4 Grasp State Estimation

of the arm can make it impossible to actually reach this pose. The ability to move the object inside of the hand, in particular reorient it, would allow to overcome this limitation. The realization of this capability will be the focus of the following chapter.

References 1. Thrun, S., W. Burgard, and D. Fox. 2005. Probabilistic Robotics. Intelligent Robotics and Autonomous Agents series: MIT Press. 2. Martin Pfanne. In-hand object pose estimation from kinematic data and tactile sensing. Diploma thesis, Technische Universität Dresden, 2013. 3. Maxime Chalon, Martin Pfanne, and Jens Reinecke. Online in-hand object localization. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2977–2984. IEEE, 2013. 4. E.A. Wan and R. Van Der Merwe. The unscented Kalman filter for nonlinear estimation. In 2000 IEEE Adaptive Systems for Signal Processing, Communications, and Control Symposium (AS-SPCC), pages 153–158. IEEE, 2000. 5. Fox, V., J. Hightower, L. Liao, D. Schulz, and G. Borriello. 2003. Bayesian filtering for location estimation. IEEE Pervasive Computing 2 (3): 24–33. 6. Elmer G. Gilbert, Daniel W. Johnson, and S. Sathiya Keerthi. A fast procedure for computing the distance between complex objects in three-dimensional space. IEEE Journal on Robotics and Automation, 4 (2):193–203, 1988. 7. Stephen Cameron. Enhancing gjk: Computing minimum and penetration distances between convex polyhedra. In 1997 IEEE International Conference on Robotics and Automation (ICRA), volume 4, pages 3112–3117. IEEE, 1997. 8. Porrill, John. 1988. Optimal combination and constraints for geometrical sensor data. The International Journal of Robotics Research 7 (6): 66–77. 9. Edwin Olson. Apriltag: A robust and flexible visual fiducial system. In 2011 IEEE International Conference on Robotics and Automation (ICRA), pages 3400–3407. IEEE, 2011. 10. Andreas Tobergte, Mihai Pomarlan, Georg Passig, and Gerd Hirzinger. An approach to ultatightly coupled data fusion for handheld input devices in robotic surgery. In 2011 IEEE International Conference on Robotics and Automation (ICRA), pages 2424–2430. IEEE, 2011. 11. Jens Reinecke, Bastian Deutschmann, and David Fehrenbach. A structurally flexible humanoid spine based on a tendon-driven elastic continuum. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 4714–4721. IEEE, 2016. 12. Intel. Intel RealSense Depth Camera D435. https://www.intelrealsense.com/depth-camerad435/, 2018. 13. Canny, John. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 679–698. 14. Sklansky, Jack. 1982. Finding the convex hull of a simple polygon. Pattern Recognition Letters 1 (2): 79–83. 15. Christopher G. Harris, Mike Stephens, et al. A combined corner and edge detector. In Alvey Vision Conference, volume 15, pages 10–5244, 1988. 16. Dana H. Ballard. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition, 13 (2):111–122, 1981. 17. Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24 (6):381–395, 1981. 18. Manuel Stoiber. Real-time in-hand object tracking and sensor fusion for advanced robotic manipulation. Master’s thesis, Technische Universität München, 2019.

References

123

19. Manuel Stoiber, Martin Pfanne, Klaus Strobl, Rudolph Triebel, and Alin Albu-Schäffer. A sparse gaussian approach to region-based 6DoF object tracking. In 2020 Asian Conference on Computer Vision (ACCV). Springer, 2020. 20. Ajit Gopalakrishnan, Niket S. Kaisare, and Shankar Narasimhan. Incorporating delayed and infrequent measurements in extended Kalman filter based nonlinear state estimation. Journal of Process Control, 21 (1):119–129, 2011. 21. Eugenius Kaszkurewicz and Amit Bhaya. Discrete-time state estimation with two counters and measurement delay. In 1996 35th IEEE Conference on Decision and Control (CDC), volume 2, pages 1472–1476. IEEE, 1996. 22. Thomopoulos, Stelios CA. 1994. Decentralized filtering and control in the presence of delays: Discrete-time and continuous-time case. Information Sciences 81 (1–2): 133–153. 23. Sang Jeong Lee, Seok-Min Hong, and Graham C. Goodwin. Loop transfer recovery for linear systems with delays in the state and the output. International Journal of Control, 61 (5):1099– 1118, 1995. 24. Thomas Dall Larsen, Nils A. Andersen, Ole Ravn, and Niels Kjølstad Poulsen. Incorporation of time delayed measurements in a discrete-time Kalman filter. In 1998 37th IEEE Conference on Decision and Control (CDC), volume 4, pages 3972–3977. IEEE, 1998. 25. Nikon. K610 Optical CMM. https://www.metric3d.de/images/PDF/Optical_CMM_EN.pdf, 2010. 26. Wimböck, Thomas, Christian Ott, Alin Albu-Schäffer, and Gerd Hirzinger. 2012. Comparison of object-level grasp controllers for dynamic dexterous manipulation. The International Journal of Robotics Research 31 (1): 3–23.

Chapter 5

Impedance-Based Object Control

Enabled by the grasp state estimation method, the developed in-hand object controller is presented in this chapter. The impedance-based method allows the compliant positioning of a grasped object inside of the hand, while at the same time regulating the internal forces on the object. Following some introductory remarks, this chapter begins by introducing the concept of an in-hand object controller. In the subsequent section, the design of the controller is presented, including an overview of the developed architecture. The next number of sections focus on the realization of the main components of the system, which are the object impedance controller, the distribution of the internal forces and the torque mapping. Consequently, the applicability of the method is extended by allowing the reconfiguration of the grasp, e.g. during the grasp acquisition. This capability is further developed in the context of in-hand manipulation, providing a complete interface for active finger gaiting. This chapter concludes by presenting a series of experiments, which validate the performance of the proposed framework in different manipulation scenarios.

5.1 Introduction The previous chapter introduced a framework for the estimation of the grasp state of manipulated objects. The experimental validation demonstrated how the in-hand localization method allowed to successfully accomplish tasks, which would fail without it. In these scenarios, the object was grasped and subsequently remained in the same in-hand pose until it was placed. However, not all practical manipulation tasks can be accomplished through this simple pick-and-place operation. Many scenarios required the robot to reposition the object inside of the hand first. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. Pfanne, In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands, Springer Tracts in Advanced Robotics 149, https://doi.org/10.1007/978-3-031-06967-3_5

125

126

(a)

5 Impedance-Based Object Control

(b)

Fig. 5.1 The kinematics of a robotic arm may not provide sufficient range of motion in order to move a statically grasped object to the desired target pose. Reaching it requires the capability of reorienting the object inside of the hand. a Hand and object before the grasp. b The kinematic constraints of the robotic arm do not allow to stack the object on the pins

Figure 5.1 shows David trying to place the rectangular object of the stacking game that was introduced in Chap. 1. In the experiments of the previous chapter, the robot was shown to be able to stack the pentagon-shaped object. However, unlike the pentagon, the rectangle only presents two, not five, possible orientations for placing it on the pins. Therefore, a much great range of motion may be required in order to reach the target position. In the case of the human-inspired robot David, the necessary kinematic freedom cannot be provided by the robotic arm alone. The ability to move the object inside of the hand is required. Conceptually, the problem of moving a grasped object inside of the hand can be separated into two components. On the one hand, it involves the positioning of all fingers, which are in contact, in a way that causes the desired object displacement. On the other hand, the forces, which are applied to the object, have to be regulated, such that the slippage or loss of contacts is avoided. Both elements of this problem will be further explored and subsequently specified in the remainder of this section.

5.1.1 Concept Object Positioning The main purpose of an in-hand controller is to regulate the object pose w.r.t. the palm. Starting from an initial grasp, this consists of the coordinated repositioning of the finger links, which are in contact with the object, through the control of the respective joint positions. Figure 5.2 illustrates the problem. Assuming a fixed grasp configuration during the motion, the desired displacement of the object can be related to the required change in position of the contact points between the hand and the

5.1 Introduction

(a)

127

(b)

Fig. 5.2 Moving the object to the desired in-hand pose involves the coordinated repositioning of the fingers that are in contact with the object. a Illustration of the initial (solid, yellow) and the desired (dashed) pose of the object. b Displacement of the contacts with the fingers, which generates the desired object motion

object. Similarly, the motion of the contact points can be mapped to the corresponding joint positions. Practically, of course, the grasp configuration may change during the manipulation, since the fingers are not fixed to the object. Displacements of the object are induced by forces, which are applied through the contacts with the finger links. Therefore, they rely on the friction between the two surfaces. If the tangential component of the applied force is greater than the static friction, the finger will start to slide on the surface of the object. In order to maintain the initial grasp configuration, the sliding of the finger links has to be actively restricted by the controller.

Internal Forces The maintenance of an object grasp involves the application of contact forces, which are generated by the joints. However, in order to avoid unintended object movements, the distribution of the forces among all contacts has to be chosen, such that they are in balance. For example, applying the same normal contact force with all fingers may generate a wrench on the object, which would cause the object to move. Additionally, dynamical loads and external forces may affect the balance of the object. Therefore, the contact forces have to be adjusted in order to ensure that no unintended wrench is generated. Figure 5.3a illustrates a balanced force distribution for an exemplary grasp. As before, an additional constraint are the friction characteristics of the contact surfaces. When choosing the balancing forces on the object, the sliding of fingers has to be avoided by ensuring that the static friction regime is maintained.

128

5 Impedance-Based Object Control

(a)

(b)

Fig. 5.3 Maintaining a stable grasp configuration requires applying internal forces on the object. a Exemplary force distribution, which balances the object. b Redistributed forces following the reconfiguration of the grasp

Finally, the distribution of the contact forces has to account for changes in the grasp configuration. When adding or removing a contact, the internal forces have to be adjusted, such that the object remains in balance. Figure 5.3b shows the redistributed forces after removing a contact from the grasp.

5.1.2 Problem Statement The goal of the proposed controller is to regulate the pose of an object, which is held in a precision grasp, consisting of three or more fingertip contacts. Information about the grasp state is provided by the estimation method that was presented in Chap. 4. These consist of the current pose of the object, x t ∈ R6 , the corrected joint positions, qˆt ∈ Rm , as well as the vector of contact point positions, ct ∈ R3n , with n being the number of contacts. Based on the positions of the contacts, the geometric description of the finger geometries also allows to determine the normal directions of the surfaces, nt ∈ R3n , at the points of contact. Altogether, the localization method provides a consistent estimation of the grasp state, where the contacting finger links are properly aligned with the object, as depicted in Fig. 5.4. The corrected joint position vector, qˆt , is calculated from the measured joint positions, q t , and the estimated error biases, q˜ t : qˆt = qt + q˜ t

(5.1)

5.1 Introduction

129

Fig. 5.4 The grasp state estimation provides the current object pose, x, contact positions, c, and normals, n, and corrected ˆ The joint positions, q. desired object pose, x des , is specified by the user

Figure 5.4 also shows the desired object pose, x des ∈ R6 , which is provided to the controller. Formally, the task of the object controller is to move the object to x des . At the same time, the initial grasp configuration shall be maintained. It is described by the initial positions of the contact points, cinit ∈ R3n . Expressed w.r.t. the object frame, {O}, this configuration shall be actively preserved throughout the manipulation. The second part of the problem is the determination of the internal forces, which balance the object. Using the hard-finger model, the goal is to find a set of contact forces, f int ∈ R3n , which compensate any external and dynamical loads on the object. In order to avoid the sliding of fingers on the surface of the object, the chosen contact forces have to consider the friction constraints. Using the Coulomb friction model, which was explained in Chap. 3, the admissible region of a force vector can be described by the friction cone, which formulates that: [i] [i] f int, ≤ μ f int,⊥

(5.2)

[i] [i] Here, f int, ∈ R and f int,⊥ ∈ R denote the tangential and normal components of a contact force vector. The scalar friction coefficient, μ ∈ R, depends on the materials of both the object and the finger link in contact. If it is not precisely known, it should be chosen conservatively. Figure 5.5 illustrates the constraint. Evidently, the determination of the internal forces does not have one unique solution. In particular, the grasp strength, i.e. how firmly an object is held, represents a free parameter. Moreover, depending on the grasp configuration, only a subset of the fingers may be necessary to maintain a balanced grasp. In order to provide access to this freedom to the user, a desired normal contact force for each contact, f d[i] ∈ R, is considered in the force distribution: [i] [i] f [i] d = fd n

(5.3)

Additionally, the user may specify a minimal and maximal force in the normal direction:

130

5 Impedance-Based Object Control

(b)

(a)

Fig. 5.5 Illustration of the surface friction constraint. If the applied contact force lies outside of the friction cone, the finger starts to slide on the surface of the object. The opening angle of the cone is defined by the friction coefficient, μ, which depends on the interacting materials. a If [1] [1] [1] [1] f int, ≤ μ f int,⊥ , the finger sticks on the surface of the object. b If f int, > μ f int,⊥ , the finger starts to slid on the surface Fig. 5.6 The force distribution problem involves finding a set of internal forces, f int , which are as close as possible to the user-defined forces, f d , while balancing dynamic loads on the object, w dyn , and considering the friction constraints

[i] f int,⊥ ≥ f min

(5.4)

[i] f int,⊥

(5.5)

≤ f max

Figure 5.6 summarizes the relevant quantities for the distribution of the internal forces. In addition to the balancing of the forces of a static grasp configuration, the controller shall support the reconfiguration of the grasp, thereby making it applicable in advanced in-hand manipulation scenarios. For instance, during the grasp acquisition, contacts between the fingers and the object are successively established, repeatedly changing the configuration. And, in finger gaiting scenarios, objects are actively reconfigured, which involves the removal, relocation and subsequent reintroduction of contacts in the grasp configuration. The controller has to be able to gradually redistribute the internal forces between the fingers, as well as provide all the necessary interfaces, to enable these operations.

5.2 Controller Design

131

The determined contact forces for the object positioning and balancing have to be generated by the joints of the fingers. The actuator torques, τ cmd ∈ Rm , which result from this mapping, represent the output of the controller. They are commanded to the underlying joint torque controller. For the robotic hand of the humanoid David, this is realized by the backstepping controller, which was mentioned in Chap. 2 [1].

5.2 Controller Design The realization of an in-hand object controller, according to the problem statement in the previous section, offers several design choices. Principally, the type of controller has to be selected. The proposed framework is based on an impedance controller. In the first part of this section, the reasoning behind this decision is explained, as well as how the impedance behavior is realized. Subsequently, the solution to the force distribution problem is discussed. In the past, different approaches to this problem have been put forward. However, the availability of the proposed in-hand localization method allows for the development of a more general method. Finally, an overview of the complete control architecture is presented, the specifics of which will be explored further in the following sections.

5.2.1 Object Impedance The basis for the proposed controller is an impedance-based design. It allows the realization of a robust control behavior, even when interacting with the environment. Moreover, this compliant approach for the positioning of the object relies less on the accuracy of the force measurements or perfect knowledge of the contact state, compared to pure force control. Instead, the error in the object pose is related to the desired contact forces or joint torques. The choice of an impedance controller also builds on the design principle of David, which is a mechanically complaint humanoid. Both the controllers for the positioning of the hand and the fingers are impedance-based. The key concept behind impedance control is to relate a position error to a force. In the case of an in-hand object controller, the input to the algorithm is the desired object pose, x des , while the output is the vector of the desired joint torques, τ cmd , which is commanded to the joint controller. As was explained in Chap. 3, the twist of the object can be related to velocities of the contact points, using the grasp matrix G, which in turn can be mapped to the joint velocities, utilizing the hand Jacobian matrix J. Similarly, J relates the joint torques to the contact forces, and G the contact forces to the object wrench. Given this three-layered design of the grasp kinematics and dynamics, the impedance behavior may be realized on either of these levels. Figure 5.7 illustrates the structure of the three different variants.

132

5 Impedance-Based Object Control

(a)

(b)

(c) Fig. 5.7 Generating the impedance behavior involves relating a pose error to a force quantity. For the in-hand object positioning, this can be realized on three different levels, entailing dissimilar considerations. a Object-level: The impedance behavior is realized by relating the desired object displacement to an object wrench, which is mapped to a set of contact forces, using the inverse of the grasp matrix, and subsequently to the joint torques. b Contact-level: The object displacement is first related to corresponding changes in the positions of the contact points. Subsequently, the contact forces from the impedance law are mapped to the joint torques. c Joint-level: Here, the compliant behavior is generated for a desired joint displacement, which is obtained from the change in the contact positions, using the inverse of the hand Jacobian matrix

Object-Level: Here, the impedance is formulated between the object pose and an object wrench wdes . Subsequently, the desired wrench is mapped to the contact forces, which are itself related to the joint torques: wdes = I x (W −1 (x des − x t )) f des = G + wdes

(5.6) (5.7)

τ des = J T f des

(5.8)

The function I x ∈ R6 denotes the spatial impedance law for the object pose. While the object is impedance controlled in this approach, the individual fingers are purely force controlled. This means, if a finger loses contact, it would continue to move in an unbounded way. Moreover, this approach has a high demand on the accuracy of the applied forces, which have to generate the desired wrench, while at same time avoid the slipping of contacts. Finally, the inversion of the grasp matrix may cause an unfavorable conditioning of the contact forces, because of the scaling between translational and rotational DoF.

5.2 Controller Design

133

Contact-Level: In this approach, the desired object displacement is first related to a motion of the contact points. The impedance behavior is realized between the desired contact positions and the contact forces, which are subsequently mapped to the joint torques: cdes = G T W −1 (x des − x t )) f des = I c (cdes ) τ des = J T f des

(5.9) (5.10) (5.11)

Here, I c ∈ R3n denotes the function, which relates the contact displacements to the desired contact forces. This approach enables the robust control of each finger in contact with the object. Moreover, even when a finger loses contact, bounding cdes allows to prevent the uncontrolled drift of the respective joints. Compared to the object-level approach, controlling the contact positions also does not introduce the same scaling problems, since only translational DoF have to be considered.

Joint-Level: The third approach goes yet another step further, relating the displacement of the contact points to the position of the joints, before realizing the joint impedance behavior: cdes = G T W −1 (x des − x t ))

(5.12)

q des = J + cdes τ des = I q (q des )

(5.13) (5.14)

Here, I q denotes the joint impedance law. This approach relies on the inversion of the hand Jacobian matrix. If a finger is close to a singular configuration, this will result in very large, undesirable values. Additionally, the conditioning of the matrix inverse favors contact displacement, which require small joint motions. This behavior would significantly impair the positioning of the fingers and thus the object. Following these consideration, the contact-level design was chosen for the realization of the impedance behavior. In addition to enabling the compliant positioning of the object, this approach allows to actively maintain the initial grasp configuration. Thereby, the impedance controller helps to avoid the slipping of contacts and to ensure the stability of the fingers, even in the event of loss of contact.

134

5 Impedance-Based Object Control

5.2.2 Force Distribution The second major component of the controller design is the force distribution for the balancing of the object. Here, the goal is to find a set of internal forces, which compensate the dynamical object loads and ensure that the fingers remain in contact with the object. This problem has been the subject of a number of publications in the last decades. In the context of object impedance control, [5] presented a comparison of three popular controllers, which mainly differ in their force distribution approaches. In [2], the concept of the virtual linkage was introduced. Here, an attracting force between each pair of contact points is applied, similar to a set of virtual springs between the fingers. In contrast, the dynamic intrinsically passive controller (IPC), which was presented in [3], introduced the idea of a virtual object, which connects to the contact points via a set of coupling springs. This formulation allows for an intuitive physical interpretation and parameterization of the applied internal forces. However, the design does not intrinsically guarantee the balancing of the object. Finally, [5] proposed a static formulation of an IPC, where the fingers connect to a virtual grasp center, which is computed from the fingertip positions. It represents a simplified, more robust variant of the previous approach. The structure of the different spring designs are illustrated in Fig. 5.8. The main drawback of each of these approaches is the neglect of explicit force constraints. Neither of these methods ensures that the applied contact forces are contained within their respective friction cones. In addition, the design of each controller entails specific trade-offs, which are undesirable. The utilization of the virtual linkage model does not allow to specify the desired contact forces for each finger. The resulting internal forces from the dynamic ICP may generate an unintended object wrench. And, the applicability of the static ICP is limited to convex objects. Fur-

(a)

(b)

(c)

Fig. 5.8 A number of different approaches for the force distribution of an object impedance controller have been proposed in literature. Their behavior can be illustrated as a set of virtual springs, which generate the internal forces on the object. a In the virtual linkage model, attracting forces are generated between each pair of fingertips [2]. b For the dynamic IPC controller, the springs connect to a virtual object [3]. c The static IPC represents a simplification of the dynamic variant, in which the contact forces are directed towards a virtual grasp center [4].

5.2 Controller Design

135

thermore, each of these methods assumes a static grasp configuration, i.e. that the positions of the contacts on the object and the finger links do not change. Many of these shortcoming arise from the fact that the previous in-hand controllers did not integrate any means of estimating the grasp state. Therefore, they did not have access to the real-time pose of the object or the positions and normal directions of the contact points. The introduction of the proposed in-hand localization method enabled the development of a more general force distribution approach, which is guaranteed to balance the object by compensating the dynamical loads on the object: wdyn = G f int

(5.15)

At the same time, friction constraints, as well as upper and lower bounds on the contact forces are explicitly considered, as expressed in Eqs. (5.2), (5.4) and (5.5). The consideration of the desired normal forces allows to regulate the grasp forces in an intuitive way. The internal forces are chosen to be as close as possible to the desired forces, while still respecting all the other constraints:   min  f int − f d  f int

(5.16)

As shown in [6], determining the force distribution involves solving a quadratic optimization problem. This will be further explored in Sect. 5.4.

5.2.3 Architecture Overview The architecture of the complete control framework is depicted in Fig. 5.9. The inputs to the algorithm are two-fold. On the one hand, there are the values from the grasp state estimation, which include the object pose, contact positions and normals, and the corrected joint positions. On the other hand, there are the desired values, which are specified by the user, such as the desired object pose and grasp forces. Principally, these inputs are processed by the two main components of the algorithm, which have been discussed so far. The object impedance controller generates a set of contact forces, which move the object towards the desired pose. At the same time, it is maintaining the initial grasp configuration, which is sampled when a contact is established. The second component deals with the distribution of the internal object forces. They are calculated to balance the dynamical object loads, which are derived from the estimated object pose and inertia parameters. The sum of the forces, which are produced by these two components, are subsequently mapped to the desired joint torques. Additionally, the joints have to compensate any dynamical loads on the fingers. The output of a nullspace controller constitutes the final torque component. Its purpose is to avoid joint limits and singularities by taking advantage of the nullspace in the mapping between the contact forces and the joint torques.

136

5 Impedance-Based Object Control

Fig. 5.9 Block diagram of the proposed control architecture

Finally, the sum of all torque components is commanded to the joint controller. Using the current joint torque measurements, it generates the appropriate motor commands. The following three sections will describe the specifics of the outlined architecture. Initially, the grasp configuration is considered to be static, i.e. no contacts are added of removed from the grasp. However, this restriction will be removed in Sect. 5.6, which extends the proposed framework to support the reconfiguration of the grasp.

5.3 Object Impedance Control The previous section identified a contact-level impedance law as the preferable design for an in-hand object controller. Subsequently, the concrete realization of such an algorithm shall be explored here. Additionally, an extension to the controller is proposed, which actively maintains the initial grasp configuration, in order to reduce the effects of sliding and rolling of the fingers on the object.

5.3.1 Object Positioning Starting from the current pose of the object, x t , the goal of the impedance controller is to generate a set of contact forces, f x ∈ R3n , which move the object towards the desired pose, x des . As was explained in Sect. 5.2, instead of generating the impedance behavior on object-level, the desired positions of the contact points shall be controlled.

5.3 Object Impedance Control

137

Fig. 5.10 The desired object positioning is realized by generating an impedance for the corresponding contact displacements, cx

Based on the difference between the current and desired pose of the object, a corresponding motion of the contact positions can be approximated, using the grasp matrix, G: cx = G T W −1 x t

(5.17)

where cx ∈ R3n denotes the displacement of the contacts and x t the remaining error between the desired and current object pose: x t = x des − x t

(5.18)

Assuming fixed positions of the contacts on the object, moving them by cx will result in the desired object displacement. The corresponding motion of the fingers is realized by formulating a quasi-static spring-damper system, which generates the impedance behavior: f x = K x cx + Dx ˙cx

(5.19)

Here, K x ∈ R and Dx ∈ R denote the scalar stiffness and damping coefficients, which are chosen to realize the desired compliance. The time-derivative of the contact displacement, c˙ x , may be calculated from the joint velocities or numerically approximated. The spring-damper system is illustrated in Fig. 5.10. The mapping of the contact forces to the joint torque commands will be covered in Sect. 5.5. It should be noted that the positions of the contact points on the object are not fixed. Moving the object will cause them to roll or slide on the surface. This may be even exacerbated by the positioning controller. Disturbances, such as joint friction, cause the physically applied contact forces to be different from the commanded values of f x . This may cause some fingers to stop, while others continue to move. Subsequently, the resulting displacement of the fingers would not generate the desired

138

5 Impedance-Based Object Control

object motion. In turn, contact displacements would be continuously commanded, causing fingers to move boundlessly.

5.3.2 Maintaining the Grasp Configuration In order to prevent the unconstrained sliding of the fingers, the object controller shall be extended to actively maintain the initial grasp configuration. Here, this configuration is defined as the positions of the contact points w.r.t. the object frame, {O}, at the beginning of the in-hand manipulation task, at t = 0, and shall be denoted o cinit ∈ R3n : o

cinit = o c0

(5.20)

Given the current pose of the object, the position of a contact in the initial configuration at time t w.r.t. {I } can be expressed: 

[i] cinit,t 1

 = T o,t

o

[i] cinit 1

 (5.21)

The goal is to command a correction, which compensates the difference between the initial and current contact configuration, cinit,t − ct . Figure 5.11 illustrates the problem. However, the desired contact position cannot be directly commanded, since it would also affect the positioning of the object.

Fig. 5.11 An additive impedance force between the current contact location, c[1] , and its position in the initial [1] configuration, cinit , counteracts the sliding of the finger

5.4 Internal Forces

139

Chapter 3 introduced the nullspace of the grasp matrix as a projection, which removes the effect of the contact motion on the object pose: c˙˜ null = (I − G T G T + )˙cnull

(5.22)

where c˙˜ null denotes a set of contact velocities, which does not generate an object motion. This relation can be exploited for the maintenance of the initial grasp configuration. Utilizing Eq. (5.22), the error in the contact configuration is projected through the nullspace: c f = (I − G T G T + )(cinit,t − ct )

(5.23)

The resulting contact displacement, c f , corrects the contact positions without affecting the pose of the object. It is combined with the change in contact positions from Eq. (5.17), which generates the desired object motion: c = cx + c f

(5.24)

Consequently, the impedance law in Eq. (5.19) is modified to realize the compliant behavior for the combined contact displacement: f x = K x c + Dx ˙c

(5.25)

5.4 Internal Forces In order to maintain a stable grasp of the object, the fingers have to apply forces through the contacts. To avoid generating an unintended wrench on the object, the distribution of these internal forces has to be chosen to balance all additional loads. This includes any dynamical loads, such as the weight of the object, which have to be compensated by the contact forces. The organization of this section is as follows. First, the mathematical formulation of the corresponding problem is described. Second, the solution to this problem, using quadratic optimization techniques, is discussed. Finally, the proposed method is modified to enable the generation of a desired object wrench, which further extends the capabilities of the controller.

140

5 Impedance-Based Object Control

5.4.1 Force Distribution The force distribution problem can be framed as finding a set of contact forces, which compensate the loads on the object, while considering a number of additional constraints. Chapter 3 introduced the subspace formulation of the contact forces, which separates them into two components: f c = G + wc + (I − G + G) f c,null

(5.26)

The first component generates the desired wrench, wc , while the second component removes the wrench-generating portion of the desired internal forces, f c,null . Utilizing Eq. (5.26), the force distribution problem can be formulated as determining a vector of contact forces, f ∗ , such that the resulting internal forces, f int , consider all relevant constraints: f int = G + wdyn + (I − G + G) f ∗

(5.27)

The desired object wrench can be chosen to be equivalent to the known loads on the object, thereby compensating them. The dynamical loads on the object, wdyn , are obtained from its rigid body dynamics, as explained in Chap. 3: wdyn = M o (x)ν˙ + bo (x, ν) + w g

(5.28)

The internal forces are subject to a number of constraints, as outlined in Sect. 5.2. First, the allowed range for the normal contact forces may be restricted by the user: [i] f min ≤ f int,⊥ ≤ f max

(5.29)

Second, the contact forces have to lie inside of the friction cone, in order to avoid the slipping of the fingers on the surface of the object: [i] [i] f int, ≤ μ f int,⊥

(5.30)

Finally, the user may specify the desired contact force for each contact, f [i] d , giving them control over the total grasp force and the contribution of each finger. Therefore, the problem can be summarized as finding a vector f ∗ , such that:   min  f ∗ − f d  f∗

(5.31)

while considering the constraints in Eqs. (5.29) and (5.30). The minimization in Eq. (5.31), subject to inequality constraints, is equivalent to a quadratic programming problem of the following form:

5.4 Internal Forces

141

Ax + x T e

(5.32)

s.t. bl ≤ Bx ≤ bu

(5.33)

min x

1 T x 2

For the force distribution to be obtained using quadratic programming techniques, the problem has to be structured according to this formulation: min ∗ f

1 2

f ∗T A f ∗T + f ∗T e

(5.34)

A = I 3n×3n

(5.35)

e = − fd

(5.36)

with:

This expression is equivalent to Eq. (5.31). The constraints in Eqs. (5.29) and (5.30) are described w.r.t. the final vector of the internal forces, f int . Therefore, they have to be reformulated in order to separate the elements that depend on f ∗ , the argument of the optimization. [i] For the force range constraint in Eq. (5.29), f int,⊥ can be substituted by the vector [i] product of the normal direction, n[i] , and the internal force vector, f int : [i] f min ≤ n[i]T f int ≤ f max

(5.37)

Expressing the n force range constraints for all contacts yields: f min 1n×1 ≤ N f int ≤ f max 1n×1

(5.38)

with N ∈ Rn×3n being a matrix of normal vectors, which have been arranged as follows: ⎛

[1]T 1×3 nint 0 ⎜ 01×3 n[2]T int ⎜ N=⎜ . .. . ⎝ . . 01×3 01×3

⎞ · · · 01×3 · · · 01×3 ⎟ ⎟ .. ⎟ .. . . ⎠ [n]T · · · nint

(5.39)

Substituting f int with its components, using Eq. (5.27), yields: f min 1n×1 ≤ N(G + wdyn + (I − G + G ) f ∗ ) ≤ f max 1n×1

(5.40)

Subtracting the wrench-generating component allows to separate the part that depends on f ∗ and obtain the desired formulation of the constraints according to Eq. (5.33):

142

5 Impedance-Based Object Control

f min 1n×1 − N G + w dyn ≤ N(I − G + G ) f ∗ ≤ f max 1n×1 − N G + wdyn

(5.41)

The corresponding parameters of the quadratic programming problem can be extracted as: bl,range = f min 1n×1 − N G + wdyn bu,range = f max 1

n×1

+

− N G wdyn +

Brange = N(I − G G)

(5.42) (5.43) (5.44)

with bl,range ∈ Rn and bu,range ∈ Rn being the lower and upper bound of the force range constraint, and Brange ∈ Rn×3n being the corresponding mapping matrix. Similarly, the optimization parameters for the friction constraint in Eq. (5.30) can be derived. However, the presented formulation of the quadratic optimization problem only considers linear inequality constraints. Yet, determining the tangential [i] , involves a quadratic calculation. Therefore, the component of a force vector, f int, friction cone shall be approximated by a pyramid, as illustrated in Fig. 5.12. In order to reduced the resulting approximation error, a higher-order polyhedron could be utilized as well. Using two arbitrary tangential direction vectors, a [i] ∈ R3 and b[i] ∈ R3 , which are perpendicular and of unit norm, the corresponding force components are obtained: [i] [i] f int,a = a[i]T f int

(5.45)

[i] f int,b

(5.46)

[i]T

=b

[i] f int

Subsequently, the friction constraint can be expressed in terms of these components:

Fig. 5.12 The friction cone is approximated by a pyramid to allow for a linear formulation of the friction constraint. It is described by the contact normal, n, as well as the vectors a and b, which are two arbitrary, perpendicular vectors that are tangential to the surface at c. f a , f b and f ⊥ denote the corresponding components of a force vector f

5.4 Internal Forces

143 [i] [i] f int,a ≤ μ f int,⊥ [i] f int,a [i] f int,b [i] f int,b

≥ ≤ ≥

[i] −μ f int,⊥ [i] μ f int,⊥ [i] −μ f int,⊥

(5.47) (5.48) (5.49) (5.50)

Utilizing the tangential vectors, as well as the normal vector, the constraints can be reformulated as follows: [i] [i] 0 ≤ μn[i]T f int − a[i]T f int

0≥ 0≤ 0≥

[i] f int μn [i] μn[i]T f int [i] μn[i]T f int [i]T

+ − +

[i] a f int [i] b[i]T f int [i] b[i]T f int [i]T

(5.51) (5.52) (5.53) (5.54)

Combining the four constraints yields a simplified expression: [i] 04×1 ≤ M [i] f int

(5.55)

with matrix M [i] ∈ R4×3 being:

M [i]

⎛ [i]T μn ⎜μn[i]T =⎜ ⎝μn[i]T μn[i]T

⎞ − a[i]T + a[i]T ⎟ ⎟ − b[i]T ⎠ + b[i]T

(5.56)

Formulated for all n contacts and substituting f int with Eq. (5.27) yields: 04n×1 ≤ M(G + wdyn + (I − G + G) f ∗ )

(5.57)

with M ∈ R4n×3n being a matrix that contains all M [i] : ⎛

M [1] 04×3 ⎜ 04×3 M [2] ⎜ M=⎜ . .. ⎝ .. . 04×3 04×3

⎞ · · · 04×3 · · · 04×3 ⎟ ⎟ .. ⎟ .. . . ⎠ · · · M [n]

(5.58)

Subtracting the left-hand side force component yields the final formulation of the friction constraints: −M G + w dyn ≤ M(I − G + G) f ∗

(5.59)

144

5 Impedance-Based Object Control

which allows to extract the lower bound, bl, f riction ∈ R4n , and mapping matrix, B f riction ∈ R4n×3n , of the corresponding optimization problem: bl, f riction = −M G + wdyn +

B f riction = M(I − G G)

(5.60) (5.61)

If required, for the upper bound of the constraints, bu, f riction ∈ R4n , any large number may be chosen. Finally, the parameters of the two types of constraints can be stacked in order to obtain the complete bound vectors, bl ∈ R5n and bu ∈ R5n , and mapping matrix, B ∈ R5n×3n :   bl,range (5.62) bl = bl, f riction   bu,range (5.63) bu = bu, f riction   Brange B= (5.64) B f riction

5.4.2 Quadratic Optimization Quadratic programming problems arise in many practical applications. Therefore, a range of solution algorithms have been developed in the past [7]. Typically, most of these approaches can be categorized as either interior-point [8] or active-set methods [9]. A number of unrelated approaches, such as augmented Lagrangian or gradient methods, have been proposed as well, but are more limited in their applicability. Originally developed for linear optimization problems, interior-point methods allow to solve a convex programming problem by transforming it into a linear optimization over a complex set. Inequality constraints are replaced by weighted barrier functions, which penalize the possible violation of a constraint. Subsequently, Newton’s method is used to solve the resulting optimization problem, subject to equality constraints. In contrast, active-set methods solve equality optimization problems, where the constraints are defined by an active set of the inequality constraints of the original problem. Iteratively, the active set is updated until an optimal solution to the inequality problem is found. Active-set method benefit from a good starting point to the optimization. The availability of the solution to a related problem, such as from the previous time step, allows to significantly reduce the number of the necessary iterations.

5.4 Internal Forces

145

In order to solve the force distribution problem in the context of this work, a parametric active-set method was employed. It efficiently determines an optimal solution along a linear homotopy between the stated problem and a related problem, for which a solution is available. The details of this algorithm were first reported in [10]. Moreover, an open-source implementation of this method, under the name qpOASES [7], is available, which was utilized for the development of the proposed object controller.

5.4.3 Extensions Desired Object Wrench Desired Object Wrench The primary purpose of the proposed controller, as it has been discussed so far, is the positioning of the object inside of the hand. However, the presented force distribution approach allows to extend the capabilities of the system. Eq. (5.27) expressed how the internal forces are chosen in order to compensate the dynamical loads. However, the same mechanism may be used to generate a desired object wrench. In many applications it is necessary to apply a force to the environment, which is enabled by this extension. Therefore, Eq. (5.27) is reformulated to include the desired external object wrench, wext : f int = G T + (wdyn + wext ) + (I − G T + G T ) f ∗

(5.65)

One possible application is the writing with a pen on a piece of paper. As illustrated in Fig. 5.13, it involves applying a force on the tip of the pen, perpendicular to the surface. Generalized, this scenarios can be described as generating a desired force, f ext ∈ R3 , at a specified contact point on the object, cext ∈ R3 . Using cext , a partial

Fig. 5.13 The force distribution is adapted to generate the desired object force f ext on the tip of the pen at cext

146

5 Impedance-Based Object Control

Fig. 5.14 While the palm cannot actively apply a force on the object, including it in the calculation of the internal forces will redistribute them, such that a desired palm force (blue) is generated by the other fingers through the object

grasp matrix, G ext ∈ R6×3 , which relates the desired contact force to a corresponding object wrench, can be obtained: wext = G ext f ext

(5.66)

Thus, among other things, the proposed extension may be used for applications, such as the pen writing task.

Palm Contact In some applications, instead of applying a force to the environment, an additional force constraint inside of the hand may be required. In particular, pressing the object against the palm allows to increase the stability of the grasp. Moreover, in reconfiguration scenarios, including the palm in the grasp configuration may allow to remove fingers from the object, which would otherwise be required to balance the object. Applying a force through a palm contact could be realized using the interface of the desired object wrench, which was just introduced. However, a preferable approach is to include the desired force in the optimization of f ∗ . Thereby, the generated force will be subject to the friction and range constraints, which are considered for the other contact forces. Incorporating a palm contact in the force distribution can simply be realized by including it in the grasp configuration. By being contained in the vector of contact points and the user-specified contact forces, f d , the palm contact will be automatically considered in the optimization of the internal forces. While the palm itself is unable to apply any forces on the object, the remaining contact forces will be adjusted in order to generate the desired effect through the object. Figure 5.14 illustrates the difference in the force distribution before and after including a palm contact.

5.5 Torque Mapping

147

5.5 Torque Mapping The previous two sections focused on deriving the contact forces, which have to be applied to the object in order to position it and to maintain a balanced grasp. To generate these forces, they have to be related to the corresponding joint torques, which are commanded to the underlying torque controller. Following the formulation of the force mapping, this section additionally proposes a nullspace controller, which allows to avoid joint limits and singularities in the finger configuration. The use of the nullspace of the hand Jacobian matrix allows to incorporate this extension without affecting the desired contact forces.

5.5.1 Force Mapping Section 5.3 formulated the contact force vector, f x , which realizes the desired impedance behavior that moves the object to the desired pose, while maintaining the initial grasp configuration. Additionally, Sect. 5.4 derived a set of internal forces, f int , which balance the dynamical loads on the object and apply the user-specified grasp forces. These two components are combined to the complete vector of the desired contact forces, f c : f c = f x + f int

(5.67)

The contact forces can be related to the joint torques using the hand Jacobian matrix, J, as described in Chap. 3: τc = JT f c

(5.68)

In addition to generating the desired force on the object, the joint torques have to compensate any dynamical loads on the fingers. Using the dynamics equations for a kinematic chain, which were introduces in Chap. 3, the resulting torque vector can be expressed: ˙ + τ g (q) τ dyn = M f (q)q¨ + b f (q, q)

(5.69)

In practice, the dynamical loads on the joints are usually much smaller than the torques for the control of the object. This is a result of the low inertia of the fingers. In the case of the David hand, for example, the mass of each finger is less than 50 g. The torques from the two components are added to obtain the desired vector of the joint torques: τ cmd = τ c + τ dyn

(5.70)

148

5 Impedance-Based Object Control

Finally, τ cmd is commanded to the underlying joint torque controller. For the David hand, this is realized by the backstepping controller, which was mentioned in Chap. 2.

5.5.2 Nullspace Control For fingers with more than 3 DoF, there exists a nullspace in the mapping between the contact forces and the joint torques. Because of this, motions of the joints in this subspace will not be constrained by the control of the contact points on the tips of these fingers. Consequently, the joints may drift over time, reaching their mechanical limits or singular configurations. In turn, this would impair the manipulability of the object. However, by taking advantage of this nullspace, the unintended drift can be avoided without affecting the desired contact forces. As outlined in Chap. 3, the mapping of the joint torques can be separated into two subspaces: τ c = J T f c + (I − J T J T + )τ c,null

(5.71)

Projecting the added torque, τ c,null , through the nullspace of J T removes any effect on the desired contact forces. Similarly extending Eq. (5.70), the complete formulation of the commanded torques is obtained: τ cmd = J T f c + (I − J T J T + )τ null + τ dyn

(5.72)

The torque vector τ null can be freely chosen to realized the desired joint behavior in the nullspace. Here, it shall avoid the drift of the fingers into joint limits or singular configurations. The concrete design, which would realize such a behavior, depends on the kinematics of the robot. The fingers of the David hand, which were inspired by the kinematics of the human hand, each consist of four joints. For this manipulator, the third and fourth joint are most affected by the unconstrained drift, resulting in undesirable configurations, such as in Fig. 5.15a. In order to avoid this behavior, a nullspace torque is added, which couples the last two joints. The formulation of a spring-damper system between the two joint positions allows to maintain a preferable configuration, as illustrated in Fig. 5.15b. For a finger of index e, the desired behavior is realized by the following quasi-static impedance law: τ e,null = K null q e + Dnull q˙ e

(5.73)

with τ e,null ∈ R4 being the desired nullspace torque for this finger, and K null ∈ R and Dnull ∈ R being the stiffness and damping coefficients. The difference vector q e ∈ R4 is chosen to implement the desired coupling between the joints:

5.6 Grasp Reconfiguration

149

(a)

(b)

Fig. 5.15 Because of the nullspace in the mapping from the contact forces to the joint torques, the joint position may drift into mechanical limits or singularities over time. Controlling the nullspace motion allows to prevent this undesirable behavior. a The unconstrained nullspace motion moves the finger in an undesirable joint configuration. b The proposed nullspace controller couples the third and forth joint of the finger, thereby maintaining a preferable configuration



⎞ 0 ⎟ 1⎜ 0 ⎟ q e = ⎜ [4] [3] ⎠ ⎝ 2 qe − qe qe[3] − qe[4]

(5.74)

where qe[3] and qe[4] are the third and fourth joint of finger e. In effect, the nullspace controller regulates the two joints to maintain the same position.

5.6 Grasp Reconfiguration Up to this point, the proposed in-hand object controller only considered static grasp configurations, i.e. no contacts are added or removed during manipulation. While this restriction is acceptable for some tasks, it certainly limits the applicability of the method. First, this section will discuss modifications to the framework, which enable the reconfiguration of the grasp. One of the applications, which requires multiple redefinitions of the configuration, is the stabilization of the object during grasp acquisition. The specific considerations in the context of this problem will be addressed in the second part of this section.

150

5 Impedance-Based Object Control

5.6.1 Adding and Removing Contacts Changes to the grasp configuration during a manipulation task may have many causes. During the grasp acquisition, one finger after the other first comes in contact with the object. In contrast, when moving an object inside of the hand, a finger may unintentionally lose contact with the object because of rolling or sliding effects. Moreover, in finger gaiting scenarios, contacts are repeatably relocated on the surface of the object. As illustrated by these examples, changes to the grasp configuration may either be passively detected by the perceptions system or actively commanded, to reconfigure the object. In any case, the reconfiguration of the grasp has to be considered by the object controller in order to support these various applications. Primarily, changing the contact configuration will affect the dimensions of the respective vectors and matrices, such as the contact positions, c ∈ R3n , contact forces, f c ∈ R3n , grasp matrix, G ∈ R6×3n , and hand Jacobian matrix, J ∈ R3n×m , which each depend on the number of contacts, n. Additionally, when adding a contact, the [i] , has to be sampled (see Sect. 5.3). initial position of the contact point, o cinit More significantly, however, changes to the grasp configuration introduce a discrete modification in the optimization problem of the internal forces. Consequently, the output would instantaneously jump to a new solution, which would be propagated to the commanded torque values of all joints. In order to avoid this undesirable behavior, a modification to the formulation of the force distribution problem is introduced, which realizes a continuous contact transition. When adding a new contact of index i, instead of instantaneously demanding the desired contact force, f [i] d , in Eq. (5.27), it is scaled with the value of a scalar [i] activation function, aa,t ∈ R: [i] [i] f [i] d,t = aa,t f d

(5.75)

[i] The function aa,t implements a linear transition from 0 to 1 within a specified duration of ta :

[i] aa,t

⎧ 0 t ≤ ta ⎪ ⎨t −t a ta < t ≤ ta + ta = ⎪ ⎩ ta 1 t > ta + ta

(5.76)

with ta being the time, at which the contact is added to the configuration. Figure 5.16a depicts the transition function. This formulation ensures that a new contact is gradually introduced in the optimization of the internal forces. Consequently, the contact forces of all n fingers will be continuously adjusted to account for the modification.

5.6 Grasp Reconfiguration

151

(b)

(a)

Fig. 5.16 The inclusion of activation functions, aa,t and ar,t , allows to gradually add and remove contacts, thereby avoiding jumps in the commanded joint torques. a The activation function for adding a contact to the configuration. b The activation function for removing a contact from the configuration

Similarly, when removing a contact from the grasp, sudden changes in the com[i] ∈ R realizes a gradmanded forces shall be avoided. Here, the activation value ar,t ual reduction of the desired contact force. The corresponding function is shown in Fig. 5.16b and can be expressed as follows:

[i] ar,t

⎧ 1 t ≤ tr ⎪ ⎨ t − tr tr < t ≤ tr + tr = 1− ⎪ tr ⎩ 0 t > tr + tr

(5.77)

Here, tr is the time, at which the contact is removed, and tr is the specified duration of the transition.

5.6.2 Grasp Acquisition During the grasp acquisition, the fingers are moved towards the object, successively establishing contacts with it. In the experimental validation of Chap. 4, objects were usually grasped using a joint impedance controller, which moves the fingers compliantly towards a desired configuration or until the measured torques exceed a specified threshold. As illustrated in the experiments, these grasps typically caused unintended object displacements, which would subsequently need to be corrected. However, instead of correcting an incorrect object pose after grasping, the proposed control framework allows to stabilize the object during the acquisition of the grasp, thereby preventing any significant object displacements. When closing the hand, the object controller is activated as soon as three contacts between the fingers and the object have been detected. In this moment, the corresponding fingers stop

152

5 Impedance-Based Object Control

[ j]

[ j]

Fig. 5.17 When switching controllers, the command joint torque, τcmd,t , transitions from τq , the

[ j] output of the joint-level controller, to τo , the output of the object-level controller, within a specified

duration of to

being controlled on a joint level. Instead, they are being commanded by the object controller. However, in order to avoid any abrupt jumps in the motion of the joints, the commanded joint torques are continuously transitioned:

[ j]

τcmd,t

⎧ [ j] ⎪ τ t ≤ to ⎪ ⎨ q t − to [ j] [ j] [ j] (τq − τo ) to < t ≤ to + to = τq − ⎪ to ⎪ ⎩ [ j] τo t > to + to

(5.78)

[ j]

Here, τq denotes the commanded torque from the joint-level grasping controller [ j] and τo the desired torque from the proposed object-level impedance controller. The duration to describes the specified transition time, while to represents the moment in time, at which the third contact with the object is detected, thereby allowing for the activation of the object controller. Figure 5.17 illustrates the torque transition. [i] Subsequently, the incorporation of the activation function, aa,t , allows to extend the grasp configuration as soon as additional contacts are detected. Whenever a new contact is introduced, the object controller gradually assumes control of the corresponding joints as well. If the goal is to stabilize the object during the grasp, e.g. to avoid the tilt of a glass of water, the initial object pose would be commanded to the object impedance controller: x des = x 0

(5.79)

Besides the object controller, the realization of this capability also heavily relies on the proposed grasp state estimation method, which provides the current object pose and informs about changes in the grasp configuration in real-time.

5.7 Enabling In-Hand Manipulation

153

5.7 Enabling In-Hand Manipulation The stated purpose of the proposed object controller is to move the object to a desired pose. However, because of physical limitations, the space of reachable poses is restricted. Reconfiguring the object through finger gaiting represents a means of extending the workspace. It involves the consecutive repositioning of the fingers and the object, according to a finger gaiting sequence. The first part of this section outlines the steps of such an advanced in-hand manipulation task and specifies the role of the object controller during the execution. An extension to the control framework, which is necessary to support the active reconfiguration of the grasp, is the ability to reposition a finger on the object. The corresponding relocation of the contact point is covered in the second part of this section.

5.7.1 Finger Gaiting Interface Kinematic constraints and mechanical limitations make it impossible to realize large displacements of a grasped object inside of the hand. For example, a human-inspired robotic manipulator, such as the David hand, is incapable of rotating an object 360◦ , while maintaining a static grasp configuration. When turning the object, eventually, the physical limits of one of the fingers will be reached. However, the repositioning of the finger on the surface of the object allows to move it away from this restriction. Subsequently, the object can be turned further. Figure 5.18 illustrates the process. The intended sequence of object displacements and finger relocations, which ultimately moves the object to the desired pose, is called the finger gaiting sequence. It is typically determined by a planning algorithm, which considers the hand kinematics, joint ranges, as well as contact constraints, in order to find an actionable series of operations. A number of such planners have been proposed in literature, such as in [11, 12] and [13]. The generated plan consists of a sequence of desired actions, which either move the object to a desired pose, x des,s , or relocate a contact point on the surface of the object to another position, o c[i] des,s . The subscript s denotes the index of the step in the sequence. The execution of the generated plan is the task of the control framework. Therefore, its interface has to provide two capabilities: 1. Object Positioning: The displacement of the object is the primary purpose of the proposed controller. Commanding the desired pose to the impedance controller will produce the corresponding motion of the fingers. In order to avoid a jump in the output of the controller, it is advisable to gradually transition the desired input: x des,t = x des,s−1 −

t − ts (x des,s−1 − x des,s ) ts

(5.80)

154

5 Impedance-Based Object Control

(a)

(b)

(c)

(d)

Fig. 5.18 Achieving large object displacements inside of the hand requires the reconfiguration of the object in order to deal with the physical limitations of the workspace of the fingers. a A circular object shall be continuously rotated around its symmetry axis. b Eventually, kinematic constraints will limit the range of motion. c By repositioning the fingers, the movability of the object is restored. d The object can be rotated further, until the next reconfiguration is required

5.7 Enabling In-Hand Manipulation

155

with ts being the specified transition time and ts being the start time of this motion. 2. Contact Point Relocation: The second task of the controller is to reposition the fingers on the object. The previous section introduced an extension to the framework, which allows to add and remove contacts from the grasp configuration. However, an additional capability is required, the ability to move a contact point from its current position, c[i] , to a desired position, c[i] des,s . This involves both the generation of a suitable trajectory and the execution of the motion. The design of such a method is presented in the second part of this section. An object control framework, which provides these capabilities, enables the execution of advanced in-hand manipulation tasks, according to a finger gaiting sequence. As before, the grasp state estimation represents the other critical element of such a system. The repeated repositioning and reconfiguration of the object inside of the hand is contingent on the continuous knowledge of the locations of the object and contact points.

5.7.2 Contact Point Relocation The reconfiguration of a finger in a precision grasp consists of three steps. First, the corresponding contact has to be removed from the grasp. Second, the finger has to be repositioned, relocating the contact to the desired position on the object. And third, the contact has to be reintroduced to the grasp by applying a force on the object. The first and third step involve the redistribution of the contact forces, as it was described in Sect. 5.6. In the following, the second step will be addressed.

Trajectory Generation The relocation of a contact point requires to move it from an initial position, c[i] 0 , to o [i] a desired position, cdes , which is described w.r.t. the object coordinate frame, {O}. Using the current estimate of the object pose, x t , the global position of the target point, c[i] des , can be obtained: 

c[i] des 1

 = T o,t

o

c[i] des 1

 (5.81)

with T o,t being the transformation notation of the object pose. When moving the finger to the desired point on the object, the corresponding contact position on the surface of the finger may change. However, assuming the same contact position, at the initial and target location, significantly simplifies the problem:

156

5 Impedance-Based Object Control

(b)

(a)

Fig. 5.19 When repositioning a finger, the contact point is moved along a two-part trajectory, which lifts the finger off of the object, before placing it again. a Illustration of the trajectory (red) of the contact point on the finger. b The desired contact position, cdes , t, transitions from the start point, c0 , through the intermediate point, cl , to the desired end point, cdes

f [i] c0

= f c[i] des

(5.82)

The resulting difference in the positioning of the finger causes only a small error in the position of the contact point on the object. Based on this assumption, the problem reduces to moving a fixed contact point on the surface of the finger to a desired position. However, the contact cannot be moved there in a straight line, since it would cause the corresponding finger to collide with the object. Instead, the contact shall be moved along a path, which lifts the finger off of the object, before approaching it again at the target location. Different trajectories could be chosen to realize the collision-free relocation of the contact. The proposed path consists of two linear segments that meet at an intermediate point, which is placed away from the object. Figure 5.19 illustrates the trajectory of the contact point. The location of the intermediate point, cl[i] , is calculated based on the positions and normal directions of the start and end points: cl[i] =

[i] n[i] + n[i] c[i] 0  des + c0 + dl  des  [i] [i]  2 + n ndes 0 

(5.83)

This equation describes a position, which lies between the two points, but is lifted in the mean direction of their normal vectors. The scalar dl ∈ R specifies how far the finger should separate from the object during the relocation. Using the intermediate point, the trajectory for the desired contact position, c[i] des,t , can be defined:

5.8 Experimental Validation

c[i] des,t

157

⎧ [i] c0 t ⎪ ⎪ ⎪ t − tl [i] ⎪ [i] [i] ⎪ (c − cl ) tl ⎨ c0 − 0.5tl 0 = t − tl − 0.5tl [i] ⎪ ⎪ cl[i] − (cl − c[i] ⎪ des ) tl ⎪ 0.5tl ⎪ ⎩ [i] cdes t

≤ ts < t ≤ tl + 21 tl + 21 tl < t ≤ tl + tl

(5.84)

> tl + tl

with tl being the start time of the motion and tl the specified time duration of the relocation.

Cartesian Finger Control The execution of the desired contact point trajectory requires a controller, which generates the corresponding joint torques. The purpose of the controller is to move [i] a contact point from its current position at c[i] t to the desired position cdes,t . In correspondence with the design of the object controller, a quasi-static impedance controller shall realize the positioning of the finger: [i] f [i] c[i] des = K c cdes + Dc ˙ des

(5.85)

[i] [i] c[i] des = cdes,t − ct

(5.86)

with:

Subsequently, the desired contact force is mapped to the joint torques of the corresponding finger of index e, using the partial hand Jacobian matrix for contact i: τ e,cmd = J [i]T f [i] e des

(5.87)

Finally, the resulting torques are commanded to the joint torque controller. Similar to the torque mapping of the object controller, the relation in Eq. (5.87) may contain a non-trivial nullspace in J [i]T e , which would allow the finger to drift in undesirable configurations. However, using the same nullspace controller that was introduced in Sect. 5.5, this can be avoided.

5.8 Experimental Validation In this chapter, a novel method for the in-hand object control was proposed. The impedance-based controller enables the compliant positioning of a grasped object inside of the hand. Extensions to the framework further expanded its capabilities, making it also applicable for changing grasp configurations. As part of the exper-

158

5 Impedance-Based Object Control

imental validation that is presented in this section, the performance of the system is evaluated. The experiments included a number of manipulation scenarios, which assessed different aspects of the control architecture. As before, the DLR humanoid David was used as the robotic platform for the experiments. The object controller was computed at a rate of 500 Hz. The commanded torque values, which were produced by it, were executed by the pre-existing joint torque controller. It, itself, was running at a rate of 3 Khz. The in-hand localization method, which was presented in Chap. 4, was used to provide real-time information about the pose of the object and the contact configuration. It estimated the grasp state from joint position and torques measurements, as well as data from the visual object tracker. The validation consisted of three parts. In the first series of experiments, the tracking accuracy of the controller was evaluated. It compared the performance of the proposed method with the static IPC controller of [5], when manipulating different objects. The second part of the evaluation demonstrated the applicability of the system during the grasp acquisition. The repeated reconfiguration of the grasp allowed to stabilize objects as they were picked up, thereby reducing undesirable inhand motions. Finally, the third experiment consisted of a finger gaiting scenario. It involved the full revolution of a tennis ball inside of the robotic hand, which required a number reconfiguration operations.

5.8.1 Tracking Performance In order to evaluate the tracking performance of the framework, the controller was commanded to move objects to different desired poses, while holding them in static fingertip grasps. The first part of these experiments consisted of displacements in individual DoF of the objects. Specifically, the robot was tasked with moving an object 20 mm along or 45◦ around one of its axes. These displacements were achieved by commanding trajectories, which gradually transition to the desired change in the pose within one second. Three different objects were used in the evaluation of the tracking performance, each involving a different number of fingers in the grasp: a pentagon shape, held by three fingers; a tennis ball, held by four fingers; and a brush, held by five fingers. The initial grasps of the three objects are shown in Fig. 5.20. For these experiments, the proposed controller was compared to the static IPC (SIPC) that was proposed in [5]. As was illustrated in Fig. 5.8c in Sect. 5.2, the static IPC primarily differs in its approach to the distribution of the internal forces, which are directed towards a virtual grasp center. Since this method is not considering friction constraints on the contact points, fingers may slide on the surface of the object. Moreover, the object impedance component of the SIPC is not actively maintaining the initial grasp configuration to further avoid the slipping of contacts, as was proposed in this work in Sect. 5.3. The presented grasp state estimation was used with both controllers in order to provide the required object pose and contact locations.

5.8 Experimental Validation

(a)

159

(b)

(c)

Fig. 5.20 The objects and initial grasp configurations used in the evaluation of the tracking accuracy of the object controller. a A pentagon shape, grasped with three fingers. b A tennis ball, grasped with four fingers. c A brush, grasped with five fingers Table 5.1 Mean and standard deviation of the terminal absolute position and orientation errors of the tracking evaluation (sample size of N = 20) Object Pentagon (n = 3) Ball (n = 4) Brush (n = 5) Controller Ours SIPC Ours SIPC Ours SIPC Motion Pos [mm] Rot [deg] Motion Pos [mm] Rot [deg] Motion Pos [mm] Rot [deg]

20 mm in x 2.0 ± 0.8 2.5 ± 0.4 20 mm in y 2.2 ± 0.2 3.8 ± 0.7 20 mm in z 2.3 ± 0.3 5.4 ± 0.4

Motion Pos [mm] Rot [deg] Motion Pos [mm] Rot [deg] Motion Pos [mm] Rot [deg]

45◦ around φ 0.9 ± 0.1 2.0 ± 0.3 5.5 ± 0.6 6.2 ± 0.6 45◦ around θ 2.5 ± 1.0 2.3 ± 0.2 10.0 ± 2.0 12.0 ± 1.0 45◦ around ψ 0.8 ± 0.3 1.8 ± 0.4 4.7 ± 0.8 11.0 ± 4.0

Motion Pos [mm] Rot [deg]

Coupled motion 3.1 ± 0.1 – 8.1 ± 0.8 –

3.6 ± 0.4 6.1 ± 0.5

2.0 ± 0.3 3.8 ± 1.2

3.0 ± 1.0 3.1 ± 0.6

2.5 ± 0.4 2.2 ± ,0.2

4.0 ± 0.8 4.3 ± 1.0

4.0 ± 0.7 4.6 ± 1.7

3.0 ± 0.3 3.5 ± 0.4

3.5 ± 0.4 1.7 ± 0.8

3.1 ± 0.3 4.5 ± 0.5

5.2 ± 0.2 6.0 ± 0.5

2.0 ± 0.3 12.0 ± 1.0

1.0 ± 0.1 1.5 ± 0.2

2.1 ± 0.1 4.3 ± 0.5

1.6 ± 0.1 2.5 ± 0.2

3.8 ± 0.2 6.1 ± 0.2

0.6 ± 0.1 4.4 ± 0.9

1.8 ± 0.1 14.0 ± 1.0

1.5 ± 0.9 6.2 ± 0.6

2.0 ± 0.3 12.0 ± 1.0

1.7 ± 0.9 5.7 ± 2.1

3.6 ± 0.4 15.0 ± 2.0

2.2 ± 0.5 13.0 ± 1.0

1.8 ± 0.1 14.0 ± 1.0

0.9 ± 0.2 5.1 ± 0.4

3.4 ± 0.4 17.0 ± 2.0

1.3 ± 0.3 5.3 ± 0.2

2.1 ± 0.3 11.0 ± 3.0

2.6 ± 0.2 6.0 ± 1.1

– –

3.9 ± 1.0 3.4 ± 1.1

– –

160

5 Impedance-Based Object Control

The second part of the tracking evaluation consisted of a coupled, periodic motion. Here, the objects were translated along and rotated around the z-axis at the same time. The trajectories followed a periodic function with a frequency of approximately 0.6 Hz and lasted for 10 seconds. Each of the experiments was repeated 20 times. The results of the evaluation are summarized in Table 5.1. It shows the mean and standard deviation of the absolute terminal errors in the position and orientation of the object. Additionally, Figs. 5.21, 5.22 and 5.23 illustrate exemplary trials for the translation along z, the rotation around ψ and the coupled motion, for each object. The results demonstrate the ability of the proposed controller to follow a desired displacement, while stabilizing the remaining DoF. For the pentagon object, the positioning error rarely exceeded 3 mm in position or 6◦ in orientation, with the exception of the rotation around θ. The experiments with the other two objects, the tennis ball and the brush, produced comparable results, showing only a small decline in accuracy in connection with the growing number of contacts. When compared with the SIPC, the results illustrate a consistently better performance of the proposed controller. It proved more accurate in 31 out of 36 individual displacement experiments, representing 86% of the trials.

5.8.2 Stabilizing the Grasp Acquisition The previous experiment initiated from a state, in which the object was already in a stable fingertip grasp. However, for the object to reach this configuration, it first has to be picked up by the robotic hand. The grasp acquisition experiments of Chap. 4 illustrated, how an object may move unexpectedly during the execution. Before, there was no way to prevent this motion, since the fingers were individually commanded to move to a predefined position or to stop, when the measured torques exceeded a specific threshold. However, depending on the application, this unplanned displacement may be undesirable. For example, when picking up a full glass of water, tilting it during the grasp acquisition would cause some of its content to be spilled. This second set of experiments demonstrated the utilization of the proposed object control framework during the grasp acquisition as a means to reduced the unintended object motion. As outlined before, the object controller is applicable for grasps, which involve three of more contacts. Therefore, it can only be activated as soon as the third contact has been detected. Initially, the fingers are closed using the pre-existing joint impedance controller. Meanwhile, the current grasp state is estimated by the in-hand localization method. Once the third contact point has been inferred, the commanded joint torques are transitioned from the joint-level controller to the object-level controller, as described in Sect. 5.6. The object controller is tasked with maintaining the pose of the object, thereby compensating any undesirable displacement. The experiments involved two objects, a brush and a water bottle. Without any object control, the brush would rotate around the vertical axis ψ and move in the y direction, when being picked up. Figure 5.24a,b show the brush before and after

5.8 Experimental Validation

161

(a)

(b)

(c) Fig. 5.21 Pentagon: Change in the position and orientation of the object during exemplary displacements as part of the tracking evaluation. a Actual and desired object pose for a commanded displacement of 20 mm along the z-axis. b Actual and desired object pose for a commanded displacement of 45◦ around the z-axis. c Actual and desired object pose for a commanded coupled, periodic trajectory along and around the z-axis

162

5 Impedance-Based Object Control

(a)

(b)

(c) Fig. 5.22 Tennis ball: Change in the position and orientation of the object during exemplary displacements as part of the tracking evaluation. a Actual and desired object pose for a commanded displacement of 20 mm along the z-axis. b Actual and desired object pose for a commanded displacement of 45◦ around the z-axis. c Actual and desired object pose for a commanded coupled, periodic trajectory along and around the z-axis

5.8 Experimental Validation

163

(a)

(b)

(c) Fig. 5.23 Brush: Change in the position and orientation of the object during exemplary displacements as part of the tracking evaluation. a Actual and desired object pose for a commanded displacement of 20 mm along the z-axis. b Actual and desired object pose for a commanded displacement of 45◦ around the z-axis. c Actual and desired object pose for a commanded coupled, periodic trajectory along and around the z-axis

164

(a)

5 Impedance-Based Object Control

(b)

(c)

(d) Fig. 5.24 Brush: Illustration of the in-hand motion of the object during the grasp acquisition, both with and without the application of the object controller. a Initial pose of the object and hand before the grasp. b Terminal pose of the grasped object, if no object controller is used. c Terminal pose of the grasped object, when using the proposed object controller. d Change in position and orientation of the object during the grasp execution without any object control (black), as well as with the proposed controller being active (red). The dashed line marks the moment, at which the controller was activated, enabled by the detection of the third contact

being grasped. However, using the object controller allowed to reduce the unwanted displacement and return the brush to its initial pose, as illustrated in Fig. 5.24c. The water bottle in the second scenario would tilt during the grasp, as well as significantly change its position. The initial and grasped pose of the bottle, without using the object controller, are shown in Fig. 5.25a,b. In this experiment, the object controller was commanded to return the bottle to its upright orientation, as soon as it was activated (see Fig. 5.25c). However, kinematic constraints did not allow to also fully compensate the translational displacement of the object. Instead, the position

5.8 Experimental Validation

(a)

165

(b)

(c)

(d) Fig. 5.25 Water bottle: Illustration of the in-hand motion of the object during the grasp acquisition, both with and without the application of the object controller. a Initial pose of the object and hand before the grasp. b Terminal pose of the grasped object, if no object controller is used. c Terminal pose of the grasped object, when using the proposed object controller. d Change in position and orientation of the object during the grasp execution without any object control (black), as well as with the proposed controller (red). The dashed line marks the moment, at which the controller was activated, enabled by the detection of the third contact

of the bottle at the moment of the controller activation was set as the desired value. Thereby, any further change in the position of the object was prevented. Each grasp was repeated ten times, both with and without object control. The results of the experiments are presented in Table 5.2. It shows the terminal error at the end of the acquisition, as well as the maximum error that occurred during the execution. For both objects, the proposed controller is able to largely remove the terminal error, with the exception of the position of the water bottle, because of the kinematic limitations. At the same time, by activating the object controller as soon as possible, the maximum error is significantly reduced as well. The remaining error

166

5 Impedance-Based Object Control

(a)

(b)

(c)

(d)

(e)

(f)

Fig. 5.26 Illustration of the sequence of steps for the extended rotation of the grasped tennis ball. a Initial pose of the object and the fingers. b Grasp configuration after rotating the object 30◦ around ψ. c Removal of the thumb (green) from the grasp configuration. d Placement of the thumb at the new location. e Fully reconfigured grasp that allows to rotate the object further. f Object pose after the second rotation

was as small as 44–66% of the original value, across both objects. Figures 5.24d and 5.25d illustrate the process of the pose error during one of the trials of each object.

5.8 Experimental Validation

90

xc yc zc cdes

80

Position [mm]

167

70 60 50 40

0

0.2

0.4

0.6

Time [s]

0.8

1

Fig. 5.27 Actual and desired position of the thumb contact during the first relocation Table 5.2 Mean and standard deviation of the terminal and maximum errors in position and orientation of the grasp acquisition experiments (sample size of N = 10) Terminal error Maximum error Pos [mm] Rot [deg] Pos [mm] Rot [deg] Brush Without object control With object control Water bottle Without object control With object control

20.9 ± 2.1

32.7 ± 2.5

23.7 ± 4.4

33.0 ± 2.5

3.3 ± 1.1

4.8 ± 1.7

15.0 ± 3.8

15.4 ± 3.2

21.7 ± 2.4

37.1 ± 2.8

22.9 ± 1.7

37.3 ± 2.6

14.6 ± 2.6

7.6 ± 1.1

15.0 ± 2.6

16.4 ± 2.6

5.8.3 Finger Gaiting The final experiment demonstrated the applicability of the object control framework in a finger gaiting task. It involved the 360◦ rotation of a tennis ball inside of the hand. Since the fingers are not capable of realizing such a displacement in a static grasp configuration, the object had to be reconfigured several times. Specifically, in each iteration, the ball was rotated 30◦ around ψ, before the fingers were consecutively relocated on the object. This process was repeated 12 times in a matter of 60 seconds in order to achieve the full revolution. Initially, the ball was grasped by all five fingers, as shown in Fig. 5.261. Changes to the desired object pose were commanded similar to the evaluation of the tracking

168

5 Impedance-Based Object Control

performance, by gradually transitioning the input values. Figure 5.262 depicts the tennis ball after it has been rotated for the first time. Subsequently, the fingers were reconfigured, starting with the thumb. This involved first redistributing the internal forces, such that the finger could be removed from the grasp without loosing the stability of the object. Next, the finger could be moved to a new location. As presented in Sect. 5.7, this requires specifying a desired point on the surface of the object. For this experiment, the new contact location could simply be obtained by rotating its position vector by −30◦ around the z-axis of the object, thereby countering the previous object motion. This strategy is only applicable since the tennis ball is rationally symmetrical. The fingers were moved to their new positions, using the trajectory and Cartesian finger controller that were proposed in Sect. 5.7. Figure 5.27 shows the desired and actual displacements of the thumb during the first relocation. Finally, the relocated contact could be reintroduced in the force distribution. Consecutively, all five fingers were repositioned, as depicted in Fig. 5.26c,d. The repeated redistribution of the internal forces is illustrated in Fig. 5.28. It shows the magnitude of the commanded contact force of each finger over the course of a full reconfiguration. The placement of the index finger completed the operation, resulting in the configuration that is shown in Fig. 5.265. This created the conditions for the next iteration, starting with another rotation of the ball. The progress of the complete revolution is depicted in Fig. 5.29. It also illustrates how the other DoF remain stable, as the object is continuously rotated. During the entire operation their tracking error stayed below 5 mm in position and 3◦ in orientation. The terminal error around the rotation axis, ψ, was less than 2◦ .

Fig. 5.28 Redistribution of the absolute contact forces of the fingers during the consecutive reconfiguration of the tennis ball

5.9 Summary

169

Fig. 5.29 Change in position and orientation of the tennis ball during the full revolution around ψ

5.9 Summary This chapter introduced a novel controller for the compliant positioning of a grasped object using a torque-controlled robotic hand. Principally, realizing the desired control behavior involves the coordination of all fingers in contact with the object. At the same time, the distribution of the internal forces has to be determined, which balances the object and considers friction constraints on the contacts. Utilizing the grasp matrix, which was derived in Chap. 3, the desired object displacement is mapped to the corresponding motion of the contacts with the fingers. An impedance law relates the displacement of the contacts to contact forces, which are generated by the actuators of the fingers. An additional force component allows to actively maintain the initial grasp configuration, which avoids the slipping of contacts. By exploiting the nullspace of the grasp matrix, these additional forces have no effect on the positioning of the object. The distribution of the internal forces is chosen to balance any dynamical or external loads on the object. The explicit consideration of friction constraints on the object, as well as user-specified grasp forces, yielded a convex optimization problem, which was solved using quadratic programming techniques. The sum of the internal and object positioning forces is mapped to the corresponding joint torques, using the hand Jacobian matrix. The nullspace in this mapping is exploited by an additional controller, which prevents the fingers from drifting into mechanical limits or singular configurations. Finally, the combined joint torques are commanded to the robot. Following the design of a controller, which allows the positioning of an object in a static grasp configuration, the applicability of the framework was extended. The incorporation of activation functions in the distribution of the internal forces enabled the reconfiguration of the grasp, i.e. the addition and removal of contacts. This capability allows to apply the controller during the acquisition of the grasp, e.g. to stabilize the object. Moreover, the proposed control framework is able to support finger gaiting operations. When actively reconfiguring the grasp, a Cartesian finger controller is employed to realize the relocation of the fingers on the object.

170

(a)

5 Impedance-Based Object Control

(b)

Fig. 5.30 Applying the developed in-hand object controller to the stacking game allows to overcome the kinematic limitations of the robotic arm. a Before, kinematic constraints prevented the robot from stacking the game piece. b The ability to reorient the object inside of the hand allows to align it with the pins

The capabilities of the system were validated in a range of experiments. First, the tracking performance of the controller was evaluated. This involved the displacement of differently grasped objects in all DoF. The comparison with the static IPC that was proposed in [5] demonstrated the consistently better performance of the proposed framework. Next, the controlled stabilization of two objects during the grasp acquisition was tested. This task validated the ability of the method to repeatedly redistribute the forces of the grasp as the fingers consecutively came into contact with object. The third experiment, which involved a full revolution of a tennis ball inside of the hand, required the repeated reconfiguration of the grasp. Collectively, these experiments illustrated the broad applicability of the proposed system. Finally, as illustrated in Fig. 5.30, the developed control framework also allows to overcome the kinematic limitations, which prevented the placement of some of the pieces of the stacking game that was discussed before. Using the object controller, the objects can be reoriented, thereby aligning them with the corresponding pins of the wooden board. Combined with the grasp state estimation of the previous chapter, this enables the execution of this challenging dexterous manipulation task.

References 1. Maxime Chalon and Brigitte d’Andréa Novel. 2014. Backstepping experimentally applied to an antagonistically driven finger with flexible tendons. IFAC Proceedings Volumes 47 (3): 217–223. 2. David Williams and Oussama Khatib. The virtual linkage: A model for internal forces in multi-grasp manipulation. In 1993 IEEE International Conference on Robotics and Automation (ICRA), pages 1025–1030. IEEE, 1993.

References

171

3. Stefano Stramigioli and Vincent Duindam. Variable spatial springs for robot control applications. In 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1906–1911. IEEE, 2001. 4. Thomas Wimboeck, Christian Ott, and Gerhard Hirzinger. Passivity-based object-level impedance control for a multifingered hand. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4621–4627. IEEE, 2006. 5. Wimböck, Thomas, Christian Ott, Alin Albu-Schäffer, and Gerd Hirzinger. 2012. Comparison of object-level grasp controllers for dynamic dexterous manipulation. The International Journal of Robotics Research 31 (1): 3–23. 6. Martin Buss, Hideki Hashimoto, and John B. Moore. Dextrous hand grasping force optimization. IEEE Transactions on Robotics and Automation, 12 (3):406–418, 1996. 7. Hans Joachim Ferreau, Christian Kirches, Andreas Potschka, Hans Georg Bock, and Moritz Diehl. qpOASES: A parametric active-set algorithm for quadratic programming. Mathematical Programming Computation, 6 (4):327–363, 2014. 8. Narendra Karmarkar. A new polynomial-time algorithm for linear programming. In 16th ACM Symposium on Theory of Computing, pages 302–311, 1984. 9. George Bernard Dantzig. Linear programming and extensions, volume 48. Princeton University Press, 1998. 10. Hans Joachim Ferreau, Hans Georg Bock, and Moritz Diehl. An online active set strategy to overcome the limitations of explicit MPC. International Journal of Robust and Nonlinear Control, 18 (8):816–830, 2008. 11. Moëz Cherif and Kamal K. Gupta. Planning quasi-static fingertip manipulations for reconfiguring objects. IEEE Transactions on Robotics and Automation, 15 (5):837–848, 1999. 12. Jeffrey C. Trinkle and Jerry J. Hunter. A framework for planning dexterous manipulation. In 1991 IEEE International Conference on Robotics and Automation (ICRA), pages 1245–1246. IEEE Computer Society, 1991. 13. Jean-Philippe Saut, Anis Sahbani, Sahar El-Khoury, and Véronique Perdereau. Dexterous manipulation planning using probabilistic roadmaps in continuous grasp subspaces. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2907– 2912. IEEE, 2007.

Chapter 6

Conclusion

In this book, the main algorithmic components of a model-based dexterous manipulation framework were presented. Novel approaches for the grasp state estimation and in-hand object control were developed and validated in a range of real-world experiments. This chapter summarizes the content of this manuscript and, in particular, highlights the contributions of the proposed methods. Moreover, the capabilities and limitations of the developed system are discussed. The final section of this work provides an outlook on potential future work that has been enabled by the results of this book. This includes the formulation of open research questions in the context of dexterous manipulation, which emerged from insights of this work.

6.1 Summary and Discussion Dexterous manipulation is a fundamental human capability, allowing us to interact with and shape the world around us. For robots to move off the factory floor and into everyday human environments, they will require the same ability to manipulate objects with precision. These robots will need to operate in surroundings that have primarily been designed for humans. Moreover, they will be expected to interact with a wide range of objects and tools, which have not been adapted to them. Inspired by the human hand, there has been great progress in the past decades on developing robotic manipulators that are capable of dexterous manipulation. However, in the same way that humans had to develop the corresponding cognitive abilities to fully take advantage of their hands, robotic systems require a number of equivalent algorithmic skills. This book described the development of a framework, which enables the dexterous manipulation of objects with robotic hands. Chapter 1 introduced a stacking © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. Pfanne, In-Hand Object Localization and Control: Enabling Dexterous Manipulation with Robotic Hands, Springer Tracts in Advanced Robotics 149, https://doi.org/10.1007/978-3-031-06967-3_6

173

174

6 Conclusion

game, which is designed for small children to develop and test their fine motor skills. Achieving the same task with a robot required the integration of a set of algorithmic components. First, to be able to place a grasped object with precision, the in-hand pose of the object had to be determined. In order to obtain a high quality estimate of the grasp state, a range of different sensing modalities were fused in a common probabilistic framework. Second, since the kinematics of a robotic arm limit the range of reachable poses of the hand, the ability to move a grasped object inside of the hand was required. An in-hand object controller realized the compliant repositioning of the grasped object, while maintaining a stable grasp. Finally, the development of both the perception and control methods relied on a common model of the kinematic and dynamic behavior of the hand-object system. Combined, these components constitute an algorithmic framework, which enables the execution of advanced dexterous manipulation tasks. Providing the foundation for the development of the subsequent methods, Chap. 3 elaborated a mathematical model for the description of the grasp system. Principally, this involved the analysis of the kinematics and dynamics of the grasp, which allows to relate the different quantities of the object, fingers and contacts between them. These derivations identified the grasp matrix and hand Jacobian matrix as the most important tools for the mapping between these different components of the grasp. The formulation of the mathematical subspaces of both matrices illustrated how the nullspaces of these mappings can be exploited. Moreover, the hard finger model was selected as the most applicable description of the contact behavior. As presented, the grasp model is fully valid for precision grasps. However, for intermediate and power grasp, limitations of the rigid body assumption restrict the applicability of the model. In particular, if there are contacts with multiple phalanges of a finger, the grasp becomes hyperstatic. Consequently, the distribution of the internal forces of the grasp can no longer be fully described or controlled. Enabled by the grasp model, a novel grasp state estimation method was described in Chap. 4. The proposed perception algorithm realized the integration of proprioception, tactile sensing and vision into a common probabilistic framework. Its purpose is to determine a consistent estimate of the pose of the object, the location of the contact points, as well as errors in the joint position. An extend Kalman filter implementation allowed to explicitly consider uncertainties in the measurements. Executing a grasp causes the object to move inside of the hand. Significant displacements, which were not predicted, may cause subsequent tasks to fail, if not accounted for. Utilizing a geometric description of the hand and the object, the proposed system is able to correct the estimation of the object pose solely from joint position measurements, by resolving collisions between the object and the fingers at their assumed poses. The quality of the estimation was improved by the consideration of explicit contact measurements. Tactile sensing allows to detect whether a finger is in contact with the object. The developed method uses this information to align the estimated poses of the object and the fingers accordingly. Moreover, the inference of the contact configuration from joint torque measurements was described. The consideration of finger measurements improves the knowledge of the grasp state. However, the object pose is not fully constrained by these sensing modalities.

6.1 Summary and Discussion

(a)

175

(b)

Fig. 6.1 Besides the humanoid robot David, the developed in-hand localization method was integrated on two additional robotic systems at DLR. a The mobile robot EDAN and its under-actuated DLR HIT hand [1]. b The humanoid robot Rollin’ Justin, equipped with the DLR Hand II [2]

The incorporation of visual information allows to further enhance the quality of the estimation. The fusion with different types of visual data was developed in this work. First, the integration of artificial features from a fiducial marker was presented. Next, the development of a novel image processing method realized the extraction of natural object features. Finally, the loose coupling with the 6 DoF pose from a visual object tracker, which considers contour and depth information, was described. Furthermore, the incorporation of visual data from the hand and a target object enabled the consistent estimation of the complete hand-object-target system. The experiments with the DLR robot David evaluated the performance of the proposed in-hand localization method for various objects and grasps. They also illustrated the strengths and limitations of the different measurement modalities. Finger measurements allow to approximate the in-hand displacement of objects if they are largely constrained by the fingers, such as in a power grasp. However, in less constraining configurations, e.g. in a precision grasp, the fusion of visual data is necessary in order to obtain a good estimate of the object pose. This result was further confirmed by two pick-and-place tasks, the latter of which consisted of the stacking game. Reliably placing the pieces of the game was only possible by combining finger measurements and visual information. While the used model is only fully valid for precision grasps, the in-hand localization experiments demonstrated the applicability of the method for all grasp types, including power grasps. Moreover, beyond the implementation on David, which was

176

6 Conclusion

elaborated in this work, the proposed method was also successfully integrated on two additional robotic systems at DLR. First, the estimation framework was used with the EMG-controlled mobile robot EDAN, which includes a DLR HIT hand [1] (see Fig. 6.1a). Second, the system was implemented on the humanoid robot Rollin’ Justin, which utilizes two DLR Hands II to manipulate objects [2] (see Fig. 6.1b). The final component of the dexterous manipulation framework, the impedancebased in-hand object controller, was described in Chap. 5. In simple pick-and-place tasks, the grasped object remains in a static in-hand configuration until it is released. However, many manipulation scenarios are more demanding. They required the repositioning of the grasped object inside of the hand. The proposed system enables the compliant in-hand control of the object, including the coordinated motion of multiple fingers and the balancing of the forces that are applied by them. The positioning of the object is realized by an impedance controller on contact-level, which guarantees the stability of the fingers, even if they loose contact. The initial contact configuration is actively maintained to prevent the slipping of fingers on the object. The internal forces on the object are distributed, such that they balance any dynamical or external loads. Moreover, friction constraints on the contact forces are explicitly considered. An additional nullspace controller takes advantage of the under-constrained mapping from the contact forces to the joint torques in order to avoid joint limits and singular configurations. The quality of the controller was validated in real-world experiments, using the David hand. Compared with another compliant in-hand object controller, the static IPC of [3], the proposed method demonstrated a consistently better tracking performance. While the ability to control the in-hand pose of an object allows to expand the reachable workspace, it is still constrained by the kinematics of the finger. Moving the object even further requires the reconfiguration of the grasp. To enable this type of advanced manipulation, the control framework was extended. This included the ability the gradually add and remove contacts, which avoids jumps in the force distribution. Additionally, a Cartesian finger controller was developed, which allows to relocate fingers on the object. With these extensions, the proposed system is able to execute finger gaiting sequences. This was experimentally validated by the full revolution of a tennis ball inside of the hand. Rotating the object by 360◦ required 12 complete reconfigurations of the grasp, i.e. 60 finger relocations, which were executed in less than one minute. The ability to reconfigure the grasp also enabled the utilization of the object controller during the acquisition of the grasp. In this scenario, the grasp forces have to be redistributed each time an additional finger comes in contact with the object. Controlling the object during the grasp acquisition allows to stabilize the 6 DoF object pose as soon as the third contact has been established. This capability was demonstrated for the grasp of a brush and a bottle. The controlled stabilization significantly reduced the undesired displacement of both objects, compared to the open-loop grasp. The integration of the developed grasp state estimation and in-hand control methods enabled the execution of a number of challenging dexterous manipulation tasks. These included the stacking game, which was initially discussed in Chap. 1. By fus-

6.2 Outlook

177

ing sensors measurements from the fingers and a head-mounted camera, the robot was able to track the in-hand pose of the grasped game pieces throughout the manipulation task. Kinematic limitations of the arm, which prevented the robot from placing statically grasped objects, were overcome by the utilization of the developed object controller. By reorienting a grasped game piece inside of the hand, the robot was able to align the object with the pins of the board. W.r.t. the applicability of the proposed system, the developed methods are primarily limited in two ways. First, the quality of both the grasp state estimation and in-hand controller are affected by the inaccuracies of the measurements. While the lack of tactile sensing capabilities and errors in the joint position measurements make it difficult to precisely detect and localize contacts between the object and the fingers, significant joint friction considerably distorts the contact forces that are applied to the object. Consequently, the overall accuracy and robustness of the system is impaired. The second set of restrictions arises from limitations and approximations of the utilized grasp model. In particular, the effects of rolling and sliding are not fully considered in the design of the framework. Moreover, limitations in the rigid body assumption restrict the applicability of the dynamics model to precision grasps. The following section discusses potential solutions that would allow to address these shortcomings. Additionally, extensions to the system, which could expand its applicability, are elaborated.

6.2 Outlook The flexibility of the developed framework allows to further extend its capabilities. First, additional sensor information can be incorporated in the grasp state estimation and in-hand controller, thereby improving the overall quality and performance of the system. Second, some of the restrictions, which are the result of approximations or simplifying assumptions, can be overcome by developing and utilizing more elaborate models. Finally, the scope of the complete framework can be expanded, making it applicable to an even wider set of dexterous manipulation tasks. A number of selected examples for each of these potential research directions shall be discussed in this section, some of which are already in active development.

Sensor Measurements Principally, the accuracy of the grasp state estimation is defined by the quality of the measurements. Inaccuracies of individual sensor inputs are compensated by combining the information from different modalities. Therefore, the incorporation of additional types of measurements allows to further improve the estimation. For the experiments with the DLR humanoid David, contacts between the fingers and the object were inferred from torque measurements. The integration of tactile sensors in

178

(a)

6 Conclusion

(b)

Fig. 6.2 Additional sensor measurements will allow to further improve the quality of the grasp state estimation and in-hand object control. a Tactile sensors, which are integrated in the skin of the fingers, provide reliable contact information. b Visually tracking the fingers and palm of the hand increases the accuracy of the estimated contact locations

the skin of the hand would provide a direct contact sensing capability, which would not be affected by joint friction. The development of such a tactile skin for David, which uses sensors from Kinfiniy [4], is currently in progress, as shown in Fig. 6.2a. The utilization of these sensors as part of the grasp state estimation has already been tested as well. Additionally, this direct contact sensing information could be used in the in-hand object control to actively maintain the desired contact state. Beyond the contact detection, the localization of the contact points directly affects the performance of the in-hand object localization and control. Its quality is primarily determined by the inaccuracies of the finger position measurements. Visually perceiving the positions of the fingers w.r.t. the object allows to greatly improve this aspect of the estimation. Based on the visual object tracker, which was developed as part of the master’s thesis of Manuel Stoiber [5], a visual finger tracking method is currently in development and illustrated in Fig. 6.2b. Once available, the output of this system will be fused in the EKF to inform the estimated errors of the finger positions. Moreover, the tracking of the hand of the robot offers an alternative to the AprilTags, which are mounted to the palm in order to localize the camera w.r.t. the manipulator. Joint friction, which is unobserved by the joint torque measurements, impairs the performance of the in-hand controller, since the physically applied forces will differ from the desired ones. Estimating this effect, based on a dynamic friction model

6.2 Outlook

179

of the hand, allows to compensate the disturbance. The development of a friction estimation method for the tendon-driven David hand has recently been investigated and described in [6]. In the future, it will be integrated in the in-hand controller.

Model Restrictions As presented, the in-hand localization and control methods rely on specific modeling assumptions, which limit the applicability or affect the quality of the developed system. However, many of these restriction could potentially be overcome by extensions to the algorithms. For instance, the rolling and sliding of the fingers on the object is not explicitly considered in the grasp state estimation, thereby simplifying the problem. Extending the grasp model would allow to predict the rolling. Moreover, if a tactile sensor were able to detect the slipping of a finger, the corresponding contact could be ignored in the prediction. W.r.t. the object control, in the proposed method, the sliding of contacts is actively avoided to maintain a static grasp configuration. However, humans routinely utilize the controlled sliding of fingers to reconfigure the grasp of an object. Realizing a similar capability would further advance the dexterity of the robotic hand. Similarly, some of the stated assumptions limit the applicability of the system. The proposed grasp state estimation relies on the availability of a geometric description of the object. A multi-modality approach, which combines image-based 3D reconstruction and tactile exploration, could potentially extend the system to also work with unknown objects. Finally, because of the rigid body assumption, the dynamics model of the grasp is only fully valid for precision grasps. Subsequently, as described, the in-hand object controller cannot be applied to hyperstatic grasps, in which the object is in contact with multiple phalanxes of a finger. The development of a grasp description, which overcomes this limitation, while retaining the computational efficiency of the rigid body model, is still an open research problem.

Dexterous Manipulation Framework Based on the results of this manuscript, the proposed dexterous manipulation framework can be developed further, incorporating additional algorithmic components, which would expand its applicability. As demonstrated, the in-hand object controller is able to support the repeated reconfiguration of the grasp, such as in finger gaiting scenarios. To fully take advantage of this capability, a finger gaiting planner is required. Its purpose would be to generate the reconfiguration sequence, which allows to iteratively move the object from its initial in-hand pose to the target location. Specifically, this involves determining in which order the fingers have to be relocated and where on the object they have to be placed. In order to find an action-

180

6 Conclusion

able sequence, the planner has to consider the kinematic limitations of the fingers and the force distribution to balance the object. Moreover, to account for unplanned deviations in the motion of the object or the fingers, the finger gaiting plan has to be continuously updated to maintain its validity. Once a reconfiguration sequence has been generated, the developed framework would be able to execute it. The development of the proposed framework allowed to significantly advance the dexterous manipulation abilities of robotic hands. The precision and versatility of the presented methods enable the execution of a wide range of manipulation tasks, which previously exceeded the applicability of robotic systems. Since some limitations remains, robotic hands have yet to reach the same capabilities as their human counterparts. However, the flexibility of the developed concept provides the foundation for further progress towards this goal. Ultimately, these dexterous manipulations skills will enable the introduction of highly capable robots in our everyday life, taking us one step closer towards a collaborative robotic future.

References 1. DLR. DLR—Institute of Robotics and Mechatronics—EDAN. https://www.dlr.de/rm/de/ desktopdefault.aspx/tabid-11670/, 2020. 2. DLR. DLR—Institute of Robotics and Mechatronics—Rollin’ Justin. https://www.dlr.de/rm/ en/desktopdefault.aspx/tabid-11427/, 2020. 3. Wimböck, Thomas, Christian Ott, Alin Albu-Schäffer, and Gerd Hirzinger. 2012. Comparison of object-level grasp controllers for dynamic dexterous manipulation. The International Journal of Robotics Research 31 (1): 3–23. 4. Kinfinity. Kinfinity. https://www.kinfinity.eu/, 2020. 5. Manuel Stoiber. Real-time in-hand object tracking and sensor fusion for advanced robotic manipulation. Master’s thesis, Technische Universität München, 2019. 6. Friedrich Lange, Martin Pfanne, Franz Steinmetz, Sebastian Wolf, and Freek Stulp. Friction estimation for tendon-driven robotic hands. IEEE Robotics and Automation Letters, 2020.