Informatics in Control, Automation and Robotics: 19th International Conference, ICINCO 2022 Lisbon, Portugal, July 14-16, 2022 Revised Selected Papers (Lecture Notes in Networks and Systems) 3031483022, 9783031483028

The book focuses the latest endeavors relating researches and developments conducted in fields of control, robotics, and

127 79 21MB

English Pages 164 [158] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Organization
Contents
Improved Robust Neural Network for Sim2Real Gap in System Dynamics for End-to-End Autonomous Driving
1 Introduction
2 Simple Mathematical Model
2.1 System Model Without Controller
2.2 P-Controller for Vehicle
2.3 Offset to Steering Angle
3 Previous Results with PilotNetRLSTM
4 PilotNetCLSTM and PilotNetCF
4.1 Naming Convention
4.2 Discretization of Steering Angles
4.3 Neural Network Architecture
5 Experimental Setup
5.1 Dataset Creation Methods
5.2 Dataset Overview
5.3 Evaluation Methods
6 Experimental Results
6.1 Comparing Results for Steering Offset
6.2 Comparing Results for Normal Driving Test
6.3 Comparing Results for Accuracies on the Validation Dataset
7 Discussion and Conclusion
References
Optimal Robust Control with Applications for a Reconfigurable Robot
1 Introduction
2 Kinematics Development
3 Optimal Robust Control
3.1 Mixed Sensitivity H Control
3.2 H Control Design
4 Application of Mixed Sensitivity H Control (Simulation and Results)
4.1 Application of H Control (Simulation and Results)
4.2 Application of -Synthesis Control and DK Iterations (Simulation and Results)
4.3 Comparison of H and -Controllers
5 Conclusions
References
Improving 2D Scanning Radar and 3D Lidar Calibration
1 Introduction
2 Radar Basics
3 Related Works
3.1 Target-Based Methods
3.2 Target-Less Methods
3.3 Speckle Filtering of Radar Signals
3.4 Improving Resolution of Radar Signals
3.5 Contribution
4 Methodology
4.1 Assumptions
4.2 Synchronization
4.3 Motion Correction
4.4 Preprocessing
4.5 Matching
4.6 Optimization
5 Experiments
5.1 Test Data and Environment
5.2 Improved Radar Filtering
5.3 Plane Optimization
5.4 Validation
6 Results
6.1 Improved Radar Filtering
6.2 Plane Optimization
6.3 Validation
7 Conclusion
References
Mobile Robots for Teleoperated Radiation Protection Tasks in the Super Proton Synchrotron
1 Introduction
1.1 The Significance of Radiation Protection Measures at CERN
1.2 Conducting Radiation Surveys in the Super Proton Synchrotron (SPS)
1.3 Mobile Robotics for Inspection: Advantages and Challenges
2 State of the Art on Teleoperated Robotics for Inspection
3 A Mobile Robot for Radiation Protection Operations
3.1 Hardware
3.2 Software Development and Integration
4 Experimental Assessment
4.1 Testing Methodology
4.2 Robotic Radiation Survey Results
4.3 Dual Robot Setup
5 Final Thoughts and Future Directions
References
A Review of Classical and Learning Based Approaches in Task and Motion Planning
1 Introduction
2 Background
2.1 Task Planning
2.2 Motion Planning
2.3 Task and Motion Planning Objectives
3 Task and Motion Planning Methods
3.1 Classical Methods
3.2 Learning Based Methods
3.3 Hybrid Methods
4 Benchmarks and Tools
5 Challenges
5.1 Observation Uncertainty
5.2 Action Uncertainty
5.3 Context-Aware Decision Making
5.4 Balance Between Optimum and Feasibility
6 Conclusion
References
Multi-objective Ranking to Optimize CNN's Encoding Features: Application to the Optimization of Tracer Dose for Scintigraphic Imagery
1 Introduction
2 A Reminder on Texture Encoding
3 The Rank-Order Aggregation Problem
3.1 Explicit or Implicit Resolution
3.2 Euclidean Distance (Spearman Distance)
3.3 Rank Absolute Deviation Distance
4 Implementation
5 Optimization of Tracer Dose for Scintigraphic Imagery
6 Conclusion
References
Linear Delta Kinematics Feedrate Planning for NURBS Toolpaths Implemented in a Real-Time Linux Control System
1 Introduction
2 NURBS Path Interpolation
3 Delta Parallel Kinematics
4 S-Curve Feedrate Planning
5 Delta Machine and Control System
6 Experimental Results
7 Conclusion
References
Automatic Fault Detection in Soldering Process During Semiconductor Encapsulation
1 Introduction
2 Related Work
3 Visual Inspection of Solder Balls
3.1 Image Segmentation
3.2 Image Classification
4 Experiments and Results
4.1 Experimental Setup
4.2 Image Acquisition
4.3 Solder Ball Regions Detection Evaluation
4.4 Solder Ball Classification Evaluation
4.5 Robustness Evaluation of Solder Ball Classification in Presence of Noise
5 Conclusion and Future Work
References
Author Index
Recommend Papers

Informatics in Control, Automation and Robotics: 19th International Conference, ICINCO 2022 Lisbon, Portugal, July 14-16, 2022 Revised Selected Papers (Lecture Notes in Networks and Systems)
 3031483022, 9783031483028

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Networks and Systems 836

Giuseppina Gini Henk Nijmeijer Wolfram Burgard Dimitar Filev   Editors

Informatics in Control, Automation and Robotics 19th International Conference, ICINCO 2022 Lisbon, Portugal, July 14–16, 2022 Revised Selected Papers

Lecture Notes in Networks and Systems

836

Series Editor Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Türkiye Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the worldwide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Giuseppina Gini · Henk Nijmeijer · Wolfram Burgard · Dimitar Filev Editors

Informatics in Control, Automation and Robotics 19th International Conference, ICINCO 2022 Lisbon, Portugal, July 14–16, 2022 Revised Selected Papers

Editors Giuseppina Gini Dipto di Elettronica e Informazione Politecnico di Milano Milan, Italy Wolfram Burgard Institute of Computer Science University of Freiburg Freiburg im Breisgau, Baden-Württemberg, Germany

Henk Nijmeijer Department of Mechanical Engineering Eindhoven University of Technology Eindhoven, Noord-Brabant, The Netherlands Dimitar Filev Research and Advanced Engineering Ford Motor Company Dearborn, MI, USA

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-3-031-48302-8 ISBN 978-3-031-48303-5 (eBook) https://doi.org/10.1007/978-3-031-48303-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Preface

The present book includes extended and revised versions of a set of selected papers from the 19th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2022), held in Lisbon, Portugal, from July 14 to 16. ICINCO 2022 received 108 paper submissions from 36 countries, out of which 7% were included in this book. The papers were selected by the event chairs, and their selection was based on a number of criteria that included the classifications and comments provided by the program committee members, the session chairs’ assessment and also the program chairs’ global view of all papers included in the technical program. The authors of selected papers were then invited to submit revised and extended versions of their papers, having at least 30% innovative material. The purpose of the International Conference on Informatics in Control, Automation and Robotics (ICINCO) is to bring together researchers, engineers and practitioners interested in the application of informatics to control, automation and robotics. Four simultaneous tracks have been held, covering intelligent control systems and optimization, robotics and automation, signal processing and sensors and modeling, and industrial informatics. Informatics applications are pervasive in most areas of control, automation and robotics as is clearly illustrated in this conference. The eight papers selected to be included in this book contribute to the understanding of relevant trends of current research on informatics in control, automation and robotics, including: data-based control and AI, mobile robots and intelligent autonomous systems, computer-based manufacturing technologies, engineering applications on robotics, automation, intelligent control systems and optimization, image processing, systems modeling and simulation. In particular, these papers present methods and results for the following problems: robot task and motion planning, mobile robot navigation, planning for parallel machines, mobile robots for automatic inspection in hazardous environments, autonomous driving via neural networks, control for reconfigurable robots, visual inspection in semiconductor production and optimizing feature extraction for images in radiology. We would like to thank all the authors for their contributions and also the reviewers who have helped ensuring the quality of this publication. July 2022

Giuseppina Gini Henk Nijmeijer Dimitar Filev

Organization

Conference Chair Dimitar Filev Wolfram Burgard (honorary)

Research & Advanced Engineering, Ford Motor Company, USA University of Freiburg, Germany

Program Co-chairs Giuseppina Gini Henk Nijmeijer

Politecnico di Milano, Italy Eindhoven University of Technology, The Netherlands

Program Committee El-Houssaine Aghezzaf Eugenio Aguirre Rudy Agustriyanto Carlos Aldana Manuel Aleixandre Joaquin Alvarez Mihail Antchev Rui Araujo Ramiro Barbosa Eric Baumgartner Juri Belikov Karsten Berns Mauro Birattari Jean-Louis Boimond Magnus Boman Tobias Bruckmann Marvin Bugeja Kenneth Camilleri

Ghent University, Faculty of Engineering and Architecture, Belgium University of Granada, Spain University of Surabaya, Indonesia University of Guadalajara, Mexico Tokyo Institute of Technology, Japan Center Scientific Research Higher Education Ensenada Cicese, Mexico Technical University-Sofia, Bulgaria University of Coimbra, Portugal ISEP/IPP-School of Engineering, Polytechnic Institute of Porto, Portugal Milwaukee School of Engineering, USA Tallinn University of Technology, Estonia University of Kaiserslautern-Landau, Germany Université Libre de Bruxelles, Belgium ISTIA-LARIS, France The Royal Institute of Technology, Sweden University of Duisburg-Essen, Germany University of Malta, Malta University of Malta, Malta

viii

Organization

Giuseppe Carbone Enrique Carrera Alessandro Casavola Marco Castellani Paul Christodoulides Feng Chu Carlos Coello Sesh Commuri António Dourado Marc Ebner Mohammed Fadali Paolo Fiorini Thierry Fraichard Eduardo Oliveira Freire Georg Frey Toyomi Fujita Mauro Gaggero Andrej Gams Péter Gáspár Giorgio Gnecco Arthur Gómez Noelia Hernández Parra Wladyslaw Homenda Chiu-Fan Hsieh Liu Hsu Daniela Iacoviello Gianluca Ippoliti Sarangapani Jagannathan Isabel Jesus Fabrício Junqueira Tohru Kawabe Moharram Korayem

Marek Kraft Dragana Krstic Masao Kubo

University of Calabria, Italy Army Polytechnic School Ecuador, Ecuador University of Calabria, Italy University of Birmingham, UK Cyprus University of Technology, Cyprus University of Evry Val d’Essonne, France CINVESTAV-IPN, Mexico University of Nevada, Reno, USA University of Coimbra, Portugal Ernst-Moritz-Arndt-Universität Greifswald, Germany UNR, USA University of Verona, Italy Inria, France Federal University of Sergipe, Brazil Automation and Energy Systems, Saarland University, Germany Tohoku Institute of Technology, Japan National Research Council of Italy, Italy Jožef Stefan Institute, Slovenia SZTAKI, Hungary IMT-School for Advanced Studies-Lucca, Italy Universidade do Vale do Rio dos Sinos, Brazil Universidad de Alcalá, Spain Warsaw University of Technology, Poland National Formosa University, Taiwan, Republic of China COPPE-UFRJ, Brazil Sapienza University of Rome, Italy Università Politecnica delle Marche, Italy Missouri University of Science and Technology, USA ISEP/IPP-School of Engineering, Polytechnic Institute of Porto, Portugal University of São Paulo (USP), Brazil University of Tsukuba, Japan Robotic Research Lab, Mechanical Engineering Department, Iran University of Science and Technology, Iran, Islamic Republic of Poznan University of Technology, Poland University of Nis, Faculty of Electronic Engineering, Serbia National Defense Academy of Japan, Japan

Organization

Kolja Kühnlenz Miroslav Kulich Sébastien Lahaye Kauko Leiviskä Tsai-Yen Li Antonio Lopes Eric Lucet Anthony Maciejewski Ester Martinez-Martin Vicente Matellán-Olivera Takafumi Matsumaru Luca Mazzola Seán McLoone Nadhir Messai Maciej Michalek Paulo Miyagi Rezia Molfino Rafael Morales Vladimir Mostyn George Moustris Riccardo Muradore Elena Nechita Juan J. Nieto Fernando Osorio Stamatios Papadakis Evangelos Papadopoulos Ju H. Park Igor Paromtchik Dariusz Pazderski Qingjin Peng Tadej Petric Raul Marin Prades Radu-Emil Precup Kanty Rabenorosoa José Ragot

ix

Coburg University of Applied Sciences and Arts, Germany Czech Technical University in Prague, Czech Republic Istia-LARIS, France University of Oulu, Finland National Chengchi University, Taiwan, Republic of China University of Porto, Portugal CEA-French Alternative Energies and Atomic Energy Commission, France Colorado State University, USA University of Alicante, Spain SCAYLE, Spain Waseda University, Japan HSLU-Lucerne University of Applied Sciences, Switzerland Queen’s University Belfast, Ireland University of Reims Champagne-Ardenne, France Poznan University of Technology, Poland University of Sao Paulo, Brazil PMARlab, University of Genova, Italy University of Castilla La Mancha, Spain VSB-Technical University of Ostrava, Czech Republic National Technical University of Athens, Greece University of Verona, Italy Vasile Alecsandri University of Bacau, Romania University of Santiago de Compostela, Spain USP-Universidade de Sao Paulo, Brazil Department of Preschool Education, Faculty of Education, University of Crete, Greece NTUA, Greece Yeungnam University, Korea, Republic of Robotic Technologies, France Poznan University of Technology, Poland University of Manitoba, Canada Jožef Stefan Institute, Slovenia Jaume I University, Spain Politehnica University of Timisoara, Romania Femto-ST Institute, France Centre de Recherche en Automatique de Nancy, France

x

Organization

Oscar Reinoso Paolo Rocco Juha Röning Christophe Sabourin Antonio Sala Addisson Salazar Jurek Sasiadek Dieter Schramm Karol Seweryn Antonino Sferlazza Jinhua She Vasile Sima Azura Che Soh Adrian-Mihail Stoica József Tar Tomasz Tarczewski Daniel Thalmann Germano Torres Andrés Úbeda Damir Vrancic Qiangda Yang Jie Zhang Cezary Zielinski

Miguel Hernandez University, Spain Politecnico di Milano, Italy University of Oulu, Finland Univ. Paris Est Creteil, LISSI, France Universitat Politecnica de Valencia, Spain Universitat Politècnica de València, Spain Carleton University, Canada University of Duisburg-Essen, Germany Space Research Centre (CBK PAN), Poland University of Palermo, Italy Tokyo University of Technology, Japan Technical Sciences Academy of Romania, Romania Universiti Putra Malaysia, Malaysia Polytechnic University Bucharest, Romania Óbuda University, Hungary Nicolaus Copernicus University, Poland Ecole Polytechnique Federale de Lausanne, Switzerland PS Solutions, Brazil University of Alicante, Spain Jožef Stefan Institute, Slovenia Northeastern University, China Newcastle University, UK Warsaw University of Technology, Poland

Invited Speakers Shie Mannor Soon-Jo Chung Panagiotis Tsiotras Emilia Fridman

Technion-Israel Institute of Technology, Israel California Institute of Technology, USA Georgia Institute of Technology, USA Tel Aviv University, Israel

Additional Reviewers Etienne Belin Elyson Carvalho José Carvalho

Université d’Angers, France Federal University of Sergipe, Brazil Federal University of Sergipe, Brazil

Organization

Jing Li

Lucas Molina Jugurta Montalvão Francisco Rossomando Philipp Maximilian Sieberg

xi

School of Management, Northwestern Polytechnical University; IBISC, Univ. Évry, University of Paris-Saclay, France Federal University of Sergipe, Brazil Federal University of Sergipe, Brazil Instituto de Automatica-Universidad Nacional de San Juan, Argentina University of Duisburg-Essen, Germany

Contents

Improved Robust Neural Network for Sim2Real Gap in System Dynamics for End-to-End Autonomous Driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephan Pareigis and Fynn Luca Maaß

1

Optimal Robust Control with Applications for a Reconfigurable Robot . . . . . . . . R. Al Saidi and S. Alirezaee

22

Improving 2D Scanning Radar and 3D Lidar Calibration . . . . . . . . . . . . . . . . . . . . Jan M. Rotter, Levin Stanke, and Bernardo Wagner

44

Mobile Robots for Teleoperated Radiation Protection Tasks in the Super Proton Synchrotron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Forkel, Enric Cervera, Raúl Marín, Eloise Matheson, and Mario Di Castro A Review of Classical and Learning Based Approaches in Task and Motion Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kai Zhang, Eric Lucet, Julien Alexandre Dit Sandretto, Selma Kchir, and David Filliat

65

83

Multi-objective Ranking to Optimize CNN’s Encoding Features: Application to the Optimization of Tracer Dose for Scintigraphic Imagery . . . . . 100 V. Vigneron, H. Maaref, and J.-P. Conge Linear Delta Kinematics Feedrate Planning for NURBS Toolpaths Implemented in a Real-Time Linux Control System . . . . . . . . . . . . . . . . . . . . . . . . 114 Gabriel Karasek and Krystian Erwinski Automatic Fault Detection in Soldering Process During Semiconductor Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Paulo V. L. Pereira, Conceição N. Silva, Neandra P. Ferreira, Sharlene S. Meireles, Mario Otani, Vandermi J. da Silva, Carlos A. O. de Freitas, and Felipe G. Oliveira Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Improved Robust Neural Network for Sim2Real Gap in System Dynamics for End-to-End Autonomous Driving Stephan Pareigis1(B)

and Fynn Luca Maaß2

1 HAW Hamburg, 20099 Hamburg, Germany [email protected] 2 TU Graz, 8010 Graz, Austria [email protected]

Abstract. An improved convolutional neural network architecture for end-toend autonomous driving is presented, which is robust against discrepancies in system dynamics. The proposed neutral network implements a lateral controller that improves performance when the absolute zero position of the steering is unknown. Experiments in the autonomous driving simulator CARLA show a significantly lower cross-track error of the new proposed architecture compared to previous architectures such as PilotNet and PilotNetΔ. If the zero position of the steering changes during ongoing operation, for example due to a blow to the steering ratio, the controller implemented by the proposed neural network steers back to the center of the road after a short deviation. The neural network uses a sequence of 10 images from a front facing camera to classify one out of 21 different steering angles. The classified steering angles are the rate of change of the steering position, making the lateral control independent to a potential offset in the zero position. Keywords: Sim-to-real gap · End-to-end learning · Autonomous driving · Artificial neural network · CARLA simulator · Robust control · PilotNet

1 Introduction The motivation for the following research question is based on [14], which aims to close the simulation-to-reality gab in end-to-end autonomous driving by improving lateral control in scenarios where the zero position of the steering is undefined or changes suddenly. The availability of methods which are robust against imprecise calibration in autonomous driving and robotics in general is of advantage, because mechanical adjustments may be subject to change during operation. In general, end-to-end solutions are able to learn complex image features and high level behavior [2, 10, 17], while having a relatively small amount of trainable parameters. End-to-end architectures often require only minimal preprocessing of the input data and combine many steps of a typical autonomous driving pipeline, making them very efficient and usable in embedded environments [16] as well. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  G. Gini et al. (Eds.): ICINCO 2022, LNNS 836, pp. 1–21, 2023. https://doi.org/10.1007/978-3-031-48303-5_1

2

S. Pareigis and F. L. Maaß

[14] considers an autonomous model vehicle driving in the center of the road using a control method similar to a P-controller (pure pursuit or a Stanley controller may be used, see e.g. [7, 13]). Due to incorrect adjustment of the zero position of the steering angle, the vehicle drives with an offset to the center of the lane, which is undesirable. The offset of the vehicle to the center of the lane shall subsequently be called cross-track error (CTE). As the adjustment of the steering position is sensitive and error-prone in the model vehicles used, a method was developed to keep the vehicle in the center of the lane regardless of the zero position of the steering servo. [14] proposes an architecture of a neural network based on NVIDIA’s PilotNet [1], called PilotNetΔ. This is a regression network with a single output neuron whose value is used directly as the change in steering angle. An LSTM layer is used to cope with multiple input images. Subsequently, the neural network architecture proposed in [14] M , to account for the regression architecture R and the shall be named PilotNetΔLST R M steers the vehicle inside the lane regardLSTM layer. It is shown that PilotNetΔLST R less of the zero position of the steering servo. However, the absolute cross-track error M is significantly larger than the CTE of CTE which results from using PilotNetΔLST R NVIDIA’s PilotNet compared to it in certain scenarios. How they compare depends on the defined zero position of the steering. For instance, when the true zero position M prevails over NVIDIA’s PilotNet. deviates 7.5◦ from the expectation, PilotNetΔLST R When the true zero position is indeed the expected zero position, NVIDIA’s PilotNet M , when driving under normal compares better. This states a problem for PilotNetΔLST R conditions. M towards NVIDIA’s PilotNet while To address this disadvantage of PilotNetΔLST R maintaining the focus on the initial problem of undefined zero positions in the steering, a new neural network architecture is proposed in this paper. The main and significant difference is the use of a classification network instead of a regression network. M , now multiple output neuInstead of using a single output neuron in PilotNetΔLST R rons are used and the steering angle is discretized. This architecture will be referred to as PilotNetΔC , where C stands for Classification. PilotNetΔC also uses class weights M uses during the training process to respect the data imbalance, whereas PilotNetΔLST R a cost-sensitive loss function [11] to handle data imbalance. Different methods to cope with multiple input images are also implemented. A neural network with an LSTM layer and one with a frame stack are implemented and compared. The use of a classification network with discretized steering angles largely simplifies training of the neural network. This was a big issue in [14]. Another advantage of a classification network is that a confusion matrix can be used to show the quality of the training process. The paper is organized as follows. Section 2 presents a simple mathematical model to show that the observed behavior of the vehicle with a misaligned steering angle is reasonable. This chapter may be skipped and is not necessary for the understanding of the proposed method. Section 3 recapitulates the previously defined neural network architecture, describes its properties and problems. Section 4 introduces the proposed new neural network architectures using a new naming convention. Section 5 describes the collection methods for the training data and the applied evaluation methods.

Improved Robust Neural Network for Sim2Real Gap

3

Section 6 shows the results and gives a comparison between the different proposed methods. Section 7 summarizes the results and concludes with some remarks and open questions.

2 Simple Mathematical Model The following simplified mathematical model shows, that it is reasonable that an offset in the steering angle implies a constant cross-track error as observed in the real experiments. This chapter is separate from the explanation of the proposed model and may be skipped. It helps to understand the general problem setting. 2.1 System Model Without Controller The following mathematical model shall be assumed: Let x ∈ R denote the offset of the vehicle to the middle of the lane. So if x = 0, the vehicle is perfectly in the middle, x = 1 means it is to the right, x = −1 means it is to the left. See Fig. 1.

Fig. 1. Simple model of a vehicle inside a lane. x denotes the offset to the center of the lane. α denotes the angle of the vehicle to the lane.

Let α ∈ [− π2 , π2 ] be the angle of the vehicle with respect to the road. Note that α does not stand for the steering angle. In this simple model, there is no steering angle. The vehicle will always drive straight according to its orientation α. When α = 0, this means the vehicle is not aligned with the road. When the vehicle drives with a constant velocity (which is not modeled in this simple model), the distance x of the vehicle to the middle of the lane will change. Let’s assume, the offset of the vehicle to the middle of the lane will change according to x˙ = α. So when α > 0 for example, the vehicle will drive to the right and the distance to the middle of the lane will increase linearly. Since there is no steering involved, the angle of the vehicle with respect to the lane will not change, so α˙ = 0

(1)

4

S. Pareigis and F. L. Maaß

The system dynamics can be written in the following matrix form      01 x x = 00 α α 2.2

P-Controller for Vehicle

A simple P-controller shall be used to control the model.            x 01 x 0 0 x 0 1 x = + = α 00 α −1 −1 α −1 −1 α

(2)

This simple controller has an effect on the angle of the vehicle to the lane, i.e. α. The controller simply puts α˙ = −x − α. This means, the higher the offset x to the center and the higher the angle α, the more the controller changes the angle of the vehicle. This may be done using a steering angle. However, the steering angle is not explicitly modeled in this simple model. Only the angle of the vehicle with respect to the road is modeled. When looking at Eq. 2, it can be calculated, that the eigenvalues are √ √ 1 1 3 3 i− λ2 = i− λ1 = − 2 2 2 2 (An online eigenvalue calculator was can be used to calculate this). Since both lie in the left half complex plane, the system is stable. 2.3

Offset to Steering Angle

Now an offset to the steering angle is considered. Since the steering angle is not explicitly modeled in this simple model, the effect that a steering offset has is modeled. A misaligned or mis-calibrated steering angle will produce a drift in the angle of the vehicle. This means, although the steering is set to zero, the angle of the vehicle with respect to the road will change constantly. Equation 1 will then change to α˙ = ε

(3)

for some small ε which represents the mis-calibrated zero position. When the P-controller from Eq. 2 is used to control the vehicle position, the following equation is received        x 0 1 x 0 = + (4) α −1 −1 α ε An online ODE solver can be used to calculate a solution of this system, see Eqs. 5 and 6. The solution to this equation will keep the vehicle driving straight, but with an

Improved Robust Neural Network for Sim2Real Gap

5

t

offset to the middle of the lane. The denominator e 2 causes the expressions to go to zero, in Eq. 5 a constant remains. x= α=

2 C sin(



3t 2 )

+ 2C1 cos(

√ 3t 2 )+

+ε t e2   √ √ √ √ √ − 3 C1 sin( 23t ) − C cos( 23t ) − C sin( 23t ) − C1 cos( 23t ) t

e2

(5) (6)

Clearly, the P-controller constantly tries to steer the vehicle back to the center. The steering offset causes the steering angle to be zero while the vehicle is still lateral to the center of the lane. This simple model shall show that the observed behavior is plausible. M 3 Previous Results with PilotNetΔLST R M The architecture presented in [14] will now be renamed to PilotNetΔLST to account R for a regression neural network being used and that an LSTM layer is used to process multiple input images.

M Fig. 2. PilotNetΔLST takes three consecutive images as input and learns the corresponding R relative steering angle Δαt = (αt − αt−1 ).

NVIDIA’s PilotNet is trained to map a camera input image of the vehicle to a steering angle. The main idea of PilotNetΔ was to use a difference in steering angles as output. This made the architecture independent of the absolute 0 position of the steering angle. It was required that three consecutive images were taken as an input to the neural network. PilotNetΔ takes the last three frames at a time t: imaget , imaget−1 , imaget−2 , labeled with the amount the angle (αt ) has to change compared to the previous angle (αt−1 ), referred to as relative angle, creating the mapping shown in Fig. 2. M Compared to NVIDIA’s PilotNet, PilotNetΔLST implements a convolutional R LSTM layer to process the sequence of three images. Each sequence has a shape of 3 by 66 by 200 by 3 pixels (sequence × height × width × channels), encoded in the YUV color space. Each image is spatially 0.5 m apart from the next one. These are the image sizes when the simulator CARLA is used in the experiments described in Sect. 5. M with Figure 3 compares the CTE (cross-track error) in PilotNet and PilotNetΔLST R LST M shows a good insensibility with and without steering offset. While PilotNetΔR M respect to a mis-calibrated steering, the absolute values of the CTE of PilotNetΔLST R are unsatisfactory compared to PilotNet. Table 1 compares the performances of the two architectures in numbers. M performs worse than PilotNet if no steering offset is included. PilotNetΔLST R

6

S. Pareigis and F. L. Maaß

M Fig. 3. CTE of PilotNetΔLST (left) and NVIDIA’s PilotNet (right) along the test track. The R CTE of NVIDIA’s PilotNet without steering offset is below 0.25 (right image, blue line). It is M (left image). The objective of therefore much better than the performance of PilotNetΔLST R M and still be insensible to steering offsets. this paper is to reduce the CTE of PilotNetΔLST R (Color figure online)

The objective of the new architecture presented in this paper is an improved performance of PilotNetΔ, i.e., a reduced absolute CTE, while maintaining the insensitivity to steering offsets. M Table 1. cross-track error of PilotNet and PilotNetΔLST while driving on a mostly straight R road during the evaluation process of the prior publication [14].

Architecture

Mean CTE Max CTE

Mean CTE Max CTE with Offset with Offset

NVIDIA’s PilotNet 0.11 m

0.29 m 0.98 m

1.46 m

M PilotNetΔLST R

1.18 m 0.57 m

1.19 m

0.55 m

M 4 PilotNetΔLST and PilotNetΔFC C

This chapter describes the new proposed neural network architectures. A new naming convention is introduced in Sect. 4.1. Section 4.2 describes the angle discretization in detail and Sect. 4.3 describes the neural network architectures. 4.1

Naming Convention

M Two new architectures PilotNetΔLST and PilotNetΔF C C are introduced. These are classification networks with multiple output neurons which map to a discretized steering

Improved Robust Neural Network for Sim2Real Gap

7

angle. Experiments show a better lateral control for these network architectures than M in [14]. The better lateral control is characthe previously proposed PilotNetΔLST R terized by a lower CTE (cross-track error) when driving under normal conditions as well as driving with a steering offset as discussed in Sect. 6.2. Along with the new architectures, a naming scheme is introduced to differentiate between the architectures following the principles shown in Fig. 4.

Fig. 4. Naming scheme of PilotNet Architectures. PilotNet is the original architecture from NVIDIA [1]. PilotNetΔ is the architecture from our previous work [14]. LSTM or F (frame stack) denotes the method used to handle multiple input images. The output shape may be R (regression network with a single neuron), C (classification network using discretized steering angles), CC (classification network with certain training data).

– Base Neural Network Architecture: All nets are based on NVIDIA’s PilotNet and therefore assume this base name [2, 3]. This includes all hidden dense and hidden convolutional layers. – Delta - Δ: The network predicts a relative steering angle Δα which is added to the current steering angle. This makes the steering control independent of absolute steering angles and in particular independent of a steering zero position. – LSTM / F: The input layer of PilotNet may take a frame stack containing a sequence of images, or it may consist of an LSTM sequence that is passed to a convolutional LSTM layer at the input layer of the network. – R/C/CC: The prediction of the neural network is done in a regression (R) fashion with a single output neuron or via classification (C) with 21 one-hot encoded output neurons. The CC indicates that the neural network is doing classification trained on the dataset where the angle discretization was already applied to the Stanley controller at data creation time. Section 5.1 describes the details. This means, C and CC architectures are the same with the only difference being the dataset they are trained on. 4.2 Discretization of Steering Angles The proposed family of classification networks PilotNetΔC provide a discrete relative angle as an output. The relative steering angles are discretized in 21 values between −5◦ and 5◦ . This results in 10 discrete relative angles greater than zero, 10 discrete relative angles less than zero and zero. See the top row of Table 2 for the non-negative classes. The classes with negative steering angles are defined symmetrically.

8

S. Pareigis and F. L. Maaß

The letter δ will be used to describe the discretization of steering angles to angle classes as depicted in Table 2. αt − αt−1 −→ δ(αt − αt−1 )

Fig. 5. Distribution of the discretization of the relative steering angles. Relative angles close to zero are discretized finer, relative angles close to −5◦ and 5◦ are discretized coarser.

The discretization is finer around zero to allow small steering corrections. Figure 5 visually depicts the distribution of the discretization. In the process of data creation, the vehicle is controlled with a given controller (see Sect. 5.1 for details). Relative angles αt − αt−1 are observed and mapped to one of the 21 discrete angle classes δ(αt − αt−1 ) according to the mapping shown in Table 2. Depicted are the non-negative classes in the top row. Relative angles which are recorded during driving with a given controller are mapped according to the ranges in the bottom row. Negative angles are mapped accordingly. Table 2. Mapping from relative steering angle αt − αt−1 during data acquisition (bottom row) to class (top row). The table shows only the non-negative classes. The mapping is done symmetrically for negative values. The letter δ will be used to denote this mapping. 0

0.05

0.1

0.3

0.5

0.7

1.0

1.5

2.5

3.5

5

[−0.01, 0.01] [0.01, 0.1] [0.1, 0.2] [0.2, 0.4] [0.4, 0.6] [0.6, 0.8] [0.8, 1.2] [1.2, 1.8] [1.8, 3.2] [3.2, 3.8] [3.8, ∞]

4.3

Neural Network Architecture

M and PilotNetΔF Two neural network architectures are proposed. PilotNetΔLST C C as defined in Tables 3 and 4. The input-output behavior of the PilotNetΔC family is given by Fig. 6. 10 images are used as an input. Both artificial neural networks are classification networks with 21 output neurons.

Improved Robust Neural Network for Sim2Real Gap

9

M Table 3. PilotNetΔLST architecture including a ConvLSTM Layer - 315,511 trainable paramC eters.

Layer Type

Stride Activation Output Shape

Input Sequence

-

-

10 × 66x200 × 3 -

Standardization

-

-

10 × 66x200 × 3 -

ELU

ConvLSTM2D 5 × 5 2 × 2

Params

24@31 × 98

64896

Dropout 0.2

-

-

24@31 × 98

-

Conv2D 5 × 5

2×2

ELU

36@14 × 47

21636

Conv2D 5 × 5

2×2

ELU

48@5 × 22

43248

Conv2D 3 × 3

1×1

ELU

64@3 × 20

27712

Dropout 0.2

-

-

64@3 × 20

-

Conv2D 3 × 3

1×1

ELU

64@1 × 18

36928

Flatten

-

-

1152

-

Dropout 0.2

-

-

1152

-

Dense

-

ELU

100

115300

Dense

-

ELU

50

5050

Dropout 0.2

-

-

50

-

Dense

-

ELU

10

510

Output

-

SoftMax

21

231 315,511

Table 4. PilotNetΔF C architecture including a frame stack with 10 images (10 images x 3 channels each) at the input layer - 268,639 trainable parameters. Layer Type

Stride Activation Output Shape

Params

Input Sequence -

-

66 × 200 × 30 -

Standardization -

-

66 × 200 × 30 -

Conv2D 5 × 5

2×2

ELU

24@31 × 98

18,024

Dropout 0.2

-

-

24@31 × 98

-

Conv2D 5 × 5

2×2

ELU

36@14 × 47

21,636

Conv2D 5 × 5

2×2

ELU

48@5 × 22

43,248

Conv2D 3 × 3

1×1

ELU

64@3 × 20

27,712

Dropout 0.2

-

-

64@3 × 20

-

Conv2D 3 × 3

1×1

ELU

64@1 × 18

36,928

Flatten

-

-

1,152

-

Dropout 0.2

-

-

1,152

-

Dense

-

ELU

100

115,300

Dense

-

ELU

50

5,050

Dropout 0.2

-

-

50

-

Dense

-

ELU

10

510

Output

-

SoftMax

21

231 268,639

10

S. Pareigis and F. L. Maaß

Fig. 6. The PilotNetΔC family of neural networks take ten consecutive images as input and learn the corresponding discrete relative steering angle δ(αt − αt−1 ).

The difference is in the input layers. PilotNetΔF C receives a frame stack with M receives ten image as a separate input dimension and is 10 images. PilotNetΔLST C equipped with a LSTM Layer to process a sequence of images. M is slightly larger than PilotNetΔF PilotNetΔLST C C due to the large size of the ConvLSTM layer.

5 Experimental Setup The experimental setup is based on the CARLA simulator[4]. The training data is obtained from two different maps created with RoadRunner[12] and intentionally kept simple with respect to their environment details. The custom maps are described in the following Sect. 5.1. The resulting datasets and training details are described in Sect. 5.2. Custom Maps for evaluation are provided as well and are used to evaluate the driving behavior with regard to the steering offset as well as the driving behavior under normal conditions. The methodology for the evaluation process is described in the respective Sect. 5.3. Metrics used with regard to classification accuracies of the models are described in Sect. 5.3 as well. 5.1

Dataset Creation Methods

Data used for training is generated on two different maps using two principles: 1. Principle: The vehicle is placed on map 1 and map 2 (Fig. 7) and drives one lap on the track clockwise each. 2. Principle: The vehicle is placed on 7 different locations on map 1 (Fig. 7) with an initial cross-track error and heading-error. The vehicle is then steered back to align with the target trajectory again. The initial cross-track error ranges from ±1.3 m and the initial heading-error from ±25◦ . This results in 210 short sequences, about 12 s of driving each. Following these two principles, a Stanley controller [9] for lateral control is used. See [7] for a description of the Stanley controller and a comparison to other controllers. The Stanley controller calculates the steering angle using the cross-track error and a heading error with respect to the closest trajectory segment. Using a Stanley controller instead of the TrafficManager [5] provided by CARLA for lateral control leads to improved smoothness of steering angles, beneficial for training the proposed PilotNetΔC and PilotNetCC architectures. A front facing camera attached to the vehicle samples images

Improved Robust Neural Network for Sim2Real Gap

11

Fig. 7. Both Maps were used to create training data. The data was obtained by a front-facing camera attached to the vehicle and labeled with current steering angles. The vehicle itself was steered by a Stanley controller for lateral control while maintaining a constant speed of 5 m/s. Images were sampled at 10 Hz resulting in 0.5 m/Frame. Map 1 features curves with varying curvature and length. Additionally, 210 short sequences were generated on this map with different initial cross-track errors and heading errors. Map 2 features curves with varying curvature and length, with the focus on curves with especially high curvature.

at 10 Hz with a constant vehicle speed of 5 m/s. The steering angles calculated by the Stanley controller are used as labels for the sampled images. Two methods C and CC are implemented for data collection both using the Stanley controller. The naming convention corresponds to the indices PilotNetΔC and PilotNetΔCC . 1. Method C. The Stanley controller steers the vehicle and collects data. The continuous steering angles are collected and are later discretized into classes during training time. The following sequence shows the process of data collection. image It −→ Stanley −→ αt −→ vehicle The Stanley controller receives an image, produces and angle, and the angle is given to the vehicle. Training data then consists of ((It , . . . , It−9 ), δ(αt − αt−1 )) Because the discretization of the steering angles to produce the classes is done after the data collection process, the training data produced by this method does not reflect correctly the applied angles during data collection. In this case, consecutive images in the resulting dataset are not truly dependent on the steering angle used as labels.

12

S. Pareigis and F. L. Maaß

2. Method CC. The Stanley controller is used to generate steering angles. These steering angles are discretized according to Sect. 4.2 before being passed to the vehicle. image It −→ Stanley −→ αt −→ δ(αt − αt−1 ) + αt−1 −→ vehicle The training data is the same as in method 1. The difference is, that the vehicle has received discretized relative steering angles. In this case, consecutive images in the dataset truly correlate with the steering angles used as labels. Following these methods, two large datasets are created. One for PilotNetΔC architectures, including data from map 1 and map 2 with the steering angles discretized during training time. The other one for PilotNetΔCC architectures, including the data from map 1 and map 2 discretized during data creation time. 5.2

Dataset Overview

Figure 8 shows the data distribution across the different angle classes for the archiLST M . This dataset consists of 15,160 samples tectures PilotNetΔF C and PilotNetΔC (25 min of driving time), with each sample containing 10 consecutive images. The steering angles are discretized during training time into classes with smaller discretization intervals at relative steering angles around 0◦ as described in Method C. As illustrated, classes with a lower relative steering angle are more present than classes with a higher relative steering angle, leading to an unfavorable data distribution. Figure 9 illustrates the data distribution across the different angle classes for the LST M . This dataset contains 15,170 samples, architectures PilotNetΔF CC and PilotNetΔCC with again 10 consecutive images per sample. The angle classes are discretized during data creation time as described in Method CC. The discretization intervals as in Table 2 are the same across both datasets. The datasets in both Fig. 8 and Fig. 9 are very similar, with the dataset in Fig. 9 containing more samples around 0◦ . The result of how both datasets influence the steering behavior is described in Sect. 6.3. Because of the class imbalance in both datasets, class weights are used in a weighted cross-entropy loss. The class weights can be calculated with the equation presented in Eq. 7: N (7) wi = C ∗ ni Equation for calculating the class weight wi for class i, i ∈ {1, . . . , 21}. C denotes the number of different classes (here C = 21), ni denotes the number of samples belonging to class i, and N is the total number of samples. The calculated weights are used in the weighted cross-entropy loss illustrated in Eq. 8: C  −wi yi log(pi ) (8) L= i=1

Improved Robust Neural Network for Sim2Real Gap

13

Fig. 8. The diagram shows the distribution of relative steering angles used for training LST M . The steering angles are discretized from continuous values PilotNetΔF C and PilotNetΔC into classes during training time. Classes around 0◦ are discretized with smaller margins. 15,160 image sequences total.

Fig. 9. The diagram shows the distribution of relative steering angles used for training LST M . The steering angles are discretized from continuous values PilotNetΔF CC and PilotNetΔCC into classes during data creation time. The Stanley controller itself discretizes the steering angles before they are applied to the vehicle. Classes around 0◦ are discretized with smaller margins. 15,170 image sequences total.

Equation for calculating the weighted cross-entropy loss for a given prediction. C denotes the number of classes. wi denotes the weight for class i. yi denotes a binary flag 0, 1 if i is the correct class for the given prediction, pi denotes the predicted probability for class i. Further, both datasets are split into 80% training and 20% validation data. 5.3 Evaluation Methods Model Evaluation of Steering Offset Performance. With the given research question, the focus is specifically on the steering offset problem. An experiment is set up to

14

S. Pareigis and F. L. Maaß

Fig. 10. Map 3 to evaluate performances of the trained models from Sect. 4.3. The track resembles a real road with a variety of different curves.

demonstrate how the different architectures respond to the steering offset. To accomplish this, the evaluated model will steer a vehicle that is placed in the center of a straight road. The vehicle drives along the road with 5 m/s. After 10 m, a steering offset of −7.5◦ (turning left) is constantly applied. The resulting response characterized by the cross-track error is calculated during this experiment and discussed in Sect. 6.1. Model Evaluation of Normal Performance. To also reflect on the performance of the trained models under normal conditions another experiment is set up. This consists of a normal driving test on a track representing a real world road. Map 3 shown in Fig. 10 illustrates the track which was not part of the training data set. All proposed models steer the vehicle one lap on the track. While driving, the cross-track error will be calculated again for further analysis to retrieve the mean-cross-track error and maximum-crosstrack error. The vehicle drives with a constant speed of 5 m/s as well. Evaluation of Classification Accuracies. Since the proposed architectures PilotNetΔC and PilotNetΔCC classify images accuracies for the validation data set are provided in Sect. 6.3 along with three measures: – Top-1-Accuracy: Accuracy where the true class matches the most probable class predicted by the model – Top-3-Accuracy: Accuracy where the true class matches with any one of the 3 most probable classes predicted by the model.

Improved Robust Neural Network for Sim2Real Gap

15

– Top-Neighbor-Accuracy: Accuracy where the true class or the direct neighbor class to the left and to the right matches with the most probable class predicted by the model. The rationale behind the Top-Neighbor-Accuracy is that the predicted classes have a spacial relationship to each other. With normal classification the distance (e.g. the number of neurons separating two classes at the classification layer) does not matter. For the PilotNetΔC/CC a false prediction of a neuron that is next to the neuron encoding the true class is not necessarily bad. This is because neurons at the classification layer encode different steering angles and neurons closer to each other encode similar values. Whereas a prediction of a neuron that is very far away from the neuron encoding the true class is considered bad. This effect was described in [8]. Because a false prediction can still lead to very usable results the Top-Neighbor-Accuracy is included in the evaluation process and is further discussed in Sect. 6.3.

6 Experimental Results The following presents the results according to the evaluation methods introduced in Sect. 5.3. Section 6.1 presents the experimental results for the driving test in regard to the steering offset. Section 6.2 presents the results for the driving test on the road from Fig. 10 under normal conditions. Section 6.3 presents the results on the validation dataset regarding classification accuracies. 6.1 Comparing Results for Steering Offset The results for this test are conducted with the trained models steering the vehicle on a straight road. After 10 m into driving a constant steering offset of −7.5◦ (steering left) is applied. Figure 11 illustrates the cross-track error over time for PilotNetΔF C and an implementation of NVIDIA’s PilotNet, each one driving with the offset and without the offset. The implementation of NVIDIA’s PilotNet that is used for comparison was trained under the same conditions as PilotNetΔF C . Furthermore, Fig. 11 M as a reference on how illustrates the mean cross-track error for the old PilotNetΔLST R this architecture performed in a similar test in the previous publication [14]. The old M previously was nearly unaffected by the steering offset, but the latPilotNetΔLST R eral control was equally suboptimal in both cases driving with and without an offset. NVIDIA’s PilotNet and PilotNetΔF C both drive perfectly without an offset in the middle of the road. Especially, the lateral control of the new PilotNetΔF C is clearly improved M . compared to the old PilotNetΔLST R

16

S. Pareigis and F. L. Maaß

Fig. 11. Cross-track error for PilotNetΔF C compared to NVIDIA’s PilotNet, both driving on a straight road. Once without a steering offset and once with a steering offset constantly applied after 10 m of driving.

Looking at NVIDIA’s PilotNet driving with an offset illustrates the problem that is mathematically described in Sect. 2 and the fundamental subject of this research. The lateral control of NVIDIA’s PilotNet is not sufficient and causes the vehicle to drive next to the center of the road. The cross-track error ranges from 0.6 m to 1.2 m with unstable lateral control, additionally caused by an uncertainty of the trained model when driving in such a state. In comparison, PilotNetΔF C is initially pushed to the left after the offset is applied with a maximum cross-track error of 0.6 m. Shortly after, PilotNetΔF C controls the vehicle back very close to the center of the road with small oscillations to the left and right M again shows significantly better afterward. Comparing this to the old PilotNetΔLST R results. Figure 12 compares PilotNetΔF C to similar implementations suitable for solving the steering offset that were also introduced in Sect. 4. Figure 12 again illusF trates the cross-track error over time for the architectures PilotNetΔF C , PilotNetΔCC , LST M LST M ◦ and PilotNetΔCC all driving with a steering offset of −7.5 (steerPilotNetΔC ing left). All depicted architectures show very similar behavior to the already described LST M is a PilotNetΔF C with a good robustness against the steering offset. PilotNetΔC bit of an exception, with the initial cross-track error being a bit higher with a maximum of 0.79 m after the offset is applied. Altogether, all architectures perform betM . Table 5 provides a full overview of the architectures ter than the old PilotNetΔLST R testes to conclude on the cross-track errors in regard to the steering offset. The three F most promising results are highlighted in bold, including PilotNetΔF C , PilotNetΔCC LST M and PilotNetΔCC with only marginal differences.

Improved Robust Neural Network for Sim2Real Gap

17

F LST M M Fig. 12. cross-track error for PilotNetΔF , PilotNetΔLST C , PilotNetΔCC and PilotNetΔC CC driving on a straight road with a steering offset constantly applied after 10 m of driving.

Table 5. Mean cross-track error and maximum cross-track error the various PilotNetΔ architectures concluding the results from Fig. 11 and Fig. 12. Driving on a straight road with a steering offset constantly applied after 10 m of driving. Architecture

Mean CTE with Offset Max CTE with Offset

NVIDIA’s PilotNet

0.81 m

1.24 m

0.55 m

1.18 m

0.11 m

0.60 m

0.09 m

0.68 m

0.11 m

0.79 m

0.08 m

0.67 m

M old PilotNetΔLST R F PilotNetΔC PilotNetΔF CC M PilotNetΔLST C M PilotNetΔLST CC

6.2 Comparing Results for Normal Driving Test The previous section demonstrated the robustness against a steering offset for the proposed architectures is this research. This section focuses on a test including the map from Fig. 10 resembling a real world road to test the performance of the different architectures on a road with more dynamic curves, rather than just a straight road. For this test, all four proposed PilotNetΔC/CC architectures drive across map 3 for one lap, with the respective cross-track error being measured and reported in Table 6. For this test, no steering offset is applied. The implementation of NVIDIA’s PilotNet is also compared in this test, knowing that the baseline performance is very good, as described by NVIDIA itself [2, 3] as well and in prior research [14]. Table 6 also includes the cross-track errors for both Stanley controllers that were used as experts in the training process. Stanley ControllerC refers to the Stanley controller used to create LST M , whereas Stanley the dataset for NVIDIA’s PilotNet, PilotNetΔF C and PilotNetΔC ControllerCC refers to the Stanley controller used to create the dataset for PilotNetΔF CC M and PilotNetΔLST . As a reminder, Stanley ControllerCC was only allowed to change CC its steering angle according to the defined relative angle classes introduced in Sect. 4. No data was acquired from this map 3 for the training process. Looking at Table 6, both Stanley ControllerC and Stanley ControllerCC show identical results. The mean CTE is 0.11 m and the maximum CTE is 0.35 m in both cases. The slight CTE is primar-

18

S. Pareigis and F. L. Maaß

Table 6. Mean cross-track error and maximum cross-track error for all PilotNetΔC , PilotNetΔCC driving under normal conditions on map 3 from Fig. 10 representing a real world road. Results of NVIDIA’s PilotNet are included as well as from the Stanley ControllerC used M as expert for PilotNetΔC , Stanley ControllerCC for PilotNetΔCC respectively. PilotNetΔLST C was not able to finish the whole lab. Architecture

Mean CTE Max CTE

Stanley ControllerC

0.11 m

0.36 m

Stanley ControllerCC 0.11 m

0.36 m

NVIDIA’s PilotNet

0.12 m

0.39 m

PilotNetΔF C

0.16 m

0.85 m

PilotNetΔF CC

0.13 m

0.49 m

M PilotNetΔLST C

-

-

M PilotNetΔLST CC

0.12 m

0.52 m

ily because the used implementation of the Stanley controller disregards effect such as wheel dumping rate described in [9]. However, as already demonstrated, the Stanley controller is good enough to be used as an expert for the proposed architectures. As indicated, no difference is measurable between Stanley ControllerC and Stanley ControllerCC , demonstrating that it is possible to rely on relative steering angles classes that are defined in Sect. 4 instead of continuous relative steering angles to achieve the same results for this particular track. The results for NVIDIA’s PilotNet are very close to the expert results resembled by the Stanley ControllerC , demonstrating that NVIDIA’s PilotNet is capable of imitating the expert well. From the four PilotNetΔC/CC models, all models were able to complete the track M , which test was aborted after leavwithout going off-road, except for PilotNetΔLST C ing the road while driving through a steep curve. The remaining three architectures show M and PilotNetΔF good results in this test. Architecture PilotNetΔLST CC CC (highlighted in bold) demonstrate the best results with a mean CTE of only 0.13 m (PilotNetΔF CC ) M ), very close to the baseline performance of their respecand 0.12 m (PilotNetΔLST CC tive Stanley controller and NVIDIA’s PilotNet. The Maximum CTE of PilotNetΔF CC M is 0.49 m and slightly higher than the baseline of 0.36 m. Likewise, PilotNetΔLST ’s CC maximum CTE is also increased with 0.52 m and 0.16 m higher than the baseline. 6.3

Comparing Results for Accuracies on the Validation Dataset

This section presents the classification accuracies for the validation dataset, which is part of the whole dataset presented Sect. 5.2 ant not used for training. Table 7 illustrates the Top-1-Accuracy, Top-3-Accuracy and Top-Neighbor-Accuracy introduced in Sect. 5.3. The Top-1-Accuracies in Table 7 across the architectures are ranging from M M ) up to 0.637 (PilotNetΔLST ), which are fairly low accura0.421 (PilotNetΔLST CC C cies, considering that nearly every second steering maneuver is technically wrong. As described in Sect. 5.3, the Top-Neighbor-Accuracy is introduced to also take steering predictions into account that are only slightly wrong. Comparing the Top-NeighborAccuracies is a better reflection on the usefulness of the trained models. As illus-

Improved Robust Neural Network for Sim2Real Gap

19

Table 7. Top-Accuracies for the tested PilotNetΔC and PilotNetΔCC architectures on the validation dataset. Architecture

Top-1-Accuracy Top-3-Accuracy Top-Neighbor-Accuracy

PilotNetΔF C PilotNetΔF CC M PilotNetΔLST C LST M PilotNetΔCC

0.596

0.953

0.900

0.457

0.886

0.809

0.637

0.960

0.893

0.421

0.877

0.801

M trated, the Top-Neighbor-Accuracies range from 0.801 (PilotNetΔLST ) to 0.900 CC ), showing the high uncertainty between closely related steering angles (PilotNetΔF C is prowhen compared to the Top-1-Accuracies. A confusion matrix for PilotNetΔF CC vided in Fig. 13, clearly illustrating that similar steering angles are often confused with each other. PilotNetΔF CC especially struggles to differentiate between steering angles of ±0.05◦ and the neighbor class 0◦ and ±0.1◦ . These steering angles additionally have a very small margin, which makes it hard to tell them apart by nature. Interestingly, all provided accuracies for (PilotNetΔCC ) are always lower compared to their counterpart (PilotNetΔC ) accuracies. This lower accuracy is not necessarily reflected in a poorer driving performance, as described in Sect. 6.2. Moreover, M achieving the highest accuracies of all models, with exception to the PilotNetΔLST C Top-Neighbor-Accuracy, did not manage to complete the track from Sect. 10.

Fig. 13. Confusion Matrix for PilotNetΔF CC on the validation dataset.

20

S. Pareigis and F. L. Maaß

7 Discussion and Conclusion It can be shown that the new proposed classification architectures are independent of the zero position of steering angles during training time, and more importantly, during inference time. The new architectures provide a better alternative to the previous M , due to significantly improved lateral control. Two means to process PilotNetΔLST R image sequences are demonstrated, implemented as a convolutional LSTM layer or as a frame stack. Both implementations lead to very similar results. Following a classification neural network, steering angles must be discretized. A distinction is made between discretizing steering angles at training time and discretizing steering angles at data creation time, with the latter having the benefit of truly coherent LST M using discretized steering angles data. Architectures PilotNetΔF CC , PilotNetΔCC at data creation time achieved slightly better results driving under normal conditions M is not able to finish as measured by the mean and maximum CTE. PilotNetΔLST C the driving test under normal conditions. The vehicle leaves the roadway during the experiment. An obvious reason for this behavior is the lack of training data for high steering angles for this particular case and can be improved with further fine-tuning. Discretizing steering angles with smaller discretization intervals at relative steering angles around 0◦ is proven to work. However, optimizing the discretization intervals may be subject to future research. Furthermore, the provided classification accuracies illustrate, that steering angles, which map to classes which lie closely together, are often confused with each other. It is demonstrated, that this effect is in practice tolerable, as long as classes are not too distant from each other. This circumstance should be respected and could be further improved into the cross-entropy loss calculation during training as proposed by [15]. The feature complexity of the simulation environment is intentionally reduced to a minimum for the purpose of this research. NVIDIA themselves [1–3] described, that it is possible for PilotNet to learn very complex image features and drive successfully on unpaved roads and under different weather and traffic conditions. Other research[6, 8] demonstrated benefits in using multiple frames, including complex image features proM architectures. Using cessed by LSTM layers similar to the proposed PilotNetΔLST  CARLA offers great possibilities for collecting sufficient data in an automated process using the described Stanley controller [9] or other control methods such as [13] that can be embedded into the simulation. Experiments demonstrating the capabilities of the proposed PilotNetΔC/CC in the real world are pending.

References 1. Bojarski, M., et al.: The NVIDIA pilotnet experiments (2020) 2. Bojarski, M., et al.: End to end learning for self-driving cars. arXiv e-prints arXiv:1604.07316 (2016) 3. Bojarski, M., et al.: Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv e-prints arXiv:1704.07911 (2017) 4. CARLA: Carla 0.9.13 (2021). https://carla.org/2021/11/16/release-0.9.13/. Accessed 25 Jan 2023

Improved Robust Neural Network for Sim2Real Gap

21

5. CARLA: trafficmanager (2021). https://carla.readthedocs.io/en/latest/adv_traffic_manager/. Accessed 03 Oct 2021 6. Chi, L., Mu, Y.: Deep steering: learning end-to-end driving model from spatial and temporal visual cues. arXiv e-prints arXiv:1708.03798 (2017). https://doi.org/10.48550/arXiv.1708. 03798 7. Dominguez, S., Ali, A., Garcia, G., Martinet, P.: Comparison of lateral controllers for autonomous vehicle: experimental results. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 1418–1423 (2016). https://doi.org/10.1109/ ITSC.2016.7795743 8. Eraqi, H.M., Moustafa, M.N., Honer, J.: End-to-end deep learning for steering autonomous vehicles considering temporal dependencies. arXiv e-prints arXiv:1710.03804 (2017) 9. Hoffmann, G.M., Tomlin, C.J., Montemerlo, M., Thrun, S.: Autonomous automobile trajectory tracking for off-road driving: controller design, experimental validation and racing. In: 2007 American Control Conference, pp. 2296–2301 (2007). https://doi.org/10.1109/ACC. 2007.4282788 10. Hoveidar-Sefid, M., Jenkin, M.: Autonomous trail following using a pre-trained deep neural network. In: Proceedings of the 15th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 103–110. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006832301030110 11. Ling, C., Sheng, V.: Cost-sensitive learning and the class imbalance problem. Encycl. Mach. Learn. 2011, 231–235 (2010) 12. Mathworks: Roadrunner. de.mathworks.com/products/roadrunner.html (2023). Accessed 25 Jan 2023 13. Nikolov, I.: Verfahren zur Fahrbahnverfolgung eines autonomen Fahrzeugs mittels Pure Pursuit und Follow-the-carrot. B.S. Thesis, University of Applied Sciences Hamburg (2009) 14. Pareigis, S., Maaß, F.L.: Robust neural network for sim-to-real gap in end-to-end autonomous driving. In: Proceedings of the 19th International Conference on Informatics in Control, Automation and Robotics - ICINCO, pp. 113–119. INSTICC, SciTePress (2022). https:// doi.org/10.5220/0011140800003271 15. Polat, G., Ergenc, I., Tarik Kani, H., Ozen Alahdab, Y., Atug, O., Temizel, A.: Class distance weighted cross-entropy loss for ulcerative colitis severity estimation. arXiv e-prints arXiv:2202.05167 (2022) 16. Tiedemann, T., et al.: Miniature autonomy as one important testing means in the development of machine learning methods for autonomous driving : how ml-based autonomous driving could be realized on a 1:87 scale. In: International Conference on Informatics in Control, Automation and Robotics 2019, pp. 483–488. ICINCO 2019 (2019). http://hdl.handle.net/ 20.500.12738/10506 17. Wang, Y., Liu, D., Jeon, H., Chu, Z, Matson, E.: End-to-end learning approach for autonomous driving: a convolutional neural network model. In: Proceedings of the 11th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, pp. 833–839. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007575908330839

Optimal Robust Control with Applications for a Reconfigurable Robot R. Al Saidi(B)

and S. Alirezaee

University of Windsor, 401 Sunset Ave, Winsor, ON N9B 3P4, Canada {alsaid,s.alirezaee}@uwindsor.ca

Abstract. This research presents the development of optimal robust controllers for a reconfigurable robot that links different kinematic structures and can be used for a variety of tasks. The Denavit-Hartenberg (D-H) parameters were changed to accommodate any required robotic structure to meet a specific task. The joint twist angle was modified such that the presented model is reconfigurable to any desired robotic systems. Optimal robust H∞ /μ controllers were developed for a reconfigurable robot in the presence of parametric and dynamic uncertainties. The resulting controllers were achieved the required stabilization and tracking performance of several desired trajectories. Keywords: Reconfigurable robot · Changeable D-H parameters · Optimal robust control · H∞ /μ controllers · D-K iteration

1 Introduction Robots with predefined kinematic structures are successfully applied to accomplish tasks between robotic systems and their environment. For more sophisticated and future applications, it is necessary to extend the capabilities of robots and employ them in more complex applications, which generally require accurate and more changeable structural properties during the interaction with their environment. To satisfy such varying environments, a robot with changeable kinematic structure (Reconfigurable Robot) is necessary to cope with these requirements and tasks. In the literature, modular robotic systems are presented as a solution for robot reconfigurability [1]. In [2], modular robots were presented to perform different tasks in space, such as capturing a target, constructing a large structure, and autonomously maintaining in-orbit systems. A reconfigurable integrated multi-robot exploration system is introduced in [3] to demonstrate the feasibility of reconfigurable and modular systems for lunar polar crater exploration missions. Reconfigurable Manufacturing System (RMS) was introduced to address the new production challenges [4]. RMS is designed for rapid adjustments of production capacity and functionality in response to new circumstances, by rearrangement or change of its components and machines [5] and [6]. A self-reconfigurable (SR) robot is introduced in [7] such that a cellular robot can adapt its shape and functions to changing environments and demands. The basic component of the SR robot is a cellular module capable of computation. These modules can rearrange c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  G. Gini et al. (Eds.): ICINCO 2022, LNNS 836, pp. 22–43, 2023. https://doi.org/10.1007/978-3-031-48303-5_2

Optimal Robust Control with Applications for a Reconfigurable Robot

23

their mutual mechanical connection to change the robot’s outward features. In these missions, one fundamental task with the robot would be the tracking of changing paths, the grasping, and the positioning of a target in Cartesian space. The main drawbacks of the modular robotic systems proposed in the literature are the high initial investment necessary in modules that remain idle during many activities, and the significant lead time for the replacement of components before performing a specific task. Therefore, it is desirable and cost-effective to employ a single versatile reconfigurable robot capable of performing tasks such as inspection, contact operations, assembly (insertion or removal of parts), and carrying objects (pick and place). In [13], the problem of changeable kinematic structures results in time-varying parameters is presented by developing robust gain-scheduling (LPV) controllers. The dynamic parameters such as inertia, torque, and gravity are dependent on the robot configuration and modeled as time-varying and can be measured online. LPV controllers were developed that adapt to varying dynamics and operating conditions. This research focuses on the development of optimal robust controllers, using the model presented in [13], to achieve high tracking performance in presence of parametric and dynamic uncertainties.

2 Kinematics Development Development of the general n–DOF Global Kinematic Model (n–GKM) is necessary for supporting any open kinematic robotic arm. The n–GKM model is generated by the variable D–H parameters as proposed in [8]. The D–H parameters presented in Table 1 are modeled as variable parameters to accommodate all possible open kinematic structures of a robotic arm. Table 1. D–H parameters of the n-GKM model. i

di

θi

1

R1 dDH1 + T1 d1

R1 θ1 + T1 θDH1 a1

00 , ±1800 , ±900

2

R2 dDH2 + T2 d2

R2 θ2 + T2 θDH2 a2

00 , ±1800 , ±900

...

...

... ...

ai

...

αi

n Rn dDHn + Tn dn Rn θn an + Tn θDHn 00 , ±1800 , ±900 The subscript DHn implies that the di or θi parameter is constant.

The twist angle variable αi is limited to five different values, (00 , ±900 , ±1800 ), to maintain perpendicularity between joints’ coordinates frames. Consequently, each joint has six distinct positive directions of rotation and/or translation. The reconfigurable joint is a hybrid joint that can be configured to be either a revolute or prismatic type of motion, according to the required task. For the n–GKM model, a given joint’s vector zi−1 can be placed in the positive or negative directions of the x, y, and z axis in the Cartesian coordinate frame. This is expressed in (1) and (2):

24

R. A. Saidi and S. Alirezaee

Rotational Joints : Ri = 1 and Ti = 0 Translational Joints : Ri = 0 and Ti = 1

(1) (2)

The variables Ri and Ti are used to control the selection of joint type (rotational and/or translational). The orthogonality between the joint’s coordinate frames is achieved by assigning appropriate values to the twist angles αi . Their trigonometric functions are defined as the joint’s reconfigurable parameters (KSi & KCi ) and expressed in (3) and (4): Ksi = sin(αi )

(3)

Kci = cos(αi )

(4)

The kinematics of a reconfigurable robot is calculated by multiplication of the all homogeneous matrices from the base frame to the flange frame. The resulting homogeneous transformation matrix for the (n-GKM) model is given in (5): ⎡ cos (Ri θi + Ti θDHi ) −KCi sin (Ri θi + Ti θDHi ) ⎢ sin (Ri θi + Ti θDHi ) KCi cos (Ri θi + Ti θDHi ) Ai−1 = ⎢ (5) ⎣ 0 KSi 0 0 ⎤ KSi sin (Ri θi + Ti θDHi ) ai cos (Ri θi + Ti θDHi ) −KSi cos (Ri θi + Ti θDHi ) ai sin (Ri θi + Ti θDHi ) ⎥ ⎥, ⎦ Ri dDHi + Ti di KCi 0 1 i = 1, 2, ..., n

3 Optimal Robust Control Linear dynamical systems are generally described in either state-space form, or as a transfer function [9] and [10]. The state-space description of a general linear timeinvariant (LTI) is commonly represented by: x(t) ˙ = Ax(t) + Bu(t),

x(t0 ) = x0

y(t) = Cx(t) + Du(t)

(6)

The system matrices, A, B, C, D, represent the linear nominal model of the system. The transfer function description can be related to the state-space form is given as: y(s) = G(s)u(s),

G(s) = C(sI − A)−1 B + D

(7)

Control Design Objectives The standard feedback system is shown in Fig. 1. The closed loop response is: y = (I + GK)−1 GKr + (I + GK)−1 Gd d

(8)

Optimal Robust Control with Applications for a Reconfigurable Robot

25

Fig. 1. General closed loop configuration.

where the following closed loop transfer functions are defined: L = GK loop transfer function S = (I + GK)−1 = (I + L)−1 sensitivity function T = (I + GK)−1 GK = (I + L)−1 L complementary sensitivity function where S + T = I. The control error is: e = y − r = −Sr + SGd d

(9)

The corresponding system input signal is; u = KSr − KSGd d

(10)

The disturbance rejection and command tracking are obtained with S ≈ 0, or equivalently, T ≈ I in the low-frequency range. The weighted sensitivity functions S is an indicator of closed-loop performance. The design performance specifications in terms of S include: 1. Minimum bandwidth frequency wB (defined as the frequency where S(jw)) crosses 0.707 (≈ −3dB). 2. Maximum peak magnitude of S,  S(jw) ≤ M . These specifications can be captured by an upper bound, 1/wP , on the magnitude S, where wP is a weight selected by the designer. The performance requirements become: |S(jw)| < 1/|1/wP (jw)|, ∀w ⇔ wP S < 1, ∀w ⇔ wP S∞ < 1

(11)

3.1 Mixed Sensitivity H∞ Control The mixed sensitivity S/KS control problem for a reconfigurable robot is to find a stabilizing controller K that minimizes:    (I + GK)−1    min (12) −1  Kstabilizing  K(I + GK) ∞

26

R. A. Saidi and S. Alirezaee

Fig. 2. S/SK mixed sensitivity optimization in regulation form (left), S/SK mixed sensitivity minimization in tracking form (right).

where the sensitivity function S = (I + GK)−1 is shaped along with the closed loop function transfer function KS. This cost function can be interpreted as the design objectives of nominal performance (without uncertainties), good tracking or disturbance rejection, and robust stabilizing with additive uncertainties. The KS is the transfer function between d and control signal u. The KS function is regarded as a mechanism for limiting the gain and bandwidth of the controller. To implement a unified solution procedure, the above cost function is recast into a standard H∞ configuration shown in Fig. 3. The solution can be obtained by using the LFT (Linear Fractional Transform) technique, in which the signals are classified into sets of external inputs, outputs, input to the controller, and output from the controller. The external inputs (reference and disturbance) are denoted by w, the output signals to be minimized which includes both performance and robustness measures, y is the vector of measurements available to the controller K, and u is the vector of control signals. P (s) is called the generalized plant or interconnected system. The objective is to find a stabilizing controller K to minimize the output z overall w with energy less than or equal to 1. This is equivalent to minimizing the H∞ -norm of the transfer function from w to z. The mixed sensitivity problem shown in Fig. 2 (left) is formulated to reject the disturbance when w = d. The z output error signal is defined as z = 1 , where z1 = wp y and z2 = −wu u. It is z2 also calculated that z1 = wp Sw and z2 = wu KSw and the elements of the generalized plant P (s) are:

Fig. 3. General H∞ control configuration.

Optimal Robust Control with Applications for a Reconfigurable Robot

27

⎡ ⎤ ⎡ ⎤ wp wp G

z1 ⎣z2 ⎦ = ⎣ 0 −wu ⎦ w u v −I −G Then:

z = Fl (P, K)w = [P11 + P12 K(I − P22 K)−1 P21 ]w

where the Fl (P, K) is the lower linear fractional transformation of P and K:

wp S Fl (P, K) = wu KS

(13)

Another formulation of the S/KS mixed sensitivity optimization is in the standard tracking control form as shown in Fig. 2 (right). This is a tracking problem in which the input signal is the reference command r and the error signals are z1 = −wp e = wp (r−y) and z2 = wu u. The results of the tracking problem is to minimize z1 = wp Sr and z2 = wu KSr. 3.2 H∞ Control Design The design objective of H∞ feedback control is to stabilize the closed loop in the presence of uncertain parameters. The robust stability of the resulting closed loop can be analyzed based on the structured singular value μ approach [11]. Uncertain Systems The difference between models and the actual physical system is called model uncertainty. Uncertainty is produced by the imperfect knowledge of some components of the system, or the change in their behavior due to changes in operating conditions. There are two major classes of uncertainty [12]: • Dynamical uncertainty, which consists of dynamical components neglected in the linear model as well as variations in the dynamical behavior during operation. • Parameter uncertainty is usually expressed in terms of accuracy coming from calibration and its imperfect nature. In Fig. 4, the errors between a model and the system are represented by the uncertainty Δ(s). The uncertainty structure Δ is defined by the block diagonal matrices with a specified fixed structure: Δ = diag[δ1 Ir1 , . . . , δs IrS , Δ1 , . . . , ΔF ] : δi ∈ C, Δj ∈ Cmj ×mj , Δ < 1 (14) Any matrix in Δ is a block diagonal with two types of blocks: repeated scalar (complex or real) and full blocks. S and F are the number of repeated scalar blocks and full blocks, respectively. System Interconnections The standard control structure of the H∞ /μ controller design is shown in Fig. 5 (left). The generalized plant P (s) contains the nominal system model as well as the system weighting functions representing the design specifications, disturbance spectra, uncertainty, and input weightings. The uncertainty Δ(s) represents the uncertainty model

28

R. A. Saidi and S. Alirezaee

Fig. 4. Uncertain system representation.

Fig. 5. Standard H∞ /μ control structure (left), A control structure for closed system analysis (right).

which is assumed to be normalized. Finally, the controller K(s) has to be designed so that the influence of w on z is small. Here w includes external signals that excite the system, such as disturbance, noise, and reference signals. The output z contains signals which are to be kept small, such as error and control signals. The lower and upper linear fractional transformation (LFT) are denoted by Fl and Fu , respectively. The closed-loop transfer function matrix N between error signals and external inputs, shown in Fig. 5 (right), is related to P and K by a lower LFT: N = Fl (P, K) = P11 + P12 K(I − P22 K)−1 P21 The closed loop N can be partitioned as follow:

N11 N12 N = Fl (P, K) = N21 N22

(15)

(16)

And the uncertain closed loop transfer function from w to z is given as: F = Fu (N, Δ) = N22 + N21 K(I − N11 K)−1 N12

(17)

To analyze the robust stability and performance of F , the perturbed close loop system is rearranged with structured uncertainties as shown in Fig. 6 (right).

Optimal Robust Control with Applications for a Reconfigurable Robot

29

Fig. 6. A control configuration with extended uncertainty structure.

μ-Norm The smallest size is of a perturbation Δ ∈ Δ that can destabilize N is defined by the structured singular value μ: μΔ (N11 )−1 = min {¯ σ (Δ)|det(I − N11 Δ) = 0 for structured Δ : σ ¯ (Δ) ≤ 1} (18) Δ

An uncertain closed loop system has been shown in Fig. 6 where the input w is to describe the disturbance and reference signals, z is the output to describe the error signal. Robust stability, nominal performance and robust performance are analyzed as follows: 1. Robust stability (RS) is achieved if μΔ (N11 (jw)) ≤ 1, ∀w. 2. Nominal performance (NP) is achieved if N22 (jw) ≤ 1, ∀w. (jw)) ≤ 1, ∀w 3. Robust performance

 (RP)  is achieved if μΔ e (N  Δ 0 Δ ∈ Δ , Δp  < 1, . where Δ e = 0 Δp H∞ -Optimal Control Assume there is no uncertainty present, the H∞ -optimal control problem is defined as follows: (19) K = min Fl (P, K)∞ K

for stabilizing K. The controller K that minimizes the H∞ norm and satisfies: Fl (P, K)∞ < γ

(20)

for a feasible γ, where γ is the lower bound on the H∞ norm. μ-Optimal Control The control design problem including the uncertainty model is solved as follows: K = min Fl (P, K)μ K

(21)

30

R. A. Saidi and S. Alirezaee

The solution to this problem is iterative and referred to as DK-iteration shown in Fig. 7. It starts with the μ upper bound problem by minimizing the scaled singular value of the closed-loop matrix Fl (P, K): μΔ (Fl (P, K)) ≤

inf

D(w)∈D

σ ¯ (D(w)Fl (P, K)D(w))−1

(22)

where D is the set of block diagonal matrices whose structure is compatible with that of Δ described in Eq. (14). D(w) is a frequency-dependent scaling. The μ upper bound problem is to minimize: min  D(w)Fl (P, K)D(w)−1 ∞ K

(23)

over all controllers K that stabilizes P and over all functions D(w) ∈ D. This problem is solved as follows: • Solve an H∞ optimization problem over all K: min  D(w)Fl (P, K)D(w)−1 ∞ K

(24)

ˆ Therefore, over all stabilizing K and let the minimizing controller denoted by K. minimizing H∞ norm of the scaled maximum singular value of the closed-loop system matrix. −1 ˆ • Minimize μ upper bound σ ¯ [D(w)Fl (P, K)D(w) ] over D(w) point-wise across ˆ frequency. This minimization produces a new scaling function D(w). ˆ • Replace D(w) with D(w) and return to first step.

Fig. 7. μ-Synthesis control procedure.

Optimal Robust Control with Applications for a Reconfigurable Robot

31

4 Application of Mixed Sensitivity H∞ Control (Simulation and Results) A Bosch Scara robot with RRT kinematic structure is modeled for analysis as given in [13]. The state equations of the first link are derived as follows: x˙1 = x2 1 x˙ 2 = (Ks x3 + Ds N1 x4 − Fv1 x2 + τDL ) JL1 x˙ 3 = N1 x4 − x2 1 x˙ 4 = (Km1 x5 − N1 k5 x3 − Fv1 x4 ) Jm1 1 (−Rm1 x5 − Km1 x4 + x6 + Kp12 u − Kp12 x5 ) x˙ 5 = Lm1 x˙ 6 = −ki12 x5 + ki12 u y = x1

(25)

For the nominal model derived in Eq. (25), the input weight is selected to be about 1 or less to bound the magnitude of the input signal, and therefore a simple weight wu = 1 is selected. The performance weight is selected in the form: wp1 =

s/M + wB ; s + wB A

M = 1.5, wB = 10, A = 10−4

(26)

The value wB = 10 has been selected to achieve approximately the desired crossover frequency wc = 10 rad/s. The H∞ problem is solved with the μ toolbox in Matlab. The simulation results are as follows: An optimal H∞ norm of 0.9856, so the weighted sensitivity requirements are almost satisfied with S∞ = 1.15, T ∞ = 1.0 and wc = 9.96 rad/s. This design shows the tracking is very good as shown by curve y1 in Fig. 8 (left up), but from the curve y1 in Fig. 8 (left down), the disturbance response is very sluggish. In case of disturbance rejection is the main concern, a performance weight can be selected to specify higher gains at low frequencies: wp2 =

(s/M 1/2 + wB )2 ; (s + wB A1/2 )2

M = 1.5, wB = 10, A = 10−4

(27)

The inverse of this weight is shown in Fig. 8 (right) and from the dashed line to cross 1 in magnitude at about the same frequency as weight wp1 , but specifies tighter control at lower frequencies. With the weight wp2 , the design with an H∞ -optimal norm of 1.32, is yielding S∞ = 1.35, T ∞ = 1.3 and wc = 15.30 rad/s. In conclusion, the first design is best for reference tracking whereas design 2 is best for disturbance rejection. 4.1 Application of H∞ Control (Simulation and Results) The parameter variations of the kinematic structure are important properties of the reconfigurable dynamic system. As a result, the inertia variations and the nonlinear

32

R. A. Saidi and S. Alirezaee

Fig. 8. Closed loop step responses for two alternative designs (1 and 2) for disturbance rejection problem (left). Inverse of performance weight (dashed line) and the resulting sensitivity function (solid line) for two H∞ designs (1 and 2) for disturbance rejection (right).

behavior of the viscous friction can be modeled as parametric uncertainties bounded with upper and lower limit values. The nominal values of these parameters are measured for different joint links of the robot. The dynamics parameters the Coriolis torque, cross-coupling inertia, and centrifugal torques are regarded as disturbance torque τLD1 . Figure 9 shows that the uncertain parts of the viscous friction (damping part) and link inertia are pulled out of the nominal system. The inertia variation of the first link is considered for different joint positions (θ2 = 0, θ2 = π) of link 2. These variations are modeled as multiplicative uncertainty. The state equations of the perturbed first link system are derived as follows:

Fig. 9. Derived block diagram of the first link of the Bosch Scara robot.

Optimal Robust Control with Applications for a Reconfigurable Robot

33

x˙1 = x2

1 (Ks x3 + Ds N1 x4 − Fv1 x2 − q2 + τDL1 ) JL1 = N1 x4 − x2 1 = (Km1 x5 − N1 k5 x3 − Fv1 x4 ) Jm1 1 = (−Rm1 x5 − Km1 x4 + x6 + Kp12 u − Kp12 x5 ) Lm1 = −ki12 x5 + ki12 u

x˙ 2 = q1 + x˙ 3 x˙ 4 x˙ 5 x˙ 6

(28)

The outputs of the perturbed system are: y = x1 p1 = q1 + p2 = x2

1 (Ks x3 + Ds N1 x4 − Fv1 x2 − q2 + τDL ) JL1

Then, the state space representation is cast as follows: ⎡ ⎤ ⎡ ⎤ ⎤⎡ A B1 B2 x˙ x ⎣ yΔ ⎦ = ⎣ C1 D11 D12 ⎦ ⎣ uΔ ⎦ y u C2 D21 D22

(29)

(30)

The system G(s) includes all nominal parameter values of the model with diagonal uncertainty matrix Δ = diag(δJ , δV ) as shown in Fig. 10.

Fig. 10. LFT representation of the perturbed Bosch Scara model.

The matrix Δ is unknown and called the uncertainty matrix and has a fixed diagonal structure. The frequency response of the perturbed open loop system is computed for different values of the perturbed parameters for −1 ≤ δJ , δV ≤ 1 as shown in Fig. 11 (left). Nominal Stability and Performance The controller design should make the closed-loop system internally stable and the required performance should be achieved for the nominal system G(s) by minimizing the following objective criterion:    Wp (I + GK)−1    (31)  Wu K(I + GK)−1  < 1 ∞

34

R. A. Saidi and S. Alirezaee

where S = (I + GK)−1 is the sensitivity function of the nominal system, and Wp , Wu are weighting functions chosen to represent the frequency characteristics of the disturbance d and the input control level. The simulation results shown in Fig. 11 (right) indicate that the norm inequality has been satisfied and the closed loop system has reduced the effect of the disturbance τLD1 achieving the required performance.

Fig. 11. Perturbed (set of family) open loop systems (left). Singular value of the closed loop with H∞ controller (right).

Selection of Weighting Functions The weighting functions wp and wu are used to reflect the relative significance of the performance requirements over the frequency ranges. The selected performance weighting function is a scalar function chosen as follows: Wp (s) = 0.95

s2 + 1.8 s + 10 s2 + 8 s + 0.01

(32)

which ensures disturbance rejection and good transient response (settling time less than 10 and overshoot less than 20% for the nominal system). The control weighting function Wu is chosen as scalar Wu = 10−2 . H∞ -Controller Design An H∞ controller is designed using the system connection shown in Fig. 12.

Optimal Robust Control with Applications for a Reconfigurable Robot

35

Fig. 12. H∞ control structure.

The generalized plant P (s) includes the performance and uncertainty weightings. The inputs to the plant are uΔ , d, r, u and the outputs are yΔ , ep , eu , y. The controller minimizes the norm of Fl (P, K) over all stabilizing controllers, where the transfer function matrix Fl (P, K) is:



e d (33) = Fl (P, K) p eu r To achieve the desired performance of disturbance rejection (or tracking reference), it is necessary to satisfy the inequality Wp (I + GK)−1 ∞ < 1. Since Wp is a scalar function, the singular values of the sensitivity function (I + GK)−1 over the frequency range must lie below that of w1p . This indicates Wp (I + GK)−1 ∞ < 1 if and only if for all frequencies σ[(I + GK)−1 (jw)] < |1/Wp (jw)|. Figure 13 (left) shows that the sensitivity function is below the performance weighting function for all frequencies. Analysis of Closed Loop System with H∞ -Controller The robust stability has been analyzed based on the perturbed closed-loop transfer function matrix Fl (P, K). Since the uncertainty considered is structured, verification of the robust stability and robust performance needs the frequency response in terms of μ values. To achieve robust stability it is necessary that the transfer function matrix (I − N11 K)−1 Equation (17) is not singular. This implies that the μ(N11 ) must be less than one over the frequency range and that the closed loop system with H∞ control achieves robust stability as shown in Fig. 13 (right). The maximum value of μ is 0.90675 1 are allowwhich implies that the structured perturbations with norm less than 0.90675 1 . The nominal performance of the able, i.e. the stability maintains for Δ∞ < 0.90675 closed loop system is analyzed using the frequency response of the (N22 ) Eq. (17). The nominal stability is achieved if and only if μ(N22 ) < 1 for all frequency ranges. The robust performance of the closed loop system with H∞ control is also tested using muanalysis. The block uncertainty structure includes 2×2 diagonal parametric uncertainty block and 1×2 performance block as follows:



 Δ 0 = Δ : Δ ∈ R2×2 , Δp ∈ C1×2 (34) 0 Δp

36

R. A. Saidi and S. Alirezaee

Fig. 13. Sensitivity and performance weighting function with H∞ controller (left). Robust stability analysis with H∞ control (right).

The robust performance of the closed loop system is achieved if and only if is less than one for each frequency. The frequency responses showing the nominal and robust performance are plotted in Fig. 14 (left). The frequency responses indicate that the system achieves the nominal performance with μ(N22 ) < 1 but fails to satisfy the robust performance criterion. From the calculations, the nominal performance has a maximum of 0.94998 while the μ curve (blue dotted line is the μ value and the red dotted line is the maximum singular value) for the robust performance has a maximum of 1.7584. For the robust performance, the size of perturbation matrix Δ must be limited 1 to ensure the perturbed performance function satisfying: to Δ∞ ≤ 1.7584  μΔ(N ) < 1, ∀(w, Δ) The frequency responses of the perturbed closed loop systems are shown in Fig. 14 (right). The step and disturbance responses are shown in Figs. 15 (left) and 15 (right), respectively. In both cases, the overshoot does not exceed 20% which demonstrates satisfactory performance in the presence of parametric perturbations. 4.2

Application of µ-Synthesis Control and DK Iterations (Simulation and Results)

The uncertainty blocks Δ given in Eq. (34) are diagonal and correspond to the inertia and viscous damping uncertainties of the robot link. The block Δp is an uncertainty block that is introduced to represent the performance requirements in the control structure of the mu approach [14]. The following optimization problem is formed to minimize the upper bound of μ values which in turn reduced the maximum value of μ. min

min

K Dl (s),Dr(s)

where

 Dl (s)Fl (P, K)Dr−1 ∞

⎤ ⎡ 0 d1 (s) 0 0 ⎦ Dl (s) = ⎣ 0 d2 (s) 0 0 d3 (s)I2

(35)

(36)

Optimal Robust Control with Applications for a Reconfigurable Robot

37

Fig. 14. Nominal and robust performance with H∞ control (left). Perturbed (set of family) of closed loop systems with different uncertainty values (right).

Fig. 15. Transient response to reference input with H∞ control (left). Transient response to disturbance input with H∞ control (right).

and

⎤ ⎡ 0 d1 (s) 0 0 ⎦ Dr (s) = ⎣ 0 d2 (s) 0 0 d3 (s)I2

(37)

where d1 (s), d2 (s) and d3 (s) are scaling transfer functions. μ-synthesis is to find a minimum value of the cost function and construction of a stabilizing controller K such that for each frequency w ∈ [0, ∞] the structured singular value satisfies the condition: μΔ˜ [Fl (P, K)(jw)] < 1

(38)

Satisfying the above condition ensures robust performance of the resulting closed loop.

38

R. A. Saidi and S. Alirezaee

Robust Stability and Performance of μ-Control μ-Control has been employed to achieve robust stability and performance against structured uncertainties. From the iteration summary, it is seen that the value of peak μ value decreases to 0.97, which means that the robust performance has been achieved. Iteration Summary ---------------------------------------------------------------Iteration # 9 10 11 12 13 Controller Order 17 15 15 13 13 Total D-Scale Order 12 10 10 8 8 Gamma Achieved 1.020 1.013 1.016 1.013 1.01 Peak mu-Value 1.021 1.012 1.007 1.000 0.97 MU iteration number: 14

The nonlinear properties of the inertia and viscous parameters are modeled as parametric uncertainties in addition to the full block representing the performance channels. Figure 16 (left) shows the sensitivity function of the closed loop system with the 13th-order controller. The sensitivity function is below the inverse of the performance weighting function, which implies that the nominal performance is achieved. The robust stability of the closed loop system is analyzed by the magnitude of the upper and lower bounds of μ as shown in Fig. 16 (right). The robust stability of the closed loop system is achieved since the maximum value of μ is equal to 0.49734. i.e. the system stability 1 . The frequency responses of the nominal and robust is preserved for ||Δ||∞ < 0.49734 performance criteria are obtained as shown in Fig. 17. The maximum value of μ in the robust performance analysis is 0.97512. This means that the closed loop system with μ-controller achieves robust performance since:    Wp (I + GK)−1    (39)  Wu K(I + GK)−1  < 1 ∞

Fig. 16. Sensitivity and weighting functions of Mu-Control (left). Robust stability of Mu-control (right).

Optimal Robust Control with Applications for a Reconfigurable Robot

39

Fig. 17. Nominal and robust performance of Mu-Control.

Fig. 18. Sensitivity functions of perturbed systems with Mu-Control (left). Performance of perturbed systems with Mu-Control (right).

for every diagonal Δ with ||Δ||∞ < 1. The frequency response of the sensitivity functions of the perturbed closed-loop systems Fig. 18 (left), shows the robust properties of the system with the μ-controller. These responses remain below the frequency response of the inverse of the performance weighting function. The magnitude response of the weighted mixed sensitivity function Equation (39) are shown in Fig. 18 (right). The robust performance has been satisfied for all perturbed systems because the magnitudes over the frequency range are below 1. The frequency responses of the perturbed closed loop systems are shown in Fig. 19 (left), where the closed loop perturbed systems maintain their magnitude over a wider frequency bandwidth. This would expect faster responses with the designed closed-loop system. Figures 20 (left) and 20 (right) show the transient responses of the closed-loop system to reference and disturbance

40

R. A. Saidi and S. Alirezaee

Fig. 19. Frequency responses of perturbed closed loop systems with Mu-Control (left). Transient responses of perturbed closed loop with Mu-Control (right).

inputs, respectively. Comparing with the responses Figs. 15 (left) and 15 (right), we see that the μ-controller ensures a smaller overshoot (10 %) while maintaining a similar settling time. Figure 19 (right) shows the transient responses (to input reference) of a family of perturbed closed loop systems with μ-controller. In all cases, the overshoot does not exceed 20 % which demonstrates satisfactory performance in the presence of parametric perturbations.

Fig. 20. Transient response to step reference input with Mu-Control (left). Transient response to disturbance input with Mu-Control (right).

4.3

Comparison of H∞ and µ-Controllers

A comparison between H∞ and μ controller is given based on criteria such as robust stability, nominal and robust performance. The comparison of the designed systems with H∞ and μ controller begins with the frequency responses of these controllers. μcontrol is characterized by larger gains in the frequency range above 10 rad/s compared

Optimal Robust Control with Applications for a Reconfigurable Robot

41

with the H∞ control as shown in Fig. 21 (left). The phase response is close to each other up to 3 rad/s and after that frequency, the μ controller continues to introduce a larger phase delay. The closed loop system with the H∞ and μ controllers is characterized by larger bandwidth that leads to faster dynamics in response to reference inputs as shown in Fig. 21 (right). The comparison of robust stability is shown in Fig. 22 (left). The frequency response of μ curve shows a lower amplitude than the H∞ control. Therefore, the system with the μ controller allows a larger norm of perturbations and maintains robust stability. Figure 22 (right) shows the nominal performance amplitude using μ controller is less the H∞ curve, resulting in better performance. The μ values over the frequency for the two controllers are plotted in Fig. 22 (left). The curves confirm that the system with H∞ controller does not achieve the robust performance criteria. The μ curve is less than one which indicates that the robust performance has been achieved against the specified structured uncertainties. In summary, the two controllers ensure robust stability of the closed-loop system to the parametric perturbations that are included in the 2 × 2 diagonal uncertainty matrix. However, the closed-loop system performance varies differently than the action of these diagonal uncertainties. In the following, the worst-case perturbation is determined for performance and to compare the two systems when the norm of perturbations increases. The results in Fig. 23 (right) show that the μ ensures robust performance for large perturbations. The performance of the closed-loop system deteriorates rapidly with the increasing perturbation magnitude in the case of H∞ controller. From the simulations above, using μ-controller in case of structured uncertainties will always produce more satisfactory performance and a less conservative controller.

Fig. 21. Frequency responses of H∞ and μ-controllers (left). Frequency responses of closed loop systems (right).

42

R. A. Saidi and S. Alirezaee

Fig. 22. Comparison of robust stability of H∞ and μ-controllers (left). Comparison of nominal performance of H∞ and μ-controllers (right).

Fig. 23. Comparison of robust performance of H∞ and μ-controllers (left). Performance degradation of H∞ and μ-controllers (right).

5 Conclusions In our research, we developed optimal robust controllers for a reconfigurable manipulator with features such as variable twist angles, length links, and hybrid (transnational/rotational) joints. The kinematic design parameters, i.e., the D-H parameters, are variable and can generate any required configuration to facilitate a specific application. With a reconfigurable robot, the dynamic parameters (inertia, Coriolis, centrifugal, and gravity) were uncertain due to their dependency on the robot configurations. The parameter variations were modeled as bounded parametric uncertainties and an H∞ optimal control was designed to achieve the performance specifications in the presence of these uncertainties. The resulting closed-loop system was analyzed using the singular structured value μ approach. This research is intended to serve as a foundation for future studies in reconfigurable control systems.

Optimal Robust Control with Applications for a Reconfigurable Robot

43

References 1. Park, Y.M., et al.: The first human trial of transoral robotic surgery using a single-port robotic system in the treatment of laryngo-pharyngeal cancer. Ann. Surg. Oncol. 26(13), 4472–4480 (2019). https://doi.org/10.1245/s10434-019-07802-0 2. Giordano, A.M., Ott, C., Albu-Schaffer, A.: Coordinated control of spacecraft’s attitude and end-effector for space robots. IEEE Rob. Autom. Lett. 4(2), 2108–2115 (2019). https://doi. org/10.1109/LRA.2019.2899433 3. Roehr, T.M., Cordes, F., Kirchner, F.: Reconfigurable integrated multirobot exploration system (RIMRES): heterogeneous modular reconfigurable robots for space exploration. J. Field Rob. 31(1), 3–34 (2014) 4. Koren, Y., Wang, W., Gu, X.: Value creation through design for scalability of reconfigurable manufacturing systems. Int. J. Prod. Res. 55(5), 1227–1242 (2017). https://doi.org/10.1080/ 00207543.2016.1145821 5. Andersen, A.-L., Brunoe, T. D., Nielsen, K., R, C.: Towards a generic design method for reconfigurable manufacturing systems: analysis and synthesis of current design methods and evaluation of supportive tools. J. Manuf. Syst. 42, 179 (2017). https://doi.org/10.1016/j.jmsy. 2016.11.006 6. Yim, M., et al.: Modular self-reconfigurable robot systems [grand challenges of robotics]. IEEE Rob. Autom. Mag. 14(1), 43–52 (2007). https://doi.org/10.1109/MRA.2007.339623 7. Murata, S., Kurokawa, H.: Self-reconfigurable robots. IEEE Rob. Autom. Mag. 14(1), 71–78 (2007). https://doi.org/10.1109/MRA.2007.339607 8. Djuric, A.M., Al Saidi, R., ElMaraghy, W.: Global kinematic model generation of n-DOF reconfigurable machinery structure. In: IEEE International Conference on Automation Science and Engineering (2010) 9. Zhou, K., Doyle, J.C., Glover, K.: (Keith): Robust and Optimal Control. Prentice Hall, New Jersey (1996) 10. Wu, M., He, Y., She, J.H.: Stability Analysis and Robust Control of Time-Delay Systems. Springer, Cham (2014). https://doi.org/10.1007/978-3-642-03037-6 11. Qaisar, T., Mahmood, A.: Structured uncertainty modeling, analysis and control of a customized robotic arm. In: IEEE International Conference on Automatica (ICA-ACCA), pp. 1–4 (2016). https://doi.org/10.1109/ICA-ACCA.2016.7778508 12. Zhu, X., An, T., Wang, G.: Decentralized position force zero-sum approximate optimal control for reconfigurable robots with modeled dynamic. Trans. Inst. Meas. Control 2023 45(3), 466–475 (2022). https://doi.org/10.1177/01423312221109726 13. Al Saidi, R., Alirezaee, S.: Robust gain scheduling LPV control for a reconfigurable robot. In: 19th International Conference on Informatics in Control, Automation and Robotics, ICINCO 2022. ICINCO (2022) 14. Mirzaei, M., Niemann, H.H., Poulsen, N.K.: DK-iteration robust control design of a wind turbine. In: IEEE International Conference on Control Applications (CCA), pp. 1493–1498 (2011). https://doi.org/10.1109/CCA.2011.6044429

Improving 2D Scanning Radar and 3D Lidar Calibration Jan M. Rotter(B) , Levin Stanke , and Bernardo Wagner Real Time Systems Group, Leibniz University of Hanover, Appelstraße 9a, 30167 Hanover, Germany {rotter,wagner}@rts.uni-hannover.de https://rts.uni-hannover.de

Abstract. Sensor fusion in mobile robots requires proper extrinsic and intrinsic sensor calibration. Robots in the search and rescue robotics domain are often equipped with multiple range sensor modalities such as radar and lidar due to the harsh environmental conditions. This article presents a method to easily calibrate a 2D scanning radar and a 3D lidar without the use of special calibration targets. Therefore, it focuses on the improvement of the feature extraction from the environment by applying filtering algorithms to remove noise and improve the signal-to-noise ratio. Additionally, a second optimization stage is introduced to propagate measurement uncertainties of the lidar to the calibration result. The results are compared to the previous version of the algorithm as well as to the ground truth parameters. Furthermore, statistical tests are performed to confirm the validity of the calibration results. Keywords: Mobile robotics · 2D scanning radar · 3D lidar · Target-less calibration · Search and rescue robotics

1 Introduction Mobile robots are used today in different scenarios. One civil use case is search and rescue (SAR) [8, 9, 12] where robots are deployed in missions to find and help victims in disaster situations. Arriving at a disaster site, first responders have to act quickly while protecting their own life and minimizing personal risks. This is where SAR robots provide an invaluable means to get an overview of a situation or to search for survivors in places that are not reachable by human or animal helpers. In general, disaster sites are considered harsh and challenging environments which can be covered in dust or smoke with objects lying or hanging around and blocking the path or decreasing the sensing abilities of an autonomous robot. To detect small objects with high precision, sensors such as cameras and laser scanners are used. The downside of these sensor modalities is that their precision relies on light being able to pass through the scene. If the view is blocked by dust or smoke the sensing abilities of these modalities decrease. To ensure a basic level of operation, sensors such as thermal cameras or radar sensors are used in addition. These sensors are robust against visual disturbances with the disadvantage of not being as precise or easy to interpret as their high-resolution counterparts. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  G. Gini et al. (Eds.): ICINCO 2022, LNNS 836, pp. 44–64, 2023. https://doi.org/10.1007/978-3-031-48303-5_3

Improving 2D Scanning Radar and 3D Lidar Calibration

45

Fig. 1. Schematic overview of the proposed in-field calibration method for a UAV-carried sensor setup consisting of a 2D scanning radar and a 3D lidar [25].

The sensor readings of multiple modalities are often fused to maximize the overall information value. Therefore, sensor readings have to be spatially assigned to each other. Key to this task is a good intrinsic and extrinsic calibration that provides parameters to correct internal offsets or distortions as well as the external displacement of the different sensors. Especially in SAR scenarios where robots may be assembled at the disaster site to reduce the packaging space, an accurate but also easy-to-perform in-field calibration method is needed. In general, there exist multiple ways to calibrate a robot’s sensor setup. One such method of an efficient target-less calibration method for 2D scanning radar and 3D lidar is presented in [25]. The method is designed especially for the infield calibration of a UAV-carried sensor setup in SAR applications. Therefore, only primitive geometric shapes like planes and lines that can be found in any structured or semi-structured environment are necessary. Figure 1 shows a schematic overview of the method using the side of a vehicle and the wall of a building as calibration features. This article extends [25] by using sophisticated filtering methods for the radar data to extract the environmental features at maximum precision. Aside from the extrinsic parameters, the intrinsic mirror offset parameter special to a 2D scanning radar is also modeled and optimized. An improved optimization model accounts for inaccuracies in the lidar and radar detections. Finally, the calibration results are tested with various statistical methods to ensure the validity of the calculated parameters.

2 Radar Basics Radio Detection And Ranging (RADAR) as a sensor modality has been around for some decades and is used today in many applications like airspace surveillance, advanced driver assistance systems, geographic mapping, or astronomy to name only a few. It actively radiates electromagnetic energy which is intercepted by reflecting objects in the environment. These reflecting objects are usually called targets. The energy is reradiated into the environment depending on the reflection properties and the radiating surface area of the target. A portion of the reflected energy is received by the radar

46

J. M. Rotter et al.

which can be used to determine the target location as well as other information about the target [33]. Depending on the application, different designs and methods of the sensor are used. Pulse radar systems send short bursts of almost rectangular high-power electromagnetic signals and use the time-of-flight principle to detect any target inside the antenna beam. This type of radar can have high maximum range capabilities and is therefore used in long-range applications. The disadvantage is the relatively low range resolution of a few meters which is inversely proportional to the length of the transmitted pulse. If detection ranges are shorter but higher range resolution is needed, frequency-modulated continuous wave (FMCW) radar systems are used. They can also provide higher measurement rates and accuracies. In FMCW radar systems a continuous carrier frequency f0 is linearly increased over a specified bandwidth Δf also called a chirp. Chirps are repeated continuously and have a length of Tc . A target reflecting the chirp signal introduces a delay τi between the sent and the received signal which depends on the range ri and the propagation velocity c τi =

2ri . c

(1)

Both signals are multiplied in the mixing unit yielding only the frequency difference called beat frequency fb as a result of the delay. The beat frequency is calculated as fb =

Δf Δf 2ri . τi = Tc Tc c

(2)

The range resolution which is the minimal distance between two distinct targets that can be measured is determined by ΔR =

c . 2Δf

(3)

The amplitude of a frequency in the combined frequency spectrum can additionally be associated with the reflectance power of the measured object [1]. A visualization of the transmitted signal and the reflected signal from one ideal target can be seen in the Fig. 2.

Fig. 2. Chirp of an FMCW radar returned by an ideal target (adapted from [1]).

Improving 2D Scanning Radar and 3D Lidar Calibration

47

3 Related Works In the following, a short overview of different calibration methods for the extrinsic calibration of a radar sensor system is given. Target-based methods as well as targetless algorithms are separately examined. In addition, different methods to filter the raw radar signals for a more accurate signal are described. 3.1 Target-Based Methods Sugimoto et al. [34] determine the extrinsic calibration parameters between a monocular camera and a 2D millimeter-wave multiple-input-multiple-output (MIMO) radar. They use a single corner reflector as a calibration target which is moved perpendicular to the horizontal radar plane. At the crossing point of the radar plane, they record local maxima in the intensity of the radar measurements, which they associate with a corresponding image from the camera. In order to determine the extrinsic parameters between the camera and the radar sensor, they additionally identify the reflector in the camera image and use the measurements in a least-squares estimation. Wang et al. [36] use a similar approach and sensor setup consisting of a 2D millimeter wave MIMO radar and a monocular camera. As a calibration target, they use a metal sheet instead, which is easier to detect in the camera image at farther distances. However, their radar sensor returns only clustered data and omits the intensity values of the detected targets. This covers the precise real position of the radar detection and degrades the quality of the results. In the camera image, the centroid of the metal sheet is used as the corresponding measurement since the radar detection position is unknown, which results only in rough calibration results. In contrast to the aforementioned approaches El Natour et al. [7] are the first to assume a non-zero elevation of the radar measurements. They take into account that radar detections are not always located in the center of the measurement cone. Instead, they model the measured radar detection as a sphere around the sensor. The detection in the camera image is modeled as a line passing through the camera center in the direction of the detection. The intersection of this line and the sphere results in the 3D location of the measurement. In their experiments they move the sensor setup around multiple radar corner targets and the trajectory is also used in the optimization process. A special calibration target for camera, lidar, and radar is introduced by Perši? et al. [19, 21]. It is based on a radar corner reflector to realize a point-like radar reflection. A thin triangle-shaped styrofoam plane that is not detectable by most radar sensors is placed in front of the reflector for better lidar detection. To increase visibility in the camera and to add more detection constraints, a chessboard pattern is printed on the styrofoam surface. To estimate the elevation of the radar detection, they use the radar cross-section (RCS), which decreases when the target is moved away from the center of the measurement cone. The estimated angle is then used to refine the Z-axis parameter. An alternative calibration target design is proposed by Domhof et al. [5, 6]. They use a quadratic styrofoam plane with four circular holes in the corners, which are detected by the camera and the lidar. A corner reflector in the center is used as the radar target. In contrast to El Natour et al. and Perši? et al., they assume zero elevations for the radar detections. In their experiments, they show that the Root Mean Square Error (RMSE) is approximately 2 cm for the lidar-to-radar calibration and 2.5 cm for the camera-to-radar calibration.

48

3.2

J. M. Rotter et al.

Target-Less Methods

The first target-less calibration method for a camera radar setup is introduced by Schöller et al. [31]. They use a data-driven method by training two neural networks in a boosting-inspired fashion. In their work, only the rotational parameters are considered. As a loss function for their training, they use the euclidean distance between the estimated and the true quaternion. Heng [11] uses a previously built map to calibrate a sensor setup consisting of multiple lidars and automotive MIMO radars. In the first step, he uses the map-building process to calibrate the lidars to each other. After that, the point-to-plane distances of the radar detections to the mapped points are optimized to estimate the extrinsic parameters of the radar sensors. Wise et al. [37] are the first to use velocities for the calibration. They extract the velocity vectors separately for each sensor and extract the extrinsic calibration parameters by optimizing all measurements. Perši´c et al. [20] also present a target-less calibration method. In every sensor modality, they extract and track features in the environment to obtain the sensor’s trajectory. An association algorithm is used to observe differences in the individual trajectories and if deviations occur, a graph-based calibration towards one anchoring sensor is performed. The authors state that the calibration and decalibration detection method is limited to the rotational part of the extrinsic parameters only. 3.3

Speckle Filtering of Radar Signals

Removing noise from radar signals and radar images has been a field of research since the early days of radar. There exist two basic methods to remove speckle noise from radar data, namely multi-look methods and spatial filtering. Multi-look speckle noise removal uses a series of radar images of the same scene. By adding all the images of the series, speckle noise can be reduced because of its multiplicative nature. The first to discover this connection were Porcello et al. [22]. They show that summing up uncorrelated Synthetic Aperture Radar images of the same terrain patch led to a significant reduction of the speckle noise. The same technique is used by Rouveure et al. for an FMCW radar in a robotic application. They use the robot’s odometer as well as an Inertial Measurement Unit (IMU) to estimate the short-term movement of the robot. After motion correction of the radar images they integrate over n measurements depending on the robot’s velocity [27, 28]. Spatial filtering methods can be divided into adaptive and non-adaptive methods. Non-adaptive filters are applied on the full-scale image regardless of the local statistical properties of an image region. Simple scalar filters are for example mean or median filters. They replace the anchoring pixel of an image patch with the patch’s mean or median. This results in a smoothed version of the original image where details and edges are lost [35]. In contrast to that, adaptive filters make use of the statistical properties inside the image patch. The first adaptive antispeckle filter was introduced by Lee et al. [14]. The filter assumes gaussian distributed noise and that the mean and the noise of one pixel can be estimated from the mean and the variance of the local neighborhood. By comparing the local mean and variance to the overall mean and variance, regions containing edges and details can be detected and excluded from the smoothing operation. Later improvements to the filter led to a better noise reduction around edges [15, 16]. Another adaptive filter is the Frost filter

Improving 2D Scanning Radar and 3D Lidar Calibration

49

which is based on a least-squares minimization. Similar to the Lee filter, multiplicative and gaussian-distributed noise is assumed. Depending on the local mean and variance, a weighted sum of pixels is used to generate the new pixel value [10]. 3.4 Improving Resolution of Radar Signals The range resolution of an FMCW radar only depends on its bandwidth. Due to regulatory limitations but also due to technical reasons, it is not possible to increase the bandwidth of a chirp infinitely. This imposes limitations on the maximum possible range resolution of an FMCW radar. To further increase the range resolution without changing the hardware or violating national regulations, there exist some algorithms to estimate the frequency spectrum from insufficient sensor readings. Examples are MUSIC (Multiple Signal Classification) [30] and ESPRIT (Estimation of Signal Parameter via Rotational Invariance Technique) [18], that benefit from previously known parameters of the signal. Rouveure et al. [26] successfully used the ESPRIT algorithm on a rotating FMCW radar to improve the range resolution. However, both algorithms are computationally complex and the number of targets has to be determined beforehand. Other improvement methods model the complete radar image as a convolution of the original image with the radar antenna and the filter characteristics of the FFT. These methods employ deconvolution techniques to remove the effects introduced by the radar hardware. A distinction is made between non-iterative methods like the regularized inversefiltering algorithm [23] or Wiener filtering algorithm [32] and iterative methods like the Richardson-Lucy deconvolution algorithm [17, 24] or Carrington algorithm [3]. All of these methods have in common that the point-spread function, which describes the specific distortion of a point target by the imaging system, has to be known. In [28] Rouveure et al. showed the application of the Richardson-Lucy deconvolution to data from a scanning FMCW radar on a mobile robot. 3.5 Contribution The novel calibration method for a 2D scanning radar and a 3D lidar was first presented in [25] and is based on geometric primitives extracted from (semi-) structured environments. The method is specially designed for use in the SAR domain to enable fast on-site calibration. Unlike in the previously mentioned works, a scanning FMCW radar is used instead of fixed n-channel MIMO sensors. With MIMO sensors, only a certain number of targets can be detected depending on the number of transmitter and receiver channels. In contrast, a scanning radar returns all target measurements for all rotation angles. Additionally, raw range profile data is used instead of pre-extracted target points like in most of the state-of-the-art publications. This enables the use of a specialized signal processing pipeline designed to fit feature extraction purposes. Feature extraction is applied to both sensor modalities separately to find geometric primitives in the environment. To ensure a good calibration quality, matching feature candidates are filtered further to collect features with a greater variety in azimuth and elevation. The calibration is modeled as a graph optimization problem. Using a scanning radar introduces an intrinsic parameter which is the distance offset between the sensor including electrical

50

J. M. Rotter et al.

signal delays and the rotating mirror. The described calibration model also includes this intrinsic offset. In this article, various anti-speckle filter methods as well as signal deconvolution to remove imaging distortions are added to the original method. Different filtering techniques and parameters in the target detection stage of the radar processing and their influence on the calibration results are evaluated. In further real-world experiments, the absolute calibration error of the method is analyzed for different environments. To verify the results of the calibration, statistical tests are performed on the different experiments.

4 Methodology The proposed calibration method follows a multi-step pipeline concept. First, raw sensor data is synchronized to search for matches only in corresponding time frames. In addition to radar and lidar data, the measurements of an IMU are also included to correct for any motion during the measurements in a second step. In the third stage, different filters for both radar and lidar are applied to extract only the relevant points. Matching the extracted features of the different modalities to each other is the fourth step in the processing pipeline. Finally, the corresponding features are filtered so that only a wellconstraining set is retained for the optimization estimating the calibration parameters. 4.1

Assumptions

One precondition that has to be fulfilled by the sensor system is that all sensors have to be synchronized in time. Only in this way, simultaneous sensor readings can be associated using their corresponding timestamps. In contrast to the initially proposed method [25], the velocities of the sensor movements are not constrained anymore. Instead, an IMU is used to correct data for any motion distortions. This way, movements that are too fast to provide accurate data for optimization can also be detected. Next, a set of initial calibration parameters has to be known to transform the points into a common coordinate frame for the matching step. A rule-of-thumb estimation of these parameters is sufficient. The developed method mainly applies to 2D FMCW radar sensors with a rotating mirror. To use it with other types of radar sensors, like MIMO sensors, changes to the optimization model may be necessary. For example, the distance offset parameter could have to be constrained to zero. Another parameter that has to be known in advance is the radar’s range resolution. It can be calculated using Eq. 3 if the sensor’s bandwidth is known. Finally and with the defined use case in mind, it should be possible for the proposed method to be calculated online during the data capturing phase. This applies to all filtering and matching algorithms before the optimization stage. Although there may be techniques resulting in higher-quality output, only filters that coincide with this online criteria can be used. 4.2

Synchronization

The data streams received from the sensors are not synchronized at the time of arrival due to different connection protocols and message handling in the operating system.

Improving 2D Scanning Radar and 3D Lidar Calibration

51

Additionally, the sensors operate at different scanning rates. For the extraction of corresponding features, only radar and lidar sensor readings from the same time frame can be used. Furthermore, all the IMU measurements between two consecutive radar or lidar readings are needed to perform motion correction on the sensor data. To accomplish data synchronization, the time stamps of the sensor messages are used. For each radar scan, the closest lidar scan with respect to time is selected as the corresponding sensor reading. This results in a maximum time difference of half of the scan time of the faster-updating sensor. We then collect all IMU data between two consecutive radar and lidar scans separately, since motion correction is not equal for both sensor streams due to different update rates. 4.3 Motion Correction Both sensors measure the environment by rotating either the sensor elements or the measurement ray around an axis which will be called the sensor axis. If the sensor is moved by a translation perpendicular to the sensor axis, the sensor readings at different rotation angles get distorted in range. A rotation around the sensor axis results in a distortion of the measurement angles, increasing or decreasing the angular resolution of the particular scan. If the sensor is moved along the sensor axis or rotated around an axis perpendicular to the sensor axis, it results in points not lying in the main sensing plane. Using an IMU provides short-term motion data that can be used to estimate the motion during the measurement as well as the new point locations of the measurements. Therefore, all IMU readings between two consecutive scans are collected for the radar and lidar separately. By integrating the IMU data, the transforms between the beginning of the scan and the respective IMU time stamp are estimated. Since time stamps are assigned to every single measurement (radar chirp, laser pulse) of a scan, each distorted measurement will be transformed into one common coordinate frame at the beginning of each scan. For the 2D scanning radar, we finally omit the Z-component of the transformed data points to project the points back into the main sensing plane. Figure 3 shows example radar data of an uncorrected and a motion-corrected scan. 4.4 Preprocessing After motion correction, the sensor data is first processed for each sensor individually. Raw sensor data has to be prepared by various filtering methods to reliably extract geometric features that can later be matched and used in the optimization process. Radar Filtering. Raw radar data is not easy to process. The difference between radar and lidar sensors is that for each range measurement, i.e. chirp, the radar does not only return target points but also measures the combined return signal of all beat frequencies (see Sect. 2). Thus, target points have to be extracted from the raw signal first. A Discrete Fourier Transform (DFT) retrieves the range information from the signal by decomposing it into its frequency components, including amplitudes and phases. The difficult part is to differentiate between the wanted target reflections and different kinds of noise like multi-path reflections, clutter, or speckle noise. Additionally, the beam

52

J. M. Rotter et al.

(a)

(b)

Fig. 3. Motion correction for the 2D radar scanner. (a) Uncorrected radar scan and (b) motioncorrected radar scan.

shape of the radar signal is much wider than a laser ray’s shape which leads to a distortion of points in the imaging process. A similar distortion is additionally introduced by the DFT. In the radar preprocessing stage filters are applied to account for all of these effects so that exact reflection locations without the disturbance of erroneous target detections can be extracted. At first, the speckle noise in the signal has to be removed. Speckle noise is generated by the measurement itself. The radar signal is distributed over a certain area of a target depending on the antenna beam angle and the distance. This area of the target contains several scattering elements that reflect the signal differently but none of which yields a reflection much stronger than the others. The received signal can therefore be seen as the incoherent sum of all the backscattered waves. Constructive and destructive interferences lead to noise in the range cells. Because of this, speckle noise cannot be reduced by increasing the energy of the output signal [2]. For this article, two different filtering approaches are applied and evaluated respectively to remove the speckle noise. The first approach exploits the gaussian-distributed and multiplicative nature of the speckle noise. A measured range bin ri is therefore modeled as (4) ri = rˆi ni where rˆi is the purely reflected signal and ni is the noise with mean 1 and variance σn2 [14]. This means that summing the range bin over time cancels out the noise component. As the sensor setup is moving, motion correction between consecutive scans has to be applied in order to add preceding scans to the current one in a windowing manner. This results in a reduction of speckle noise while also increasing the received energy of valid targets. The only problem with this easy-to-compute approach is the necessity of precise motion estimation from the first to the last summed scan. Using only an IMU, motion drift becomes a problem quite fast due to the low scanning rates of the radar. Therefore, it is only possible, to sum up a small number of scans, which reduces the quality of

Improving 2D Scanning Radar and 3D Lidar Calibration

53

the noise reduction. For this reason, the second approach, a Lee filter [14], applies only to a single scan. As the noise in a single range cell over time is gaussian-distributed, this also applies to the noise in a local neighborhood of unoccupied space. This means that a range cell can be replaced by the mean of its local neighborhood rˆi if neither of the neighborhood cells contains a valid target. The Lee filter compares the estimated a priori variance V ar(rˆi ) of a region around the examined cell to the global variance σn2 of the noise. The underlying assumption is that speckle noise is distributed equally over the complete sensing area. It follows that the distribution of speckle in an unoccupied local neighborhood has to be similar to the global speckle distribution. The similarity k between the local and global variance is therefore used as a weight to the mean filter which results in the selective smoothing of unoccupied regions. r˙i = rˆi + k(ri − rˆi )

(5)

V ar(rˆi ) + V ar(rˆi )

(6)

k=

rˆi σn2

After removing the speckle noise from the signal, the distortion factors can be removed. As explained previously, the radar signal is distorted by two main factors; the beam shape of the antenna and the filtering characteristic of the DFT. Every returned signal can be modeled as a convolution of the distortion elements in the radar system with the ideally returned signal. A point target in an otherwise non-distorting environment, therefore, reveals the combined convolution kernel. This kernel is also called the point-spread function (PSF). As explained in Sect. 3.4 there exist multiple methods to reverse this convolution. Based on its popularity, simple and efficient implementation, and due to the fact that Rouveure et al. [28] successfully used it for scanning FMCW radar, the Richardson-Lucy deconvolution [17, 24] is applied. The main requirement is a known PSF, which has to be extracted either from point target measurements or from simulated or theoretical data. For a 2D scanning radar, the PSF components in range and in azimuth are required. Both components were measured using a trihedral corner reflector as a point target (see Fig. 4). To restrict the kernel width and thus reduce filter complexity, only the range down to −3 dB was used as the azimuth PSF with an angular resolution matching that of the scan. This portion of the characteristic could also be approximated by a gaussian function. A comparison with the simulated antenna characteristic provided by the manufacturer showed no differences. Since, in theory, range distortion is only introduced by the sinc convolution of the DFT, the PSF is chosen accordingly. The measurements support this approximation although a slightly higher weight of the range cells in front of the target and slightly lower weights behind the target can be seen. The reason for this might be internal electronic signal path delays or distortions which can not be modeled without knowledge of the components and signal processing algorithms. Therefore, in the experiments, the measured data is used for the Richardson-Lucy deconvolution. After refining the radar signals, the rest of the processing pipeline of [25] is used. The target points are extracted from the processed signal by a hand-tuned cell-averaging constant false-alarm rate (CA-CFAR) filter. The extracted target points are then analyzed for geometric features, namely lines, that are later used in the optimization process. Lines are extracted using a RANSAC approach until a specified portion of the target points is assigned to features.

54

J. M. Rotter et al.

(a)

(b)

Fig. 4. Theoretical PSF (orange) and real-world data (blue) for azimuth (a) and range (b).

Lidar Filtering. In contrast to the radar data, the data from the lidar is much easier to process. The measurements are taken in a pulsed manner using the time-of-flight principle. This results in single distance measurements, rather than a complete range profile, returning only one or up to a few reflections. Therefore, the filter pipeline does not need to be as complex to extract the target signals. In the first step, sparse outlier points from the measurements are removed. Due to varying point densities, tiny objects, or measurement errors, single sparse points far away from point clusters are created. These outliers can be classified as such by their mean distance to their k-nearest neighbors according to [29]. In a homogeneous region, the mean distance of a point to its k-nearest neighbors is similar to that of other points. Only outlier points exceed this mean distance by a multiple of the standard deviation of nearest neighbor distances. The resulting point cloud is then transformed into a range image. This simplifies finding large homogeneous regions in the laser scan which are typical candidates for planes in structured environments. In a plane, we find that neighboring points do not differ much in their distance to the sensor origin. This makes the gradient magnitude for each pixel in the range image a measure of the homogeneity at that spot. Filtering the points using a fixed threshold reduces the point cloud only to homogeneous regions. To refine the results, a distance transform is applied and points in the border areas of the detected regions are removed. The remaining points in the range image are then transformed back into a point cloud. Lastly, ground segmentation is performed aiming to remove a possible ground plane. In the reduced point cloud, planes with an angle of about 90◦ with respect to the radar’s measurement plane are extracted using a RANSAC model as well. 4.5

Matching

To match the extracted features to each other, a measure of similarity has to be implemented. This measure must consist of one number expressing the geometric vicinity of a line and a plane feature. Since all extracted lines lie in the radar’s measurement plane, the distance to a plane should also only be calculated there. To achieve this, the intersection points of the respective normals with the line, represented by a helper plane Pri , and the plane Plj are calculated. The intersection point on the lidar plane feature is then projected into the radar’s measurement plane along the feature plane. The distance between these two points scaled by inverse distance is used as the similarity measure, which can also be seen in Fig. 5. A feature correspondence is formed by a plane-line pair

Improving 2D Scanning Radar and 3D Lidar Calibration

55

where the similarity measure is minimal. An important step after forming the matches is ensuring a proper spatial distribution around the sensor setup. Therefore, the matches are placed into range, azimuth, and elevation bins. Only if all bins are filled with a minimum number of matches is the final optimization performed on an equally distributed sample subset from all bins.

Fig. 5. Matching of detected lidar planes Pli to a radar line represented by a helper plane Pr [25].

4.6 Optimization As already mentioned, most scanning radars are built with a rotating mirror above the radar antenna. This causes a constant distance offset or in the measurements which has to be corrected before using the data in any application depending on accurate range measurements. This offset also includes the length of the signal path between the antenna and the A/D-converter with its reduced signal propagation speed compared to air and is therefore not measurable by hand. To extract this offset in the optimization, an additional parameter is added to the six extrinsic parameters. The calibration is then formulated as a graph-optimization problem using the g2o-framework [13]. The factor graph of the calibration consists of only one optimizable vertex v0 containing the extrinsic and the offset parameters. A non-optimizable vertex vk is added for every plane-line match with connecting edges for every point pr of the radar line feature. The edges’ measurement error vector is defined as e(pr , v0 , vk ) = (vk .n ∗ pˆl − vk .d) ∗ vk .n (7)   pr pˆl = v0 .r Tl −1 ∗ pr + v0 .o ∗ (8) pr  with vk .n being the normal and vk .d being the distance of the lidar plane feature, v0 .r Tl the extrinsic transform between radar and lidar and v0 .o the intrinsic offset parameter. This article introduces a prior optimization using all point inliers of a plane to account for inaccuracies in the RANSAC’s lidar plane parameter estimation. The nonoptimizable vertices vk are replaced by optimizable vertices vK also holding the plane’s parameters. For every point associated with the plane by the RANSAC algorithm, a unary edge is connected to vK defining the measurement error vector as

56

J. M. Rotter et al.

e(pl , vK ) = (vK .n ∗ pl − vK .d) ∗ vK .n

(9)

with vK .n and vK .d being the plane parameters. This two-stage approach helps to downweigh outlier points. Also, the uncertainty of the lidar point measurement is propagated into the calibration optimization leading to more accurate estimations of the overall parameter uncertainty.

5 Experiments To evaluate the proposed method and the effect of the advanced filtering methods, experiments were conducted comparing the early method [25] and the improved version to each other. For the experiments, the radar and laser sensors were mounted on a handheld platform at known positions so that ground truth calibration parameters were extracted from the CAD model. Additionally, the ground truth for the intrinsic offset parameter was measured using a trihedral reflector in an optical tracking system. The difference in the estimated distance measurements between the sensor and reflector using the tracking data and the radar measurements was set as the ground truth. 5.1

Test Data and Environment

The data from the radar scanner is received in a raw form containing the A/D-converter outputs for every measurement. To convert the data to range profiles, a DFT is applied to the time domain data, extracting the beat frequencies and phases. Every measurement is provided with a corresponding time stamp as well as the respective scan angle. The range profiles are used as input to the calibration method accessing only the signal amplitude. The laser scan data is processed as a point cloud relying on the data transformation of the manufacturer. Each point is provided with a time stamp along with a ring index indicating the transmitter-receiver pair that measured this point. Test data for the various experiments is gathered in different environment settings that fit the targeted use case scenario. Consequently, indoor and outdoor datasets are created. In the indoor scenario, a lobby-like environment and a lab environment are tested. As outdoor scenarios, a single wall, as well as two perpendicular walls, are chosen. To gather properly distributed test data the preprocessing and matching stages are run during the data recording. A visual indicator for detected matches inside each bin is shown to guide the user in the acquisition process. The minimum number of samples per bin is set to nb = 100 for the acquisition. Due to experiences from previous experiments, this should be about four to five times the number of samples actually needed for proper calibration. This way, samples will not be seen twice in the same calibration process. Matches were grouped into three azimuth, three elevation and two range bins. 5.2

Improved Radar Filtering

The improved filtering of the radar data can only be assessed qualitatively. Therefore, single frames from the data sets are compared in different stages of the preprocessing

Improving 2D Scanning Radar and 3D Lidar Calibration

57

pipeline. The goal of preprocessing is to maximize the number of correctly detected target points while keeping the number of false detections to a minimum. Since all the stages of preprocessing depend on their respective predecessor unit, parameters changed in the first stages always have an impact on the outputs and parameters of the following processing units. To find a fitting parameter set for the sensor configuration used, a systematic approach is carried out. The parameters being assessed are the number of integrated samples for the multi-look despeckling, the Lee filter kernel size, and the number of iterations for the Richardson-Lucy deconvolution. The CA-CFAR output of the preprocessed radar measurements is compared to the output produced by the raw data while using the same detection parameters. 5.3 Plane Optimization Calibration accuracy depends not only on the extracted features but also on the optimization model. The newly inserted plane optimization stage extends the previous model [25] significantly. To evaluate the quality of the optimization results, experiments with and without the plane optimization stage are performed. Multiple calibration runs (n = 50) are performed on each of the different environment setups using the optimized preprocessing parameters from the previous experiments. The number of samples per bin is set to nb = 16, which ensures that all bins should be filled using the datasets described above. The calibration results are then tested for their accuracy by calculating and comparing the RMSE to the ground truth parameters. 5.4 Validation To validate the calibration results, further test statistics are applied. First of all a Kruskal-Wallis test [4] is performed to check the independence of the calibration results from the environment that the data was recorded in. The Kruskal-Wallis test, therefore, evaluates if the mean parameter results within one environment data set can be considered equal to the mean parameter results over all environment setups. The significance level is set to α = 0.05. As the Kruskal-Wallis test is only testing the validity of the actual calibration results, an additional test of the validity of the estimated uncertainty in the optimized parameters is performed. The estimated uncertainty of the optimization process is given as the covariance matrix over all parameters. This uncertainty should at best neither underestimate nor overestimate the actual process uncertainty by too much. Therefore, the quotient of estimated standard deviation σˆi and actual standard deviation σi v=

σi σˆi

(10)

evaluates to less than one if the estimated standard deviations underestimate the real process variance. The test is performed on each calibration run in a dataset and the mean is calculated for comparison. If the previous Kruskal-Wallis test also confirms that the calibration output is independent of the input data, the mean variance quotient over all datasets is given.

58

J. M. Rotter et al.

6 Results The results of the previously defined experiments are presented in the following subsections. The experiments were run on a dedicated server. However, the online criteria of the filtering and matching pipeline was tested on current laptop hardware which was also used in the data recording process. 6.1

Improved Radar Filtering

The original radar scan used for the preprocessing experiments is given in Fig. 6. The scan is given as a polar image having the different azimuth scans as rows. Thus, the radar’s origin is located at the left side of the image. To better be able to compare the filter results, a section with a strong radar target produced by a wall is taken from the scan. The speckle noise as well as the low signal-to-noise ratio can easily be seen.

(a)

(b)

Fig. 6. Original radar measurement in full scale (a) and a detailed view (b) for comparison of the filters.

First, the multi-look despeckling is analyzed. To evaluate the filtering effect on the speckle, a still scene is used and shown in Fig. 7. Three integration windows are chosen with window sizes of three, five, and ten measurements. The images show that the speckle is clearly reduced with increasing window size while the effect on the target signal intensity is small. This output suggests that using a filter window of around ten measurements would reduce the noise enough to apply the deconvolution. However, in situations with movement, the filter output is heavily blurred by insufficient motion correction. At an update rate of 4 Hz a window of ten measurements spans over 2.5 s at which the drift of a typical imu already becomes a problem. To achieve similar despeckling results using only one frame, the results of the Lee filter are shown in Fig. 8. The effect of different filter kernel sizes of 3 × 3, 7 × 7, and 11 × 11 on speckle and target signals is analyzed. Compared to the multi-look despeckling the Lee filter achieves similar results. However, the 11 × 11 kernel shows also some target signal degradation leading to the decision to use the 7 × 7 Lee filter for the final signal processing.

Improving 2D Scanning Radar and 3D Lidar Calibration

(a)

(b)

59

(c)

Fig. 7. Results of the multilook speckle filtering with (a) r = 3, (b) r = 5 and (c) r = 10 added frames in a still scene.

(a)

(b)

(c)

Fig. 8. Results of the Lee speckle filtering with (a) 3 × 3, (b) 7 × 7 and (c) 11 × 11 kernel size.

Next in the preprocessing pipeline, the radar measurements have to be deconvolved using the Richardson-Lucy algorithm. As explained in Sect. 4.4, the point spread function was extracted by measuring a point target over multiple frames. The results of different numbers of iterations in the deconvolution are shown in Fig. 9. It can be seen from the images that the signal-to-noise ratio is improved by the increase of the iterations. All in all, a further increase of the iteration number has also no visible degrading effect on the target signal. However, at 80 iterations the rest of the speckle noise begins to get enhanced to a level that a target detector could mistake it for an actual target signal. Also, to be able to calculate the output in an online application, 80 iterations were the maximum the computer could handle in time. Although more iterations might be possible using different hardware, further experiments were conducted using this parameter. Finally, the output of the CA-CFAR target detector is shown in Fig. 10. The CACFAR was hand-tuned to a fitting parameter set using the filtered sensor data. The number of train cells was set to 12 and the number of guard cells around the cellunder-test was set to 2. A false-alarm rate of 0.08 was used. The figure shows the target detections using these parameters both for the original and the filtered measurement. It is obvious that the detector is able to extract more target points at the same false-alarm rate due to the increased signal-to-noise ratio. This improves the later detection of line segments in the radar data.

60

J. M. Rotter et al.

(a)

(b)

(c)

Fig. 9. Results of the Richardson-Lucy filtering at (a) i = 10, (b) i = 40 and (c) i = 80 iterations.

(a)

(b)

Fig. 10. Rusults of the CA-CFAR target detection at the same filter parameters without (a) and with (b) the extended preprocessing.

6.2

Plane Optimization

Table 1 shows the resulting RMSE values per parameter with respect to the ground truth calibration. Column POpt. indicates if the plane parameters are optimized using the inlier points from the lidar. It can be seen that the plane optimization in general results in a lower error with respect to ground truth. Only two parameters stand out from this result. Yaw rotation and internal offset show a slightly increased error compared with their unoptimized counterparts. This can have two reasons. First, the better fit of five parameters in the optimization problem could lead to a shift in the rest of the parameters. Secondly, a measurement error of the ground truth parameters would lead to the same result. Since both ground truth values of yaw and offset are harder to estimate for the sensor setup used in the experiments, this would not be an unlikely reason.

Improving 2D Scanning Radar and 3D Lidar Calibration

61

Table 1. Per parameter RMSE values for the different datasets w.r.t ground truth. POpt. indicates if the plane parameters were optimized using the lidar points. Environment

POpt. x [m]

y [m]

z [m]

yaw [◦ ] pitch [◦ ] roll [◦ ] offset [m]

Outdoor 1 Wall Outdoor 2 Walls Indoor Lobby Indoor Lab

x x x x

0.0136 0.0149 0.0126 0.0507

0.0283 0.0086 0.0116 0.0105

0.075 0.0234 0.0689 0.1668

6.4389 6.2407 6.37 6.4181

1.4101 1.1874 2.941 4.7232

3.0502 1.6161 2.0495 1.8189

0.044 0.0939 0.0712 0.0235

0.0186 0.0179 0.0178 0.0697

0.0393 0.0157 0.007 0.0283

0.0923 0.0602 0.073 0.2619

4.3953 4.657 6.4311 2.4467

2.163 2.8495 2.1556 7.8074

3.329 2.5926 1.1576 1.9764

0.0374 0.0834 0.0636 0.0298

Outdoor 1 Wall Outdoor 2 Walls Indoor Lobby Indoor Lab

6.3 Validation On the combined dataset of all calibration runs with and without plane optimization in all environments, the Kruskal-Wallis test shows significant differences (p ≈ 0) in the per-experiment means for all calibration parameters. This means that either the plane optimization or the different environments have a significant effect on the calibration results. Further tests using only subsets of the datasets (outdoor-optimized, outdoor-not optimized, indoor-optimized, indoor-not optimized, all outdoor, all indoor) show similar results. Consequently, the calibration results can be viewed as partly dependent on the environment where the calibration has been performed in. This result is not unexpected as the calibration is influenced by outlier matches where lines are assigned to the wrong planes. Reasons for mismatches can almost always be found in the environment as radar and lidar perceive their surroundings differently. For example, nets or thin construction-site fences are ideal targets for the radar, whereas the lidar is unlikely to measure their fine structures, which could then lead to mismatches. However, due to the good RMSE results, which do not differ as much between the different environments, the calibration can be considered correct. Finally, the estimated standard deviations from the optimizer were compared to the actual process standard deviations. Therefore, only the experiments with plane optimization were considered due to their overall better performance. Table 2 shows the standard deviation quotients as defined in Sect. 5.4. The overall standard deviation quotient is not given since the Kruskal-Wallis test did not confirm the independence from the environment. The results suggest a heavy underestimation of the real process noise by the optimizer. The reason for this is most likely an insufficient noise model of the overall filtering and matching process. Currently, only the known sensor parameters like the field of view, distance resolution, or measurement noise provided by the manufacturer are used for the error model. However, sources of error in the data processing have not been evaluated and modeled so far. Consecuently, the current results suggest, that the estimated standard deviation cannot be trusted as a measure of the calibration quality.

62

J. M. Rotter et al. Table 2. Standard deviation estimation quotient per parameter for all environments. Environment

vx

Outdoor 1 Wall

0.0307 0.105 0.0344 0.214 0.0678 0.206 0.102

Outdoor 2 Walls 0.133

vy

vz

vyaw vpitch

vroll

vof f set

0.416 0.0955 0.142 0.0748 0.332 0.186

Indoor Lab

0.0282 0.425 0.0275 0.133 0.0350 0.332 0.157

Indoor Lobby

0.0404 0.186 0.0213 0.110 0.0527 0.269 0.159

7 Conclusion In this article the target-less calibration method for a 2D radar scanner and a 3D lidar is extended by a radar filter pipeline for more reliable target detection. Furthermore, the optimization model is extended by a second optimization stage taking the inlier points from the lidar of the detected planes as priors and allowing for simultaneous optimization of the plane parameters. In systematic experiments, the extensions are compared to the former version showing an overall improvement in the calibration results. Although statistical tests for the independence of the calibration method from the environment were not successful, a reliable performance could be shown by the relatively low RMSE values. The results suggest further improving the method by better outlier detection as well as a more comprehensive error model. In the future, matches should be checked for their plausibility for example by comparing them to other matches and their respective matching values. This plausibility value could be used as a weight to the random selection of matches making it less possible for outliers to be considered in the optimization. In the same way, an information value for a match could be calculated, which could serve as an uncertainty estimate for a match in the optimization process. Acknowledgements. This work has partly been funded by the German Federal Ministry of Education and Research (BMBF) under the project number 13N15550 (UAV-Rescue).

References 1. Adams, M.D. (ed.): Robotic Navigation and Mapping with Radar. Artech House, Boston, London (2012). iSBN 978-1-60807-482-2 2. Argenti, F., Lapini, A., Bianchi, T., Alparone, L.: A tutorial on speckle reduction in synthetic aperture radar images. IEEE Geosci. Remote Sens. Mag. 1(3), 6–35 (2013) 3. Carrington, W.A., Lynch, R.M., Moore, E.D.W., Isenberg, G., Fogarty, K.E., Fay, F.S.: Superresolution three-dimensional images of fluorescence in cells with minimal light exposure. Science 268(5216), 1483–1487 (1995) 4. Devore, J.L., Berk, K.N., Carlton, M.A.: The analysis of variance. In: Devore, J.L., Berk, K.N., Carlton, M.A. (eds.) Modern Mathematical Statistics with Applications. STS, pp. 639– 702. Springer, Cham (2021) 5. Domhof, J., Kooij, J.F.P., Gavrila, D.M.: A multi-sensor extrinsic calibration tool for lidar, camera and radar. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 1–7 (2019). iSBN 9781538660263

Improving 2D Scanning Radar and 3D Lidar Calibration

63

6. Domhof, J., Kooij, J.F., Gavrila, D.M.: A joint extrinsic calibration tool for radar, camera and lidar. IEEE Trans. Intell. Veh. 6(3), 571–582 (2021) 7. El Natour, G., Ait-Aider, O., Rouveure, R., Berry, F., Faure, P.: Toward 3D reconstruction of outdoor scenes using an MMW radar and a monocular vision sensor. Sens. (Switz.) 15(10), 25937–25967 (2015) 8. Fan, H., Bennetts, V.H., Schaffernicht, E., Lilienthal, A.J.: Towards gas discrimination and mapping in emergency response scenarios using a mobile robot with an electronic nose. Sens. (Switz.) 19(3) (2019) 9. Fritsche, P., Zeise, B., Hemme, P., Wagner, B.: Fusion of radar, LiDAR and thermal information for hazard detection in low visibility environments. In: SSRR 2017 - 15th IEEE International Symposium on Safety, Security and Rescue Robotics, Conference, pp. 96–101 (2017). iSBN 9781538639221 10. Frost, V.S., Stiles, J.A., Shanmugan, K.S., Holtzman, J.C.: A model for radar images and its application to adaptive digital filtering of multiplicative noise. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-4(2), 157–166 (1982) 11. Heng, L.: Automatic targetless extrinsic calibration of multiple 3D LiDARs and radars. In: IEEE International Conference on Intelligent Robots and Systems (2020). iSSN 21530866 12. Kim, J.H., Starr, J.W., Lattimer, B.Y.: Firefighting robot stereo infrared vision and radar sensor fusion for imaging through smoke. Fire Technol. 51(4), 823–845 (2015). ISBN 1069401404 13. Kummerle, R., Grisetti, G., Strasdat, H., Konolige, K., Burgard, W.: G2o: a general framework for graph optimization. In: 2011 IEEE International Conference on Robotics and Automation, pp. 3607–3613. IEEE (2011) 14. Lee, J.S.: Speckle analysis and smoothing of synthetic aperture radar images. Comput. Graph. Image Process. 17(1), 24–32 (1981) 15. Lee, J.S.: Digital image smoothing and the sigma filter. Comput. Vision Graph. Image Process. 24(2), 255–269 (1983) 16. Lee, J.S., Wen, J.H., Ainsworth, T., Chen, K.S., Chen, A.: Improved sigma filter for speckle filtering of SAR imagery. IEEE Trans. Geosci. Remote Sens. 47(1), 202–213 (2009) 17. Lucy, L.B.: An iterative technique for the rectification of observed distributions. Astron. J. 79, 745 (1974) 18. Paulraj, A., Roy, R., Kailath, T.: Estimation of signal parameters via rotational invariance techniques- esprit. In: Nineteeth Asilomar Conference on Circuits, Systems and Computers, 1985. pp. 83–89. IEEE, Pacific Grove (1985) 19. Persic, J., Markovic, I., Petrovic, I.: Extrinsic 6DoF calibration of 3D LiDAR and radar. In: 2017 European Conference on Mobile Robots, ECMR 2017 (2017). iSBN 9781538610961 20. Perši´c, J., Petrovi´c, L., Markovi´c, I., Petrovi´c, I.: Online multi-sensor calibration based on moving object tracking. Adv. Robot. 35(3–4), 130–140 (2021) 21. Perši´c, J., Markovi´c, I., Petrovi´c, I.: Extrinsic 6DoF calibration of a radar-LiDAR-camera system enhanced by radar cross section estimates evaluation. Robot. Auton. Syst. 114, 217– 230 (2019) 22. Porcello, L.J., Massey, N.G., Innes, R.B., Marks, J.M.: Speckle reduction in syntheticaperture radars. J. Opt. Soc. Am. 66(11), 1305 (1976) 23. Preza, C., Miller, M.I., Thomas, L.J., McNally, J.G.: Regularized linear method for reconstruction of three-dimensional microscopic objects from optical sections. J. Opt. Soc. Am. A 9(2), 219 (1992) 24. Richardson, W.H.: Bayesian-based iterative method of image restoration. J. Opt. Soc. Am. 62(1), 55 (1972) 25. Rotter, J., Wagner, B.: Calibration of a 2D scanning radar and a 3D Lidar. In: Proceedings of the 19th International Conference on Informatics in Control, Automation and Robotics, pp. 377–384. SCITEPRESS - Science and Technology Publications (2022)

64

J. M. Rotter et al.

26. Rouveure, R., Faure, P., Jaud, M., Monod, M.O., Moiroux-Arvis, L.: Distance and angular resolutions improvement for a ground-based radar imager. In: 2014 International Radar Conference, pp. 1–6. IEEE, Lille (2014) 27. Rouveure, R., Monod, M.O., Faure, P.: High resolution mapping of the environment with a ground-based radar imager (2009) 28. Rouveure, R., Faure, P., Monod, M.O.: Description and experimental results of a panoramic K-band radar dedicated to perception in mobile robotics applications. J. Field Robot. 35(5), 678–704 (2018) 29. Rusu, R.B., Marton, Z.C., Blodow, N., Dolha, M., Beetz, M.: Towards 3D Point cloud based object maps for household environments. Robot. Auton. Syst. 56(11), 927–941 (2008) 30. Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 34(3), 276–280 (1986) 31. Scholler, C., et al.: Targetless rotational auto-calibration of radar and camera for intelligent transportation systems. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 3934–3941. IEEE (2019) 32. Shaw, P.J., Rawlins, D.J.: The point-spread function of a confocal microscope: its measurement and use in deconvolution of 3-D data. J. Microsc. 163(2), 151–165 (1991) 33. Skolnik, M.I. (ed.): Radar Handbook, 3rd edn. McGraw-Hill, New York (2008). oCLC: 185095728 34. Sugimoto, S., Tateda, H., Takahashi, H., Okutomi, M.: Obstacle detection using millimeterwave radar and its visualization on image sequence. In: Proceedings - International Conference on Pattern Recognition, vol. 3, no. May, 342–345 (2004). iSBN 0769521282 35. Szeliski, R.: Image processing. In: Szeliski, R. (ed.) Computer Vision: Algorithms and Applications. Texts in Computer Science, pp. 107–190. Springer, Cham (2022) 36. Wang, T., Zheng, N., Xin, J., Ma, Z.: Integrating millimeter wave radar with a monocular vision sensor for on-road obstacle detection applications. Sensors 11(9), 8992–9008 (2011) 37. Wise, E., Persic, J., Grebe, C., Petrovic, I., Kelly, J.: A continuous-time approach for 3D radar-to-camera extrinsic calibration. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13164–13170. IEEE (2021)

Mobile Robots for Teleoperated Radiation Protection Tasks in the Super Proton Synchrotron David Forkel1,2(B) , Enric Cervera2 , Ra´ul Mar´ın2 , Eloise Matheson1 , and Mario Di Castro1 1

BE-CEM-MRO - European Organization for Nuclear Research (CERN), Espl. des Particules 1, 1211 Meyrin, Switzerland {david.guenter.forkel,eloise.matheson,mario.di.castro}@cern.ch 2 Jaume I University, Avinguda de Vicent Sos Baynat, s/n, 12006 Castell´o de la Plana, Castell´o, Spain {enric.cervera.mateu,raul.marin.prades}@cern.ch http://www.cern.ch/, http://www.uji.es/ Abstract. This publication presents a robotic solution for teleoperated radiation surveys in the Super Proton Synchrotron (SPS) accelerator at CERN. It begins with an introduction to radiation protection and the current method of conducting surveys in person. The potential of using robotics for these missions is then discussed. The design and selection of components for the robot base are described, as well as the software implementation. The test procedure, including requirements for correct execution, operational steps, and data treatment, is outlined. Results show the advantages of the teleoperated robotic solution, such as improved measurement conditions and reduced radiation dose for staff. Future plans include the automation of the task through the gradual increase of autonomy of the robotic system and the deployment of a dual robot system. Keywords: Automatic inspection · Hazardous environment · Mobile robot · Robotic survey · Telerobotics

1 Introduction This publication represents an extended version of the paper “Telerobotic Radiation Protection Tasks in the Super Proton Synchrotron Using Mobile Robots” published in the proceedings of the ICINCO conference 2022 and contains an additional description of the Human Robot Interface (HRI) and the robot control module [7]. Moreover, the results are enriched and the concept of a dual robot system is outlined. In the introduction, the topic of radiation protection and the execution of the radiation survey are discussed. Furthermore, advantages and challenges of mobile robotics for inspection are addressed. 1.1 The Significance of Radiation Protection Measures at CERN CERN operates the world’s largest accelerator complex, providing high energy particle beams to a global community of physicists studying the constituents of matter. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  G. Gini et al. (Eds.): ICINCO 2022, LNNS 836, pp. 65–82, 2023. https://doi.org/10.1007/978-3-031-48303-5_4

66

D. Forkel et al.

Researchers use sophisticated particle detectors and analysis software to investigate the products of high-energy particle collisions. The complex, located near Geneva, straddles the French-Swiss border and includes the LINAC4, Proton Synchrotron (PS), Super Proton Synchrotron (SPS), Large Hadron Collider (LHC), and four LHC experiments (ATLAS, CMS, LHC-b and ALICE). The injectors successively accelerate particles to higher energy, ultimately resulting in proton beams colliding at the collision points with a center of mass energy of 14 TeV [3]. The operation of accelerators is associated with the loss of beam particles, whether through intentional actions such as collimation, dumping, or collisions, or through accidental means such as degraded beam transmission. These lost particles interact with other particles or matter, resulting in the creation of radioactive isotopes through various nuclear processes. This can lead to the radioactive contamination of accelerator and detector components, tunnel structures, liquids, and gases. Gamma and beta radiation emitted by the decay of these isotopes, known as residual radiation, are the main source of ionizing radiation exposure for workers during repairs and maintenance of accelerators and detectors. The primary goal of radiation protection at CERN is to minimize individual exposure to ionizing radiation, while also reducing the radiological impact on the surrounding environment [8]. The principles of radiation protection legislation, as outlined in Recommendation 60 of the International Commission on Radiological Protection, are the following [9]: – Justification of the practice: Any practice involving the exposure of persons to ionizing radiation must be justified. – Optimization of protection: Procedures that result in radiation exposure of individuals must be continuously optimized to reduce the radiation doses received. The ALARA principle, which states that personal and collective doses must always be kept as low as reasonably achievable, applies. – Dose limits: The legal limits for personal radiation doses must be adhered to. CERN’s radiation safety code integrates these recommendations [2]. 1.2

Conducting Radiation Surveys in the Super Proton Synchrotron (SPS)

Radiation surveys of CERN accelerators are an ongoing practice and integral to CERN’s approach to keeping radiation doses as low as reasonably achievable (ALARA). These surveys serve two main purposes: – Measuring the radiation dose rate along the accelerator to assess radiological risks and plan repair and maintenance work. – Identifying locations of beam-losses and optimizing transmission during beamoperation. Radiation surveys in the SPS accelerator have been conducted by CERN personnel over a considerable time [10]. A radiation protection technician in an electrical vehicle drives along the 7 km circumference of the accelerator while continuously measuring the dose rate with a radiation detector from about 70 cm distance from the machine components and at the height of the beam axis. This general survey is then refined by a more detailed survey of the six Long Straight Sections (LSS) of the tunnel, where

The Robotic Radiation Survey in the SPS

67

radiation protection technicians measure the dose rates at 40 cm distance and on contact of the accelerator components. The general survey results are utilized as an indicator of areas with increased radiation levels and to inform the operational team about the development of beam loss points. Meanwhile, the data from the detailed survey is used for job and dose planning. An example of typical results can be seen in Fig. 1 [4].

Fig. 1. Radiation survey results 2008/09 realised in sextant 1 of the SPS.

1.3 Mobile Robotics for Inspection: Advantages and Challenges CERN has a long history of utilizing robots for inspections, particularly for handling highly radioactive components like ISOLDE targets [10]. The Train-InspectionMonorail (TIM) robot shown in Fig. 2 has been developed for the LHC, which combines an electrical train with a pre-existing monorail system from the Large-Electron-Positron (LEP) Collider. TIM is regularly used for visual inspections, functional tests of the 3,600 beam-loss-monitors, and radiation surveys in the LHC accelerator tunnel. This not only allows for the reduction of accelerator downtime, but also limits the need for personnel to perform these tasks. The Train-Inspection-Monorail (TIM) robot demonstrates the benefits of using robotic solutions for inspections, particularly in radioactive environments. It is a versatile and effective tool that can be used for various types of inspections [1]. The use of robotic solutions, such as TIM, helps optimize inspection and maintenance tasks in the accelerator complex by increasing efficiency and reducing the need for workers to perform these tasks in areas with high levels of radiation.

2 State of the Art on Teleoperated Robotics for Inspection Teleoperated robots for exploration and inspection also include the recent developments of lunar rovers, such as the YUTU-2 mission. In February 2022, the mission discovered

68

D. Forkel et al.

Fig. 2. Train Inspection Monorail (TIM).

several small intact spheres of translucent glass, which were inspected as they could contain information about the moon’s history, including the composition of the lunar mantle and impacts [13]. Inspecting and interacting with complex environments such as accelerators, underwater facilities, and nuclear plants requires a thorough understanding and specialized knowledge [12]. In some cases, such as at CERN, the necessary knowledge for inspections may not be possessed by a single person, making the use of telerobotic systems necessary [17]. The situation becomes even more challenging when communication channels are limited, a challenge that can be addressed by providing more intelligence to the robot, thus allowing the operator to interact in a more supervised way, reducing the need for communication bandwidth. This is the case with underwater robots, which rely on Visual Light Communication, Radio Frequency modems for short-range communication, and sonar for long-range communication, requiring the use of appropriate autonomy level accordingly [16]. In hazardous environments, it is necessary and beneficial to conduct a pre-inspection before deciding on the next steps, such as performing maintenance operations. The most recent research experiments in this field involve the use of multiple robots that can collaborate to recover and transport large objects [12]. Another crucial application of teleoperated systems is in radioactive environments. Robots are increasingly being used in nuclear plants to simplify inspection procedures

The Robotic Radiation Survey in the SPS

69

and reduce the radiation exposure of personnel. One example is the LAROB underwater robot, which can remotely inspect reactor vessels in nuclear power plants using laser guidance. LAROB helps to carry out mandatory inspections more efficiently while reducing the operator’s workload. The system has the potential to significantly shorten the critical path of reactor vessel inspection [11].

3 A Mobile Robot for Radiation Protection Operations This section covers the hardware and software implementation. Including the development of an omnidirectional platform, selection of sensors, mechanical design, as well as the use of functionality of the Human Robot Interface, Cern Robotic Framework, general architecture of the robot, and the control module. 3.1 Hardware Holonomic Robot Base. CERN has developed a new robot for the SPS radiation survey. The robot features an omnidirectional base, which is made up of four parallel mecanum wheels [15]. The omnidirectional movement is achieved by the passive rollers attached to each wheel. The combination of the rollers and wheels allows for sideways and diagonal movements, as well as rotations around the center of the base. The rubber rollers are protected by the wheel frame. However, the movement behavior may vary depending on the ground conditions, particularly on smooth or slippery surfaces, which can cause slippage of the wheels and rollers, resulting in misalignment in any direction [14]. The design of the robot’s frame structure and arrangement of the mecanum wheels in the longitudinal direction at the bottom corners of the frame offers several benefits. Firstly, it simplifies the design, provides sufficient and equal space for motor mounting, and allows for the connection of the motor sets centrally within the frame. Additionally, the structure is compact and allows all required sensors and other equipment to be housed within the frame, with only the wheels extending from the frame. This improves maneuverability when traversing narrow passages or confined spaces. Additionally, it provides redundancy in the system, allowing for the completion of the robot’s tasks even in the event of a motor failure through controlling the remaining wheels, and adjustment and correction algorithms in the kinematic model can compensate for such errors. However, this locomotion arrangement also has some drawbacks that need to be considered. The slippage of the mecanum wheels causes a positioning error that can affect the odometry. Thus, it is important to use additional sensors such as cameras, LiDARs or IMUs to support the localization of the robot, in addition to the motor encoder values. Another disadvantage is the lower energy efficiency of the mecanum wheels compared to conventional wheels, which results in increased battery consumption. This issue can be addressed by proper battery planning and reducing the maximum speed to comply with safety regulations within the accelerator complex. Selection of Sensors and Components. The following devices were chosen as equipped devices for the omndirectional base [15]:

70

– – – – – –

D. Forkel et al.

An inertial measurement unit (IMU) to improve localization accuracy A WiFi antenna for local communication and testing A radiation sensor to measure radiation dose rate A 4G module for external access from different networks 3 high definition cameras for teleoperation R Jaco 2 robot arm for moving the radiation sensor to optimal measureA Kinova ment positions and detailed visual inspections with its attached gripper camera.

The base also includes a small form factor PC to handle all processes. The cameras and robot arm are connected to the PC’s network interface through an Ethernet hub. Design of Mechanical Parts. The mechanical design features a robust and rigid structure using aluminum profiles which also ensure the protection of internal components. The mechanical design features the following characteristics: – 4 lead acid batteries located on the sides with a capacity of 15 Ah, providing about 4 h of operation depending on the base’s velocity and robot arm usage – A magnetic connector for easy charging – A support for the radiation sensor attached to the robotic arm’s end effector – Four potential locations for cameras or LiDARs behind the wheels, positioned for optimal field of view The Measurement and Inspection Robot for Accelerators (MIRA) has been engineered with dimensions of 526 mm × 360 mm × 190 mm to navigate through the cutout gaps of the security doors separating the various accelerator sectors. The weight of the robot is approximately 45 kg, including the robot arm and all necessary components. The reach of the robot arm including the radiation sensor is roughly 1100 mm [15]. The robot arm has been designed to fold down below the height of the base for improved maneuverability in low-profile environments, such as the secure gate cut-out. The gripper camera serves as a visual guide for the operator. In addition to radiation surveys, the combination of the robot arm and camera allows for a wide range of teleoperation tasks including visual inspection, leak repair, drilling, component replacement, welding, visual and variable checking, and more by simply replacing the end effector tool. Figure 3 shows the fully equipped robot (Fig. 3). 3.2

Software Development and Integration

Human Robot Interface (HRI). The HRI allows for the homogeneous control of a diverse set of robots, providing features such as a robot configuration editor and the ability to customize control commands in real-time. The software development approach ensures modularity and safety by compromising various modules of the interface, including multi-modality, safety strategies, operator training, and the communications architecture. The interface and the CERN Robotic Framework it belongs to are designed to be adaptable to future missions. It has a high level of usability, learn-ability, and safety for both non-experts and qualified robotic operators, and the interface has been used successfully in hundreds of real interventions in radioactive and hazardous

The Robotic Radiation Survey in the SPS

71

R Jaco 2 robot arm [7]. Fig. 3. Cross-section of the omnidirectional robot base with Kinova

Fig. 4. Controlling the 9-DOF robot arm on the TIM with the Human Robot Interface (HRI).

environments. The interface is programmed in C# and communicates with the CERN Robotic Framework over TCP. It supports keyboard, spacemouse or gamepad input. For visual orientation, the HRI provides up to 6 parallel camera streams through MJPEG and TCP [12]. For the omnidirectional robot bases translational and rotational velocity commands are sent. Concerning the velocity control of robot arms either task space or joint space

72

D. Forkel et al.

can be selected. In both cases the desired velocity is controlled by using a slider and the corresponding command is sent by pressing the mapped keys. The operator receives feedback on the velocity registered by the motor encoders of the robot bases and on the position of the robot arms, as well as on the torque applied to the joints. Additionally, depending on the mission, recorded sensor data, such as the gamma radiation dose rate, temperature, and oxygen, are displayed and plotted accordingly. CERN Robotic Framework (CRF). The Cern Robotic Framework is an innovative, modular architecture for robotic inspections and telemanipulation in harsh and semi-structured environments [5]. It encompasses all aspects of robotic interventions at CERN, including specification, operator training, robot selection and material selection best suited for radiological hazards, and the execution and recovery scenarios of interventions. It can be considered as a comprehensive, in-house software solution that is essential for the operation of current interventions and the development of new robotic projects at CERN. Figure 5 illustrates the scope of this framework.

Fig. 5. Modules of the CERN Robotic Framework (CRF) [7].

General Architecture. The general structure of the robotic control system is depicted in Fig. 6.

The Robotic Radiation Survey in the SPS

73

Fig. 6. Overall architecture of the teleoperated robot for the robotic radiation survey.

The Human Robot Interface (HRI), also called Cern Robotic GUI, sits at the top of the robotic control system’s architecture and allows the operator to control all necessary components of the robot and receive updates on the current status. The operator can control the robot base and arm independently through keyboard or controller inputs, and view the four video streams from the cameras attached to the robot for visual orientation. Additionally, the current radiation dose measured by the radiation sensor attached to the robot arm and the current velocity of the base are displayed. The client connects to a virtual private network via a local network connection, and the server in the robot is also connected to this network, allowing for remote launch of the Robot Arm and Robot Base Communication Point on the server PC. The communication points establish a connection between the Graphical User Interface and the robot using the Transmission Control Protocol (TCP), and start the control loops for the robot base and arm, enabling teleoperation of the mecanum wheels of the robot base and the joints of R Jaco 2 through the HRI on the client side. A logging sequence can also be the Kinova launched, which accesses the built-in odometry of the robot base and radiation sensor data and stores it locally on the robot’s PC.

74

D. Forkel et al.

Robot Control Module. This module shown in Fig. 7 describes the detailed flow of the control process in the general architecture described above. The HRI sends the task  to the Robot Base Communication Point velocities of the robot base represented by ξEB using TCP. The latter in turn sends the feedback velocities of the platform back to the GUI, which were previously computed via Forward Kinematics from the motor velocities (q  ) registered by the encoders. The error e(Ts ) between the velocity commands of the HRI and the feeback velocities of the robot serve as inputs to the Controller,  (Ts ) through a PID controller. These adjusted velociwhich regulates the output uξEB ties are then finally converted into the velocities uq (Ts ) of the 4 motors by applying  . Regarding Inverse Kinematics. Thus, the control loop is closed in task velocity ξEB R robot arm, the velocity commands are sent to the Robot the control of the Kinova  or joint space q  , depending on the conArm Communication Point in task space ξEA  , trol mode selected. In contrast to the Robot Base, not only the feedback velocities ξEA  q of the robot are sent back to the HRI, but also the registered position q of the robot, and the torque τ of the individual joints. If task velocity mode is selected, the task  sent by the HRI are converted to joint velocities p via Inverse Kinematvelocities ξEA ics. If joint velocity mode is selected, the received velocity commands p are directly forwarded without transformation. The error ep (Ts ) from the difference between the R Jaco 2 serve as input to the task velocities and the feedback velocities of the Kinova Controller. The output uq (Ts ) generated from the PID controller is finally passed to the robot arm. Thus, the control loop is closed in joint velocities. By using Forward Kinematics, the joint positions q registered by the robot are converted into task positions ξEA and passed on to the HRI via the Robot Arm Communication Point. The joint positions q, joint torques τ and joint velocities q  can be transferred directly to the Communication Point.

Fig. 7. Control module for robot arms and robot bases.

The Robotic Radiation Survey in the SPS

75

4 Experimental Assessment The Experimental Assessment includes the testing methodology, the results obtained, and the deployment of the dual robot system. 4.1 Testing Methodology Baseline Requirements. To ensure a successful radiation survey, several key factors must be taken into account during the execution of the operation. These include the implementation of safety measures, such as avoiding human interaction, as well as the protection of equipment and machinery in the tunnel from any potential damage caused by the robot. Given the experimental nature of the test phase, it was decided to conduct the operation outside of regular working hours, in coordination with the CERN Control Centre (CCC). The CCC provided clearance for the use of a robot for the survey inside the SPS tunnel. Additionally, a team of two operators were deployed to oversee the execution of the mission and provide a comprehensive understanding of the situation. The robot’s speed during the survey was limited to 1.5 m/s to ensure the safety of the operation by reducing the risk of collisions with structural elements or equipment in the SPS tunnel. The goal of the robotic radiation survey, similar to the manual survey described in Sect. 1.2, is to measure the radiation dose rate along the SPS machine. Therefore, it is essential to adhere to the requirements outlined in the inspection process. This includes maintaining a distance of 70 cm from the beam axis when taking measurements. Additionally, the survey must be completed within a 2-h time frame in order to minimize disruption to SPS maintenance activities. A key difference between the teleoperated survey and the existing procedure is the need for the robot to navigate through 19 security doors, which are typically opened manually by personnel. These doors must be passed through a cut-out rectangle measuring 30 cm × 40 cm. Procedural Steps. The mission begins by activating the robotic system. The charging process is interrupted, and the communication points of the robot base and arm are activated. Using the HRI, the robot is moved out of the charging station, and the robot arm is positioned vertically, aligning the radiation sensor with the beam axis. The system is now ready for operation. To start the survey, the closest security gate is approached for precise localization at the beginning of data recording. The security gates are passed by folding the robot arm back, so that the arm is below the height of the robot base, as shown in Fig. 8. The camera on the end effector is used for guidance through the cut-out in the gate. Once the gate is passed, the robot arm is brought back into the operation pose illustrated in Fig. 9. The logging sequence for measuring the radiation dose is then launched and data is saved locally on the PC storage medium. The data set includes the measured radiation dose and odometry data from the motor encoders integrated in the mecanum wheels. This allows for mapping the measured radiation value to its position in the SPS tunnel during data analysis. The measurements are taken from one safety gate to the next, resulting in a total of 19 data sets. The path between the start and end point is completed in one continuous

76

D. Forkel et al.

Fig. 8. Measurement and Inspection Robot for Accelerators (MIRA) passing one of the 19 secure doors [7].

run at a constant speed of 1.5 m/s, while maintaining a maximum distance of 70 cm to the beam axis, using line markings as a reference during the operation, via the gripper camera. Special attention must be paid to the connection status between the robot and the 4G repeaters in the SPS tunnel system. By monitoring the ping development between the client and server PCs, any connection problems can be identified before a complete loss of control occurs. In rare cases, there may be a temporary loss of control, in which case the robot’s speed will automatically be set to 0. However, the robot base has no brakes, so the wheels will coast before coming to a complete stop. The operator must also take into account the floor conditions, as the floor is often inclined, requiring counter-steering to maintain a constant distance from the magnets. Especially near the 6 access points, the operator must be aware of any cables, maintenance tools, or other objects in the way, and navigate around them. Once the 7 km circumference of the SPS tunnel has been covered and all 19 secure gates have been passed, the robot arm can be brought back into the parking pose, and the base will be driven into the charging station. Subsequently, the locally saved measurements are transferred to the client PC, and the charging process is launched. The total operation time measured in recent surveys is between 1 h 40’–1 h 55’, which is within the 2-h limit. Data Analysis. The post-processing of the data collected from the survey has two main objectives. The first is to link the position of the measurement to its corresponding value, and correct for any positioning errors. The second is to make adjustments to enhance the visualization of the results. The odometry data from the robot base is used to calculate the distance traveled, also known as “distance cumul´ee (DCUM)”, which estimates the circumference of the SPS. As illustrated in Fig. 10, the SPS is divided into 6 sections, where each section is 60◦ in size and the unit of arc minutes is used for

The Robotic Radiation Survey in the SPS

77

Fig. 9. MIRA taking radiation measurements [7].

precise positioning. The position of the secure gates is indicated by the yellow dashed elements along the circumference of the SPS. The 19 individual measurements are then consolidated into 6 sets of data for each section. The unit of measurement for radiation is microsievert per hour. Along with these adjustments, the position error of the measurement is also corrected. This is necessary because the mecanum wheels tend to slip during acceleration and deceleration, leading to a higher recorded distance than the actual value. However, as the exact location of the security doors is known, the position error can be corrected by uniformly applying the absolute percentage error to the measured odometry for one segment run. 4.2 Robotic Radiation Survey Results Figure 11 displays the outcomes from the December 2021/January 2022/November 2023 robotic radiation surveys in sextant 1 of the SPS. The three yellow dashed lines represent the location of the section doors, which are also shown in Fig. 10. As depicted in Fig. 11, the graphs are similar to the 2008/2009 surveys presented in Sect. 1.2. The increased measurement frequency of the radiation sensor at 50 Hz results in a lower overall noise level. A total of approximately 500,000 data points were recorded in each of the two teleoperated surveys, which corresponds to one measurement point every 14 mm. The robotic survey registered in November 2022 shows the highest radiation doses. The survey is the first one of the yearly maintenance period

78

D. Forkel et al.

Fig. 10. Data partitioning map of the SPS [7].

Fig. 11. Robotic radiation survey results December 2021/January 2022/November 2022 realized in sextant 1 of the SPS.

The Robotic Radiation Survey in the SPS

79

(YETS) in 2022/2023, 30 h after the proton beam in the SPS was stopped. Regarding the December 2021 and January 2022 surveys, more than 1 month and 2 months respectively have passed since the SPS machine was stopped. Thus, the residual radiation was lower during these measurements compared to the one of November 2022. The total radiation dose absorbed by the robot during these runs amounts to 9.3 μSv for the December 2021 survey, 6.2 μSv for the January 2022 survey, and 21.4 μSv for the November 2022 survey. Assuming a manually performed survey takes similarly long and will be replaced by the robotic survey in the future, this total radiation dose received during the survey can be regarded as a saved dose per radiation protection worker. The relative positioning error during the surveys was measured to be 1.9% for the December 2021 survey, 3.1% for the January 2022 survey, and 2.1% for the November 2022 survey. The increased error for the January 2022 survey is due to temporary communication problems that caused interruptions in the operation and resulted in slow downs and re-accelerations which increase the slippage of the mecanum wheels. These communication problems could mostly be observed in the arc of the SPS between two access points. 4.3 Dual Robot Setup At the beginning of the prototype phase in 2021, the first robot and its charging station were placed in BA3, so that a robotic radiation survey can be performed at any time. According to the current protocol, the robotic system is operated during technical stops and the yearly YETS. Functional tests outside of these time intervals are therefore only possible to a very limited extent, since the robot must not leave the charging station during SPS operations. In addition, communication with the robot’s PC and the charging station is only possible via the Technical Network of CERN, which allows the connection to the VPN but does not offer internet access. In order to enable software and hardware improvements, it is therefore essential to develop a functional mock-up. In this case, an identical robot was constructed that can be equipped with new sensors such as LiDARs and stereo cameras. This enables continuous project development in hardware and software, which can then be transferred to the active robot in BA3 during the maintenance intervals of the SPS. During the YETS 2022/2023, the mock-up was tested together with the active robot in the tunnel of the SPS shown in Fig. 12. In this test, the measured dose rate values were compared and the teleoperation behavior was examined for deviations. After the tests were successful, the mock-up itself was placed as an active robot in BA6 with a second charging station. Thus, the simultaneous execution of the Robotic Radiation Survey with two robots is now feasible. Each robot covers 3 sextants and switches the charging station at the end of the operation. Thereby, by using a second operator, the execution time of the survey is halved. Additionally, the use of two robots simultaneously would be beneficial for future detailed surveys of specific areas in the SPS, which then can be executed in parallel to the ring survey.

5 Final Thoughts and Future Directions In summary, a complete robotic solution for conducting radiation surveys in a teleoperated manner has been developed. The key advantage of this system is the ability to

80

D. Forkel et al.

Fig. 12. Two identical Measurement and Inspection Robots for Accelerators in the arc of the SPS.

conduct the survey without the presence of personnel in the tunnel of the SPS, thereby reducing the radiation dose exposure for staff. Additionally, the system allows for more precise measurements due to optimized conditions. However, there is still potential for further improvement, such as implementing automatic control for measuring distance to the beam axis, which could result in even more accurate data. The greatest potential of the project lies in gradually increasing the autonomy of the robot for this mission. Currently, the operator’s attention is essential for the execution of the task, but the goal for future development is to increase the level of assistance so that the operator’s workload is gradually reduced until only occasional intervention is required [6]. The first step of the gradual increase of autonomy is assisted operation, where repetitive tasks are performed automatically. In this case, the robot arm will be able to perform actions such as achieving the parking position, operation pose, or even folding the robot arm to pass through the security doors autonomously. Furthermore, safety measures such as collision warnings or extended strategies in case of communication failures will be implemented. Level 2 describes an autopilot that can independently perform certain tasks under optimal conditions. Applied to this project, this means that the robot will be able to navigate autonomously to the start and end points of the measurements, but more complex processes such as crossing security gates and navigating through environments with obstacles will not be included yet. Therefore, the operator will still be required to monitor the situation and be able to intervene at all times. Level 3 describes a completely autonomous execution of all steps of the operation. The operator is only informed in exceptional cases and critical situations and is given a buffer time to react appropriately to the situation. This stage of development would include all tasks of the survey, from mission preparation, measurement acquisition, security gate crossing to the successful

The Robotic Radiation Survey in the SPS

81

completion of the survey, and deactivation of the robot. Additionally, security strategies will be developed to take effect in case of execution errors and will either correct the problem itself or give the operator time to intervene.

References 1. Castro, M.D., Tambutti, M.L.B., Ferre, M., Losito, R., Lunghi, G., Masi, A.: i-TIM: a robotic system for safety, measurements, inspection and maintenance in harsh environments. In: 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics, SSRR 2018, Philadelphia, PA, USA, 6–8 August 2018, pp. 1–6. IEEE (2018). https://doi.org/10.1109/ SSRR.2018.8468661 2. CERN: Code de s´ecurit´e safety code. CERN, November 2006 3. CERN: Accelerators, February 2022. https://home.cern/science/accelerators. Accessed 28 Feb 2022 4. Forkel-Wirth, D., Silari, M. (eds.): Radiation protection group annual report 2009 (2010). https://cds.cern.ch/record/2221663/files/annrep-rp-2009.pdf, https://cds.cern.ch/record/ 2221663/files/AnnRep-RP-2009.pdf. Accessed 28 Feb 2022 5. Di Castro, M., Ferre, M., Masi, A.: CERNTAURO: a modular architecture for robotic inspection and telemanipulation in harsh and semi-structured environments. IEEE Access 6, 37506–37522 (2018). https://doi.org/10.1109/ACCESS.2018.2849572 6. Petit, F.: The next step in autonomous driving, July 2020. https://www.blickfeld.com/blog/ the-next-step-in-autonomous-driving. Accessed 28 Feb 2022 7. Forkel, D., Cervera, E., Mar´ın, R., Matheson, E., Castro, M.D.: Telerobotic radiation protection tasks in the super proton synchrotron using mobile robots. In: Proceedings of the 19th International Conference on Informatics in Control, Automation and Robotics, Volume 1: ICINCO, pp. 451–458. INSTICC, SciTePress (2022). https://doi.org/10.5220/ 0011276600003271 8. Forkel-Wirth, D., et al.: Radiation protection at CERN. CERN, March 2013. https://doi.org/ 10.5170/CERN-2013-001.415 9. ICRP: 1990 recommendations of the international commission on radiological protection. icrp publication 60 (users edition). ICRP (1991) 10. Kershaw, K., et al.: Remote inspection, measurement and handling for maintenance and operation at CERN. Int. J. Adv. Robot. Syst. 10, 1 (2013). https://doi.org/10.5772/56849 11. Kim, J.H., Lee, J.C., Choi, Y.R.: LAROB: laser-guided underwater mobile robot for reactor vessel inspection. IEEE/ASME Trans. Mechatron. 19, 1–10 (2014). https://doi.org/10.1109/ TMECH.2013.2276889 12. Lunghi, G., Marin, R., Di Castro, M., Masi, A., Sanz, P.J.: Multimodal human-robot interface for accessible remote robotic interventions in hazardous environments. IEEE Access 7, 127290–127319 (2019). https://doi.org/10.1109/ACCESS.2019.2939493 13. Starr, M.: Lunar rover discovers mysterious glass spheres on the far side of the moon, February 2022. https://www.sciencealert.com/the-moon-has-glass-balls. Accessed 28 Feb 2022 14. Park, J., Kim, S., Kim, J., Kim, S.: Driving control of mobile robot with mecanum wheel using fuzzy inference system. In: ICCAS 2010, pp. 2519–2523, October 2010. https://doi. org/10.1109/ICCAS.2010.5670241 15. Prados Sesmero, C., Buonocore, L.R., Di Castro, M.: Omnidirectional robotic platform for surveillance of particle accelerator environments with limited space areas. Appl. Sci. 11(14) (2021). https://doi.org/10.3390/app11146631, https://www.mdpi.com/2076-3417/11/ 14/6631

82

D. Forkel et al.

16. Rubino, E.M., et al.: Progressive image compression and transmission with region of interest in underwater robotics. In: OCEANS 2017 - Aberdeen, pp. 1–9, June 2017. https://doi.org/ 10.1109/OCEANSE.2017.8084999 17. Veiga Almagro, C., et al.: Cooperative and multimodal capabilities enhancement in the CERNTAURO human-robot interface for hazardous and underwater scenarios. Appl. Sci. 10(17) (2020). https://doi.org/10.3390/app10176144, https://www.mdpi.com/2076-3417/10/ 17/6144

A Review of Classical and Learning Based Approaches in Task and Motion Planning Kai Zhang1,2(B) , Eric Lucet1 , Julien Alexandre Dit Sandretto2 , Selma Kchir1 , and David Filliat2,3 1

3

Universit´e Paris-Saclay, CEA, List, 91120 Palaiseau, France [email protected] 2 U2IS, ENSTA Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France FLOWERS, INRIA, ENSTA Paris, Institut Polytechnique de Paris, 91120 Palaiseau, France

Abstract. Robots are widely used in many tedious and simple works. But, with the advance of technology, they are expected to work in more complex environments and participate in more challenging tasks. Correspondingly, more intelligent and robust algorithms are required. As a domain having been explored for decades, task and motion planning (TAMP) methods have been applied in various applications and have achieved important results, while still being developed, particularly through the integration of more machine learning approaches. This paper summarizes the development of TAMP, presenting its background, popular methods, application environment, and limitations. In particularly, it compares different simulation environments and points out their advantages and disadvantages. Besides, the existing methods are categorized by their contribution and applications, intending to draw a clear picture for beginners. Keywords: Task and motion planning · Simulation environment · Review

1 Introduction Robotics has been explored for decades to assist, complement or replace humans. Since robots never feel tired and make less mistakes in repetitive works, they have the potential to replace humans in certain jobs. In industrial production, robots have been widely used for repetitive and well-defined tasks such as assembling cars and moving goods in assembly lines, clearly increasing productivity. To bring the benefit of robots to daily lives, many works explore the adaptation of robots from industrial environments to daily living environments. However, unlike the industrial environment, which is mostly structured and static, the daily living environment is full of change, which poses new challenges for robot control algorithms. Besides, robots take on more complex tasks, such as household chores or delivery tasks. Therefore, more efficient algorithms need to be proposed to enable robots to understand the environment and behave accordingly. To complete a complicated task, also named long-horizon task, a popular solution is to decompose it into simple subtasks and solve them one by one. This type of methods can be summarized as two steps, task planning and motion planning. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  G. Gini et al. (Eds.): ICINCO 2022, LNNS 836, pp. 83–99, 2023. https://doi.org/10.1007/978-3-031-48303-5_5

84

K. Zhang et al.

From the view of task planning, the complicate tasks could be divided into several fundamental abstract tasks, like navigation, grasp, push, pull, etc. This taskdecomposing process is independent of the robot types but based on the definition of abstract short-horizon tasks. For example, the “open door” task could be considered as a sequence of abstract actions, like grasping the door handle, then opening the door. Therefore, the difficulties of task planning are the definition of abstract actions and the decomposing strategy. With a subtask, motion planning converts it into executable control parameters for the robots to accomplish the task. For example, in the subtask of grasping the door handle, by taking the handle position and robot position, the inverse kinematic method calculates the configuration of each joint of the arm so that the gripper could reach the handle successfully. Since the robot is not alone in the environment, calculating a feasible moving trajectory is challenging. In summary, task planning and motion planning share some similarities but also can be distinguished by their difference. They play similar roles in the long-term tasksolving part, which means both of them convert the task from a higher and harder level to a lower and easier level. As for the difference, task planning plays in the discrete space and is often independent of the robots while the motion planning lives in either discrete space or continuous space and varies among robots. For example, the planning of navigation trajectory can be continuous but planning of start and stop position is discrete. This paper is an extension of a previous conference paper [59] where we added several recent advancement on task and motion planning (TAMP) methods. Besides, a detailed explanation about the comparison criteria of simulation environments is provided to help with choosing the most suitable one. We also enriched the challenge section, especially on situational mapping since it is crucial for future TAMP methods to take more context into account. In a first step, task planning and motion planning are introduced separately along with explanation of some popular tasks in Sect. 2. Then, the TAMP methods are introduced and compared in three different categories in Sect. 3. Section 4 describes some public simulation environments and tools used for TAMP. Before the summary in Sect. 6, the current challenges are detailed based on current TAMP advances in Sect. 5.

2 Background 2.1

Task Planning

Task planning aims to divide a long-horizon task into several short-horizon subtasks, which makes the original difficult task easier to solve. For example, in a complex washing task [19], given the initial and final states, task planning generates a sequence of intermediate states and corresponding abstract actions (move, wash, store...). By executing the abstract actions, the robot could transit from the initial state to the final state. A more mathematical definition can be expressed as: given as input the initial state S0 and final state Sg , task planning produces several intermediate states Si , i = 1, ..., g − 1, where each Si could be achieved from Si−1 through the transition actions Ai , i = 1, ..., g.

A Review of Classical and Learning Based Approaches in TAMP

85

In most cases, the transition actions are abstract and independent of the robot. A popular description format is the STRIPS-style form, which contains at least the name, precondition and effect of the action. For example, the action move from place P1 to place P2 can be described as: Move(P1,P2) Pre: (atPos(P1)) (not(atPos(P2))) Eff: (atPos(P2)) (not(atPos(P1))) In general, action Ai could be either discrete or continuous. The discrete action space is made of a finite number of actions, such as turning left and turning right. In contrast, the continuous action space is infinite and each action is set by a parameter. For example, turning left 30◦ is different from turning left 30.1◦ . Comparing to discrete action space, it takes more time to find the action sequence in a continuous action space. To solve the task planning problem, several planning methods have been proposed, such as behavior tree methods [19], heuristic search methods [10], operator planning methods, etc. A more detailed overview and discussion can be found in the introduction book [31]. Thanks to their simplicity and efficiency, they are widely used in the decision making games like chess, Tower of Hanoi, etc. Instead of a hand-crafted approach, reinforcement learning (RL) methods learn strategies to map observations from the current situation to the next subgoals by maximizing rewards. Popular RL methods include DQN [38] for discrete actions and DDPG [35] or Soft Actor-Critic(SAC) [16] for continuous actions. By default, these methods learn the solution of a single task, hence they couldn’t be used as a general task planner. But in planning problems, the RL approach can be applied to learn goal-conditioned policies, which takes in the observation and goal information and outputs appropriate next subgoals as actions. For example, to solve a 2D navigation task, [6] uses a goalconditioned RL to generate a sequence of intermediate waypoints that the robot could follow to reach the goal. Such policies can also be trained using imitation learning, and their generalization to large problems is considered in [51]. They propose Action Schema Networks to solve probabilistic and classical planning problems by mimicking the policy generated by traditional planners. 2.2 Motion Planning Motion planning can be thought of as a bridge between low-level control parameters and high-level tasks. Given a goal position corresponding to a high-level task, the motion planning algorithm generates a trajectory or a number of control parameters, such as the robot’s angular and linear speed, to guide it to its destination in the navigation subtask. For motion planning, a number of algorithms have been developed, such as inversekinematic approaches for manipulation tasks or shortest path planning methods for navigation tasks. Ghallab’s planning book [15] provides a more thorough introduction and overview of the existing techniques. In addition to the classical approaches, learning techniques are also employed. For example, [61] proposes a reinforcement learning technique for visually guided navigation in enclosed spaces. The motion parameters are generated by a network to govern

86

K. Zhang et al.

the movement of the robots based on the visual observations. More approaches based on reinforcement learning can be found in [48]. The cost function to be optimized is another dimension of the motion planning systems. Different from most approaches that have a single objective such as the distance to obstacles, motion planners can also develop plans based on the specific context of the local environment. For instance, a context-aware costmap [36] can be created by combining numerous semantic layers, each of which defines a different sort of constraint, resulting from mobile or static obstacles and from areas with specific danger or expected behavior. Then, a practical and secure trajectory could be produced by planning using the context-aware costmap. For instance, a proactive obstacle avoidance technique is presented in [42], where the robot tends to steer clear of people from their back area. Other examples could include situation where the robot must go on the correct side of the road or, when faced with stairs, it should choose the wheelchair accessible ramp. 2.3

Task and Motion Planning Objectives

Fig. 1. Demonstration of tasks (Figure is from [59]): (a) Rearrangement task. The robot needs to move the green box from its start pose to the goal region indicated by the green circle [25]. (b) Navigation among movable obstacles. The robot needs to remove the green obstacles before pushing the red boxes to the kitchen region [24]. (c) Pick-Place-Move task. The robot should pick the blue cube and place it in the bottom left container [9].

Although there are many complex global objectives for TAMP in the human environment, the majority of them can be viewed as a mixture of more simple prototypical tasks. We think that if the TAMP approaches could successfully handle these fundamental problems at scale, they could be easily applied to address more challenging tasks. We present three of these fundamental tasks below: – Rearrangement (Re). Figure 1(a) illustrates how the robot must maneuver a number of obstacles in order to avoid colliding with these obstacles while reaching for the desired target object. Collaboration can be necessary for the rearrangement task for numerous robots, which commonly happens when a robot’s arm is physically unable to access certain areas of the environment [4]. In rearrangement tasks, we typically concentrate on planning arm actions.

A Review of Classical and Learning Based Approaches in TAMP

87

– Navigation among movable obstacles (NAMO). Compared to pure navigation tasks, a robot might interact with the surroundings while navigating to the desired point. For instance, it can actively remove movable obstacles in order to make a blocked trajectory feasible. Figure 1(b) provides an illustration where the robot should clear the obstructions in the hallway before entering the kitchen. Compared to rearrangement, we typically intertwine arm and mobile base actions. – Pick-Place-Move (PPM) task. The robot’s basic actions are to pick up an object, move it, and put it in a container, as seen in Fig. 1(c). The PPM can also be used for assembly and/or disassembly job, which should take the order of the manipulated objects into account. Here again, plans typically intertwine arm and mobile base actions. In all these three fundamental tasks, task planning usually entails choosing which object to grasp or manipulate, and where to put it, while motion planning will generate arms and mobile base motions to execute these decisions.

3 Task and Motion Planning Methods After introducing task planning and motion planning separately, we now focus on methods that perform both simultaneously. This is challenging because these two modules should be designed carefully to guarantee the effectiveness of the system. For example, the task planner should generate feasible plans for the motion planner and adjust the next output according to the results of motion planner. The connection between these two planners contributes to the intelligent behaviours of the robot. We describe the main TAMP methods into three categories, namely classical methods that are historically developed first using various algorithmic approaches, learning methods that try to tackle the task using purely data based algorithms and hybrid methods that combine the previous two techniques in various ways. 3.1 Classical Methods The classical methods are mainly divided into two categories, sampling-based and optimization-based methods. Some examples can be found in Fig. 2. The widely-used sampling-based methods usually include symbolic operators, which specify the prerequisites of applying an action and its effect. A comprehensive review on sampling methods and optimization methods to solve TAMP problems can be found in [8]. Given a long-horizon task with description of initial and final state, sampling-based methods randomly sample states from the continuous state space in order to find valuable intermediate states, like key frames of a video. Afterward, searching approaches are used to identify a series of workable transition operators between the sampled intermediate states, and the sampler can produce new samples if an existing one doesn’t result in a feasible plan. The frequently adopted searching methods include heuristic search [52], forward search [19], or backward search (a more complete overview on searching methods can be found in [15]). From the resulting sequence of intermediate states and operation between them, classical motion planning methods, such as A* [17],

88

K. Zhang et al.

Fig. 2. TAMP applications of using classical methods. The left figure is from [52] where robot navigates based on sampling methods. The right figure is from [50] where robot builds the highest tower based on optimization methods.

RRT Connect [29] for the robot base or inverse kinematics for the robot arm [13, 47], are applied to control the robot to follow the plan. However, sampling-based methods do not perform well on some particular problems. First, they cannot generally identify infeasible problem instances and will therefore not terminate in such cases. Second, the sampling procedure can only be used on the space that has already been investigated, therefore they are unable to provide solutions for problems that require identifying values from unknown space [11]. For example, in a partial observation case, the robot can only plan a path to a waypoint within the range of observation. Finally, sampling techniques often fail when the task description is not specific enough. For example, in a pouring task, if we ask the robot to pour as much milk as possible into the cup without indicating the amount to reach, the sampling planner will fail since it cannot find an ultimate state. Optimization based approaches present characteristics that solve these shortcomings of the sampling methods. Different from terminating on a specific state, the objective is represented as a cost function along the temporal axis. Then, an optimization strategy is utilized to minimize the cost with respect to constraints and finally outputs a workable solution. Benefiting from the integrated time axis in the objective function, the optimization method is appropriate to solve the problems in continuous action space. For example, in a manipulation problem where a robot should assemble cylinders and plates on a table to build the highest possible stable tower [50], the action sequences are generated by a straightforward symbolic planning approach but the optimal final and intermediate placements of each component are discovered through optimization. In addition, there are also some TAMP methods based on specific hand-crafted strategy. For example, [37] presents an active path cleaning algorithm for the NAMO task,

A Review of Classical and Learning Based Approaches in TAMP

89

which integrates obstacle classification, collision detection, local environment reconstruction and interaction with obstacles. To solve the situation where the obstacle is unknown, an affordance-based method [52] is developed to help the robot identify if the obstacle is movable through interaction. 3.2 Learning Based Methods In learning based methods, robot acquires skills from numerous trials including success and failure cases, instead of relying on a manually designed model associated to sampling or optimization. As one of the most popular frameworks, RL learns to produce the plans sequentially through a policy mapping the current observation of the environment to an action [4]. To solve a TAMP problem, a reward function is defined, usually giving a positive reward when the robot finishes the task, and a negative reward when it fails. Then, the learning mechanism behind RL finds the policy that maximizes the sum of rewards. One of the important difficulty of RL for long TAMP problems is to explore efficiently the environment, as taking random actions requires a prohibitive number of trials until a solution can be found [34]. Therefore, hierarchical RL (HRL) has been proposed to solve this sparse reward problem by decomposing the complicated tasks into simpler subtasks so that the robot could get positive reward easily and accomplish the difficult tasks step by step [1]. Some examples can be found in Fig. 3.

Fig. 3. TAMP applications of using learning-based methods. The left figure is from [18] where the humanoid robot passes a virtual gate. Two policies including subgoal generation and balance control are optimized separately. The right figure is from [34] where robot navigates to blue goal point. The subgoal generation and motion control modules are optimized jointly.

An intuitive idea of HRL is to design and train two networks, one for high level task generation and another one for primitive motion control. In [18], learning sensorimotor primitives is accomplished by training a low-level network with access to proprioceptive sensors on simple tasks. Then, this pre-trained module is fixed and connected to a

90

K. Zhang et al.

high-level network, with access to all sensors, which drives behavior by modulating the inputs to the low-level network. Similarly, in [30], a high-level module learns the strategy of generating subgoals and a low-level module learns the actions to complete each subgoal. Due to the limitations of previous approaches in generalization capabilities, a task-independent approach is designed by reformulating the task description [40]: absolute observations are replaced with relative observations, such as position and distance to reduce the dependence on the task environment. Additionally, they use an off-policy RL technique, which requires less interactive training data, to simplify training in a real-world setting. Training high-level strategies and low-level actions separately would lose the opportunity for joint optimization, leading to the possible failure of converging at the optimal strategy. Therefore, in [32], the authors describe a joint training strategy that learns the policy in three levels for a navigation task. The top level generates subtasks by the current state and the goal state, while the middle level decomposes the subtasks into directly feasible goals and the lowest level generates action control parameters to reach the goal. By maximizing the success rate, the three levels are optimized jointly. However, in a NAMO task, given a final location, the higher-level subgoal creation network must generate goals not only for the robot base but also for the arm to interact with. Therefore, a HRL method is proposed in [34] to generate heterogeneous subgoals for the robot to interact with obstacles during navigation. To find appropriate interactions with various obstacles, a neural interaction engine [58] that predicts the effects of actions is integrated into a policy-generating RL network. With this network, the actions with the greatest success rate are picked by comparing the effects of different actions on the success rate of completing the task in the future. Although these learning methods achieve satisfactory results in simulated environments, transferring from simulated environments to real applications is difficult because the trained models usually cannot be used directly in real scenarios. In most cases, they need to be retrained in the application environment. For example, in the solution proposed in [34], the trained models map sensor data to actions. When migrating this model to the real environment, the mapping between sensor data and actions needs to be reestablished due to the huge differences between the virtual and real environments. In addition, real sensor data is noisy. Even at the same observation location and observation environment, the sensor data observed twice may be different, and this variation may lead to erroneous actions of the model. It is also important to note that training data in real environments is expensive and the effectiveness of learning-based methods is affected by the required size of the training dataset. Therefore there are very few real world applications that rely only on learning methods. 3.3

Hybrid Methods

Although both classical and learning-based methods can solve some TAMP tasks, they both have limitations. For example, the symbolic operators used in samplingbased methods are usually designed manually, which requires expertise and is timeconsuming. In addition, the sampling approach is inefficient because of environmental constraints, where only a small fraction of a huge space may meet the requirements, but random sampling requires verification of thousands of sample points to find a suitable

A Review of Classical and Learning Based Approaches in TAMP

91

location. Learning methods avoid manual operations, but they provide less freedom to add additional constraints such as no collision tolerance. The obtained models have poor migration capabilities due to the mappings from observations to behaviors are closely related to the experimental environment. Moreover, the transfer of learning methods from simulated to real environments proves to be difficult due to the high cost of constructing training datasets and the unrealistic representation of the environment, which can be caused by sensor noise, varying illumination conditions, noisy action execution, etc. Therefore, some researchers adopt the hybrid strategy, using a classical TAMP solver structure that include some learning components. This includes learning to generate symbolic operators from dataset [28, 41, 46], learning to guide the operator search [22, 24], and learning to generate feasible subgoals [55]. Some instances can be found in Fig. 4.

Fig. 4. TAMP applications of using hybrid methods. Figure (a) is from [46] where the symbolic operators are learned from a dataset before applying TAMP methods. (b) is from [24] where the RL is leveraged to order the priority of the manipulated objects. (c) [55] integrates the RL method for subgoal generation with classical motion control algorithms to solve a NAMO task.

Learn to Generate Symbolic Operators. Learning symbolic operators from a dataset provides the basic elements, i.e., primitive actions, for the task planning. With the learned operators, a conventional tool such as PDDL [14] or its extensions [12, 57] is applied to search feasible plans, which consists of primitive actions. Then, a motion planning algorithm is used to directly convert the primitive actions into executable control parameters. A supervised learning strategy is introduced in [41] to learn symbolic operators from a training dataset. Each training instance contains the current state, an action, and the effect after applying the action. The action model is searched by maximizing the likelihood of the action effect, but taking into account a complexity penalty. To reduce the requirement of expensive training datasets, a learning-from-experience approach [28] first applies the action to the agent and records state through experience. Then, it converts continuous states into decision trees and finally into symbolic operators. Similarly, in [46], the original dataset is created from demonstrations and the abstract symbolic operators are generated by statistical clustering methods and used for further task planning.

92

K. Zhang et al.

Learn to Search. In a problem containing a large number of actions and states, classical search methods are less efficient because the search space is too large. Reinforcement learning methods provide a way to learn the search strategy from experience thus avoiding traversing the entire space to find a solution [2]. In [24], a graph is used as a representation of the search space due to its scalability. Nodes are abstract actions, while edges are transition priorities. A Q-value function is learned from the training dataset using a neural network to rank the abstract actions and calculate the priorities to guide efficient search. In addition to searching in the discrete space, a generative model provides multiple feasible candidates in the continuous action space to avoid being blocked by an infeasible solution [23]. Similarly, a model is applied to the dataset to learn the probability of success [53, 54]. Then, in the same domain but new scenario, it predicts the success rate of each action. By selecting the actions with higher success rates, the search space is greatly reduced. In addition, the success rate is also used to plan subtasks. In [60], a model is trained to generate a success rate map with respect to navigation subgoals. A threshold is then set to constrain the search space of subgoals in the environment, thus speeding up the task planning process. Learn to Decompose. In addition to operator-based methods, some direct subtask generation methods based on reinforcement learning have been proposed. These methods propose directly feasible subtasks for which classical motion planning methods are employed in order to control the robot. In a NAMO task, [55] use the SAC algorithm [16] to generate subtasks for the robot’s arm and base. The algorithm use a distance reduction reward and success reward to train the network to generate subtasks, in conjunction with traditional motion planning methods such as RRT connect [29] and inverse kinematic methods to verify the feasibility of subtasks. In summary, hybrid approaches typically apply a learning-based approach to task planning, or part of the task planning process, and then employ classical motion planning algorithms to generate control parameters. The use of the classical motion planning in these strategies offers better transferability to the real world than pure learning algorithms, while the learning-based part provides more efficient solutions than classical task planning methods. Finally, Table 1 provides an overview of the application of the presented classical, learning based and hybrid methods on the fundamental tasks proposed in Sect. 2.3.

4 Benchmarks and Tools The validation of algorithm performance is an important but difficult step when developing robotics algorithms. Testing directly the effects of the approaches in a real environment is a straightforward approach, but it is extremely time-consuming, expensive, unstable, and potentially unsafe. Therefore, it is common to first validate in a virtual environment and then migrate to real conditions if expectations are met. To facilitate experimental validation, a number of interactive simulation environments have been proposed for a wide variety of tasks. In this section, we compare several simulation environments that are designed for navigation and manipulation tasks, which include iGibson2 [33], AI2THOR [27], TDW

A Review of Classical and Learning Based Approaches in TAMP

93

Table 1. Applications of TAMP methods on the three fundamental tasks. Re: Rearrangement, NAMO: Navigation Among Movable Obstacle, PPM: Pick Place Move. Classical methods Learning based methods Hybrid methods Re

[13, 47, 50]

NAMO [37, 52] PPM

[4]

[2, 53, 54]

[34, 39, 58]

[21, 22, 55]

[9, 19, 20]

[11, 24, 28, 60]

[7], Sapien [56], Habitat2 [49] and VirtualHome [43]. The generic simulators like Gazebo [26], Bullet [3], are not considered since they are not designed for task and motion planning but for more general usage. Instead of focusing on the low-level view of which type of rendering they use, we focus on their usability for TAMP tasks and their extensibility to real environments. We describe below our selected comparison criteria and the results of our evaluation for each environment is presented in Table 2. Table 2. Comparison among different interactive simulation environments. Table is from [59]. iGibson2

AI2THOR TDW

Sapien Habitat2 VirtualHome

Provided environment

15 homes (108 rooms) 120 rooms –





build from 8 rooms

Interactive objects

1217

609

200

2346



308

ROS support



×

×





×

Uncertainty support



×

×

×



×

Supported tasks Re

+

+++++

++++

++++

++++

++++

NAMO +++++

++

+++

+++

++

++++

PPM

+++++

+++++

++++

++++

+++++

+++++

Speed

++ (GPU)

++

++

+++

++++

++

Sensors

RGBD, Lidar

RGBD

RGBD RGBD RGBD

RGBD

– Provided Environment and Interactive Objects. A simulation environment with embedded scenes and interactive objects is easier to use since constructing the scenes is difficult for a beginner. A large number of interactive objects provide the users more freedom to adapt the environment to the tasks. – ROS Support. Robot Operating System (ROS) [44] is a generic and widely-used framework for building robot applications. It offers a standard software platform to developers across industries that will carry them from research and prototyping all the way through to deployment and production. With ROS support, we could easily transfer our algorithm from simulation environment to real applications. Hence, an environment supporting ROS has better transferability. – Uncertainty Support. In the real environment, the sensor data always contains noise and uncertainty. Introducing uncertainty in simulation environment provides more realistic results for the algorithms. – Support Tasks. Here we evaluate the usability of each environment to the three fundamental tasks described in Sect. 2.3. The score is given based on the provided environment, types of interaction, control interfaces, etc. A higher score means that it is easier to apply the environment to the task.

94

K. Zhang et al.

– Speed. Rendering speed is important in simulation experiments. All the environments can use only CPU for rendering except the iGibson2. – Sensors. We list which type of sensors the simulation environment supports (RGBD: Color and depth camera).

5 Challenges Although TAMP methods have been explored for decades, they still lack robustness and generalization capabilities and face limitations in practical applications. In this section, we present several potential areas for improvement. 5.1

Observation Uncertainty

Uncertainty in observations is usually caused by sensor noise, which is unavoidable in practical applications. While this is a very general concern, there are two solutions that target specifically this problem by (a) modeling the noise and reducing it by multiple observations [20], and (b) using learning methods to map the noisy data directly to actions [4]. An operator-based TAMP approach is proposed in [20] to address the observation uncertainty in the PPM task. This uncertainty arises in the localization of the robot and the target object. They eliminate the noise by modeling the noise of localization using a Gaussian model after multiple observations of the object from the robot. A noteworthy point is that they model the relative difference between the two objects rather than focusing on a single object because it may have systematic errors. In [4], raw sensor data is fed directly to a neural network, aiming to map raw observations into action sequences through reward optimization. This approach is simple because it does not require a complex modeling process, but a large number of training scenarios, 30,000 in their experiments, is required for the neural network to withstand the effects of random errors on the observations. In summary, although these approaches are effective in reducing the effect of observation errors on actions, the corresponding experimental setting remain quite simple and the robot has a large free space to operate. Therefore, this raises the question of whether the method is practical in limited and complex environments where robots need to perform household tasks. Making multiple observations from multiple angles in a small environment is not easily achievable and the learning-based methods are too inefficient. There are therefore still many challenges in the integration of uncertainty reduction actions in the TAMP problems. 5.2

Action Uncertainty

Using a symbolic operator, an action may produce different action effects given the same preconditions and actions. For example, the pick action may be performed by grabbing from the top of an object or from its side. This ambiguity may lead to failure when trying to place the object steadily. In a PPM task described in [46], the robot needs to pick an object and place it on a shelf. Due to the limited space under the ceiling of the shelf, the robot needs to choose an appropriate pick action. They collected a dataset

A Review of Classical and Learning Based Approaches in TAMP

95

from which several picking actions were obtained, such as picking from the side and picking from the top. By backtracking the target state, a solution is found so that the robot could choose actions based on the target state. Such backtracking method can certainly solve the uncertainty of the action, but it requires sufficient observation of the environment, which is usually not satisfied in practical applications. Another option is to replan based on observations. For example, initially the robot grasps an object from the top and observes that the current grasping setup is not appropriate for placing. It can replan and change the grasping configuration by temporarily putting the object on a table and then placing the object on the shelf. Nevertheless, replanning is usually a costly operation, and integrating efficient replanning strategies in TAMP is an interesting research direction. 5.3 Context-Aware Decision Making Real tasks are often more complex than the tasks artificially designed for research, and the robot needs to derive solutions by considering the relevant situational information for each task. For example, imagine a block building task where various shapes of blocks are provided and the goal is to assemble a car model. Solving the task is not possible without considering the shape of each block and of the car. Contextual analysis and mapping should therefore be adapted to as many tasks as possible and can benefit a variety of domains, including safe navigation, action verification, and understanding ambiguous tasks. To enable safe navigation, situational mapping allows robots to create their own danger zones based on the type of obstacles. For example, for stationary obstacles, such as walls and tables, the danger zones are relatively small, while for mobile obstacles, such as humans and vehicles, the danger zones are larger. Specifically, the shape of the danger zone can be related to the direction and speed of movement of the mobile obstacle. As an example, in [45], the real-time human behavior is analyzed to generate danger zones for safe navigation in crowded scenes. Due to the noise of sensors and action uncertainty, a robot might apply an action to an object but fail to change its state. For example, it can fail to grasp a bottle even after the grasp action. If there is no action verification step, all the following actions are vain. Utilizing the semantic information as a validation, like detecting whether there is still a bottle at the grasping point, is important to help the robot correct such mistakes in time. Sometimes, the description of a task allocated to a robot is ambiguous. For example, in PPM task, instead of asking the robot to pick an object at location A and put it at location B, the task could be described as picking an object in the living room and putting it in the kitchen. Without the precise location instruction, the robot should recognize and understand the environment, like judging whether the current room is the kitchen. Many research on semantic environment modelling and navigation can be exploited, such as the work proposed in [5] where the robot navigates with consideration of semantic information of environment and chooses suitable route to destination. In summary, TAMP has a long way to go to provide the capability to solve diverse kind of tasks in a unified approach, and a complete and coherent environment and context modelling is a key element in this direction.

96

5.4

K. Zhang et al.

Balance Between Optimum and Feasibility

This last challenge is on the definition of the TAMP final objective, which can be either to find an optimal solution or limited to finding a feasible solution. For example, in NAMO task, it might happen that the robot can either bypass an obstacle through moving a longer distance or clear the obstacle and pass it, which may correspond (depending on the particular situation) to a feasible solution and to the optimal solution respectively. The choice between these two solutions is of higher level than the TAMP task, and could be made by considering the semantic background, such as whether the task is urgent or whether one wants to secure its completion by avoiding more uncertain actions (such as manipulation). But it is therefore interesting to develop TAMP approaches that can provide sets of solutions with different characteristics, such as different optimality/risk tradeoffs.

6 Conclusion This paper, as an extension of [59], reviews recent advances in TAMP, including tasks, useful simulation environments, approaches and existing challenges. Three fundamental tasks, namely rearrangement, navigation among movable obstacles and pick-placemove task, are detailed. To facilitate the experiments, this paper also illustrates some simulation environments so the readers may readily select the appropriate experiment environment based on their needs. Besides, certain TAMP approaches are categorized into three groups based on whether they employ machine learning methodologies. Furthermore, their experiment tasks are also listed, which helps readers to easily choose a baseline according to their objectives and prior knowledge. Finally, we highlight the current difficulties with the goal of identifying potential investigation directions, in particular the need of algorithms that are more resilient to perception and action uncertainty and that can leverage environment semantics. Acknowledgements. This work was carried out in the scope of OTPaaS project. This project has received funding from the French government as part of the “Cloud Acceleration Strategy” call for manifestation of interest.

References 1. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discret. Event Dyn. Syst. 13(1), 41–77 (2003) 2. Chitnis, R., et al.: Guided search for task and motion plans using learned heuristics. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 447–454. IEEE (2016) 3. Coumans, E., Bai, Y.: Pybullet, a python module for physics simulation for games, robotics and machine learning (2016-2021). http://pybullet.org 4. Driess, D., Ha, J.S., Toussaint, M.: Deep visual reasoning: learning to predict action sequences for task and motion planning from an initial scene image. In: Robotics: Science and Systems 2020 (RSS 2020). RSS Foundation (2020)

A Review of Classical and Learning Based Approaches in TAMP

97

5. Drouilly, R., Rives, P., Morisset, B.: Semantic representation for navigation in large-scale environments. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1106–1111. IEEE (2015) 6. Eysenbach, B., Salakhutdinov, R.R., Levine, S.: Search on the replay buffer: bridging planning and reinforcement learning. Adv. Neural Inf. Process. Syst. 32 (2019) 7. Gan, C., et al.: Threedworld: a platform for interactive multi-modal physical simulation. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1) (2021) 8. Garrett, C.R., et al.: Integrated task and motion planning. Annu. Rev. Control Robot. Auton. Syst. 4, 265–293 (2021) 9. Garrett, C.R., Lozano-P´erez, T., Kaelbling, L.P.: FFRob: an efficient heuristic for task and motion planning. In: Akin, H.L., Amato, N.M., Isler, V., van der Stappen, A.F. (eds.) Algorithmic Foundations of Robotics XI. STAR, vol. 107, pp. 179–195. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16595-0 11 10. Garrett, C.R., Lozano-Perez, T., Kaelbling, L.P.: FFRob: leveraging symbolic planning for efficient task and motion planning. Int. J. Robot. Res. 37(1), 104–136 (2018) 11. Garrett, C.R., Lozano-P´erez, T., Kaelbling, L.P.: Sampling-based methods for factored task and motion planning. Int. J. Robot. Res. 37(13–14), 1796–1825 (2018) 12. Garrett, C.R., Lozano-P´erez, T., Kaelbling, L.P.: PDDLStream: integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 440–448 (2020) 13. Garrett, C.R., Paxton, C., Lozano-P´erez, T., Kaelbling, L.P., Fox, D.: Online replanning in belief space for partially observable task and motion problems. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 5678–5684. IEEE (2020) 14. Ghallab, M., et al.: PDDL–The Planning Domain Definition Language (1998). http:// citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.212 15. Ghallab, M., Nau, D., Traverso, P.: Automated Planning and Acting. Cambridge University Press, Cambridge (2016) 16. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018) 17. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968) 18. Heess, N., Wayne, G., Tassa, Y., Lillicrap, T., Riedmiller, M., Silver, D.: Learning and transfer of modulated locomotor controllers. arXiv preprint arXiv:1610.05182 (2016) 19. Kaelbling, L., Lozano-Perez, T.: Hierarchical task and motion planning in the now. In: Proceedings of the IEEE International Conference on Robotics and Automation, ICRA (2010) 20. Kaelbling, L.P., Lozano-P´erez, T.: Integrated task and motion planning in belief space. Int. J. Robot. Res. 32(9–10), 1194–1227 (2013) 21. Kim, B., Kaelbling, L.P., Lozano-P´erez, T.: Adversarial actor-critic method for task and motion planning problems using planning experience. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8017–8024 (2019) 22. Kim, B., Shimanuki, L.: Learning value functions with relational state representations for guiding task-and-motion planning. In: Conference on Robot Learning, pp. 955–968. PMLR (2020) 23. Kim, B., Shimanuki, L., Kaelbling, L.P., Lozano-P´erez, T.: Representation, learning, and planning algorithms for geometric task and motion planning. Int. J. Robot. Res. 41(2), 210– 231 (2022) 24. Kim, B., Wang, Z., Kaelbling, L.P., Lozano-P´erez, T.: Learning to guide task and motion planning using score-space representation. Int. J. Robot. Res. 38(7), 793–812 (2019)

98

K. Zhang et al.

25. King, J.E., Cognetti, M., Srinivasa, S.S.: Rearrangement planning using object-centric and robot-centric action spaces. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3940–3947. IEEE (2016) 26. Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), vol. 3, pp. 2149–2154. IEEE (2004) 27. Kolve, E., et al.: AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2017) 28. Konidaris, G., Kaelbling, L.P., Lozano-Perez, T.: From skills to symbols: learning symbolic representations for abstract high-level planning. J. Artif. Intell. Res. 61, 215–289 (2018) 29. Kuffner, J.J., LaValle, S.M.: RRT-connect: an efficient approach to single-query path planning. In: Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), vol. 2, pp. 995– 1001. IEEE (2000) 30. Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Process. Syst. 29 (2016) 31. LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006) 32. Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight. In: International Conference on Learning Representations (2018) 33. Li, C., et al.: iGibson 2.0: object-centric simulation for robot learning of everyday household tasks. In: 5th Annual Conference on Robot Learning (2021) 34. Li, C., Xia, F., Martin-Martin, R., Savarese, S.: HRL4IN: hierarchical reinforcement learning for interactive navigation with mobile manipulators. In: Conference on Robot Learning, pp. 603–616. PMLR (2020) 35. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (Poster) (2016) 36. Lu, D.V., Hershberger, D., Smart, W.D.: Layered costmaps for context-sensitive navigation. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 709– 715. IEEE (2014) 37. Meng, Z., Sun, H., Teo, K.B., Ang, M.H.: Active path clearing navigation through environment reconfiguration in presence of movable obstacles. In: 2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), pp. 156–163. IEEE (2018) 38. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015) 39. Nachum, O., Gu, S., Lee, H., Levine, S.: Near-optimal representation learning for hierarchical reinforcement learning. In: International Conference on Learning Representations (2018) 40. Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. Adv. Neural Inf. Process. Syst. 31 (2018) 41. Pasula, H.M., Zettlemoyer, L.S., Kaelbling, L.P.: Learning symbolic models of stochastic domains. J. Artif. Intell. Res. 29, 309–352 (2007) 42. Patel, U., Kumar, N.K.S., Sathyamoorthy, A.J., Manocha, D.: DWA-RL: dynamically feasible deep reinforcement learning policy for robot navigation among mobile obstacles. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 6057–6063. IEEE (2021) 43. Puig, X., et al.: VirtualHome: simulating household activities via programs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8494–8502 (2018) 44. Quigley, M., et al.: ROS: an open-source robot operating system. In: ICRA Workshop on Open Source Software, vol. 3, p. 5. Kobe, Japan (2009)

A Review of Classical and Learning Based Approaches in TAMP

99

45. Samsani, S.S., Muhammad, M.S.: Socially compliant robot navigation in crowded environment by human behavior resemblance using deep reinforcement learning. IEEE Robot. Autom. Lett. 6(3), 5223–5230 (2021) 46. Silver, T., Chitnis, R., Tenenbaum, J., Kaelbling, L.P., Lozano-P´erez, T.: Learning symbolic operators for task and motion planning. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3182–3189. IEEE (2021) 47. Srivastava, S., Fang, E., Riano, L., Chitnis, R., Russell, S., Abbeel, P.: Combined task and motion planning through an extensible planner-independent interface layer. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 639–646. IEEE (2014) 48. Sun, H., Zhang, W., Yu, R., Zhang, Y.: Motion planning for mobile robots–focusing on deep reinforcement learning: a systematic review. IEEE Access 9, 69061–69081 (2021) 49. Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS) (2021) 50. Toussaint, M.: Logic-geometric programming: an optimization-based approach to combined task and motion planning. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015) 51. Toyer, S., Thi´ebaux, S., Trevizan, F., Xie, L.: ASNets: deep learning for generalised planning. J. Artif. Intell. Res. 68, 1–68 (2020) ¨ ¨ Padir, T.: Affordance-based mobile robot navigation among 52. Wang, M., Luo, R., Onol, A.O., movable obstacles. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2734–2740. IEEE (2020) 53. Wang, Z., Garrett, C.R., Kaelbling, L.P., Lozano-P´erez, T.: Active model learning and diverse action sampling for task and motion planning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4107–4114. IEEE (2018) 54. Wang, Z., Garrett, C.R., Kaelbling, L.P., Lozano-P´erez, T.: Learning compositional models of robot skills for task and motion planning. Int. J. Robot. Res. 40(6–7), 866–894 (2021) 55. Xia, F., Li, C., Mart´ın-Mart´ın, R., Litany, O., Toshev, A., Savarese, S.: RelMoGen: integrating motion generation in reinforcement learning for mobile manipulation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4583–4590. IEEE (2021) 56. Xiang, F., et al.: SAPIEN: a simulated part-based interactive environment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11097–11107 (2020) 57. Younes, H.L., Littman, M.L.: PPDDL1. 0: an extension to PDDL for expressing planning domains with probabilistic effects. Technical report. CMU-CS-04-162 2, 99 (2004) 58. Zeng, K.H., Weihs, L., Farhadi, A., Mottaghi, R.: Pushing it out of the way: interactive visual navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9868–9877 (2021) 59. Zhang, K., Lucet, E., Sandretto, J.A.D., Kchir, S., Filliat, D.: Task and motion planning methods: applications and limitations. In: 19th International Conference on Informatics in Control, Automation and Robotics ICINCO 2022), pp. 476–483. SCITEPRESS-Science and Technology Publications (2022) 60. Zhang, X., Zhu, Y., Ding, Y., Zhu, Y., Stone, P., Zhang, S.: Visually grounded task and motion planning for mobile manipulation. arXiv preprint arXiv:2202.10667 (2022) 61. Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3357–3364. IEEE (2017)

Multi-objective Ranking to Optimize CNN’s Encoding Features: Application to the Optimization of Tracer Dose for Scintigraphic Imagery V. Vigneron(B) , H. Maaref , and J.-P. Conge Univ. Evry, Université Paris-Saclay, IBISC EA 4526, Evry, France {vincent.vigneron,hichem.maaref}@univ-evry.fr Abstract. The pooling layer is at the core of every convolutional neural network (CNN), contributing to data invariance, variation, and perturbation. It describes which part of the input image a neuron in the output layer can see. CNNs with max pooling can handle simple transformations like flips or rotations without too much trouble. The problem comes with complicated modifications. The rank order importance is used here as an alternative to max-pooling. The rank texture descriptor is non-parametric, independent of geometric layout or size of image regions, and can better tolerate rotations. These description functions produce images that emphasize low/high frequencies, contours, etc. We propose a multi-objective ranking algorithm derived from Vargas et al. [10] to optimize CNN’s encoding features. It is applied for the first time to estimate trace dose in radiology with scintigraphic imagery. Keywords: Deep Learning · Pooling function · Rank aggregation · Segmentation · Contour extraction

1 Introduction CNN architecture is augmented by multi-resolution (pyramidal) structures which come from the idea that the network needs to see different levels of resolutions to produce good results. A CNN stacks four processing layers: convolution, pooling, ReLU, and fully-connected [7]. The pooling layer receives several input feature maps between two convolutional layers. Pooling (i) reduces the number of parameters in the model (subsampling) and computations in the network while preserving their important characteristics (ii) improves the efficiency of the network (iii) avoids over-training. The question is how to pool the characteristics of the input image optimally. This central question in the design of deep learning (DL) architectures is partly answered by Lazebnik, who demonstrated the importance of the spatial structure of the pooling areas [13]. Indeed, This research was supported by the program Cátedras Franco-Brasileiras no Estado de São Paulo, an initiative of the French consulate and the state of São Paulo (Brazil). We thank Sofia Vargas Ibarra, in charge of the computer simulations, Olivier Montsarrat and Sadish Anebajagane, for providing the radiological data, and GENOPOLE, for its financial support. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  G. Gini et al. (Eds.): ICINCO 2022, LNNS 836, pp. 100–113, 2023. https://doi.org/10.1007/978-3-031-48303-5_6

Multi-objective Ranking to Optimize CNN’s Encoding Features

101

the local spatial variations of image pixels’ intensities (textures) characterize an “organized area phenomenon” that cannot be captured in usual pooling layers [21]. This was first explicitly shown by Haralick in the 70’s [9]. Pooling layers summarize the presence of features in tiles of the feature map generated by the convolutional layer. The two most common pooling methods are average pooling and max pooling. All classical computer vision tasks consider summarizing local image content a fundamental step. The pooling layer makes the network less sensitive to the position of the features: the fact that an object is a little higher or lower, or even that it has a slightly different orientation, should not cause a radical change in the classification of the image. The max-pooling operator, for instance, down-samples the input representation (image, hidden layer output matrix, etc.), reducing its dimensionality. Even though pooling layers do not have parameters, they affect the back-propagation (derivatives) calculation. Back-propagating through the max-pooling layer selects the maximum neuron from the previous layer (on which the max-pooling was performed) and continues back-propagation only through it. The max function is locally linear for the activation obtained by the max, so its derivative is 1, and 0 for the other neurons, which did not succeed. This is conceptually very similar to the differentiation of the rectified linear activation function, or ReLU for short: ReLU(x) = max(0, x). Suppose a layer H comes on top of a layer H−1 . Then the activation of the ith neuron of the layer H is  Hi = f ( wij H(−1)i ), (1) j

where f is the activation function and W = {wij } are the weights. The derivation of Eq. (1) by the chain rule gives the gradient flows as follows  grad(H(−1)i ) = grad(Hi )f  wij . (2) i

In summary, in the case of max-pooling, f (x) = x (identity) for the max neuron and f = 0 for all other neurons, so f  = 1 for the max neuron of the previous layer and f  = 0 for all other neurons. Back-propagating through the max-pooling layer selects the max neuron of the previous layer (on which the max-pooling was done) and continues back-propagation only through that neuron [19]. The problem is that the maximum chosen by the max-pooling in the pixel patch is not the actual maximum. Grattarola et al. [8] identify that the pooling functions generally “do not preserve well the spatial information due to the reduction of the spatial information”. The pooling layer reduces the number of parameters to learn and the computation performed in the network [5, 24]. This paper proposes another pooling operator with similar properties based on ranks, with additional advantages: (a) it is independent of the geometric arrangement or sizes of the tiles, (b) invariant in translation and can better tolerate rotations, (c) it summarizes features instead of precisely positioned features generated by the convolution layers. This makes the model more robust to variations in the position of the features in the input image. It is based on Savage’s definition of rank order [18] but also to texture analysis promoted by Haralick. It is parameter-free, and its implementation relies on a multi-objective optimization function.

102

V. Vigneron et al.

Notations. Small Latin letters a, b, . . . represent integers throughout this paper. Small bold letters a, b are put for vectors, and capital letters A, B for matrices or tensors, depending on the context. The dot product between two vectors is denoted . √ , the 2 norm of a vector. X1 , . . . , Xn are non We denote by a = ordered variates, x1 , . . . , xn non ordered observations. "Ordered statistics" means either p(1) ≤ . . . ≤ p(n) (ordered variates) and p(1) ≤ . . . ≤ p(n) (ordered observations). The extreme order statistics are p(1) = min{x1 , x2 . . . , xn }, p(n) = max{x1 , x2 , . . . , xn }. The sample range is p(n) − p(1) . The p(i) are necessarily dependent because of the inequality relations among them. Definition 1 (Savage [18]). The rank order corresponding to the n distinct numbers x1 , . . . , xn is the vector t = (r1 , . . . , rn )T where ri is the number of xj ’s≤ xi and i = j.

2 A Reminder on Texture Encoding Most of the image descriptors that encode local structures e.g. , local binary patterns (LBP) and its variants [15, 17] depend on the reading order of the neighbors as they compute the feature value as the weighted sum of a mathematical function of neighboring pixels w.r.t their order in the neighborhood. The new pixel value LP,R at the center of the image is an integer in the range of 0 to 255 (for an 8-bit encoding) given by:  P −1  1 if x ≥ 0, p LR (P ) = 2 · t(gp − gc ), with t(x) = (3) 0 otherwise p=0 where P is the number of pixels in the neighborhood considering the distance R between central pixel gc and the neighboring pixels {gp |p = 0, . . . , P − 1}. In Fig. 1 LBP generates an 8-bit string for a 3 × 3 neighborhood by computing the Heaviside function t(·) of the difference of neighboring pixel and the central pixel, i.e. (gp − gc ). Invariance w.r.t. any monotone transformation of the grey scale is obtained by considering in (3) the signs of the differences t(gi − gc ), i = 0, . . . , P − 1. But the independence of gc and {|g0 − gc |, . . . , |gP −1 − gc |} is not warranted in practice. Moreover, under certain circumstances, LBP misses the local structure as it does not consider the central pixel. Methods based on higher-order statistics (e.g., co-occurrence matrices) provide a complete statistical characterization but are extremely time-consuming.

Fig. 1. Example of 3 × 3 image neighborhood (P = 8 and R = 1) (from Vargas et al. in [10]).

We propose a new texture-encoding model insensitive to the neighborhood size and the reading order of neighboring pixels. It is based on rank-order statistics that strongly

Multi-objective Ranking to Optimize CNN’s Encoding Features

103

depend on the spatial relationships among grey levels of pixels and could be utilized for contour detection, region-based and boundary-based segmentation, or image classification [11, 14, 25]. Since CNNs have been widely used for image analysis, we further explore how such a model is suitable for capturing image texture and improving CNNs performances. The following section is a reminder of rank-order modelization and presents the optimization program with the Spearman distance and the rank-absolute deviation. Section 4 shows how the linear program is implemented in the CNN model for generating feature maps. It is experimented with region-based image segmentation in experiment 1. It is applied in the Sect. 5 for optimizing trace dose in scintigraphic imagery.

3 The Rank-Order Aggregation Problem 3.1 Explicit or Implicit Resolution Solving a rank-aggregation problem means finding a vector of values t∗ attributed by a virtual judge to a set of alternatives, candidates, individuals, etc. A = {a1 , a2 , . . . , an } by minimizing the disagreements of opinions in a set of voters, judges, criteria, etc. V = {v1 , v2 , . . . , vm } [1], i.e. t∗ = arg min t

m 

d(t, t(k) ),

s.t. t ≥ 0,

(4)

k=1

where d(t, t(k) ) is a metric measuring the proximity between t and t(k) , chosen a priori, and t(k) is the kth column of the table T . Depending on the properties of d(·), we will deal with a nonlinear optimization program with an explicit or implicit solution. The data is collected in a (n × m) table T = {tij } crossing the sets A and V (Fig. 2a). tij can be marked (tij ∈ N), value scales (tij ∈ R), ranks or binary numbers (such as opinion yes/no). T represents the ranking of the n candidates assigned by the m voters as a total order, i.e. tik = ti k , ∀i = i [3]. Equation (4) defines a nonlinear optimization program whose solution is t∗ [23].  ) ) between the ranking of voters k and k  can be chosen as The distance d(t(k) , t(k n the Euclidean distance i=1 (tik − tik )2 or the disagreement distance (Condorcet) n   (k) (k )  i j |yij − yij | if i=1 sgn|tik − tik | or the rank absolute-deviation distance t(k) is replaced by its permutation matrix Y (k) (Fig. 2b). 3.2 Euclidean Distance (Spearman Distance) m n Minimizing the Euclidean distance defined by k=1 i=1 (t∗i −tik )2 , where ti denotes the rank of the ith candidate, consists in looking for the optimal consensus t∗ of m voters who attributed the votes t(1) , t(2) , . . . , t(m) to the candidates a1 , a2 , . . . , an . By definition, t ∈ Sn , the symmetric group of the n! permutations [4]. Hence t∗ ∈ Sn . The permutation t can be represented for a voter k by a permutation matrix Y (k) = (k) (k) (k) {yij }ni,j=1 , yij ∈ {0, 1}, and yij = 1 if the candidate i is positioned in place j for

104

V. Vigneron et al.

(k)

Fig. 2. (a) The data is collected in a (n × m) table T . (b) yij

(k) matrix for which yij = 1 if the rank of the alternative ai is (k) (k) otherwise [6]. yii = 0 and yij = 0 if i and j are ex-aequos.

= 1i