Artificial Intelligence for Robotics and Autonomous Systems Applications 3031287142, 9783031287145

This book addresses many applications of artificial intelligence in robotics, namely AI using visual and motional input.

374 115 14MB

English Pages 487 [488] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Efficient Machine Learning of Mobile Robotic Systems Based on Convolutional Neural Networks
1 Introduction
2 Problem Analysis and Formulation
3 Efficient Deep Learning for Robotics—Related Work
3.1 Efficient Deep Learning Models for Object Detection in Robotic Applications
3.2 Efficient Deep Learning Models for Semantic Segmentation in Robotic Applications
4 The Proposed Models of Efficient CNNs for Semantic Segmentation Implemented on Jetson Nano
5 Obstacle Avoidance Algorithm for Mobile Robots Based on Semantic Segmentation
6 Experimental Results
7 Discussion of Results
8 Conclusion
Appendix A
References
UAV Path Planning Based on Deep Reinforcement Learning
1 Introduction
1.1 Research Background and Significance
1.2 Research Status
1.3 The Main Research Content and Chapter Arrangement of this Chapter
2 Deep Learning and Reinforcement Learning
2.1 Comparison of Supervised Learning, Unsupervised Learning and Reinforcement Learning
2.2 Deep Learning Methods
2.3 Reinforcement Learning Methods
2.4 DQN Algorithm
3 Design of Improved DQN Algorithm Combined with Artificial Potential Field
3.1 Network Structure Design
3.2 State Space Design
3.3 Action Space Design
3.4 Reward Function Design
4 Simulation Experiment and Result Analysis
4.1 Reinforcement Learning Path Planning Training and Testing
4.2 Training and Results
4.3 Comparative Analysis of Improved DQN and Traditional DQN Algorithms
5 Conclusions
References
Drone Shadow Cloud: A New Concept to Protect Individuals from Danger Sun Exposure in GCC Countries
1 Introduction
2 Related Work
2.1 Overview
2.2 Proposal
3 Proposed Method
3.1 Design of Mechanical Structure
3.2 Shade Fabric
3.3 Selecting a Umbrella Flight Controller
3.4 Flying Umbrella Power Calculation
4 Experiment Phase
5 Results and Discussion
6 Conclusion
References
Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter, Machine Learning and Curve-Fitting Method for High-speed Target Interception
1 Introduction
2 Related Work
3 Vision Based Target Position Estimation
3.1 Computation of Centre of Instantaneous Curvature of Target Trajectory
3.2 EKF Formulation
3.3 Future State Prediction
4 Mathematical Formulation for Curve Fitting Method
4.1 Classification of Curves
4.2 Least-Squares Curve Fitting in 2D
4.3 Least-Squares Curve Fitting of Any Shape in 3D
5 Interception Strategy
6 Results
6.1 Simulation Experiments
6.2 Hardware Experiments
7 Discussions
8 Conclusions
References
Robotics and Artificial Intelligence in the Nuclear Industry: From Teleoperation to Cyber Physical Systems
1 Introduction
1.1 Background
1.2 Motivation
1.3 Problem Statement
1.4 Recent Technological Advances–Industry 4.0
1.5 Chapter Outlines and Contributions
2 Nuclear Decommissioning Processes
2.1 Characterisation
2.2 Decontamination
2.3 Dismantling and Demolition
2.4 Waste Management
3 Current Practice in Nuclear Decommissioning Research
3.1 Assisted Teleoperation and Manipulation in Nuclear Decommissioning
3.2 Robot-Assisted Glovebox Teleoperation
3.3 Post-processing of Nuclear Waste
3.4 Modular and Cooperative Robotic Platforms
3.5 Unmanned Radiation-Monitoring Systems
4 Towards an Autonomous Nuclear Decommissioning Process
4.1 Different Levels of Autonomy
4.2 The Cyber Physical System Architecture
4.3 Enabling Technologies
5 A Cyber Physical Nuclear Robotic System
5.1 Software Architectures
5.2 Autonomous Multi-robot Systems
5.3 Control System Design
5.4 Motion Planning Algorithms
5.5 Vision and Perception
5.6 Digital Twins in Nuclear Environment
6 Conclusions
References
Deep Learning and Robotics, Surgical Robot Applications
1 Introduction
2 Related Work
3 Machine Learning and Surgical Robot
4 Robotics and Deep Learning
5 Surgical Robots and Deep Learning
6 Current Innovation in Surgical Robotics
7 Limitation of Surgical Robot
8 Future Direction of Surgical Robot
9 Discussion
10 Conclusions
References
Deep Reinforcement Learning for Autonomous Mobile Robot Navigation
1 Introduction
2 Antecedents
2.1 Control Theory, Linear Control, and Mechatronics
2.2 Non-linear Control
2.3 Classical Robotics
2.4 Probabilistic Robotics
2.5 Introduction of Back-Propagation for Feed-Forward Neural Networks
2.6 Deep Reinforcement Learning
3 Background: Autonomous Mobile Robot Navigation and Machine Learning
3.1 Requirements
3.2 Review of RL in AMR
3.3 Introduction of Convolutional Neural Networks
3.4 Advanced AMR: Introduction to DRL (Deep Reinforcement Learning) Approach
3.5 Application Requirements
4 Deep Reinforcement Learning Methods
4.1 Continuous Control
4.2 A Simple Implementation of Q-learning
4.3 Dueling Double DQN
4.4 Actor-Critic Learning
4.5 Learning Autonomous Mobile Robotics with Proximal Policy Optimization
4.6 Multi-Agent Deep Reinforcement Learning
4.7 Fusion Method
4.8 Hybrid Method
4.9 Hierarchical Framework
5 Design Methodology
5.1 Benchmarking
6 Teaching
7 Discussion
8 Conclusions
References
Event Vision for Autonomous Off-Road Navigation
1 Introduction
2 Related Work
2.1 Off-Road Navigation
2.2 Neuromorphic Vision
2.3 Key Features of Event Cameras
2.4 Data Registration
2.5 Feature Detection
2.6 Algorithmic Compatibility
3 Event-Based Vision Navigation
3.1 Vision and Ranging Sensor Fusion
3.2 Stereo Event Vision
3.3 Monocular Depth Estimation from Events
4 Proposed End-to-End Navigation Model
4.1 Event Preprocessing
4.2 Depth Estimation Branch
4.3 Steering Prediction Branch
4.4 Desert Driving Dataset
5 System Implementation
5.1 Deep Learning Acceleration
5.2 Memristive Neuromorphic Computing
6 Results and Discussion
7 Conclusions
References
Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot in the Warehouse
1 Introduction
2 Related Work
3 Problem Formulation
3.1 Problem Framework and Description
3.2 Motion Planning
4 Methodology
4.1 Multi-armed Bandit Formulation
4.2 Task Scheduling Based on Time Synchronization
5 Results and Analysis
5.1 Simulation Results
5.2 Discussion
6 Conclusion
References
Machine Learning and Deep Learning Approaches for Robotics Applications
1 Introduction
2 Autonomous Versus Automatic Robots
3 Robotics Applications
3.1 Computer Vision
3.2 Computer Vision
3.3 Learning Through Imitation
3.4 Self-supervised Learning
3.5 Assistive and Medical Technologies
3.6 Multi-agent Learning
4 Extreme Learning Machines Methods for Robotics
5 Machine Learning for Soft Robotics
6 ML-Based Robotics Applications
6.1 Robotics Recommendation Systems Using ML
6.2 Nano-Health Applications Based on Machine Learning
6.3 Localizations Based on ML Applications
6.4 Control of Dynamic Traffic Robots
7 Robotics Applications Challenges
8 Conclusions
References
A Review on Deep Learning on UAV Monitoring Systems for Agricultural Applications
1 Introduction
2 Proposed Methodology
2.1 An Overview of Deep Learning Strategies used in Agriculture
3 Findings on Applications of Deep Learning Models in Plant Monitoring
3.1 Pest Infiltration
3.2 Plant Growth
3.3 Fruit Conditions
3.4 Weed Invasion
3.5 Crop Disease Monitoring
4 Findings on Applications of Deep Learning Models in Animal Monitoring
4.1 Animal Population
5 Discussion and Comparison of Deep Learning Strategies in Agricultural Applications
6 Conclusions
References
Navigation and Trajectory Planning Techniques for Unmanned Aerial Vehicles Swarm
1 Introduction
2 UAV Technical Background
2.1 UAV Architecture
2.2 UAV Swarm Current State
2.3 UAV Swarm Advantages
2.4 UAV Swarm Applications
3 Swarm Communication and Control System Architectures
3.1 Centralized Communication Architecture
3.2 Decentralized Communication Architecture
4 Navigation and Path Planning for UAV Swarm
4.1 UAVs Network Communication and Path Planning Architecture
4.2 Trajectory Planning for UAVs Navigation Classifications
4.3 Route Planning Challenges
5 Classical Techniques for UAV Swarm Navigation and Path Planning
5.1 Roadmap Approach (RA)
5.2 Cell Decomposition (CD)
5.3 Artificial Potential Field (APF)
6 Reactive Approaches for UAV Swarm Navigation and Path Planning
6.1 Genetic Algorithm (GA)
6.2 Neural Network (NN)
6.3 Firefly Algorithm (FA)
6.4 Ant Colony Optimization (ACO)
6.5 Cuckoo Search (CS)
6.6 Particle Swarm Optimization (PSO)
6.7 Bacterial Foraging Optimization (BFO)
6.8 Artificial Bee Colony (ABC)
6.9 Adaptive Artificial Fish Swarm Algorithm (AFSA)
7 Conclusions
References
Intelligent Control System for Hybrid Electric Vehicle with Autonomous Charging
1 Introduction
2 Preliminaries
2.1 Hybrid Vehicle and Pure Electric Vehicle
2.2 Hybrid Vehicle Architecture
3 The Architecture of Electric Vehicles
3.1 Battery Technologies
3.2 Super-Capacitors
3.3 The Electric Motor
4 Electric Vehicles Charging
4.1 Types of Classic Chargers
4.2 Autonomous Charger
5 The Mathematical Model for the Autonomous Charging System
5.1 Inductive Power Transfer Model
5.2 Photovoltaic Generator Model
6 Simulation Results and Discussion
6.1 Fuzzy Logic Algorithms
6.2 Power Delivered by the Charging System
6.3 Power Distribution and SOC Evolution
7 Conclusion
References
Advanced Sensor Systems for Robotics and Autonomous Vehicles
1 Introduction
1.1 Automatic Driving Application
1.2 Railway Monitoring Application
2 Related Works
3 Types of Sensors for Various Applications
3.1 Efficient Road Monitoring
3.2 Efficient Railway Monitoring Monitoring
4 Conclusion
References
Four Wheeled Humanoid Second-Order Cascade Control of Holonomic Trajectories
1 Introduction
2 Related Work
3 Robot Motion Model
4 Observer Models
5 Omnidirectional Cascade Controller
6 Results Analysis and Discussion
7 Conclusions
References
Recommend Papers

Artificial Intelligence for Robotics and Autonomous Systems Applications
 3031287142, 9783031287145

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Studies in Computational Intelligence 1093

Ahmad Taher Azar Anis Koubaa   Editors

Artificial Intelligence for Robotics and Autonomous Systems Applications

Studies in Computational Intelligence Volume 1093

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

Ahmad Taher Azar · Anis Koubaa Editors

Artificial Intelligence for Robotics and Autonomous Systems Applications

Editors Ahmad Taher Azar College of Computer and Information Sciences Prince Sultan University Riyadh, Saudi Arabia

Anis Koubaa College of Computer and Information Sciences Prince Sultan University Riyadh, Saudi Arabia

Automated Systems and Soft Computing Lab (ASSCL) Prince Sultan University Riyadh, Saudi Arabia Faculty of Computers and Artificial Intelligence Benha University Benha, Egypt

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-031-28714-5 ISBN 978-3-031-28715-2 (eBook) https://doi.org/10.1007/978-3-031-28715-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Robotics, autonomous control systems, and artificial intelligence technology are all examples of the Fourth Industrial Revolution. In order to further the autonomous car age, artificial intelligence is integrated with robots and independent driver systems. The remarkable advancements in artificial technology have sparked the development of new services and online solutions to a variety of social problems. However, there is still more to be done to integrate artificial intelligence with physical space. Robotics, a branch of a technical and scientific discipline that deals with bodily interactions with the physical world, is mechanical in nature. There are several domain-specific skills in robotics, including sensing and perception, computing film/dynamic actions, and controlling theory. Robotics and artificial intelligence work together to revolutionize society by connecting cyberspace and physical space.

Objectives of the Book This book’s objective is to compile original papers and reviews that demonstrate numerous uses of robotics and artificial intelligence (AI). It seeks to showcase cuttingedge robotics and AI applications as well as developments in machine learning and computational intelligence technologies in a variety of scenarios. In order to give a cogent and comprehensive strategy to ion conservation employing technology and analytics, it is also urged to develop and critically evaluate data analysis methodologies using such approaches. For applied AI in robotics, this book should serve as a useful point of reference for both beginners and experts. Both novice and expert readers should find this book a useful reference in the field of artificial intelligence, mathematical modelling, robotics, control systems, and reinforcement learning.

v

vi

Preface

Organization of the Book This well-structured book consists of 15 full chapters.

Book Features • The book chapters deal with the recent research problems in the areas of artificial intelligence, mathematical modelling, robotics, control systems, and reinforcement learning. • The book chapters present advanced techniques of AI applications in robotics and drones. • The book chapters contain a good literature survey with a long list of references. • The book chapters are well-written with a good exposition of the research problem, methodology, block diagrams, and mathematical techniques. • The book chapters are lucidly illustrated with numerical examples and simulations. • The book chapters discuss details of applications and future research areas.

Audience The book is primarily meant for researchers from academia and industry, who are working in the research areas such as robotics engineering, control engineering, mechatronic engineering, biomedical engineering, medical informatics, computer science, and data analytics. The book can also be used at the graduate or advanced undergraduate level and many others.

Acknowledgements As the editors, we hope that the chapters in this well-structured book will stimulate further research in artificial intelligence, mathematical modelling, robotics, control systems, and reinforcement learning, and utilize them in real-world applications. We hope sincerely that this book, covering so many different topics, will be very useful for all readers.

Preface

vii

We would like to thank all the reviewers for their diligence in reviewing the chapters. Special thanks go to Springer, especially the book Editorial team. Riyadh, Saudi Arabia/Benha, Egypt

Riyadh, Saudi Arabia

Prof. Ahmad Taher Azar [email protected] [email protected] [email protected] Prof. Anis Koubaa [email protected]

Contents

Efficient Machine Learning of Mobile Robotic Systems Based on Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Milica Petrovi´c, Zoran Miljkovi´c, and Aleksandar Joki´c UAV Path Planning Based on Deep Reinforcement Learning . . . . . . . . . . . Rui Dong, Xin Pan, Taojun Wang, and Gang Chen Drone Shadow Cloud: A New Concept to Protect Individuals from Danger Sun Exposure in GCC Countries . . . . . . . . . . . . . . . . . . . . . . . Mohamed Zied Chaari, Essa Saad Al-Kuwari, Christopher Loreno, and Otman Aghzout Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter, Machine Learning and Curve-Fitting Method for High-speed Target Interception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aakriti Agrawal, Aashay Bhise, Rohitkumar Arasanipalai, Lima Agnel Tony, Shuvrangshu Jana, and Debasish Ghose

1 27

67

93

Robotics and Artificial Intelligence in the Nuclear Industry: From Teleoperation to Cyber Physical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Declan Shanahan, Ziwei Wang, and Allahyar Montazeri Deep Learning and Robotics, Surgical Robot Applications . . . . . . . . . . . . . 167 Muhammad Shahid Iqbal, Rashid Abbasi, Waqas Ahmad, and Fouzia Sher Akbar Deep Reinforcement Learning for Autonomous Mobile Robot Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Armando de Jesús Plasencia-Salgueiro Event Vision for Autonomous Off-Road Navigation . . . . . . . . . . . . . . . . . . . 239 Hamad AlRemeithi, Fakhreddine Zayer, Jorge Dias, and Majid Khonji

ix

x

Contents

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot in the Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Ajay Kumar Sandula, Pradipta Biswas, Arushi Khokhar, and Debasish Ghose Machine Learning and Deep Learning Approaches for Robotics Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Lina E. Alatabani, Elmustafa Sayed Ali, and Rashid A. Saeed A Review on Deep Learning on UAV Monitoring Systems for Agricultural Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Tinao Petso and Rodrigo S. Jamisola Jr Navigation and Trajectory Planning Techniques for Unmanned Aerial Vehicles Swarm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Nada Mohammed Elfatih, Elmustafa Sayed Ali, and Rashid A. Saeed Intelligent Control System for Hybrid Electric Vehicle with Autonomous Charging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Mohamed Naoui, Aymen Flah, Lassaad Sbita, Mouna Ben Hamed, and Ahmad Taher Azar Advanced Sensor Systems for Robotics and Autonomous Vehicles . . . . . . 439 Manoj Tolani, Abiodun Afis Ajasa, Arun Balodi, Ambar Bajpai, Yazeed AlZaharani, and Sunny Four Wheeled Humanoid Second-Order Cascade Control of Holonomic Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 A. A. Torres-Martínez, E. A. Martínez-García, R. Lavrenov, and E. Magid

Efficient Machine Learning of Mobile Robotic Systems Based on Convolutional Neural Networks Milica Petrovi´c, Zoran Miljkovi´c, and Aleksandar Joki´c

Abstract During the last decade, Convolutional Neural Networks (CNNs) have been recognized as one of the most promising machine learning methods that are being utilized for deep learning of autonomous robotic systems. Faced with everlasting uncertainties while working in unstructured and dynamical real-world environments, robotic systems need to be able to recognize different environmental scenarios and make adequate decisions based on machine learning of the current environment’s state representation. One of the main challenges in the development of machine learning models based on CNNs is in the selection of appropriate model structure and parameters that can achieve adequate accuracy of environment representation. In order to address this challenge, the book chapter provides a comprehensive analysis of the accuracy and efficiency of CNN models for autonomous robotic applications. Particularly, different CNN models (i.e., structures and parameters) are trained, validated, and tested on real-world image data gathered by a mobile robot’s stereo vision system. The best performing CNN models based on two criteria—the number of frames per second and mean intersection over union are implemented on the real-world wheeled mobile robot RAICO (Robot with Artificial Intelligence based COgnition), which is developed in the Laboratory for robotics and artificial intelligence (ROBOTICS&AI) and tested for obstacle avoidance tasks. The achieved experimental results show that the proposed machine learning strategy based on CNNs provides high accuracy of mobile robot’s current environment state estimation. Keywords Efficient deep learning · Convolutional neural networks · Mobile robot control · Robotic vision · NVidia Jetson Nano

M. Petrovi´c (B) · Z. Miljkovi´c · A. Joki´c Faculty of Mechanical Engineering, University of Belgrade, Belgrade, Serbia e-mail: [email protected] Z. Miljkovi´c e-mail: [email protected] A. Joki´c e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_1

1

2

M. Petrovi´c et al.

1 Introduction The worldwide interest in Artificial Intelligence (AI) techniques has become evident after the paper [1] reached a substantially better result on image classification task by utilizing Artificial Neural Networks (ANNs). Afterward, numerous ANN models that achieve even better results on image classification and various other tasks have been developed in [2]. Consequently, Deep Learning (DL) emerged as a new popular AI subfield. DL represents the process of training and using ANNs that utilize much deeper architectures, i.e., models with a large number of sequential layers. Another important innovation provided by [1, 3] was that the deep ANNs provide better results with convolutional layers instead of fully connected layers. Therefore, deep ANNs with convolution as a primary layer are entitled as Convolutional Neural Networks (CNNs). The CNN models such as ResNet [4], VGG [5], and Xception [6] have become the industry and research go-to options, and many researchers tried and succeeded in improving models’ accuracy or modifying the models for other tasks and purposes. An introductory explanation of the CNN layers (e.g., Pooling, ReLU, convolution, etc.) is beyond the scope of this chapter, and interested readers are referred to the following literature [2]. Background of the research performed in this chapter includes the high interconnection of robotics, computer vision, and AI fields that has led numerous researchers in the robotics community to get interested in DL. Many robotics tasks that have a highly non-linear nature can be effectively approximated by utilizing DL techniques. Nowadays, the utilization of DL in robotics spans from Jacobian matrix approximation [7] to decision-making systems [8]. However, in the robotics context, AI is mainly used when a high dimensional sensory input is utilized in control [9], simultaneous localization and mapping [10], indoor positioning [11], or trajectory learning [12, 13]. One of the main challenges for utilizing state-of-the-art DL models in robotics is related to processing time requirements. Keeping in mind that all robotic algorithms need to be implemented in the real-world setting, where processing power is limited by robot hardware, DL models need to be able to fulfill the time-sensitive requirements of embedded robotic processes. In the beginning, the accuracy of the utilized CNN models was the only relevant metric researchers considered, and therefore the trend was to utilize larger models. Larger models not only require more time, energy, and resources to train but are also impossible to implement in real-world time-sensitive applications. Moreover, one major wake-up call was the realization that the energy utilized for training one of the largest DL models for natural language processing [14] was around 1,287 MWh [15], whereas a detailed analysis of the power usage and pollution generated from training DL models was shown in [16]. Having that in mind, the researchers started exploring the models that do not utilize a large amount of power for training, as well as the models that are usable in real time. As it can be concluded from the previous elaboration, motivation for the research in this chapter is in the possible utilization of highly accurate DL models within robotic domain. Particularly, the DL models that provide effective tool for the mobile robot

Efficient Machine Learning of Mobile Robotic Systems Based …

3

perception system to further enhance the understating of the robot’s surroundings and utilize that information for further decision-making will be considered. The objective of the research presented in this chapter is to identify how large (in terms of number of parameters and layers) the CNN model needs to be to achieve high accuracy for semantic segmentation task, while having practical value in terms of capability for its implementation to computationally restricted Nvidia Jetson Nano board. The main contributions of this chapter include the analysis of the efficiency of developed DL models implemented on Nvidia Jetson Nano board, within mobile robot RAICO (Robot with Artificial Intelligence based COgnition) for real-world evaluation. Particularly, efficient CNN models with different levels of computational complexity are trained on well-known dataset and tested on mobile robot RAICO. Different from other approaches (see e.g., [17–19]) that usually utilize AI-boards with higher level of computation recourses or even high end GPUs (that are much harder to integrate within robotic systems), the authors analyze the models that have far lower computational complexity and that are implementable on Jetson Nano. After the model selection process and thorough analysis of achieved results, the best CNN model is implemented within the obstacle avoidance algorithm of mobile robot RACIO for experimental evaluation within real robotic algorithm. The chapter outline is as follows. The formulation and initial analysis of the problem at hand are given in Sect. 2. The related work regarding the efficient DL models in robotic domain is presented in Sect. 3. Section 4 includes the implementation details of different efficient DL models. The methodology for mobile robot obstacle avoidance algorithm based on DL is considered in Sect. 5. Section 6 is devoted to the analysis of experimental results and followed by discussion of achieved results presented in Sect. 7. Section 8 has concluding remarks with future research directions.

2 Problem Analysis and Formulation The efficiency of CNN models is measured in FLoating Point Operations (FLOPs). FLOPs represent the number of computational operations that need to be performed for a model to produce an output for a given input. If the CNN models are compared and tested on the same hardware platform, inference time (time required for the CNN to produce an output) can also be utilized for their comparison. In terms of robotic applications, inference time (e.g., 0.02 s) can be a more informative metric since its utilization gives a human-understandable estimate of the model speed. When a model is implemented on a robotic system, inversion of inference time (also known as Frames Per Second–FPS) is also provided as a standard metric. The models with FPS above 30 are usually considered as real-time models. On the other end of the spectrum, CNN efficiency can also be analyzed in terms of training time. Since the focus of this chapter is on deep learning for robotic applications, this type of analysis will not be discussed further (the interested reader is referred to literature sources [15]) and model efficiency will be focused solely on inference time.

4

M. Petrovi´c et al.

Efficient general purpose (backbone) CNNs that can be adapted to multiple computer vision tasks have started emerging as a popular research topic. The development of novel efficient CNN models will be shown through an analysis of the three popular models. One of the first efficient CNN models was proposed in [20] and entitled SqueezeNet. The authors proposed to minimize the number of parameters in the model by using mainly convolutional layers with 1 × 1 and 3 × 3 filters. Moreover, high accuracy was achieved by down-sampling feature maps within later layers in the network. The network is defined with so-called fire modules that contain two 1 × 1 convolution and one 3 × 3 convolution layer combined with ReLU activations. The resulting network has 50 × fewer parameters than AlexNet, while achieving the same accuracy. In [21], the authors developed procedure to more efficiently implement convolution and named their model MobileNet. The number of parameters utilized for standard 3 × 3 convolution layers (and also a number of FLOPs) can be greatly reduced by using depthwise and pointwise convolution layers instead. For example given in Fig. 1, the following equations demonstrate the difference between the number of parameters for standard convolution (1) and MobileNet convolution (2): Pc = Fwc Fhc Nc Mc = 3 · 3 · 5 · 7 = 315

(1)

Pm = Pd + Pp = Fwd Fhd Nd Md + Fw p Fhp N p M p = 3 · 3 · 1 · 5 + 1 · 1 · 5 · 7 = 80 (2) where P is the number of parameters, F w , F h are the width and height of the filter, N is the number of channels (depth) of the input feature maps, and M is the number of filters used; all the parameters have additional index that shows which layer they represent, c—standard convolution, d—depthwise convolution, p—pointwise convolution, m—MobileNet. The difference between the standard and MobileNet convolutional layer can be graphically seen in Fig. 1. Moreover, Eqs. (3) and (4) represent the difference between a number of FLOPs (without bias, padding is 0, and stride is 1) utilized for these two convolution layers, Fc = Fwc Fhc Nc Dw Dh Mc = 3 · 3 · 5 · 10 · 7 · 7 = 22050,

(3)

Fm = Pd + Pp = Fwd Fhd Nd Md Dw Dh + Fw p Fhp N p M p Dw Dh = 3 · 3 · 1 · 5 · 10 · 7 + 1 · 1 · 5 · 7 · 10 · 7 = 3150 + 2450 = 5600,

(4)

where F represents the number of FLOPs, Dw and Dh are width and height of the output feature map, with the same notation as in (1) and (2). As it can be seen, both memory footprint (according to the number of parameters) and inference time according to the FLOPs are four times lower for the MobileNet convolution in the

Efficient Machine Learning of Mobile Robotic Systems Based …

Input feature map

Input feature map

5

Convolution layer with 7 (3×3) filters

Depthwise convolution layer with 5 (3×3) filters

Pointwise convolution layer with 7 (1×1) filters

Fig. 1 Difference between standard and depthwise separable convolution process

considered example. For larger layers, the difference can be even more significant (up to 8 or 9 times [21]). Another efficient general-purpose CNN model is ShuffleNet [22]. In the same manner as MobileNet, ShuffleNet utilizes depthwise and pointwise convolution layers. Differently, it utilizes a group convolution to further reduce the number of FLOPs. Additionally, the model also performs the channel shuffle between groups to increase the overall information provided for feature maps. ShuffleNet achieves better accuracy than MobileNet while having the same number of FLOPs. Both MobileNet and ShuffleNet have novel versions of their models to further improve their performance ([23, 24]).

3 Efficient Deep Learning for Robotics—Related Work In the best-case scenario, the developed DL models applied in robotic applications should be able to achieve real-time specifications by utilizing a small embedded device. One of the most popular family of AI-embedded devices is NVidia Jetson, and since many DL models are tested on these devices (including our models), their specifications are given in Table 1.

6

M. Petrovi´c et al.

Table 1 Embedded NVidia Jetson devices Jetson device Nano

TX2

Xavier NX

Processing power

0.472 GFLOPS

1.33 TFLOPS

21 TOPS

GPU

Maxwell (128 cores)

Pascal (256 cores)

Volta (384 cores)

RAM

4 GB

8 GB

8 GB

Power

10 W

15 W

15 W

Different CNN models have been developed for various computer vision applications. Therefore, the following related work is divided into two sections based on computer vision tasks that are utilized in robotic applications.

3.1 Efficient Deep Learning Models for Object Detection in Robotic Applications The first frequently utilized computer vision application in robotics is object detection. Object detection represents the process of finding a specific object in the image, defining its location with the bounding box, and classifying the object into one of the predefined classes with the prediction confidence score. The most common efficient detection networks are analyzed next. The faster R-CNN [25] represents one of the first detection CNN models that brought the inference time so low that it encouraged the further development of real-time detection models. Nowadays, detection models can be used in real-time with significant accuracy, e.g., YOLO [26] and its variants (e.g., [27, 28]), and SSD [29]. Object detection-based visual control of industrial robot was presented in [30]. The authors utilized faster R-CNN model in conjunction with an RGBD camera to detect objects and decide if the object was reachable for a manipulator. The human–robot collaboration based on hand gesture detection was considered in [31]. The authors improved SSD network by changing VGG for Resnet backbone and adding an extra feature combination layer. The considered modifications improved the detection of hand signs even when the human was far away from the robot. In [32], the authors analyzed dynamic Simultaneous Localization And Mapping (SLAM) based on SSD network. Dynamic objects were detected to enhance the accuracy of the standard visual ORB-SLAM2 method by excluding parts of the image that were likely to move. The proposed system significantly improved SLAM performance in both indoor and outdoor environments. Human detection and tracking performed with SSD network were investigated in [33]. The authors proposed a mobile robotic system that can find, recognize, track, and follow (using visual control) a certain human in order to achieve human–robot interaction. The authors of [34] developed a YOLOv3based bolt position detection algorithm to infer the orientation of the pallets the

Efficient Machine Learning of Mobile Robotic Systems Based …

7

industrial robot needs to fill up. The YOLOv3 model was improved by using a kmeans algorithm, a better detector, and a novel localization fitness function. The human intention detection algorithm was developed in [35]. The authors utilized YOLOv3 for object detection and LSTM ANN for human action recognition. CNNs were integrated into one human intention detection algorithm and robot decisionmaking system utilizes that information for decision-making purposes.

3.2 Efficient Deep Learning Models for Semantic Segmentation in Robotic Applications The second common computer vision task that is utilized within robotic applications is semantic segmentation (e.g., [36]). Semantic segmentation represents the process of assigning (labeling) every pixel in the image with an object class. The accuracy of DL models for semantic segmentation can be represented either in pixel accuracy or mean Intersection over Union (mIoU) [37]. Few modern efficient CNN models for semantic segmentation are analyzed next, following by the ones that are integrated into robotic systems. The first analyzed CNN model was ENet [17]. The authors improved the efficiency of ResNet model by adding a faster reduction of feature map resolution with either max-pooling or convolution with stride 2. Moreover, the batch normalization layer was added after each convolution. The results showed that ENet achieved mIoU accuracy close to state-of-the-art while having a much lower inference time (e.g., 21 FPS on Jetson TX1). Another efficient CNN model entitled as ERFNet was proposed in [18]. ERFNet further increased the efficiency of a residual block by splitting 2D convolution into two 1D convolution layers. Each n × n convolution layer was split into 1 × n, followed by ReLU activation and another n × 1 convolution. ERFNet achieved higher accuracy than ENet, at the expense of some inference time (ERFNet– 11 FPS on TX1). The authors of [38] proposed attention-based CNN for semantic segmentation. Fast attention blocks represent the core improvement of the proposed paper. The ResNet was utilized as a backbone network. The utilized network was evaluated on the Cityscape dataset, where it achieved 75.0 mIoU, while being implemented on Jetson Nano and achieving 31 FPS. The authors of [39] proposed a CNN model that integrates U-net [40] with ResNet’s skip connection. The convolution layers were optimized with CP decomposition. Moreover, the authors proposed an iterative algorithm for fine-tuning the ratio of compression and achieved accuracy. At the end, the compressed network achieved astonishingly low inference time (25 FPS–Jetson Nano), with decent mIoU accuracy. The authors in [19] proposed to utilize CNNs for semantic segmentation of crops and weeds. The efficiency of the proposed network was mainly achieved by splitting the residual 5 × 5 convolution layer into the following combination of convolution layers 1 × 1—5 × 1—1 × 5—1 × 1. The proposed system was tested on an unmanned agriculture robot with both 1080Ti NVidia GPU (20 FPS) and Jetson TX2 (5FPS). The novel semantic

8 Table 2 Overview of the CNN models utilized for robotic tasks

M. Petrovi´c et al. CNN model

Vision task

Robotic task

R-CNN [30]

Detection

Visual control

ResNet-SSD [31]

Detection

Hand gesture detection used for control

SSD [32]

Detection

Visual SLAM

SSD [33]

Detection

Human detection and tracking

YOLOv3 [34]

Detection

Bolt position detection algorithm

YOLOv3 [35]

Detection

Human intention detection

Mininet [41]

Sem. seg.

Visual SLAM

ERFNet [42]

Sem. seg.

Person detection/free space representation

ResNet50 [43]

Sem. seg.

Visual SLAM

segmentation CNN model (Mininet) was proposed in [41]. The main building block included two subblocks; the first one has depthwise and pointwise convolution where depthwise convolution was factorized into two layers with filter n × 1 and 1 × n, and the second subblock includes Atrous convolution with a factor greater than 1. Both subblocks included ReLU activation and Batch normalization. At the end of the block, both subblocks were summed, and another 1 × 1 convolution was performed. The proposed model achieves high accuracy with around 30FPS on high-end GPU. In regards to the robotic applications, the network was evaluated on efficient keyframe selection for ORB2 SLAM method. The authors in [42] proposed CNN model for RGBD semantic segmentation. The system was based on ResNet18 backbone with decoder that utilized ERFNet modules. Mobile robot had Kinect2 RGBD camera, and the proposed model was implemented on Jetson Xavier. Resulting system was able to have high accuracy person detection with free space representation based on floor discretization. Visual SLAM based on depth map generated by CNN model was considered in [43]. The authors utilized version of ResNet50 that had improved inference time, so that it can be implemented on Jetson TX2 in near real-time manner (16 FPS). Overview of all analyzed efficient CNN models utilized for different robotic tasks can be summarized in Table 2.

4 The Proposed Models of Efficient CNNs for Semantic Segmentation Implemented on Jetson Nano As it can be seen from Sect. 3, numerous CNN models have been proposed for small embedded devices that can be integrated into robotic systems. In Sect. 4, the authors will describe the CNN models that will be trained and deployed to the NVidia Jetson

Efficient Machine Learning of Mobile Robotic Systems Based …

9

Nano single-board computer. Models will be trained on the Cityscapes dataset [44] with images that have 512 × 256 resolution. As a baseline model, we have utilized the CNN network proposed by the official NVidia Jetson instructional guide for inference (real-time CNN vision library) [45]. The network is based on a fully convolutional network with a ResNet18 backbone (entitled ResNet18_2222). Due to the limited processing power, the decoder of the network is omitted, and the output feature map is of lower resolution. The baseline model is created from several consecutive ResNet Basic Blocks (BB) and Basic Reduction Blocks (BRB), see Fig. 2. When the padding and stride are symmetrical, only one number is shown. The number of feature maps in each block is determined by the level in which the block is, see Fig. 3.

Fig. 2 Blocks of layers utilized for ResNet18 and ResNet18_1D architectures

10

M. Petrovi´c et al.

Fig. 3 ResNet18_2222 and ResNet18_1D_2300 architectures

The complete architecture of the baseline and selected 1D model with three levels is presented in Fig. 3. As it can be seen, the architecture is divided into four levels. In the baseline model, each level includes two blocks, either two BB or BB + BRB (defined in Fig. 2). Size of the feature maps is given between each level and each layer. For all architectures, the number of features per level is defined as follows:

Efficient Machine Learning of Mobile Robotic Systems Based …

11

level 1—64 features, level 2—128 features, level 3—256 features, and level 4—512 features, regardless of the number of blocks in each level. The first modification we propose to the baseline model is to change the number of blocks in each level. The intuition for this modification is twofold, (i) a common method of increasing the efficiency of CNN models is rapid reduction of feature maps resolution, and (ii) the prediction mask resolution can be increased by not reducing input resolution (since we do not use decoder). Classes that occupy small spaces (e.g., poles in the Cityscapes dataset) cannot be accurately predicted if the whole image with 256 × 512 resolution is represented by a prediction mask of 8 × 16; therefore, a higher resolution of the prediction mask can increase both accuracy and mIoU measure. The second set of CNN models that are trained and tested include the decomposition of 3 × 3 layer into 1 × 3 and 3 × 1 convolution layers. Two types of blocks—1D Block (DB) and 1D Reduction Block (DRB), created from this type of layer can be seen in Fig. 2. CNN models with 1D blocks are entitled ResNet_1D (or RN_1D). One of the 1D ResNet models is shown in Fig. 3. This model includes only the first three levels with a larger number of blocks per level compared to the baseline model. Since there is one less level, the output resolution is larger with the output mask of 16 × 32. Lastly, the depth-wise separable and pointwise convolution is added into 1D layers to create a new set of blocks (Fig. 4). Additional important parameters for separation block are a number of feature maps at the input and the output of the level.

Fig. 4 Separable convolutional blocks

12

M. Petrovi´c et al.

Fig. 5 ResNet_sep_4400 architectures

Another eight architectures named separable ResNet models (RN_sep) are created using separable blocks. The example of the separable convolutional model with only two levels is shown in Fig. 5.

5 Obstacle Avoidance Algorithm for Mobile Robots Based on Semantic Segmentation After mobile robots receive high-level tasks that need to be performed (see, e.g. [46–48]), a path planning step is required to ensure the safe and efficient execution of tasks. Along the way, new obstacles can be detected in the defined plan; therefore, local planning needs to occur to avoid collisions. In this work, the efficient deep learning system is utilized to generate the semantic map of the environment. Afterward, the semantic map is utilized to detect obstacles

Efficient Machine Learning of Mobile Robotic Systems Based …

13

in the mobile robot’s path. The considered mobile robot system moves in a horizontal plane, and therefore the height of the camera remains the same for the whole environment. Moreover, since the pose and intrinsic camera parameters are known, it is possible to geometrically link the position of each group of pixels produced by the semantic map to the position in the world frame. By exploiting the class of each group of pixels, the mobile robot can determine how to avoid obstacles and reach the desired pose. A mathematical and algorithmic explanation of the proposed system is discussed next. Mobile robot pose is defined by its position and orientation, included in the state vector (5): x = (z, x, θ )T

(5)

where x and z are mobile robot coordinates, and θ is the current heading angle. The camera utilized by the mobile robot RAICO is tilted downwards for the inclination angle α. Camera angles of view in terms of image height and width are denoted as γ h and γ w , respectively. As mentioned in Sect. 4, the output semantic mask is smaller than the input image; therefore, the dimensions of the output mask are defined with its width (W ) and height (H) defined in pixels. The geometric relationships between the output mask and the area in front of the mobile robot, in the vertical plane, can be seen in Fig. 6. If the output semantic mask pixel belongs to the class “floor”, we can conclude that there is no obstacle in that area (e.g., between z1 and z2 in Fig. 6) of the environment,

Fig. 6 Camera geometric information defined in the vertical plane

14

M. Petrovi´c et al.

Fig. 7 Camera geometric information defined in the horizontal plane

and mobile can move to that part of the environment. The same view in the horizontal plane is shown in Fig. 7. In order to calculate the geometric relationships, the first task is to determine the increment of the angle between the edges of the output pixels in terms of both camera width (β w ) and height (β h ) by using (6) and (7): βw = γw /W ,

(6)

βh = γh /H .

(7)

Afterward, starting angles for width and height need to be determined by using (8) and (9): ϕh = (90 − α) − 0.5γh ,

(8)

ϕw = (90 − 0.5γw ).

(9)

Therefore, it is possible to calculate the edges of the area that is covered by each pixel of the semantic map, defined with their z and x coordinates (10) and (11): z i = z c + yc tan(ϕh + (i − 1)βh ), i = 1, ..., H + 1, xi j = xc − z j /tan(ϕw + (i − 1)βw ), i = 1, ..., W + 1, j = 1, ..., H + 1.

(10) (11)

Efficient Machine Learning of Mobile Robotic Systems Based …

15

The example of the generated map by the mobile robot is shown in Fig. 8, where the green area is accessible while obstacles occupy the red areas. The width that a mobile robot occupies while moving is defined by its width B (see Fig. 8). Since the obstacles that are too far away from the robot do not influence the movement, we defined threshold distance D. The area defined with B and D is utilized to determine if the obstacle avoidance procedure needs to be initiated. Therefore, if any of the pixels that correspond to this area include obstacles, the obstacle avoidance procedure is initiated. The whole algorithm utilized for both goal-achieving behavior and obstacle avoidance is represented in Fig. 9.

Fig. 8 Representation of the free and occupied areas in the environment

Fig. 9 State-action transitions algorithm within obstacle avoidance algorithm

16

M. Petrovi´c et al.

Fig. 10 Examples of the trajectory mobile robot will take with and without obstacles

It is assumed that the mobile robot is localized (i.e., the initial pose of the mobile robot is known), and the desired pose is specified. Therefore, the initial plan is to rotate the mobile until it is directed to the desired position and perform translation until it is achieved. If an obstacle is detected within the planned path (according to the robot width), the mobile robot performs an additional obstacle avoidance strategy before computing new control parameters to achieve the desired pose. There are five states (S1 –S5 ) in which a mobile robot can be and two actions it can take. The actions that mobile robot performs are translation or rotation. At the start of the movement procedure, the mobile robot is in state S1 , which indicates that rotation to the desired pose needs to be performed. After the rotation is finished, obstacle detection is performed. If the obstacle is not detected (O = 0), the mobile robot transitions to state S2 ; otherwise, it transitions to state S3 . Within state S2 , the mobile robot performs translational movement until the desired position is reached or until the dynamic obstacle is detected in the robot’s path. In state S3 , the mobile robot calculates a temporary goal (new goal) and rotates until there is no obstacle in its direction. Afterward, it transitions to state S4 , where the mobile robot performs translation until the temporary goal is achieved or a dynamical obstacle is detected. If the translation is completed, the mobile robot starts rotating to the desired position (S1 ). On the other hand, if the obstacle is detected in the S4 , the robot transitions to S3 and generates a new temporary goal. The obstacle avoidance process is performed until the mobile robot achieves state S5 , indicating the desired pose’s achievement. An example of the mobile robot’s movement procedure with and without obstacles is shown in Fig. 10.

6 Experimental Results The experimental results are divided into two sections. The first includes the results of training of deep learning models, while the second one involves utilizing the best model within an obstacle avoidance algorithm. All CNN models are trained and tested on the same setup to ensure a fair comparison. Models have been trained on the Cityscapes dataset [44] with input images of 512 × 256 resolution. Low-resolution images are selected since the used NVidia

Efficient Machine Learning of Mobile Robotic Systems Based …

17

Jetson Nano has the lowest level of computation power out of all NVidia Jetson devices (see Table 1). Models are trained on a deep learning workstation with three NVidia Quadro RTX 6000 GPUs and two Xeon Silver 4208 CPUs using the PyTorch v1.6.0 framework. All analyzed models are compared based on two metrics, the mIoU and FPS achieved on Jetson Nano. At the same time, global accuracy and model size are also shown to compare the utilized models better. All networks are converted to TensorRT (using ONNX format) with FP16/INT8 precision to increase the models’ inference time. Table 3 includes all the variations of all three CNN models, whose detailed explanation is provided in Sect. 4. The experiment is proposed not to change the number of used blocks for each network but only to change their position within four levels. Since the networks need to be tested on a real-world mobile robot instead of FLOPs, we compare the efficiency of the networks in FPS. Also, the network size in MB is also provided. The CNN with the best mIoU value is the model RN_8000. The model with the lowest memory footprint is RN_1D_8000, with its size being only 1.6 MB. The model with the fastest inference time represented in FPS is RN_sep_1115. However, since the primary motivation for these experiments was to determine the best network in regards to the ratio of FPS and mIoU, the network selected for utilization in the obstacle avoidance process is RN_2600, since it achieves both a high level of accuracy and number of FPS. The primary motivation for training the CNNs on the Cityscapes dataset is its popularity and complexity. Afterward, the selected network is trained again on the Sun indoor dataset [49] to be used in mobile robot applications. By utilizing the algorithm proposed in Sect. 5, the obstacle avoidance ability of the mobile robot RAICO is experimentally evaluated (Fig. 11). Mobile robot is positioned on the floor within the ROBOTICS&AI laboratory. Mobile robot is set to initial pose x = (0, 0, 0), while the desired pose is set to be xd = (600,100,-0.78). The change in pose of the mobile robot is calculated according to the dead-reckoning odometry by utilizing wheel encoders [50]. A spherical obstacle is set to a position (300, 50) with a diameter of roughly 70 mm. The proposed algorithm is started, and the mobile robot achieves the trajectory shown in Fig. 12. Mobile robot representation is shown with different colors for different States (see Sect. 5), S1 is red, S2 is blue, S3 is dark yellow, and S4 is dark purple. Moreover, the obstacle is indicated with a blue-filled circle. Desired and final positions are shown with black and green dots, respectively. Moreover, the selected images mobile robot acquired and semantic segmentation masks generated by the CNN overlayed over the image can be seen in Fig. 13. As it can be seen, segmentation of floor (red), wall (teal), chairs (blue), tables (black), and objects (yellow) is performed well, with precise edges between mentioned classes. By utilizing accurate semantic maps, mobile robot was able to dodge the obstacle, and successfully achieve the desired pose. Now, we show Fig. 14 with four examples of the influence of the semantic maps on free and occupied areas in the environment generated during the experimental evaluation. The green area is free, and it corresponds to the floor class, while the redoccupied area corresponds to all other classes. In the first image, the mobile robot

18

M. Petrovi´c et al.

Table 3 Experimental results for all CNN models Num

Model title

Blocks per level

Output resolution

Model size FPS [mb]

mIoU [%]

Accuracy [%]

1

RN_2222

[2 2 2 2]

8 × 16

46.0

44.7

28.674

77.816

2

RN _1133

[1 1 3 3]

8 × 16

67.6

46.0

27.672

77.255

3

RN _1115

[1 1 1 5]

8 × 16

95.3

46.8

27.060

76.541

4

RN _2330

[2 3 3 0]

16 × 32

17.3

43.4

36.091

81.543

5

RN _1160

[1 1 6 0]

16 × 32

28.5

44.7

34.508

81.089

6

RN _4400

[4 4 0 0]

32 × 64

5.7

41.9

40.350

83.484

7

RN _2600

[2 6 0 0]

32 × 64

7.5

42.6

40.668

83.651

8

RN _8000

[8 0 0 0]

64 × 128

2.4

38.7

40.804

84.589

9

RN _1D_2222

[2 2 2 2]

8 × 16

33.7

41.4

28.313

77.236

10

RN _1D_1133

[1 1 3 3]

8 × 16

48.2

41.7

27.304

77.115

11

RN _1D_1115

[1 1 1 5]

8 × 16

66.6

42.2

26.854

76.420

12

RN _1D_2330

[2 3 3 0]

16 × 32

12.3

40.9

34.819

80.928

13

RN _1D_1160

[1 1 6 0]

16 × 32

19.8

41.3

33.937

80.693

14

RN _1D_4400

[4 4 0 0]

32 × 64

4.0

40.2

39.065

83.304

15

RN _1D_2600

[2 6 0 0]

32 × 64

5.2

40.8

39.212

83.061

16

RN _1D_8000

[8 0 0 0]

64 × 128

1.6

37.7

37.947

84.058

17

RN_sep_2222

[2 2 2 2]

8 × 16

25.6

46.4

28.901

78.000

18

RN_sep_1133

[1 1 3 3]

8 × 16

41.3

48.0

28.563

77.383

19

RN_sep_1115

[1 1 1 5]

8 × 16

61.3

48.7

27.909

77.250

20

RN_sep_2330

[2 3 3 0]

16 × 32

10.7

42.9

36.452

81.834

21

RN_sep_1160

[1 1 6 0]

16 × 32

18.8

44.7

35.172

81.018

22

RN_sep_4400

[4 4 0 0]

32 × 64

3.8

39.1

39.990

83.894

23

RN_sep_2600

[2 6 0 0]

32 × 64

5.0

40.1

39.529

83.125

24

RN_sep_8000

[8 0 0 0]

64 × 128

1.8

34.5

38.010

84.062

detects the obstacle in its path and then rotates to the left until robot can avoid an obstacle. The second image corresponds to the moment the obstacle almost leaves the robot’s field of view due to its translational movement. The third image represents the moment at the end of the obstacle avoidance state, and the last image is generated near the final pose. By analyzing images in the final pose, it can be seen that mobile robot accurately differentiate between free space (floor) in the environment and (in this case) the wall class that represents the occupied area. This indicates that it is possible to further utilize CNN models in conjunction with proposed free space detection algorithm to define the areas in which mobile robot can safely perform desired tasks.

Efficient Machine Learning of Mobile Robotic Systems Based …

19

Fig. 11 Mobile robot RAICO with the obstacle in the environment

Mobile robot trajectories

Fig. 12 Real trajectory mobile robot achieved

400

X axis [mm]

300 200 100 0 -100

0

200

400

600

Z axis [mm]

7 Discussion of Results The experimental results are divided into two sections, one regarding finding the optimal CNN model in terms of both accuracy and inference speed and the other regarding experimental verification with the mobile robot. Within the first part of the experimental evaluation, three types of CNN models with a different number of layers in each level are analyzed. The experimental results show that the best network (RN_8000) in terms of accuracy is the one with all layers concentrated in the first

20

Fig. 13 Mobile robot perception during movement

M. Petrovi´c et al.

Efficient Machine Learning of Mobile Robotic Systems Based …

21

Fig. 14 Mobile robot obstacle detection during movement

level. This type of CNN has the highest output resolution, which is the main reason why it provides the overall best results. Moreover, the general trend is that networks with fewer levels have higher accuracy (the best CNN model has 40.8 mIoU and 84.6 accuracy). Regarding the model size, it is shown that networks with many layers concentrated in the fourth level occupy more disk space (the largest model occupies 96 MB of disk space compared to the smallest model, which occupies only 1.6 MB).

22

M. Petrovi´c et al.

The main reason for this occurrence is that the higher levels include a larger number of filters. However, the largest CNN models also have the lowest inference time and, therefore, the highest number of FPS, reaching even 48.7 average FPS. Moreover, it is shown that both proposed improvements of the network (depth separable and 1D convolution) show a marginal decrease in inference time at the expense of a slight decrease in accuracy. By exploiting the ratio between the accuracy and inference time, the RN_2600 CNN model is utilized in the experiment with a mobile robot. This network achieved the second-best accuracy and is 4 FPS faster than the network with the best accuracy. Moreover, since modern SSD or micro-SD cards are readily available with disk spaces much larger than the proposed size of the models in MB, it can also be concluded that the disk size all models occupy are is not a substantial restriction. On the other hand, the main achievement of this chapter is shown through the experiment with the mobile robot RAICO as it performed obstacle avoidance to demonstrate a case study for the utilization of an accurate and fast CNN model. The model is employed within the obstacle detection and avoidance algorithm. The output of the network is processed and generates the output semantic segmentation masks. Afterward, the geometric relationship between the camera position and its parameters is utilized to determine the free area in the environment. If the obstacle is detected close to the mobile robot path, the algorithm transitions the mobile robot from goal achieving states to obstacle avoidance states. Mobile robot avoids obstacle and transitions back to the goal-achieving states, all while checking for new obstacles in the path. Experimental evaluation reinforces the validity of the proposed algorithm, which can, in conjunction with the CNN model, successfully avoid obstacle and achieve the desired position with a satisfactory error. Moreover, since the proposed algorithm has shown accurate free space detection, it can be further utilized within other mobile robotic tasks.

8 Conclusion In this work, we propose applying an efficient deep learning model employed for the obstacle avoidance algorithm. The CNN model is used in real-time on the Jetson Nano development board. The utilized CNN model is inspired by the ResNet model integrated with depth separable convolution and 1D convolution processes. We proposed and trained 24 variants of CNN models for semantic segmentation. The best model is selected according to the ratio of mIoU measure and the number of FPS it achieves on Jetson Nano. The selected model is RN_2600 with two levels of layers and it achieves 42.6 FPS with 40.6 mIoU. Afterward, the selected CNN model is employed in the novel obstacle avoidance algorithm. Within obstacle avoidance, the mobile robot has four states. Two states are reserved for goal achieving and two for obstacle avoidance purposes. According to the semantic mask and known camera pose, the area in front of the mobile is divided into free and occupied sections. According to those areas, mobile robot transitions between goal-seeking and obstacle avoidance

Efficient Machine Learning of Mobile Robotic Systems Based …

23

states during the movement procedure. The experimental evaluation shows that the mobile robot managed to avoid obstacle successfully and achieve the desired position with an error in the Z direction of −15 mm, and 23 mm in the X direction, generated according to the wheel encoder data. Further research directions include the adaptation of proposed CNN modes and their implementation on an industrial-grade mobile robot with additional computational resources. The proposed method should be a subsystem of the entire mobile robot decision-making framework. Acknowledgements This work has been financially supported by the Ministry of Education, Science and Technological Development of the Serbian Government, through the project “Integrated research in macro, micro, and nano mechanical engineering–Deep learning of intelligent manufacturing systems in production engineering”, under the contract number 451-03-47/202301/200105, and by the Science Fund of the Republic of Serbia, Grant No. 6523109, AI-MISSION4.0, 2020-2022.

Appendix A Abbreviation List RAICO AI ML ANN DL CNN FLOPs FPS VGG R-CNN SSD YOLO SLAM LSTM RGBD BN ReLU BB BRB DB DRB SB

Robot with Artificial Intelligence based COgnition Artificial Intelligence Machine Learning Artificial Neural Networks Deep Learning Convolutional Neural Network FLoating Point Operations Frames Per Second Visual Geometry Group Region–Convolutional Neural Network Single Shot Detector You Only Look Ones Simultaneous Localization And Mapping Long-Short Term Memory Red Green Blue Depth Batch Normalization Rectified Linear Unit Basic Block Basic Reduction Block 1D Block 1D Reduction Block Separation Block

24

SRB RN mIoU ONNX

M. Petrovi´c et al.

Separation Reduction Block ResNet Mean Intersection over Union Open Neural Network eXchange.

References 1. Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012) ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105). 2. Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual understanding: A review. Neurocomputing, 187, 27–48. 3. LeCun, Y., & Bengio, Y. (1995) Convolutional networks for images, speech, and time series. Handbook brain theory neural networks (Vol. 3361, no. 10). 4. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). 5. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations (pp. 1–14). 6. Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1251–1258). 7. Nguyen, H., & Cheah, C.C. (2022). Analytic deep neural network-based robot control. IEEE/ASME Transactions Mechatronics (pp. 1–9). 8. Joki´c, A., Petrovi´c, M., & Miljkovi´c, Z. (2022). Mobile robot decision-making system based on deep machine learning. In 9th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN 2022) (pp. 653–656). 9. Miljkovi´c, Z., Miti´c, M., Lazarevi´c, M., & Babi´c, B. (2013). Neural network reinforcement learning for visual control of robot manipulators. Expert Systems with Applications, 40(5), 1721–1736. 10. Miljkovi´c, Z., Vukovi´c, N., Miti´c, M., & Babi´c, B. (2013). New hybrid vision-based control approach for automated guided vehicles. International Journal of Advanced Manufacturing Technology, 66(1–4), 231–249. 11. Petrovi´c, M., Ci˛ez˙ kowski, M., Romaniuk, S., Wolniakowski, A., & Miljkovi´c, Z. (2021). A novel hybrid NN-ABPE-based calibration method for improving accuracy of lateration positioning system. Sensors, 21(24), 8204. 12. Miti´c, M., Vukovi´c, N., Petrovi´c, M., & Miljkovi´c, Z. (2018). Chaotic metaheuristic algorithms for learning and reproduction of robot motion trajectories. Neural Computing and Applications, 30(4), 1065–1083. 13. Miti´c, M., & Miljkovi´c, Z. (2015). Bio-inspired approach to learning robot motion trajectories and visual control commands. Expert Systems with Applications, 42(5), 2624–2637. 14. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems (Vol. 33, pp. 1877–1901). 15. Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., & Dean, J. (2021). Carbon emissions and large neural network training (pp. 1–22). arXiv:2104.10350. 16. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. arXiv:1906.02243.

Efficient Machine Learning of Mobile Robotic Systems Based …

25

17. Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). ENet: a deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147. 18. Romera, E., Alvarez, J. M., Bergasa, L. M., & Arroyo, R. (2017). Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1), 263–272. 19. Milioto, A., Lottes, P., & Stachniss, C. (2018) Real-time semantic segmentation of crop and weed for precision agriculture robots leveraging background knowledge in CNNs. In 2018 IEEE International Conference on Robotics and Automation (pp. 2229–2235). 20. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., & Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and , S is the system state space set, A is the system action space set, T : S × A → S is the state transition probability function and R is the reward function of the system. Therefore, t the immediate reward obtained by the system when the system rt = R(st , at , st+1 ) performs an action at ∈ A in the state and st ∈ S transfers to the state at the moment is st+1 ∈ S. The interactive process can be seen in Fig. 5 [39]. It can be seen more clearly from the above figure that when the agent performs a certain task, it first obtains the current state information by perceiving the environment, then selects the action to be performed, and interacts with the environment to generate a new state, and at the same time The environment gives a reinforcement signal for evaluating the pros and cons of the action, that is, the reward, and so on. The process of the reinforcement learning algorithm is to execute the strategy and the environment to achieve new state data, and use the new data to modify its own behavior strategy under the guidance of the reward function. After many iterations, the agent will learn to complete the task. The optimal behavior policy required. Solving the Markov decision problem refers to solving the distribution of behaviors in each state so that the accumulated reward is maximized. For the model-free method with unknown environment, the method based on the value function is generally adopted, and only the state value function is estimated during the solution, and the optimal strategy is obtained in the iterative solution time of the value function. The DQN method used in this chapter is a method based on value function.

UAV Path Planning Based on Deep Reinforcement Learning

43

2.4 DQN Algorithm This section first analyzes the principle of the DQN algorithm, and then proposes some improvements to the DQN algorithm according to the characteristics of the UAV path planning task. It mainly studies and analyzes the boundaries and goals of the interaction between the agent and the environment, so as to establish a deep reinforcement learning model design that meets the requirements of the task. Reinforcement learning realizes the learning of the optimal strategy by maximizing the cumulative reward. Formula (2) is a general cumulative reward model, which represents the future cumulative reward value of the agent executing the strategy from time t. Rt =

n 

γ k−t rk

(2)

k=t

where: γ ∈ [0, 1] is the discount rate, which is used to adjust the reward effect of the future state on the reward at the current moment [39]. In the specific algorithm design, the reinforcement learning model establishes a value function based on the expectation of accumulated reward to evaluate the policy. Value functions include action value functions Q π (s, a) and state value functions V π (s). The action value function of Q π (s, a) formula (3) reflects the expected reward value of completing the action a from the state s execution strategy π; the state value function of V π (s) formula (4) reflects the expected reward value when the state s executes the strategy π. Q π (s, a) = E π {Rt |st = s, at = a}

(3)

V π (s) = E π {Rt |st = s}

(4)

Reinforcement learning to obtain the maximum cumulative return is equivalent to maximizing the value function Q π (s, a) and V π (s) process. The optimal action value function Q ∗ (s, a) and optimal state value function are defined V ∗ (s) as formulas (5) and (6): Q ∗ (s, a) = max Q π (s, a)

(5)

V ∗ (s) = max V π (s)

(6)

π

π

Equation (3) into Eq. (5) and express it in the iterative form shown in (8): Q ∗ (s, a) = max E π {Rt |st = s, at = a} π

(7)

44

R. Dong et al.

      = E π R s, a, s  + γ max Q ∗ s  , a  |s, a i

(8)

In the formula: s  and a  are s the successor states and actions of and, respectively, and formula (8) is called Q ∗ (s, a) the a Bellman optimal equation. It can be seen that the value function includes the value function value of the next moment and the immediate reward value, which means that when reinforcement learning evaluates the strategy at a certain moment, it also considers the value of the current moment and the range of future moments. The long-term value, that is, the possible cumulative reward; this also avoids the limitations of the model, and avoids only focusing on the size of the immediate reward and ignoring the long-term value, which is not the optimal strategy choice. The Bellman equation iteratively finds the MDP, and then obtains the Q ∗ (s, a) optimal action policy function π ∗ (s) shown in the formula (9) [39]: π ∗ (s) = arg max Q ∗ (s, a) a

(9)

In order to solve the continuous or large-scale discrete state-action space limitation of traditional reinforcement learning, deep learning technology is introduced, and the neural network is used to fit the value function, for example: the objective cost function is estimated by the neural network of formula (10) Q ∗ (s, a), L i (θi ) = E (yi − Q(s, a; θi )2 }

(10)

where: θ —the parameters of the neural network. fitting objective yi is shown in formula (11):     yi = R s, a, s  + γ max Q s  , a  ; θi−1 a

(11)

In the process of modeling and fitting the value function, the neural network is very sensitive to the sample data; while the sequence data samples output by the reinforcement learning execution strategy have strong correlation, which seriously affects the fitting accuracy of the neural network and makes the iterative optimization process of the strategy fall into Local minima even lead to non-convergence problems. In order to overcome the above problems, deep reinforcement learning algorithms generally use experience replay technology to weaken the coupling between the data extraction process and the policy optimization process and remove the correlation between sample data. Specifically, in the reinforcement learning process, the data obtained by the interaction between the agent and the environment are temporarily stored in the database as shown in Fig. 6, and then the data is obtained through random sampling to guide the neural network update. In addition, in order to obtain the optimal policy, it is desirable to obtain as high a reward value as possible when performing the policy selection action, and also hope that the model has a certain search ability to ensure the state space search

UAV Path Planning Based on Deep Reinforcement Learning

45

Fig. 6 Experience playback database

ability. Obviously, the traditional greedy method cannot take into account the above requirements, so soft strategies are generally used for action selection. ε-The greedy action selection method is: when executing an action, (1 − ε) select a high-value action according to π ∗ (s) the probability; ε randomly select the search action space with probability, the mathematical expression is as formula (12):  πε (s) =

probability 1 − ε π ∗ (s), Random selection a ∈ A, probability ε

(12)

The DQN algorithm finally obtained is shown in Table 2. Among them, M is the maximum number of training steps, the subscript j represents the serial number of the state transition sample in Nbatch the small batch sample set of, is the si environmental state of the mobile robot, ai is the executable action in the state space, and D is the experience playback pool.

3 Design of Improved DQN Algorithm Combined with Artificial Potential Field In the DQN algorithm, the state transition probability is not considered, only the description of the state space, action space and reward function of the agent is considered, and these elements should be designed according to specific tasks [40]. For the path planning reinforcement learning task, it is first necessary to design a state space based on sensor parameters, an action space based on UAV motion characteristics, and a reward function based on path planning characteristics to build a UAV path planning reinforcement learning system. In order to improve the robustness of the UAV’s path planning in an unknown environment and improve the learning efficiency of UAV path planning, based on the obstacle position information and target position information of the environment where the UAV is located, the design is suitable for path planning tasks. The reward function of, fully considers the influence of position and orientation on the reward function. In addition, this chapter supplements and improves the reward function based on the problem and idea that the range repulsion

46

R. Dong et al.

Table 2 DQN algorithm pseudo code

affects the path planning of the artificial potential field method: through the analysis of the motion collision of the UAV, the supplementary establishment of the direction penalty obstacle avoidance function is more effective for the UAV’s movement. Evaluation, guiding the UAV to quickly reach the target position under effective obstacle avoidance conditions.

3.1 Network Structure Design Since the DQN method generally overestimates the Q value of the behavior value function, there is an over-optimization problem. The overall estimated value function is larger than the real value function, and the error will increase with the increase of the number of behaviors, as shown in Fig. 7, generally using two network Q Network

UAV Path Planning Based on Deep Reinforcement Learning

47

and Target Q Network, which implements behavior selection and behavior evaluation with different value functions. The two network structure models are exactly the same. In order to ensure the convergence and learning ability of the network training process, the parameter update speed of the Target Q network is slower than that of the Online Q network. Target in this section Q The network is updated every 300 steps by default, and can be adjusted according to actual training needs. Because in the actual learning and training process, the learning time of the agent or the cost of network training time increases with the increase of the model complexity. Therefore, this chapter designs a network structure with low model complexity that can meet the task requirements and uses the Keras hierarchical structure to build it; in order to avoid the over-fitting phenomenon of the model, random deactivation Dropout is added after the fully connected layer. Based on the above theories, considering that the input data size of the deep network network model is the characteristic state size of the regional perception centered on the drone, and adapting to the network input size and scale, the final network model consists of three fully connected layers and one hidden layer. constitute. According to the actual requirements of the mobile robot control task, the characteristic state of the robot is used as the input of the network, and the network expects to output the Q value of 7 actions, and at the same time select the action with the largest Q value to execute. The network model is shown in Fig. 8.

Fig. 7 Graph

48

R. Dong et al.

Fig. 8 Schematic diagram of network structure

3.2 State Space Design The distance from the UAV to surrounding obstacles is the most intuitive indicator to reflect the environmental state of the UAV. In this chapter, the current environmental state is detected based on the distance information from the UAV to the surrounding obstacles, and the lidar is used as a sensor to detect the relative position and distance between the UAV and the obstacles. Therefore, the detection distance information diagram as shown in Fig. 9 is designed. The feature state information is mainly composed of three parts: sensor feedback information, target position information and motion state information, forming a Fig. 9 Schematic diagram of distance perception

UAV Path Planning Based on Deep Reinforcement Learning

49

standard finite Markov process, so that deep reinforcement learning can be used to deal with this task. The state space is represented by a one-dimensional array formed by the distance values of the lidar; considering that there will be at least 360 depth value channels in the circular area with the UAV as the center when the lidar detects, which is not required in the actual task of this chapter So many depth values and too large state space will increase the computational cost and weaken the learning ability of the model. Therefore, in actual lidar detection, as shown in Fig. 9, this chapter sets the sampling interval of lidar to 15°, and finally obtains the down-sampled lidar data array. Specifically, the length of the lidar data array is 2 4, the first interval represents the forward direction of the UAV, and the distance and included angle between the UAV and the obstacle position are added; the state space format is shown in formula (13), where state is the state space, which represents the distance value li of the ith interval d corresponding to the lidar, the distance value between the UAV distance and the target zone, a and the angle value between the UAV’s forward direction and the obstacle. state = [l1 , l2 , . . . , l22 , l23 , l24 , d, a]

(13)

During the simulation process, the lidar information is obtained at a fixed frequency, and the lidar data is extracted through the Topic message of ROS. The state space data includes the distance information between the UAV and the target point, and the azimuth information between the UAV and the obstacle. The above information is processed into a multi-dimensional vector, where the value is 2 6, as the state information of the UAV.

3.3 Action Space Design The action space should be designed so that the drone can explore the environment as much as possible for rewards. The UAV controls the heading through the command of the autopilot. By defining the fixed yaw angle change of the UAV, plus the rising and falling speeds, the movement of the UAV can basically cover the entire environment through the speed and angular velocity control. Explore space. Therefore, the action space, that is, the value range of the UAV action, is shown in Table 3. DQN algorithm is discrete, and the UAV’s actions are divided into seven spatially, including fast left turn, left turn, straight ahead, right turn, fast right turn, ascent and descent, and the angular velocities are 1.2, −0.6, 0, 0.6, 1.2 rad /s, and ascent and descent speeds 0.5 m/s and −0.5 m/s. The speed command is sent to the drone at a fixed frequency in time. Through this design, the actual path of the drone is a continuous arc and polyline.

50 Table 3 Action space

R. Dong et al. Action

Angular velocity (rad/s) or velocity (m/s)

0

−1.5 (rad/s)

1

−0.75 (rad/s)

2

0 (rad/s)

3

0.75 (rad/s)

4

1.5 (rad/s)

5

0.5 (m/s)

6

−0.5 (m/s)

3.4 Reward Function Design The flight mission of the UAV is generally sailed according to the planned route, which is composed of multiple waypoints arranged and connected, so the flight mission of the UAV can be decomposed into multiple path planning tasks between multiple waypoint sequences. The UAV starts from the starting point and passes through the designated waypoints in turn. When the UAV in flight encounters an obstacle, if there is a danger of collision, the UAV needs to avoid the obstacle. Reinforcement learning is used for path planning. Since the motion behavior of the UAV is selected from the action space, the path representing the UAV must be flyable. At the same time, the artificial potential field method is introduced into the reinforcement learning algorithm. As shown in Fig. 10, the attractive potential is assigned to the target waypoint, and the repulsive potential is assigned to the obstacle. The multi-rotor will attract the target waypoint and obstacles. Flying under the combined action of the repulsive force, it is shown to fly along the desired path to the target waypoint while avoiding obstacles on the path. The artificial potential field method is embodied in the reward function of reinforcement learning. The following is an introduction to the design of the improved DQN algorithm. Figure 11, the movement of the drone in the surrounding environment is designed as a movement in an abstract electric field, the target point is assumed to be negatively charged, the drone and obstacles are assumed to be positive charges, the target The

Fig. 10 Schematic diagram of artificial potential field path planning

UAV Path Planning Based on Deep Reinforcement Learning

51

point and the drone have different charges, so they have gravity, and the obstacle and the drone have the same charge, so they have repulsion. The movement of the drone is guided by the resultant force in space, which r1 is the distance from the drone to the target point, which r2 is The distance from the UAV to the obstacle is the Q G amount of negative charge assigned to the target point, the amount of negative charge Q O assigned to the obstacle, and the amount of Q U positive charge assigned to the UAV, ka , kb and kc is the proportional coefficient, ϕ is the angle between the gravitational direction and the movement direction of the U B UAV, and is the resultant force received by the UAV; in actual work, in order to avoid the UAV to avoid collisions as much as possible, choose the round-trip motion instead of the target point In the case of motion, the attraction of the target point to the UAV should be greater than the repulsion effect of the obstacle, so the set Q G value is larger than Q O the value to ensure that the UAV can avoid obstacles and reach the target point; When the drone approaches the target point, the gravitational force increases, and when the drone approaches the obstacle, the repulsive force increases; function of the DQN deep reinforcement learning algorithm is expressed as the following three parts: Gravitational reward function (14): RU G  = U G  · ka =

QU QG U G ka r12 |U G|

(14)

In the formula: RU G  —the reward caused U G  by gravity; U G—the gravitational force of the drone;—the vector from the drone to the target point; |U G|—the distance UG —the direction from the drone to the target from the drone to the target point; |U G| point unit vector on. Fig. 11 Schematic diagram of the UAV reward function

52

R. Dong et al.

Repulsion Reward Function (15): RU O  = U O  · kb =

QU Q O U O kb r22 |U O|

(15)

In the formula: RU O  —the reward caused U O  by the repulsion force;—the repulsion force received by the U O UAV;—the vector from the UAV to the obstacle; |U O|— UO —the direction from the UAV to the the distance from the UAV to the obstacle; |U O| target point unit vector on. Direction reward function (16): Rϕ = arccos

(U O  + U G  )U C kc |U O  + U G  ||U C|

(16) 



(U O +U G )U C —the where: U C—the force actually received by the UAV; arccos |U O  +U G  ||U C| angle between the actual motion direction and the expected motion direction; reward function is shown in (17):

R = RU G  + RU O  + Rϕ

(17)

This chapter analyzes the advantages and disadvantages of the three major types of machine learning, supervised learning, unsupervised learning, and reinforcement learning, discusses the training methods and limitations of deep learning and reinforcement learning, introduces the basic theory of deep learning and reinforcement learning, and introduces deep reinforcement. The DQN algorithm in learning focuses on the improvement of the DQN algorithm combined with the artificial potential field algorithm in the classical path planning algorithm, so that it has better performance in the UAV path planning decision task. Constraining the agent to discrete actions makes the algorithm easier to converge at the cost of reduced mobility. Using the fusion method of electric potential field and deep reinforcement learning, obstacles generate repulsive force, and target point generates gravitational force, which is combined with the reward function to guide the drone to reach the target point without collision, so that the algorithm converges quickly. The lightweight network structure is adopted, including three fully connected layers and one hidden layer, which enhances the real-time performance. Finally, the feasibility of the algorithm is verified.

4 Simulation Experiment and Result Analysis This chapter is the training and testing chapter. It applies the improved DQN algorithm to the UAV path planning task, and selects common UAV mission scenarios to design multiple sets of experiments for algorithm feasibility testing, training and

UAV Path Planning Based on Deep Reinforcement Learning

53

verification. According to the training model of the improved DQN algorithm, the feasibility test of indoor path planning in Gazebo environment and the training of dynamic obstacle avoidance were carried out. After completion, it does not rely on the reinforcement learning strategy, and only outputs the action value through the deep neural network.

4.1 Reinforcement Learning Path Planning Training and Testing With the UAV model, the reinforcement learning path planning algorithm and the simulation environment, the system state, action and reward function of the reinforcement learning task are planned according to the UAV path. The process of this experimental study is shown in Fig. 12. First, initialize the UAV and the simulation environment, load the UAV into the simulation environment, and obtain the reward function corresponding to R the state space S of the previous moment and the action space of the previous moment when the UAV flies in the simulation environment A. In addition, the reward function at the current moment is R  stored in the data container, and the data stored in the data container can be updated in real time with the movement of the drone; when the sample size is sufficient, the training process is started, and the decision-making network in DQN is used. Fit the Q value, and select the value in the action space with the highest expected value as the action command of the UAV; when the UAV approaches an obstacle or collides with an obstacle, the R value of the reward function generated is small, and when the UAV approaches the target point Or when the target point is reached, the generated reward function R value is large. As the training progresses, the behavior of the drone will avoid obstacles and reach the target point. When the reward value reaches the requirement or reaches the set number of training steps, save the most The weights and parameters of the optimal deep neural network are then verified in the test environment, such as the reliability of the model. After the training, the weights and parameter files of the network model can be obtained. In the algorithm testing and application stage, there is no need to train the model and reinforcement learning strategies. It only needs to send the state information to the deep neural network module to output the action value. This chapter mainly conducts three parts of the experiment: The first part is to verify the feasibility of the algorithm in the indoor path planning environment of Gazebo; the second part is to conduct intelligent body training, to carry out the UAV path planning task in the training environment, the environment needs to be observed, sent to the neural network, and finally The network model is trained according to the DQN algorithm. The third part is the agent test, which loads the trained model into the test environment, executes the path task according to the network model obtained during the training process, and finally counts the completion of the task. In order to train and test the UAV path planning algorithm proposed above, Chap. 4 has

54

R. Dong et al.

Fig. 12 Training and testing flow chart

constructed a variety of path planning algorithm training and testing scenarios. The training of all environments is designed and developed based on Tensorflow, ROS and Pixhawk using Python language under Ubuntu 20.04 system. In order to determine the path planning ability of the network model obtained after training, testing is required. That is, in the same simulation environment as the training environment, the path is completely determined by the network model of deep reinforcement learning, and 100 rounds of testing are designed for each round. Take the drone initialization as the beginning of the test round, and take the drone reaching the target point as the end condition of the test round. A total of 100 rounds of testing are carried out, and the target point is placed in the range that the drone can reach in 50 training steps. Within, i.e. by defining a limited number of points within a certain range, the drone target points randomly appear at the defined locations. At the beginning of each turn, the drone will initialize, returning to the set position.

UAV Path Planning Based on Deep Reinforcement Learning

55

In order to evaluate the path planning performance of the algorithm, this chapter designs three evaluation indicators as the actual quantitative indicators for judging the effect of the algorithm, which are specifically defined as: • Loss function: The loss function is the error function between the actual value obtained by the target network and the predicted value output by the training network. The training process is to use gradient descent to reduce this value. The change in loss can indicate that the network is approaching convergence. Record the test process The change of the loss function in the middle, judge the rate of convergence and the size of the error; • Maximum Q value: During training, each time the observation value of the number of samples in the experience pool is taken, the current state is input into the training network to obtain the Q value, and then the corresponding next state is input into the target network, and the maximum Q value is selected. Record the change of the maximum Q value during the test process to judge the learning effect of the reinforcement learning strategy; • Success rate: For each path planning process, if the UAV can reach the target smoothly in the end, it is regarded as a successful path planning process, but if after more than 50 actions, it is still If it cannot be achieved, or during the execution of the action, the drone moves beyond the specified range of motion, such as hitting an obstacle, or exceeding the specified motion area, etc., it will be regarded as an unsuccessful test. Count the number of successes in one hundred rounds, and count the success rate.

4.2 Training and Results In an indoor closed space without obstacles, the path planning capability of the UAV 3D path planning task algorithm is verified. The UAV collects obstacle information through lidar, obtains the distance of nearby obstacles relative to the agent, and the clip relative to the target point. Angle and distance from the target point. Each time training is performed according to the method proposed in this chapter, the artificial potential field method is always followed for certain rewards and punishments during the training process. The network model parameter settings are shown in Table 4. The initial value of the greedy coefficient Epsilon is set to 1.0, and it gradually decreases by 0.01 until 0.05 no longer decays. The deep neural network architecture consists of 1 input layer, 2 fully connected layers, 1 hidden layer and then a fully connected layer, and finally the output layer. Therefore, this chapter designs a network structure with low model complexity that can meet the task requirements and uses the Keras hierarchical structure to build it; in order to avoid the over-fitting phenomenon of the model, a random dropout (Dropout) is added after the fully connected layer. Considering the deep network The input data size of the network model is the size of the feature state of the regional perception centered on the UAV, and it adapts to the input size and scale of the network. The final network model consists of three fully connected layers and one hidden layer. According to the actual requirements of the

56

R. Dong et al.

Table 4 Network model training parameters Hyperparameters Numerical value Describe episode_step

1000

Time steps

target_update

200

Target network update rate

discount_factor

0.99

Indicates how much value will be lost in the future based on the time step

learning_rate

0.00025

Learning speed. If the value is too large, the learning effect is not good; if the value is too small, the learning time is very long

epsilon

1.0

Probability of choosing a random action

epsilon_decay

0.99

Epsilon reduction rate. When one step is over, ε decreases

epsilon_min

0.05

Epsilon minimum

batch_size

64

The size of a set of training samples

train_start

64

If the playback memory size is greater than 64, start training

memory

1,000,000

The size of the playback memory

mobile robot control task, the characteristic state of the UAV is used as the input of the network, and the network expects to output the Q value of 7 actions, and at the same time, the action with the largest Q value is selected for execution. Therefore, the training is carried out according to the above method, and the model of the UAV reinforcement learning deep network with the ability of two-dimensional path planning is obtained. Figure 13 shows the change of the model’s Eplison value, and Fig. 14 shows the change of the model’s maximum Q value with the number of training steps. It can be concluded that with the increase of the number of training steps, the maximum Q value gradually increases, and the model error also tends to be stable. After obtaining the reinforcement learning path planning model, there is no need to train the model and reinforcement learning strategy, and only need to send the state information to the deep neural network module to output the action value. The motion command is selected in the action space, so it must conform to the kinematic model, that is, the application of three-dimensional path planning (Fig. 15). In this path planning task, the UAV starts from the starting point, can move toward the target point, and deflects itself toward the target point. When approaching an obstacle, it will make an obstacle avoidance action. It collides with the boundary or obstacle and always maintains a safe distance, indicating that the path planning strategy has been learned, and it proves that the reinforcement learning strategy designed in this chapter can realize the path planning of the UAV (Table 5). After the training is completed, load the parameters and weights of the 1000 -step training model, and only need to send the state information to the deep neural network module to output the action value. Send the action value to Pixhawk to control the drone movement through ROS. Two sets of tests are conducted, where the starting point of the drone is ( 0, 0, 2) and the target points are (2, 0, 0) and (−2, −2, 2), each group conducts 100 tests for a total of 200 times, and counts the success rate of the

UAV Path Planning Based on Deep Reinforcement Learning

57

Fig. 13 Changes in eplison values during training

Fig. 14 Maximum Q value change during training

drone reaching the target point and the number of collisions. If it collides or stops, the next test will be started. The test results are shown in the table below. As shown in Fig. 16, the results show that the UAV has a certain path planning ability in the indoor environment.

4.3 Comparative Analysis of Improved DQN and Traditional DQN Algorithms On the basis of the previous section, in order to better verify the performance of the improved DQN algorithm for UAV path planning, the improved DQN path planning

58

R. Dong et al.

Fig. 15 Loss changes during training

Table 5 Gazebo dynamic obstacle avoidance test results Target point location Number of times to Number of collisions Success rate reach the target point Test group 1 (2, 0, 0)

76

24

0.76

Test group 2 (−2, −2, 2)

64

36

0.64

experiment was carried out in the indoor simulation environment. Set the classic DQN algorithm to reward the target point, punish the collision, and give the reward for the distance from the target point. At the same time, under the same environment, the traditional DQN algorithm is used as a comparative experiment. 100 rounds of testing are designed each round. Take the drone initialization as the start of the test round, and take the drone to reach the target point as the end condition of the test round. A total of 100 rounds of tests are carried out, and the target point is placed within the range that the drone can reach in 50 steps, namely By defining a limited number of points within a certain range, drone target points appear randomly at defined locations. At the beginning of each round, the drone will initialize, return to the set position, and select the loss change and average path length as the comparison criteria. The reward function of the classic DQN algorithm is shown in formula (18) R = Ra + Rd cos θ + Rc + Rs

(18)

the formula: Ra —target reward; Rd —direction reward; Rc —collision penalty; Rs — step penalty [41]. According to the UAV reaching the target point, it is recorded as the UAV path planning success, and the UAV path planning success rate based on the improved DQN algorithm and DQN algorithm is calculated. The results are shown in Table 6.

UAV Path Planning Based on Deep Reinforcement Learning

(a) Environment initialization

(c) The drone moves towards the target point

59

(b) Target point generation

(d) The drone reaches the target point

Fig. 16 3D indoor path planning test environment

Compared with the classic QN algorithm, the improved DQN algorithm has a shorter average path length and a higher success rate, while the classic DQN algorithm still cannot reach the target point after 100 rounds. From the comparison of the success rate change curve, it can be shown that the effect of improving the DQN algorithm in the obstacle avoidance experiment is better than that of the DQN algorithm. Figure 17 shows the loss curves obtained by the DQN algorithm and the improved DQN algorithm in the indoor 3D path planning environment of the UAV. The red curve in the figure represents the loss change trend of the improved DQN algorithm, while the blue curve represents the loss change trend of the classic DQN algorithm. The figure reflects the variation of the error obtained by the UAV in each step after being trained by two reinforcement learning algorithms. It can be seen that the error of the improved DQN algorithm is smaller and the convergence speed is faster. Figures 18 and 19 are the path planning trajectories of the UAV in the test environment. For the improved DQN algorithm, the shortest path that can be achieved in the discrete action space is used to reach the target point, while the test results of the

60

R. Dong et al.

Table 6 Comparison results of improved DQN and classic DQN Target location

Number of times to reach the target point

Collision frequency

Average path Exceeds the length maximum number of rounds

Success rate

Improve DQN

(2, 0, −1) 89

11

8

0

0.89

Classic DQN

(2, 0, −1) 52

10

13

38

0.52

Fig. 17 Comparison of loss

classic DQN algorithm show no The human–machine cannot successfully complete the path planning requirements, which is manifested as a back-and-forth movement in an open area in space to avoid collisions without moving to the target point. It can be seen that due to the introduction of the idea of the artificial potential field method, the improved DQN algorithm avoids the phenomenon that the DQN algorithm repeatedly moves in place to avoid collisions, and can successfully complete the path planning task. Comparing the test results of the DQN algorithm and the improved algorithm, it shows that the introduction of the artificial potential field method can effectively guide the UAV to move to the target point, and can reach the target point to get rewards; while the strategy learned by the traditional algorithm is to try to avoid collision rather than actively. The target point moves. This shows that in the UAV path planning task, the improved DQN algorithm can be more efficient and faster than the classic DQN algorithm. At the same time, the magnitude of the change of the

UAV Path Planning Based on Deep Reinforcement Learning

61

Fig. 18 Improved DQN path

Fig. 19 Classic DQN path

loss curve of the DQN algorithm is greater than that of the improved DQN algorithm, which shows that DQN is not as good as the improved DQN algorithm in terms of algorithm stability.

5 Conclusions UAV has become a working platform for various environments with different purposes. This chapter takes the rotor UAV as the experimental platform to study the deep reinforcement learning path planning of UAVs. This chapter uses ROS as the communication, sends the decision instructions to Pixhawk to control the UAV to achieve path planning, and proposes a path planning method that improves the DQN algorithm. This algorithm combines the advantages of the artificial potential field

62

R. Dong et al.

method. After testing in the simulation environment, the algorithm is conducive to the decision of deep reinforcement learning, and greatly reduces the time required for training. In order to achieve more realistic rendering effects and build larger scenes, Gazebo and Airsim are selected as the training and testing environments for depth enhancement algorithms. Experimental results show that the algorithm can achieve collision free path planning from the starting point to the target point, and it is easier to converge than the classical DQN method. The conclusions of this chapter are as follows: According to the task requirements of this chapter, a path planning method based on the improved DQN algorithm is designed. This algorithm combines the advantages of the artificial potential field method, which can effectively guide the UAV to reach the target point while avoiding obstacles in the training process, and significantly improves the efficiency of the path planning algorithm for UAV in-depth reinforcement learning. Combined with UAV model, reinforcement learning path planning algorithm and simulation environment, training is conducted according to the system state, action and reward function of UAV path planning reinforcement learning task, and then simulation flight tests are conducted on the simulation test platform for UAV’s indoor path planning ability and outdoor dynamic obstacle avoidance ability respectively, Test results show that the algorithm can achieve indoor and outdoor path planning and dynamic obstacle avoidance. The path search results of improved DQN algorithm and classical DQN algorithm are compared and analyzed. By comparing the number of UAV collisions, the efficiency and stability of the reward function numerical change analysis algorithm are improved, and the path planning trajectory effect of UAV is given in the test environment, which finally verifies the stability and feasibility of the UAV depth reinforcement learning path planning method designed in this chapter.

References 1. Khatib, O. (1995). Real-time obstacle avoidance for manipulators and mobile robots. International Journal of Robotics Research, 5(1), 500–505. https://doi.org/10.1177/027836498600 500106. 2. Ge, S. S., & Cui, Y. J. (2002). ‘Dynamic motion planning for mobile robots using potential field method. Autonomous robots’, 13(3), 207–222. https://doi.org/10.1023/A:1020564024509. 3. Mabrouk, M. H., & McInnes, C. R. (2008). Solving the potential field local minimum problem using internal agent states. Robotics and Autonomous Systems, 56(12), 1050–1060. https://doi. org/10.1016/j.robot.2008.09.006. 4. Jurkiewicz, P., Biernacka, E., Dom˙zał, J., & Wójcik, R. (2021). Empirical time complexity of generic Dijkstra algorithm. In 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM) (pp. 594–598). IEEE. (May, 2021). 5. Knuth, D. E. (1977). A generalization of Dijkstra’s algorithm. Information Processing Letters, 6(1), 1–5.

UAV Path Planning Based on Deep Reinforcement Learning

63

6. Pods˛edkowski, L., Nowakowski, J., Idzikowski, M., & Vizvary, I. (2001). ‘A new solution for path planning in partially known or unknown environment for nonholonomic mobile robots. Robotics and Autonomous Systems, 34(2–3), 145–152. https://doi.org/10.1016/S09218890(00)00118-4. 7. Zhang, Y., Li, L. L., Lin, H. C., Ma, Z., & Zhao, J. (2017, September). ‘Development of path planning approach based on improved A-star algorithm in AGV system. In International Conference on Internet of Things as a Service (pp. 276–279). Springer, Cham. https://doi.org/ 10.1007/978-3-030-00410-1_32. (Sept, 2017). 8. Sedighi, S., Nguyen, D. V., & Kuhnert, K. D. (2019). Guided hybrid A-star path planning algorithm for valet parking applications. In 2019 5th International Conference on Control, Automation and Robotics (ICCAR) (pp. 570–575). IEEE. https://doi.org/10.1109/ICCAR.2019. 8813752. (Apr, 2019). 9. LaValle, S. M. (1998). Rapidly-exploring random trees: A new tool for path planning (pp. 293– 308). 10. Karaman, S., & Frazzoli, E. (2012). Sampling-based algorithms for optimal motion planning with deterministic μ-calculus specifications. In 2012 American Control Conference (ACC) (pp. 735–742). IEEE. https://doi.org/10.1109/ACC.2012.6315419. (June, 2012). 11. Kavraki, L. E., Svestka, P., Latombe, J. C., & Overmars, M. H. (1996). Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4), 566–580. https://doi.org/10.1109/70.508439. 12. Webb, D. J., & Van Den Berg, J. (2013). Kinodynamic RRT*: Asymptotically optimal motion planning for robots with linear dynamics. In 2013 IEEE International Conference on Robotics and Automation (pp. 5054–5061). IEEE. https://doi.org/10.1109/ICRA.2013.6631299. (May, 2013). 13. Bry, A., & Roy, N. (2011). Rapidly-exploring random belief trees for motion planning under uncertainty. In 2011 IEEE International Conference on Robotics and Automation (pp. 723– 730). IEEE. https://doi.org/10.1109/ICRA.2011.5980508. (May, 2011). 14. Nasir, J., Islam, F., Malik, U., Ayaz, Y., Hasan, O., Khan, M., & Muhammad, M. S. (2013). RRT*-SMART: A rapid convergence implementation of RRT. International Journal of Advanced Robotic Systems, 10(7), 299. https://doi.org/10.1109/ICRA.2011.5980508. 15. Gammell, J. D., Srinivasa, S. S., & Barfoot, T. D. (2014). Informed RRT*: Optimal samplingbased path planning focused via direct sampling of an admissible ellipsoidal heuristic. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 2997–3004). IEEE. https://doi.org/10.1109/IROS.2014.6942976. (Sept, 2014). 16. Ye, H., Zhou, X., Wang, Z., Xu, C., Chu, J., & Gao, F. (2020). Tgk-planner: An efficient topology guided kinodynamic planner for autonomous quadrotors. IEEE Robotics and Automation Letters, 6(2), 494–501. arXiv:2008.03468. 17. Koohestani, B. (2020). A crossover operator for improving the efficiency of permutation-based genetic algorithms. Expert Systems with Applications, 151, 113381. https://doi.org/10.1016/j. eswa.2020.113381. 18. Lamini, C., Benhlima, S., & Elbekri, A. (2018). ‘Genetic algorithm based approach for autonomous mobile robot path planning. Procedia Computer Science’, 127, 180–189. https:// doi.org/10.1016/J.PROCS.2018.01.113. 19. Li, Q., Wang, L., Chen, B., & Zhou, Z. (2011). An improved artificial potential field method for solving local minimum problem. In 2011 2nd International Conference on Intelligent Control and Information Processing (Vol. 1, pp. 420–424). IEEE. https://doi.org/10.1109/ICICIP.2011. 6008278. (July, 2011). 20. Liang, J. H., & Lee, C. H. (2015). Efficient collision-free path-planning of multiple mobile robots system using efficient artificial bee colony algorithm. Advances in Engineering Software, 79, 47–56. https://doi.org/10.1016/j.advengsoft.2014.09.006. 21. Akka, K., & Khaber, F. (2018). Mobile robot path planning using an improved ant colony optimization. International Journal of Advanced Robotic Systems, 15(3), 1729881418774673. https://doi.org/10.1177/1729881418774673.

64

R. Dong et al.

22. Su, Q., Yu, W., & Liu, J. (2021). Mobile robot path planning based on improved ant colony algorithm. In 2021 Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS) (pp. 220–224). IEEE. https://doi.org/10.1109/ACCTCS52002.2021.00050. (Jan, 2021). 23. Cheng, J., Wang, L., & Xiong, Y. (2018). Modified cuckoo search algorithm and the prediction of flashover voltage of insulators. Neural Computing and Applications, 30(2), 355–370. https:// doi.org/10.1007/s00521-017-3179-1. 24. Khaksar, W., Hong, T. S., Khaksar, M., & Motlagh, O. R. E. (2013). A genetic-based optimized fuzzy-tabu controller for mobile robot randomized navigation in unknown environment. International Journal of Innovative Computing, Information and Control, 9(5), 2185–2202. 25. Xiang, L., Li, X., Liu, H., & Li, P. (2021). Parameter fuzzy self-adaptive dynamic window approach for local path planning of wheeled robot. IEEE Open Journal of Intelligent Transportation Systems, 3, 1–6. https://doi.org/10.1109/OJITS.2021.3137931. 26. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., & Hassabis, D. (2015). Humanlevel control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi. org/10.1038/nature14236. 27. Jaradat, M. A. K., Al-Rousan, M., & Quadan, L. (2011). Reinforcement based mobile robot navigation in dynamic environment. Robotics and Computer-Integrated Manufacturing, 27(1), 135–149. https://doi.org/10.1016/j.rcim.2010.06.019. 28. Shi, Z., Tu, J., Zhang, Q., Zhang, X., & Wei, J. (2013). The improved Q-learning algorithm based on pheromone mechanism for swarm robot system. In Proceedings of the 32nd Chinese Control Conference (pp. 6033–6038). IEEE. (July, 2013). 29. Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2017). Target-driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3357–3364). IEEE. https://doi.org/10.1109/ICRA.2017.7989381. (May, 2017). 30. Sadeghi, F., & Levine, S. (2016). Cad2rl: Real single-image flight without a single real image. https://doi.org/10.48550/arXiv.1611.04201. arXiv:1611.04201. 31. Tai, L., & Liu, M. (2016). Towards cognitive exploration through deep reinforcement learning for mobile robots. https://doi.org/10.48550/arXiv.1610.01733. arXiv:1610.01733. 32. Jisna, V. A., & Jayaraj, P. B. (2022). An end-to-end deep learning pipeline for assigning secondary structure in proteins. Journal of Computational Biophysics and Chemistry, 21(03), 335–348. https://doi.org/10.1142/S2737416522500120. 33. He, L., Aouf, N., & Song, B. (2021). Explainable deep reinforcement learning for UAV autonomous path planning. Aerospace Science and Technology, 118, 107052. https://doi.org/ 10.1016/j.ast.2021.107052. 34. Jeong, I., Jang, Y., Park, J., & Cho, Y. K. (2021). Motion planning of mobile robots for autonomous navigation on uneven ground surfaces. Journal of Computing in Civil Engineering, 35(3), 04021001. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000963. 35. Chen, C., Seff, A., Kornhauser, A., & Xiao, J. (2015). DeepDriving: Learning affordance for direct perception in autonomous driving. In 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/ICCV.2015.312. 36. Wu, K., Wang, H., Esfahani, M. A., & Yuan, S. (2020). Achieving real-time path planning in unknown environments through deep neural networks. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/tits.2020.3031962. 37. Maw, A. A., Tyan, M., Nguyen, T. A., & Lee, J. W. (2021). iADA*-RL: Anytime graph-based path planning with deep reinforcement learning for an autonomous UAV. Applied Sciences, 11(9), 3948. https://doi.org/10.3390/APP11093948. 38. Gao, J., Ye, W., Guo, J., & Li, Z. (2020). ‘Deep reinforcement learning for indoor mobile robot path planning. Sensors’, 20(19), 5493. https://doi.org/10.3390/s20195493. 39. Yongqi, L., Dan, X., & Gui, C. (2020). Rapid trajectory planning method of UAV based on improved A* algo-rithm. Flight Dynamics, 38(02), 40–46. https://doi.org/10.13645/j.cnki.f.d. 20191116.001.

UAV Path Planning Based on Deep Reinforcement Learning

65

40. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). ‘Playing atari with deep reinforcement learning. https://doi.org/10.48550/arXiv. 1312.5602. arXiv:1312.5602. 41. Ruan, X., Ren, D., Zhu, X., & Huang, J. (2019). ‘Mobile robot navigation based on deep reinforcement learning’. In 2019 Chinese control and decision conference (CCDC) (pp. 6174– 6178). IEEE. https://doi.org/10.1109/CCDC.2019.8832393. (June, 2019 ).

Drone Shadow Cloud: A New Concept to Protect Individuals from Danger Sun Exposure in GCC Countries Mohamed Zied Chaari, Essa Saad Al-Kuwari, Christopher Loreno, and Otman Aghzout

Abstract The pick temperature in the Gulf and the Persian Gulf is around 47◦ in the summer. The hot season lasts for six months in this region, starting at the end of April and ending in October. The average temperature in the same period exceeds 44◦ in the USA and Australia. The high temperature worldwide affects the body’s capability to function outdoors. “Heat stress” refers to excessive amounts of heat that the body cannot handle without suffering physiological degeneration. Heat stress due to high ambient temperature seriously threatens workers worldwide. This issue increases the risk of limitations in physical abilities, discomfort, injuries, and heat-related illnesses. According to the Industrial safety & hygiene news, the worker’s body must maintain a core temperature of 36◦ to maintain its normal function. Many companies have their workers wear UV-absorbing clothing to protect themselves from the sun’s rays. This chapter presents a new concept to protect construction workers from dangerous sun exposure in hot temperatures. The fly umbrella drone with a UV-blocker fabric canopy provides a stable shaded area. The solution minimizes heat stress and protects them from UV rays when working outdoors. According to the sun’s position, a fly umbrella moves dramatically through an open space, providing shade for workers. Keywords Safety · Heat stress · Smart flying umbrella · Drone · Outdoor worker

1 Introduction Global temperatures are rising, and how to reverse the trend is the subject of discussion. A special report published by the Intergovernmental Panel on Climate Change (IPCC) in October 2018 addressed the impacts of global warming of 1.5◦ , as shown M. Z. Chaari (B) · E. S. Al-Kuwari · C. Loreno Qatar Scientific Club, Fab Lab Department, Doha, Qatar e-mail: [email protected] O. Aghzout University of Abdelmalek Essaadi, Tetouan, Morocco e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_3

67

68

M. Z. Chaari et al.

Fig. 1 Global surface temperature [38]

in Fig. 1. Increasing temperatures have an uneven impact on subregions. Heat stress is predicted to have the greatest impacts in Western Africa and Southern Asia, with 4.8 and 5.3% productivity losses in 2030, equivalent to 43 and 9 million jobs, respectively, as shown in Fig. 2. A weakening ozone shield has caused an increase in UV light transmission from the sun to the planet, resulting in skin diseases [37]. The report concluded that limiting global warming to 1.5◦ would require rapid, far-reaching, and unprecedented social changes [17]. In [27], this work present that the GCC countries have been rapidly growing and expanding, with urbanization occurring at an accelerated pace. Approximately 80% of the population of the GCC lives in urban areas, making it one of the most urbanized regions in the world. Global warming is caused by urbanization as a result of increased energy consumption and, consequently, higher carbon emissions. From 1979 to 2018, the death rate as a direct result of heat exposure (the underlying cause of death) generally ranged between 0.5 and 2 deaths per million people, with spikes in specific years, as shown in Fig. 3. Death certificates show more than 11,000 Americans have died from heat-related causes since 1979. This increase affects the heat exposure of construction workers outdoor [4, 22, 42]. High temperatures can cause various illnesses, such as heat stroke, permanent damage, or even death. In [28], the authors review the scientific reports on the health status of workers exposed to high temperatures in the workplace. Heat exposure has been associated with heart-related diseases, deaths, accidents, and psychological effects on construction workers. The author’s review suggests that many workers are vulnerable to heat exposure, which affects workers worldwide. In [39], the authors describe a model of heat stress impacts on cardiac mortality in Nepali migrant workers in Qatar. They used this model for a multi-objective necessary to present the effect of hot temperature on the body of workers. The authors demonstrated that the increased cardiovascular mortality during hot periods most likely is due to severe heat stress among these construction workers.

Drone Shadow Cloud: A New Concept to Protect Individuals …

69

Fig. 2 Percentage of working hours lost to heat stress by subregion since 1995 and projected for 2030 [23]

Fig. 3 Heat-related deaths in the United States

70

M. Z. Chaari et al.

In [32], the authors present a model based on worldwide analysis Heat exposure has killed hundreds of U.S. workers. At least 384 workers in the United States have died from environmental heat exposure in the past decade. In 37 states across America, the count includes people working essential jobs, such as farm workers in California, construction workers in Texas, and tree trimmers in North Carolina and Virginia. That shows that heat stress is not limited to GCC countries and is prevalent throughout the United States. In [26], the authors and reporters from CJI and NPR examined worker heat deaths recorded by OSHA between 2010 and 2020. They compared the high temperature of each incident day with historical averages for the previous 40 years. Most of the deaths happened on sweltering days on that date. Approximately two-thirds of the incidents occurred on days when the temperature reached at least 50◦ . In [12], the authors present a thorough longitudinal study to describe heat exposure and renal health of Costa Rican rice workers over three months of production. In this study, 72 workers with various jobs in a rice company provided pre-shift urine and blood samples at baseline and three months later. NIOSH guidelines and the WBGT index were used to calculate metabolic and ambient heat loads. As a result of the research, the research recommended that efforts be made to provide adequate water, rest, and shade to heat-exposed workers in compliance with national regulations. In this study [31], the authors explain that during the stages of the milk manufacturing cycle, Italian dairy production exposes workers to certain uncomfortable temperatures as well as potentially subjecting workers to heat shock. This study aimed to assess the risks of heat stress for dairy workers who process buffalo milk in southern Europe. The United States has a high rate of heat-related deaths, although they are generally considered preventable. In the United States, heat-related deaths averaged 702 per year between 2004 and 2018, as shown in Fig. 4. As part of the CDC’s effort to study heat-related deaths by age group, gender, race/ethnicity, and urbanization level and to evaluate comorbid conditions associated with heat-related deaths, CDC analyzed mortality data from the National Vital Statistics System (NVSS). The highest heatrelated mortality rates were observed among males 65 years and older, American Indians/Alaska Natives who lived in nonmetropolitan counties, and those in large central metro counties [6]. To counteract this risk, legal texts, as well as technological solutions, exist. Nations agree to stop all work if the wet-bulb globe temperature (WBGT) exceeds 32.1◦ in a particular workplace, regardless of the time. The factors considered by the WBGT index are air temperature, humidity, sunlight, and wind strength. Construction workers should not stay outside in the heat for long periods because they will face serious health problems [5, 19]. In [8] the authors explain the epidemic of chronic kidney disease in Central America is largely attributed to heat stress and dehydration from strenuous work in hot environments. This study shows the efforts to reduce heat stress and increase efficiency among sugarcane workers in El Salvador. This study pushes the Salvadoran government to provide mobile canopies for workers, as shown in Fig. 5. Umbrellas are widely used as shade in middle eastern countries. This research aims to develop a flying umbrella to provide shade and safe conditions parameters for outdoor workers.

Drone Shadow Cloud: A New Concept to Protect Individuals …

71

Fig. 4 U.S. heat-related deaths from 2004 to 2018 Fig. 5 Portable canopies provide shade to the cane field workers during a break

2 Related Work 2.1 Overview Scientists and researchers have devoted much attention to this issue. In [18], the authors present a technique based on shade structures installed in primary schools to help reduce children’s exposure to ultraviolet radiation (UVR) during their formative

72

M. Z. Chaari et al.

Fig. 6 An illustration of solar geoengineering techniques [14, 24, 33]

years and to provide areas where they can conduct outdoor activities safely. In [25] Scientists such as Keith seek ways to mimic the volcanic effect artificially. They explain in a nature journal that a method to cool the planet rapidly is by injecting aerosol particles into the stratosphere to reflect away some inbound sunlight, as shown in Fig. 6. In [21, 29], the authors report that the process involves injecting sulfur aerosols into the stratosphere between 9 and 50 kilometers above the Earth’s surface. Solar geoengineering involves reflecting sunlight into space to limit global warming and climate change. After the aerosols combine with water particles, sunlight will be reflected more than usual for one to three years. One scientific team is developing a global sunshade that uses balloons or jets to shield the most vulnerable countries in the global south from the adverse effects of global warming [34]. In [7], the present author’s scientists have focused more on modifying fabric as a primary protective layer of skin against harmful radiation. Today, many people and outdoor workers use umbrellas to protect themselves from the sun’s and UV rays, are shown in Fig. 7. In the modern world, umbrellas are necessary. In sweltering conditions, it is beneficial. In addition, when we are working or doing something outdoors under changing weather conditions, umbrellas are a handy tool, as shown in Fig. 7a. However, under such circumstances, sometimes it has noticeable shortcomings. The hand may always be busy when handling an umbrella, limiting some hand functions and requiring further care and attention. There are some difficulties in obtaining some disadvantages in holding an umbrella. Therefore, many companies and associations supply umbrella hats for generating a canopy, as shown in Fig. 7b. Several high-tech solutions help workers cope with and adapt to this climate, particularly outdoors. Providing shade to customers utilizing robotics technology requires a great deal of work.

Drone Shadow Cloud: A New Concept to Protect Individuals …

73

(a) Southern Area Municipality distributes umbrella hats to workers (Bahrain).

(b) Peoples utilize an umbrella to block UV rays in Dubai.

Fig. 7 Ordinary umbrellas

Research at the Dongseo University has developed a solution to finding the perfect spot to enjoy shade during a summer picnic [3]. A new type of portable architecture solves the mundane but frustrating problem of adjusting shades throughout the day. Researchers at this university have demonstrated that an adaptive canopy can change shape as the sun moves throughout the day, providing stable shade and shadowing regardless of the solar position or time of day while still considering its configuration irrespective of location. Research at the fabrication laboratory in QSC has developed a prototype to provide cool air from the robot the perfect spot to enjoy during a summer picnic [11]. A new type of air conditioner robot prototype that follows humans in outdoor applications. Several robotic systems are sensitive to solar radiation, including some integrated with solar panels, and can even use them for shading. The researchers conclude that “the resulting architectural system can autonomously reconfigure, and stable operation is driven by adaptive design patterns, rather than solely robotic assembly methods.” Cyberphysical macro materials and aerial robotics utilize to construct the canopy [44]. The drone is a lightweight nanomaterial made of Carbon Fiber with integrated electronics to sense, process, and communicate data [1, 13]. University of Maryland researchers developed a system called RoCo which provides cooling while conserving the user’s comfort [15, 16]. The engineering faculty of this university has created multiple versions of air conditioner robots with different options. In today’s fast-moving world, robots that assist humans are increasingly

74

M. Z. Chaari et al.

relevant. In many fields and industries, a robot helps a human [35, 40]. Researchers have confirmed the possibility of creating an intelligent flying umbrella by combining drone and canopy technology. Drones significantly impact the production capabilities of individuals such as artists, designers, and scientists, and they can detect and track their users continuously. Over the past few years, UAV and mini UAV technology have grown significantly [10]. It is present in our daily lives and helps us in various fields, recently in the COVID-19 pandemic (Spraying, Surveillance, Homeland Security, etc.). Today, drones can perform a wide range of activities, from delivering packages to transporting patients [9]. In [20], A team of experts in engineering at Asahi Power Service has invented a drone called “Umbrella Sidekick.” A drone can translate as an imaginative flying umbrella. While the techniques and solutions described above are good, outdoor workers need better and more efficient methods, especially concerning global warming. The following subsection presents a new proposal for flying umbrellas.

2.2 Proposal This work aims to develop a flying umbrella with stable shading and a motion tracking system. According to its name, this umbrella performs the same function as the older one, flying above the head of the workers and serving the same purpose as an umbrella hat. UV-protective fly umbrellas reduce the temperature in outdoor construction areas and prevent heat illness at work. Flying Umbrella drones are designed mainly to: • Provides consistent shadowing and shading regardless of the angle of the sun’s rays. • Keep a safe distance of approximately ten meters from the construction worker’s field. • Prevent heat illness at work. The sun heats the earth during the day, and clear skies allow more heat to reach the earth’s surface, which increases temperatures. However, cloud droplets reflect some of the sun’s rays into space when the sky is cloudy. So, less of the sun’s energy can reach the earth’s surface, which causes the earth to heat up more slowly, leading to cooler temperatures [2, 43]. The prototype idea comes from the shadow cast by clouds since clouds can block the sun’s rays and provide shade, as shown in Fig. 8a. The study aims to develop an intelligent flying umbrella that would improve efficiency and offer a comfortable outdoor environment for workers. We choose ten meters high of the umbrella in the workers’ bran in a variety of ways. On the one hand, the workers do not have hair and feel the noise generated by the propellers [30, 36, 41]. On the other hand, the canopy reflects significant amounts of solar radiation. (x) is a function of the solar position and the umbrella place, as shown in Fig. 8b. The umbrella is on standby in special parking, awaiting the order to fly. If it receives an order, it will follow the workers automatically. The umbrella will

Drone Shadow Cloud: A New Concept to Protect Individuals …

(a) Cloud shade.

75

(b) Flying umbrella shade.

Fig. 8 A flying umbrella that casts a shadow

Fig. 9 Propose a way to flying umbrellas

return to its original position if it loses the target workers. A failed signal will cause the umbrella to issue an alarm and return to the parking area smoothly based on the ultrasonic sensor implemented in the umbrella, as shown in Fig. 9. The sunshades protect workers from solar and heat stress, and sunshades need to be adjusted daily according to solar radiation. The umbrella concept consists of several components: an aluminum frame, an electronics board, sensors, and radio-frequency communication. The chapter is structured as follows: The first section is the introduction. Section 2 presents the materials and methods. Section 3 discusses the fabrication and testing of the fly umbrella. Section 4 demonstrates the results, and finally, we conclude with a prototype discussion and plan for the future.

76

M. Z. Chaari et al.

3 Proposed Method Thus, this flying umbrella includes a flight and operating system to provide shade to workers outdoors. The product is born from the combination of a drone equipped with the necessary equipment and a canopy umbrella. Control commands for a umbrella drone: This module sends specific control commands to the drone via the radiofrequency link to control the umbrella (i.e., pitch, roll, yaw, and throttle). Utilize RF remote controls to track and follow workers in an open environment. The VU0002 digital ultrasonic sensor employs the ultrasonic time-of-flight principle to measure the distance between the umbrella and the obstacle. Various communication protocols, including LIN bus and 2/3-wire IO, are used to output a digital distance signal and self-test data, making it suitable for an array of intelligent umbrella parking systems. This ultrasonic sensor is highly weather-resistant, stable, and anti-interference. The logical design in Fig. 10 shows the interaction between the components of the flying umbrella and how data flows from the input layer to the designated actions by the umbrella. Through the camera on board, the pilot perceives the environment through its sensor. Also via the RF signal, the onboard computer receives commands from the umbrella and executes them by sending movement control commands to the umbrella flight controller, which is the brain of the umbrella location system. Transmitters transmit orders that are received by the receiver and then transmitted to the flight controller, which instructs the actuators to move. As part of the prototype, an ultrasonic proximity sensor detects obstacles in the umbrella’s path. The umbrella can track the worker efficiently by avoiding obstacles. The drone tracking module is responsible for monitoring the spatial relationship between real-world elements

Fig. 10 Diagram of a flying umbrella drone that uses an ultrasonic sensor

Drone Shadow Cloud: A New Concept to Protect Individuals …

77

Fig. 11 Three cases for flying umbrella cloud geometry effect on workers from space. (S: Shadow shown on the image; E: shadow covered by cloud, and F: bright surface covered by cloud)

and their virtual representations. After an initial calibration process and using an IMU-based approach, tracking is possible. A pilot must be able to receive a live video feed from the umbrella, as well as have real-time access to flight data. The umbrella should be able to switch between modes without malfunctioning. The umbrella should be able to adjust its position automatically without human assistance. For example, the umbrella must not exceed five seconds in latency when delivering pilot instructions to the drone and transmitting telemetry information from the drone to the pilot. At ten meters in height, the canopy creates a shade geometry around the worker’s barn, as shown in Fig. 11. According to the position of the flying umbrella, the shade position changes. Consider these three possibilities. S: Shadow provided by the canopy; E: shadow is covered by the umbrella, and F: the bright surface contaminated by the canopy. The projected and actual positions of the umbrella will most likely differ significantly when the observing and solar zenith angles are relatively large.

3.1 Design of Mechanical Structure For making the umbrella structure frame, we use aluminum instead of carbon fiber in our prototype since we do not have the facility to produce it from fiber carbon. Plastic propellers are the most common carbon fiber propellers of better quality; we chose G32X 11CF carbon fiber propellers for our prototype. A motor’s technical specification is imperative, and more efficient motors will save battery life and give the owner more flying time, which is what every pilot wants. We select motor U13KV130 with high efficiency and can thrust 24 kg. Multirotor drones rely on electronic speed

78

M. Z. Chaari et al.

Fig. 12 Diagram of a flying umbrella drone

controllers to provide high frequency, high power, high-resolution AC power to the brushless electric motors in a very compact package. The flight controller regulates motor speeds via ESC, to provide steering. We select ESC flame 180A 12S V2.0. Controls autopilot, follows workers, failsafe, and many other autonomous functions with inputs from the receiver, GPS module, battery monitor, and IMU mange by the flight controller. This is the heart of the umbrella drone, as shown in Fig. 12. All propellers’ positions varied in a maximum dimension of about 1100 mm, as shown in Fig. 13. Steps to fabricate the prototype: • • • • • • •

Making aluminum umbrella frames. Fixing the Flame 180A 12s V2.0 & the U13II KV130 motors. Installing the G32X11CF propeller. Installing the flight controller board. Installing the ultrasonic sensor. Installing the camera and power distribution system. Fixing the batteries.

The umbrella body was designed and fabricated in our mechanical workshop. Assembly and programming took place in the FabLab (QSC), as shown in Fig. 14. The umbrella comprises six propellers and electronics that balance the canopy; it can

Drone Shadow Cloud: A New Concept to Protect Individuals …

79

Fig. 13 Schematic top and right view of the umbrella drone dimensions

(a) Top view.

(b) Right view.

move in two directions at the same speed. It uses six U13II KV130 motors (engines) manufactured by T-MOTOR. Based on the data in Table 1, each engine produces 5659 W and a maximum thrust of 24 kg. Utilize a 2.5 m distance measurement ultrasonic sensor to prevent crushing. The different specifications of the hardware and mechanical parts of the flying umbrella describe in Table 2. An exploded picture of a flying umbrella concept in 3D, as shown in Fig. 15. The parts of the umbrella drone shadows are mounted and secured well, as depicted in Fig. 11b. In the meantime, the umbrella remains on standby in a safe area and awaits an order to take off, and it will follow workers after stabilizing at an altitude of ten meters to provide shade. The pilot can move the position of the umbrella to give the maximum area of shade to workers in the construction field. If the pilot loses communication with the umbrella, a ringing alarm will go on, and the GPS will land the umbrella automatically in the parking. The flowchart of the system algorithm is shown in Fig. 16.

80

M. Z. Chaari et al.

(a) Umbrella frame ready.

(b) Fixing the Flame 180A 12s V2.0 & the U13II KV130 motors.

(c) Installation of the G32X11CF propeller.

(d) Verify the weight of the umbrella body.

Fig. 14 Manufacturing the umbrella prototype Table 1 The list of components Item Manufacture UV-Blocker fabric Ultrasonic sensor Motor Propellers ESC Converter DC-DC

Blocks up to 90% of UV rays Brand AUDIOWELL T-motor T-motor T-motor MEAN WELL

Table 2 Mechanical specifications of the flying umbrella List Umbrella frame size (mm) Umbrella aluminum frame weight (kg) Distance between propellers and the UV-blocker fabric (mm) Distance between propellers and the ground (mm)

Part number N/A VU0002 U13IIKV130 (Power:5659) G32X 11CF Prop FLAME 180A 12S V2.0 RSD-30G-5

Technical specification 2500 × 2500 33 540 305

Drone Shadow Cloud: A New Concept to Protect Individuals …

(a) An umbrella in 3D exploded view.

81

(b) All parts of the umbrella mounted.

Fig. 15 Flying umbrella ready to fly

3.2 Shade Fabric We used a fabric canopy screen that can block 95% of UV rays while allowing water and air to pass through. Polyethylene fabric of 110 grams per square meter with galvanized buttonholes and strong seams. It is very breathable, allowing air to pass through, making the space more comfortable. It can block 95% of UV rays and create cool shadows. Cools the space and allows light to pass through, allowing raindrops to pass through, so there’s no water pooling.

3.3 Selecting a Umbrella Flight Controller The process of making an umbrella drone can be rewarding, but the choice of a flight controller can be challenging. This prototype can only be operated by specific drone controllers available today on the market. When you know exactly what our umbrella drone will look like, you can narrow down the list of potential autopilot boards, making the decision easier. Here are some criteria to consider when selecting an autopilot board. In our analysis, we analyzed seven of the top drone controllers on the market based on factors such as: • • • • • • • •

Affordability Open Source Firmware FPV Racing friendly Autonomous functionality Linux or microcontroller-based environment Frame size typical Popularity CPU.

82

M. Z. Chaari et al.

Fig. 16 The overall system flowchart

In this subsection, we will present the best flight controller boards and select the best for our prototype. • APM Flight Controller: APM took the Pixhawk as its apprentice and developed it into a much more powerful flight controller. Pixhawk uses a 32-bit processor, while APM has an 8-bit processor. It was a massive leap for open-source drone controllers, so DIY drone builders widely used them. The Pixhawk is compatible with ArduPilot and PX4, two major open-source drone projects, and is also entirely open-source.

Drone Shadow Cloud: A New Concept to Protect Individuals …

83

• Pixhawk: After the original pixhawk, the pixhawk open-source hardware project made many flight control boards. Open-source projects like ArduPilot, therefore, have a better chance of adding new functionality and support to Cube. Due to these similarities, the Cube and the Pixhawk are very similar. • Navio2: The Navio2 uses a raspberry pi to control the flight. As a result, Navio2 is simply a shield that attaches to a Raspberry Pi 3. Debian OS images that come preinstalled with ArduPilot are available for free from Emlid, which makes Navio2. A simple flash of an SD card will do the trick. • BeagleBone Blue: The first Linux implementation of ArduPilot was ported to the BeagleBone Black before being ported to the BeagleBone in 2014. Creating the BeagleBone Blue was a direct result of the success of the Linux porting of ArduPilot for the BeagleBone Black. • Naza Flight Controller: Naza-M V2 kits can be found on Amazon for about $200 and come with essential components like GPS. There is a closed-source flight control software here, which means the community does not have access to its code. Nasa flight controllers aren’t appropriate for people who want to build a drone they can tinker with. • Naze32: The Naze32 model boards are lightweight and affordable, costing about $30-40. Many manufacturers offer Naze32 boards; choose one that is an F3 or F4 flight controller.

3.4 Flying Umbrella Power Calculation A significant problem with electricity-powered robots is their battery life. This will also be an issue with the flying umbrella. Lithium Polymer (LiPo) batteries utilize in this prototype because of their lightweight and high capacity. It is also possible to use Nickel Metal Hydride (NiMH), cheaper but heavier than LiPo, causing problems and reducing the umbrella’s efficiency. Due to battery weight, there is a tradeoff between an umbrella’s total weight and flight time. Each U13II KV130 brushless DC Motor can thrust 24.3 Kg at a battery voltage of 48 VDC, as shown in Fig. 17. So with six U13II KV130 brushless motors, the fly umbrella can carry 144 Kg. Two sources power the flying umbrella: • For six DC brushless motors at 5659 W each, it is 59.65 KW for the whole load and full thrust. So the power of the total motor is approximately equal to 41.7 kW. Six batteries 44.44 Vdc/8000 mAh (Two batteries 22.22 Vdc/8000 mAh series). The total amount of energy produced by the umbrella is 41.7 kWh = 9721 KgCo2. • Battery 12 Vdc 5.5 AH for powering all sensors (ultrasonic, flight controller, GPS module, camera, etc.). So total consumption power is 35 W. The time for the umbrella to fly is based upon the specifications of the batteries, as shown in Table 3. The umbrella cannot fly for more than 22 min with the CX48100 battery. Aluminum umbrella frames weigh about 33 kg, and CX48100 batteries weigh 42 kg.

84

M. Z. Chaari et al.

Fig. 17 U13II KV130 brushless DC Motor Table 3 Batteries specifications Batteries Current (Ah) Voltage (V) (Brand)

Umbrella expected flight time (min)

Numbers of batteries Batteries total weight (Kg) 12 (Each two batteries arranged in series) 1 2 (two batteries arranged in series)

HRB

8

22.2 V

2

CX48100 CXC4825

100 25

48 48

22 11

11

42 16

A total of 73 kg is the weight of the six propellers umbrella drone with a battery and a thrust-to-weight ratio (TTWR) of about 120 kg. The batterie was fixed in the center to ensure an appropriate weight distribution. The fly umbrella can carry the total weight easily. Flight time calculation: AC D = T F W × ( p ÷ v) T = (C × (B D M)) ÷ (AC D) = 22 min where: ACD: Average current Draw, TFW: Total flight weight, BDM: Battery Discharge Margin, P: Power to Weight ratio, V: Voltage.

Drone Shadow Cloud: A New Concept to Protect Individuals …

85

4 Experiment Phase We agree to use radio frequency technology to control the umbrella via a ground control person to improve worker safety. A ground control station and GPS data manage the umbrella remotely. The large size of the umbrella (2500 mm × 2500 mm) necessitates the use of a controller with a remote controller and flier system. This will ensure its use in urban areas and ensure its safety. The umbrella flying control system maintains balance by taking into account all parameters. With the addition of a high-quality ultrasonic sensor (VU0002), the umbrella can take off and land easily and avoid obstacles. To ensure that the umbrella functions appropriately, the ultrasonic sensor in the umbrella must work with an obstacle avoidance algorithm. A flight controller is one of the most comprehensive features of a flying umbrella. The application supports everything, from the home return to carefree flight to altitude hold. The altitude holding and return to home features are particularly helpful for stable shade and following the sun’s position. This technique may help a pilot who is disoriented about their flight direction. Continuously flying around the yard without experiencing extreme ups and downs is easy. However, using the Naza-M V2 in this work is highly efficient, as shown in Fig. 18a. The Fly umbrella is equipped with an accelerometer, gyroscope, magnetometer, barometric pressure sensor, and GPS receiver for more excellent safety in urban environments, as shown in Fig. 18b. As a safety precaution, a rope should use to pull the umbrella drone in case communication is lost. See Fig. 19. During the flight test, one of the objectives was to determine how the control system would perform when the large umbrella size synced with the electronic system. Pilots were able to fly the umbrella via remote control during flight tests. The first step in determining parameters after moving an umbrella is to select longitudinal axes. In this function, the FCS controls the longitudinal axes while the pilot continues to direct the umbrella. The pilot steered the umbrella in and out of the airport

(a) Connection diagram.

Fig. 18 Fly umbrella RF components

(b) A GPS receiver, a barometric pressure sensor, accelerometers, gyroscopes, and magnetometers make up the fly umbrella’s control system.

86

M. Z. Chaari et al.

Fig. 19 The umbrella pulled with a rope

and increased gain until the umbrella was stable to minimize steady-state errors. It is height-adjusted in ascending and descending steps from left to right to ensure stability and maintain a moderate rise and fall rate. Shades can be controlled remotely and moved based on the worker’s location. The umbrella took off successfully, flew steadily, and landed in the desired area, as shown in Fig. 20. Flying umbrellas provide shade while remaining stable in the air, as illustrated in Fig. 20b. The workers move in different directions (left, right, and back) to evaluate the umbrella’s mechanical response. We measured the trajectory between the flying umbrella and the error distance during the current study. We kept the umbrella at a ten-meter altitude. During the testing phase, we observed that the PID controller of the umbrella was affected by the wind speed. So one of the limitations of the umbrella is that it cannot fly in strong winds. Because of the high-efficiency propellers, the flying umbrella can be loud because large quantities of air are rapidly displaced. While the propeller spins, pressure spikes are generated, resulting in a distinctive buzzing noise. The flying umbrella produced in-air levels of 80 dB, with fundamental frequencies at 120 Hz. Noise levels were around 95 dB when the umbrella flew at altitudes of 5 and 10 m. Noise levels were very high in the construction area, so it was not a significant effect compared to heat stress.

Drone Shadow Cloud: A New Concept to Protect Individuals …

(a) Flying umbrella successful take-off

87

(b) Flying umbrella successful flying

(c) Landed

Fig. 20 Following scenes: a–b flying umbrella successful take-off and flying

5 Results and Discussion The following is a summary of the analysis of the experimental results: • For safety reasons, we implement ultrasonic sensors in the umbrella to avoid obstacles and a GPS module to help the umbrella to the homing area in case of losing the RF link with the pilot. The GPS module enables the umbrella to know its location relative to the home area. • Succesful control the umbrella remotely after adjusting the flay parameters such as the speed and the altitude. • Make a stable industrial shade that can protect workers in construction air from high temperatures and UV rays. • In addition to providing shade, this umbrella is equipped with a water tank in the middle that produces a mist of water to keep workers filling cool. • On a sunny day, the fly umbrella shields more than 75% of ultraviolet light. • Thanks to its high-performance design and six brushless motors, this umbrella can carry more than 120 kg. • The umbrella provides shade over a surface area of 5 m2 (2.5 m × 2.5 m). • The GPS sensor allows the flying umbrella to return to the parking station if it loses contact with the umbrella pilot. • Use UV-blocker fabric umbrellas, which offer excellent protection from the sun. A minimum flight height may require for follow-me modes in this application. Umbrella drones should fly at a higher altitude, with no obstacles in front or behind them. The Umbrella drone produces high levels of noise.

88

M. Z. Chaari et al.

Fig. 21 The air temperature difference between shade and full sun in 05 June 2022 (Morning) Table 4 The difference in temperature between objects in shade and full sunlight Object Time Full sun Shade provided by flying umbrella Workers hat Air temperature Soil temperature

10 AM 10.15 AM 10.20 AM

42.3◦ 44◦ 46.5◦

41.2◦ 42.3◦ 44◦

• A more extensive scale of the umbrella will be able to be used in many places after further R&D. In comparing temperatures in two locations, the first location under an umbrella shade and the second location under the full sun in morning time, there was an average difference of 2.5◦ during 22 min, as shown in Fig. 21. We can observe that the shade reduces air and soil temperatures and blocks the sun’s rays, as shown in Table 4. On the same date but afternoon temperatures in two locations, one under an umbrella shade, one under direct sunlight, there was an average difference of 2.7◦ during 22 min, as shown in Figure 22. According to Table 5, the shade reduces the air and soil temperatures by blocking the sun’s rays. Testing at the airport reveals some parameters we can consider to increase the efficiency of the umbrella. Another power source, such as a solar system, should be considered to keep the umbrella flying for a long time. Based on the GPS signal, tracking capabilities are very high. The umbrella produces high levels of noise which are acceptable in workers’ barn area.

Drone Shadow Cloud: A New Concept to Protect Individuals …

89

Fig. 22 The air temperature difference between shade and full sun in 05 June 2022 (Afternoon) Table 5 The difference in temperature between objects in shade and full sunlight (Afternoon) Object Time Full sun Shade provided by flying umbrella Workers hat Air temperature Soil temperature

4.00 PM 4.15 PM 4.20 PM

41.2◦ 40.4◦ 39.8◦

40.4◦ 39.8◦ 39.7◦

6 Conclusion The purpose of this research is to develop a flying umbrella. We demonstrate how the flying umbrella prototype successfully provides shade to individuals working outdoors in hot climates. As a prototype, this umbrella design can perform all the functions of an ordinary umbrella but with human assistance, and it protects workers from the sun’s rays and prevent heat illness at work. In this work, the reader can find the possible draft design and a list of all the devices required to perform the desired task. The drone is changing the philosophy of flying objects to the manufactured cloud. We compare the temperature difference between objects in the shade and full sunlight at two separate times on the same day, afternoon and morning. The temperature under the canopy is lower than in full sunlight. The air and soil temperature decreased by 2.5◦ and 2.7◦ , respectively. In particular, more work needs to be done to design umbrellas for different climates and critical conditions. Protecting an envi-

90

M. Z. Chaari et al.

ronment subject to rain, snow, and, most importantly, scorching heat is essential. In future work, we propose the possibility of installing a wireless charging station in the parking area for the fly umbrella drone. Acknowledgements No funding to declare.

References 1. Agarwal, G. (2022). Brief history of drones. In: Civilian Drones, Visual Privacy and EU Human Rights Law, Routledge (pp. 6–26). https://doi.org/10.4324/9781003254225-2 2. Ahmad, L., Kanth, R. H., Parvaze, S., & Mahdi, S. S. (2017). Measurement of cloud cover. Experimental Agrometeorology: A Practical Manual (pp. 51–54). Springer International Publishing. https://doi.org/10.1007/978-3-319-69185-5_8 3. Ahmadhon, K., Al-Absi, M. A., Lee, H. J., & Park, S. (2019). Smart flying umbrella drone on internet of things: AVUS. In 2019 21st International Conference on Advanced Communication Technology (ICACT), IEEE. https://doi.org/10.23919/icact.2019.8702024 4. Al-Bouwarthan, M., Quinn, M. M., Kriebel, D., & Wegman, D. H. (2019). Assessment of heat stress exposure among construction workers in the hot desert climate of saudi arabia. Annals of Work Exposures and Health, 63(5), 505–520. https://doi.org/10.1093/annweh/wxz033. 5. Al-Hatimy, F., Farooq, A., Abiad, M. A., Yerramsetti, S., Al-Nesf, M. A., Manickam, C., et al. (2022). A retrospective study of non-communicable diseases amongst blue-collar migrant workers in qatar. International Journal of Environmental Research and Public Health, 19(4), 2266. https://doi.org/10.3390/ijerph19042266. 6. Ambarish Vaidyanathan PSSS Josephine Malilay. (2020). Heat-related deaths - united states, 2004–2018. Centers for Disease Control and Prevention, 69(24), 729–734. 7. Bashari, A., Shakeri, M., & Shirvan, A. R. (2019). UV-protective textiles. In The Impact and Prospects of Green Chemistry for Textile Technology (pp. 327–365). Elsevier. https://doi.org/ 10.1016/b978-0-08-102491-1.00012-5 8. Bodin, T., García-Trabanino, R., Weiss, I., Jarquín, E., Glaser, J., Jakobsson, K., et al. (2016). Intervention to reduce heat stress and improve efficiency among sugarcane workers in el salvador: Phase 1. Occupational and Environmental Medicine, 73(6), 409–416. https://doi.org/ 10.1136/oemed-2016-103555. 9. Chaari, M. Z., & Al-Maadeed, S. (2021). The game of drones/weapons makers war on drones. In Unmanned Aerial Systems (pp. 465–493). Elsevier. https://doi.org/10.1016/b978-0-12820276-0.00025-x 10. Chaari, M. Z., & Aljaberi, A. (2021). A prototype of a robot capable of tracking anyone with a high body temperature in crowded areas. International Journal of Online and Biomedical Engineering (iJOE), 17(11), 103–123. https://doi.org/10.3991/ijoe.v17i11.25463. 11. Chaari, M. Z., Abdelfatah, M., Loreno, C., & Al-Rahimi, R. (2021). Development of air conditioner robot prototype that follows humans in outdoor applications. Electronics, 10(14), 1700. https://doi.org/10.3390/electronics10141700. 12. Crowe, J., Rojas-Garbanzo, M., Rojas-Valverde, D., Gutierrez-Vargas, R., Ugalde-Ramírez, J., & van Wendel de Joode, B. (2020). Heat exposure and kidney health of costa rican rice workers. ISEE Conference Abstracts, 2020(1). https://doi.org/10.1289/isee.2020.virtual.o-os549 13. DeFrangesco, R., & DeFrangesco, S. (2022). The history of drones. In The big book of drones (pp. 15–28). CRC Press. https://doi.org/10.1201/9781003201533-2 14. Desch, S. J., Smith, N., Groppi, C., Vargas, P., Jackson, R., Kalyaan, A., et al. (2017). Arctic ice management. Earths Future, 5(1), 107–127. https://doi.org/10.1002/2016ef000410.

Drone Shadow Cloud: A New Concept to Protect Individuals …

91

15. Dhumane, R., Ling, J., Aute, V., & Radermacher, R. (2017). Portable personal conditioning systems: Transient modeling and system analysis. Applied Energy, 208, 390–401. https://doi. org/10.1016/j.apenergy.2017.10.023. 16. Dhumane, R., Mallow, A., Qiao, Y., Gluesenkamp, K. R., Graham, S., Ling, J., & Radermacher, R. (2018). Enhancing the thermosiphon-driven discharge of a latent heat thermal storage system used in a personal cooling device. International Journal of Refrigeration, 88, 599–613. https:// doi.org/10.1016/j.ijrefrig.2018.02.005. 17. Geffroy, E., Masia, M., Laera, A., Lavidas, G., Shayegh, S., & Jolivet, R. B. (2018). Mcaa statement on ipcc report “global warming of 1.5 c”. https://doi.org/10.5281/ZENODO.1689921 18. Gies, P., & Mackay, C. (2004). Measurements of the solar UVR protection provided by shade structures in new zealand primary schools. Photochemistry and Photobiology. https://doi.org/ 10.1562/2004-04-13-ra-138. 19. Hameed, S. (2021). India’s labour agreements with the gulf cooperation council countries: An assessment. International Studies, 58(4), 442–465. https://doi.org/10.1177/ 00208817211055344. 20. Hayes, M. J., Levine, T. P., & Wilson, R. H. (2016). Identification of nanopillars on the cuticle of the aquatic larvae of the drone fly (diptera: Syrphidae). Journal of Insect Science, 16(1), 36. https://doi.org/10.1093/jisesa/iew019. 21. Haywood, J., Jones, A., Johnson, B., & Smith, W. M. (2022). Assessing the consequences of including aerosol absorption in potential stratospheric aerosol injection climate intervention strategies. https://doi.org/10.5194/acp-2021-1032. 22. How, V., Singh, S., Dang, T., Lee, L. F., & Guo, H. R. (2022). The effects of heat exposure on tropical farm workers in malaysia: Six-month physiological health monitoring. International Journal of Environmental Health Research, 1–17. https://doi.org/10.1080/09603123. 2022.2033706 23. (ILO) ILO (2014) Informal Economy and Decent Work: A Policy Resource Guide Supporting Transitions to Formality. INTL LABOUR OFFICE 24. Irvine, P., Emanuel, K., He, J., Horowitz, L. W., Vecchi, G., & Keith, D. (2019). Halving warming with idealized solar geoengineering moderates key climate hazards. Nature Climate Change, 9(4), 295–299. https://doi.org/10.1038/s41558-019-0398-8. 25. Irvine, P., Burns, E., Caldeira, K., Keutsch, F., Tingley, D., & Keith, D. (2021). Expert judgements judgements on solar geoengineering research priorities and challenges. https://doi.org/ 10.31223/x5bg8c 26. JULIA SHIPLEY DNRBSMCCWT BRIAN EDWARDS (2021) Hot days: Heat’s mounting death toll on workers in the u.s. 27. Khan, H. T. A., Hussein, S., & Deane, J. (2017). Nexus between demographic change and elderly care need in the gulf cooperation council (GCC) countries: Some policy implications. Ageing International, 42(4), 466–487. https://doi.org/10.1007/s12126-017-9303-9. 28. Lee, J., Lee, Y. H., Choi, W. J., Ham, S., Kang, S. K., Yoon, J. H., et al. (2021). Heat exposure and workers’ health: a systematic review. Reviews on Environmental Health, 37(1), 45–59. https://doi.org/10.1515/reveh-2020-0158. 29. Lee, W. R., MacMartin, D. G., Visioni, D., & Kravitz, B. (2021). High-latitude stratospheric aerosol geoengineering can be more effective if injection is limited to spring. Geophysical Research Letters, 48(9). https://doi.org/10.1029/2021gl092696. 30. Lee, Y. K., Yeom, T. Y., & Lee, S. (2022). A study on noise analysis of counter-rotating propellers for a manned drone. The KSFM Journal of Fluid Machinery, 25(2), 38–44. https:// doi.org/10.5293/kfma.2022.25.2.038. 31. Marucci, A., Monarca, D., Cecchini, M., Colantoni, A., Giacinto, S. D., & Cappuccini, A. (2014). The heat stress for workers employed in a dairy farm. Journal of Agricultural Engineering, 44(4), 170. https://doi.org/10.4081/jae.2013.218. 32. Mehta, B. (2021) Heat exposure has killed hundreds of u.s. workers - it’s time to do something about it. Industrial Saftey & Hygiene News, 3(52). 33. Meyer, R. (2018). A radical new scheme to prevent catastrophic sea-level rise. The Atlantic

92

M. Z. Chaari et al.

34. Ming, T., de_Richter, R., Liu, W., & Caillol, S. (2014). Fighting global warming by climate engineering: Is the earth radiation management and the solar radiation management any option for fighting climate change? Renewable and Sustainable Energy Reviews,31, 792–834. https:// doi.org/10.1016/j.rser.2013.12.032 35. Niedzielski, T., Jurecka, M., Mizi´nski, B., Pawul, W., & Motyl, T. (2021). First successful rescue of a lost person using the human detection system: A case study from beskid niski (SE poland). Remote Sensing, 13(23), 4903. https://doi.org/10.3390/rs13234903. 36. Oliver Jokisch, D. F. (2019). Drone sounds and environmental signals - a first review. 30th ESSV ConferenceAt: TU Dresden 37. Pan, Q., Sumner, D. A., Mitchell, D. C., & Schenker, M. (2021). Compensation incentives and heat exposure affect farm worker effort. PLOS ONE, 16(11), e0259,459. https://doi.org/10. 1371/journal.pone.0259459 38. Peace, A. H., Carslaw, K. S., Lee, L. A., Regayre, L. A., Booth, B. B. B., Johnson, J. S., & Bernie, D. (2020). Effect of aerosol radiative forcing uncertainty on projected exceedance year of a 1.5 c global temperature rise. Environmental Research Letters, 15(9), 0940a6. https://doi. org/10.1088/1748-9326/aba20c 39. Pradhan, B., Kjellstrom, T., Atar, D., Sharma, P., Kayastha, B., Bhandari, G., & Pradhan, P. K. (2019). Heat stress impacts on cardiac mortality in nepali migrant workers in qatar. Cardiology, 143(1–2), 37–48. https://doi.org/10.1159/000500853. 40. Sankar, S., & Tsai, C. Y. (2019). ROS-based human detection and tracking from a wireless controlled mobile robot using kinect. Applied System Innovation, 2(1), 5. https://doi.org/10. 3390/asi2010005. 41. Thalheimer, E. (2021). Community acceptance of drone noise. INTER-NOISE and NOISECON Congress and Conference Proceedings, 263(6), 913–924. https://doi.org/10.3397/in2021-1694. 42. Uejio, C. K., Morano, L. H., Jung, J., Kintziger, K., Jagger, M., Chalmers, J., & Holmes, T. (2018). Occupational heat exposure among municipal workers. International Archives of Occupational and Environmental Health, 91(6), 705–715. https://doi.org/10.1007/s00420018-1318-3. 43. Woelders, T., Wams, E. J., Gordijn, M. C. M., Beersma, D. G. M., & Hut, R. A. (2018). Integration of color and intensity increases time signal stability for the human circadian system when sunlight is obscured by clouds. Scientific Reports, 8(1). https://doi.org/10.1038/s41598018-33606-5 44. Wood, D., Yablonina, M., Aflalo, M., Chen, J., Tahanzadeh, B., & Menges, A. (2018). Cyber physical macro material as a UAV [re]configurable architectural system. Robotic Fabrication in Architecture, Art and Design 2018 (pp. 320–335). Springer International Publishing. https:// doi.org/10.1007/978-3-319-92294-2_25

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter, Machine Learning and Curve-Fitting Method for High-speed Target Interception Aakriti Agrawal, Aashay Bhise, Rohitkumar Arasanipalai, Lima Agnel Tony, Shuvrangshu Jana, and Debasish Ghose Abstract Accurate estimation of trajectory is essential for the capture of any highspeed target. This chapter estimates and formulates an interception strategy for the trajectory of a target moving in a repetitive loop using a combination of estimation and learning techniques. An extended Kalman filter estimates the current location of the target using the visual information in the first loop of the trajectory to collect data points. Then, a combination of Recurrent Neural Network (RNN) with least-square curve-fitting is used to accurately estimate the future positions for the subsequent loops. We formulate an interception strategy for the interception of a high-speed target moving in a three-dimensional curve using noisy visual information from a camera. The proposed framework is validated in the ROS-Gazebo environment for interception of a target moving in a repetitive figure-of-eight trajectory. Astroid, Deltoid, Limacon, Squircle, and Lemniscates of Bernoulli are some of the high-order curves used for algorithm validation. Keywords Extended kalman filter · RNN · Least-square curve-fitting · 3D-repetitive-trajectory · High-speed interception · Estimation

Nomenculture f P

focal length of the camera Error covariance matrix

A. Agrawal · A. Bhise · R. Arasanipalai · L. A. Tony · S. Jana · D. Ghose (B) Department of Aerospace Engineering, Indian Institute of Science, Guidance Control and Decision Systems Laboratory (GCDSL), Bangalore 560012, India e-mail: [email protected] A. Agrawal e-mail: [email protected] R. Arasanipalai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_4

93

94

r rdes k xtarget k ytarget k xtarget,vision k ytarget,vision xk yk xck yck Xˆ k−1 Xˆ k+1 Xˆ k+1 Vtarget Vdes Zk RNN UAV EKF IMU

A. Agrawal et al.

Radius of instantaneous centre of curvature of target trajectory Desired yaw rate for interceptor The inertial coordinates of the target x-position at kth sampling time The inertial coordinates of the target y-position at kth sampling time The target x-position observed using camera at kth sampling time The target y-position observed using camera at kth sampling time Target pixel x coordinate at kth sampling time Target pixel y coordinate at kth sampling time x coordinate of instantaneous centre of curvature of target trajectory at kth sampling time y coordinate of instantaneous centre of curvature of target trajectory at kth sampling time. State variable at the instant of availability of sensor measurement. State variable after measurement update. State variable after measurement update. Speed of target. Desired velocity of interceptor. Depth of target Recurrent Neural Network Unmanned Aerial Vehicle Extended Kalman Filter Inertial Measurement Unit

1 Introduction In automated robotics systems, a class of problems that have engaged the attention of researchers is that of motion tracking and guidance using visual information by which an autonomously guided robot can track and capture a target moving in an approximately known or predicted trajectory. Interception of a target in an outdoor environment is challenging, and it is important for the defence of the military as well as important civilian infrastructures. Interception of intruder targets using UAVs has the advantages of low cost and quick deployability; however, the performance of the UAV is limited by its payload capability. In this case, the detection of the target is performed using visual information. Interception of a target with UAVs using visual information is reported in various literature such as [8, 17, 33] and it is difficult due to limitations on sensing, payload and computational capability of UAVs. Interception strategies are generally based on the estimation of target future trajectories [30, 31], controller based on visual servoing [5, 12], or using vision based guidance law [22, 34]. Controller based on visual servoing is mostly applicable for slow-moving targets. Guidance strategies for the interception of a high-speed target are difficult as the capturability region of the guidance law is small for the interceptor compared to a low-speed target. Interception using the visual information is further difficult as the

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

95

visual output can be noisy in the outdoor environment, and the range of view is small compared to other sensors such as radar. The interception strategy by prediction of target trajectory over a shorter interval is not effective in the case of a high-speed target. Therefore, accurate estimation of the trajectory of the target is important for efficient interception of a high-speed target. Once the looping trajectory of the target is obtained, the interceptor could be placed in a favourable position to increase the probability of interception. In this chapter, we consider a problem where an aerial target is moving in a repetitive loop at high speed and an aerial robot, or a drone, has to observe the target’s motion via its imaging system and predict the target trajectory in order to guide itself for effective capture of the target. In this chapter, the strategy for interception of a high-speed target moving in a repetitive loop is formulated after estimation and prediction of target trajectory using the Extended Kalman Filter, Recurrent Neural Network (RNN) and least square curve fitting techniques. An Extended Kalman filter (EKF) is used to track a manoeuvring target moving in an approximately repetitive loop by using the first loop of the trajectory to collect data points and then using a combination of machine learning with least-square curve-fitting to accurately estimate future positions for the subsequent loops. The EKF estimates the current location of the target from its visual information and then predicts its future position by using the observation sequence. We utilise noisy visual information of the target from the three-dimensional trajectory to carry out the trajectory estimation. Several high-order curves, expressed as univariate polynomials, are considered test cases. Some of these are Circle/Ellipse, Astroid, Deltoid, Limacon, Nephroid, Quadrifolium, Squircle, and Lemniscates of Bernoulli and Gerono, among others. The proposed algorithm is demonstrated in the ROS-Gazebo environment and is implemented in field tests. The problem statement is motivated by Challenge-1 of MBZIRC-2020, where the objective is to catch an intruder target moving in an unknown repetitive figure-of-eight trajectory (shown in Fig. 1). The ball attached to the target drone is moving in an approximate figureof-eight trajectory in 3D, and the parameters of the trajectory are unknown to the interceptors. Interceptors need to detect, estimate and formulate strategies for grabbing or interception of the target ball. In this chapter, the main focus is to estimate the target trajectory using visual information. The method proposed in the chapter is used first to estimate the position of the target using the Kalman Filter techniques, and then the geometry of the looping trajectory is estimated using the learning and curve fitting techniques. The main contribution of the chapter is the following: 1. Estimation of target position using visual information in EKF framework. 2. Estimation of target trajectory moving in a standard geometric curve in a closed loop using Recurrent Neural Network and Least-Square curve fitting techniques. 3. Development of a strategy for interception of a high-speed target moving in a standard geometric curve in a repetitive loop. The rest of the chapter is organised as follows: Relevant literature is presented in Sect. 2. Estimation and prediction of target location using visual information are presented in Sect. 3. Detailed curve fitting methods using learning in 2D and 3D are

96

A. Agrawal et al.

Fig. 1 Problem statement

described in Sect. 4. The strategy for interception of a high-speed target is formulated in Sect. 5. Simulation results are presented in Sect. 6. The summary of the chapter is discussed in Sect. 7.

2 Related Work Estimation of the target position during an interception scenario is traditionally obtained using radar information. Estimation of target position using Extended Kalman Filter from noisy GPS measurements are reported in literature [3, 19, 23]. Several interesting works have been reported in the literature about the interception of a missile having a higher speed than the interceptor [35, 37]. However, interception using visual interception is a relatively new topic. The estimation of the target position from visual information in outdoor information is highly uncertain due to the presence of high noise in the information of the target pixel. Estimation of prediction of moving object using Kalman filtering techniques are reported in [1, 24, 27]; however, prediction accuracy with these techniques reduces with time horizon. Target trajectory estimation based on learning techniques is reported in various literature such as k means [26], Markov models [7], and Long Short Term Memory (LSTM) [29] techniques. In [29], the LSTM learning technique is used for the estimation of the trajectory of highway vehicles from temporal data. Here, the data is geographically classified into different clusters, and then a different LSTM model is applied for robust trajectory estimation from a large volume of information on vehicle position. After partitioning the trajectories into approximate line segments, a novel trajectory clustering technique is proposed to estimate the pattern of target motion [32]. In [28], the Inertial Measurement Unit(IMU) and visual information are combined through an unsupervised deep neural network, VisualInertial-Odometry Learner (VIOLeaener) network for trajectory estimation. In [25], Convolution Neural Network is used to estimate the vehicle trajectory using the optical flow.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

97

Trajectory estimation using curve fitting method is reported in [2, 10, 15, 16, 18]. In [2], bearing-only measurements are used for fitting splines for highly manoeuvring targets. In [16], the trajectory of targets has a smooth trajectory using a sliding time window approach where parameters of the parametric curve are updated iteratively. In [18], Spline Fitting Filtering (SFF) algorithm is used to fit a cubic spline to estimate the trajectory of the manoeuvring target. The trajectory is estimated using data-driven regression analysis, known as “fitting for smoothing (F4S)”, with the assumption that trajectory is a function of time [15]. Interception of a target using the estimation of the target’s future position is reported in various literature such as [9, 13, 36]. In [13], the target trajectory is estimated using a simple linear extrapolation method and uncertainty in the target position is considered using a random variable. In [36], the interception strategy is formulated using the prediction of target trajectory using the historical data and selection of optimal path using third-order Bezier curves. The proposed formulation is validated using the simulation and not validated using real visual information. Capturing an aerial target using a robotic manipulator after the target’s pose and motion estimation using the adaptive extended Kalman filter and photogrammetry is reported in [9]. Other research groups approached the similar problem statement (as shown in Fig. 1) [4, 6, 38], and the estimation of trajectory is performed using the filtering techniques assuming that target will be following the figure-of-eight trajectory; however, a general approach for estimation of trajectory following unknown geometric curve is not reported.

3 Vision Based Target Position Estimation In this section, the global position of the target is estimated from visual information. It is assumed that the target trajectory lies in the 2D plane, and thus measurements of the target in the global X -Y plane are considered. The target’s motion is assumed to be smooth; that is, the change in curvature of the trajectory remains bounded and smooth over time. Let xtarget and ytarget are coordinates of the target position, and Vtarget is target speed, ψ is flight path angle with the horizontal. The target motion without considering wind can be expressed as, x˙target = Vtarget cosψ

(1)

y˙target = Vtarget sinψ

(2)

ψ˙ = ω

(3)

The target trajectory, instantaneous circle, and the important variables at kth sampling time is shown in Fig. 2. Let the position of the target at kth sampling time be (X k ), given as

98

A. Agrawal et al.

Fig. 2 Target trajectory

T  k k ytarget X k = xtarget

(4)

k k and ytarget are the inertial coordinates of the target in the global X -Y plane where xtarget at the kth sampling time . {xck , yck } are coordinates of the center of the instantaneous curvature of the target trajectory at that instant. From Fig. 2, the variables θ k and ψ k are related as follows, π (5) θk = ψk − 2

based on which Eqs. (1) and (2) can be simplified to, x˙target = −Vtarget sinθ

(6)

y˙target = Vtarget cosθ

(7)

Therefore, the motion of the target can be represented in discrete time-space by the following equations,  k k−1 k k k = xtarget − Vtarget Δt (ytarget − yck )/ (ytarget − yck )2 + (xtarget − xck )2 xtarget

(8)

 k k−1 k k k ytarget = ytarget + Vtarget Δt (xtarget − xck )/ (ytarget − yck )2 + (xtarget − xck )2

(9)

where Vtarget is the speed of the target and Δt is the sampling time interval.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

99

The target position is estimated using the target pixel information from the image plane of a monocular camera. The target position is obtained considering the perspective projection of the target in the image plane. If the coordinates of the estimated k k and ytarget,vision ), then target position are (xtarget,vision k = xtarget,vision

Z k xk f

(10)

k ytarget,vision =

Z k yk f

(11)

where xk and xk are the coordinates of the target pixel in the image plane, f is the focal length of the camera, and Z k is the target depth. The target measurement (Y k ) can be presented as,  k   k  xtarget,vision x k (12) Y = k = target + ηk k ytarget ytarget,vision where η ∼ N (0, R) is the measurement noise assumed to be normally distributed.

3.1 Computation of Centre of Instantaneous Curvature of Target Trajectory Computation of the coordinates of the centre of instantaneous curvature of target trajectory and prediction of target trajectory from the observation sequences is derived using the similar approach for the derivation of discrete-time guidance law as reported in [20]. Let the coordinates of the centre of instantaneous curvature be calculated from previous m observations, and we consider that its trajectory maintains a constant curvature for the observed sequences of the target’s position. Using Fig. 3, the target motion is expressed as,

Fig. 3 Centre of curvature

100

A. Agrawal et al. k+1 k xtarget = xtarget − r δ sin θk

(13)

k+1 k ytarget = ytarget + r δ cos θk

(14)

θk = θk−1 + δ

(15)

where r is the radius of the instantaneous circle, and δ is the change in the target’s flight path angle θ between the time steps. Let the sequence of last m observations of target positions gathered at sample index k be, k−i k−i , ytarget } where i = 0, 1, 2, ..m − 1 {xtarget

(16)

We will define the difference in x-position and y-position of target of jth sequence at kth sample index as, k− j

k− j−1

Δxtarget (k, j) = xtarget − xtarget = −r δ sin θk− j−1 k− j

k− j−1

Δytarget (k, j) = ytarget − ytarget = r δ cos θk− j−1

(17) (18)

Equation 17 can be written as, Δxtarget (k, j) = −r δ sin(θk− j−2 + δ)

(19)

Δxtarget (k, j) = −r δ sin θk− j−2 cos δ − r δ cos θk− j−2 sin δ

(20)

Δxtarget (k, j) = Δxtarget (k, j − 1) cos δ − Δytarget (k, j − 1) sin δ

(21)

Equivalently,

Therefore,

Similarly, Eq. 18 can be written as, Δytarget (k, j) = r δ cos(θk− j−2 + δ)

(22)

Δytarget (k, j) = Δxtarget (k, j − 1) sin δ + Δytarget (k, j − 1) cos δ

(23)

Therefore,

Since the parameter δ describes the evolution of the target’s states, the elements of the evolution matrix contain (cos δ, sin δ). The difference in observations equations are written in matrix form as Eq. 24, for j = 0, 1, ..., m − 1.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …





⎤ .. .. . . ⎢ ⎥ ⎢ ⎥  ⎢Δxtarget (k, j)⎥ ⎢Δxtarget (k, j − 1) −Δytarget (k, j − 1)⎥ cos δ ⎢ ⎥=⎢ ⎥ ⎢Δytarget (k, j)⎥ ⎢Δytarget (k, j − 1) Δxtarget (k, j − 1) ⎥ sin δ ⎣ ⎦ ⎣ ⎦ .. .. .. . . . .. .

101



(24)

The least squares solution of the observation sequence provides the estimation of the evolution matrix at every sampling step, and we obtain the estimated value of δ ˆ as δ. Let (xc (k), yc (k)) be the co-ordinates of the instantaneous center of curvature of the target trajectory, then from Fig. 3 we can write, k xc (k) + r cosθk = xtarget

(25)

k yc (k) + r sinθk = ytarget

(26)

Therefore using (17) and (18), (xc (k), yc (k)) is calculated as follows: k xc (k) = xtarget −

Δytarget (k, 1) δˆ

(27)

k yc (k) = ytarget +

Δxtarget (k, 1) δˆ

(28)

Steps for calculating the centre of curvature of the target trajectory are mentioned in detail in Algorithm 1.

Algorithm 1: Algorithm for computing instantaneous centre of curvature of target trajectory k− j

k− j

Input: Sequence of m target measurements: {xtarget , ytarget } where j = 1, 2, ..m − 1 1. Populate Δxtarget (k, j) and Δytarget (k, j) in b(k) 2. Populate Δxtarget (k, j − 1) and Δytarget (k, j − 1) in A(k) 3. Solve for evolution matrix. (cos δ, sin δ) ← b(k)A(k)T (A(k)A(k)T )−1 4. Compute the centre coordinates Δy (k,1) k xc (k) ← xtarget − targetˆ k yc (k) ← ytarget +

δ Δx target (k,1) δˆ

Output: xc (k) and yc (k)

102

A. Agrawal et al.

3.2 EKF Formulation Once the instantaneous centre of curvature of the target trajectory at the current instant is estimated, the target position is estimated using the continuous-discrete extended Kalman Filter (EKF) framework. The continuous target motion model is represented as, X˙ = F(X, U, ξ ) (29) where  F(X, U ) =

 −Vtarget (ytarget − yc )/ (ytarget − yc )2 + (xtarget − xc )2 Vtarget (xtarget − xc )/ (ytarget − yc )2 + (xtarget − xc )2

(30)

The discrete measurement model is  k xtarget + ηk Yk = Hk (X k , ηk ) = k ytarget 

(31)

where ξ is the process noise and ηk is the measurement noise. It is assumed that process noise and measurement noises are zero mean Gaussian white noise, that is, ξ ∼ N (0, Q) and ηk ∼ N (0, R). The prediction step is the first stage of the EKF algorithm, where we propagate the previous state and input values to the non-linear process Eq. (32) in a discrete time estimate to arrive at the state estimate. X˙ˆ = F( Xˆ , U, 0)

(32)

The error covariance matrix is propagated as follows: P˙ = A P + P A T + Q where matrix A =

∂F ˆ |X . ∂X

A= where

(33)

The matrix A can be derived as, Vtarget Γ ((ytarget − yc )2 + (xtarget − xc )2 )3/2

  −(xtarget − xc )2 (ytarget − yc )(xtarget − xc ) Γ = . (ytarget − yc )2 −(xtarget − xc )(ytarget − yc )

(34)

(35)

The state and measurement update equation is given by, Xˆ k+ = Xˆ k− + L k (Yk − Ck Xˆ k− )

(36)

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

where

103

Pk+ = (I − L k Ck )Pk−

(37)

L k = Pk CkT (R + Ck Pk CkT )− 1

(38)

Ck =

∂H ˆ |X ∂X

(39)

The EKF position estimation framework provides a filtered position of the target, which is then used for predicting the target’s trajectory. The workflow of trajectory prediction is divided into two phases, namely, the observation phase and the prediction phase. During the observation phase, a predefined sequence of observations of the estimated target position is gathered, and the trajectory is predicted in the near future.

3.3 Future State Prediction Prediction of target position in shorter duration is important for ease in the tracking of the target. For prediction of target position up to n steps, we can write, for j = 0, 1, 2, · · · , n − 1 k+ j+1 k+ j (40) xˆtarget = xˆtarget + Δxˆtarget (k, j + 1) k+ j+1

k+ j

yˆtarget = yˆtarget + Δ yˆtarget (k, j + 1)

(41)

where Δxˆtarget (k, j + 1) and Δ yˆtarget (k, j + 1) is expressed as, Δxˆtarget (k, j + 1) = xˆtarget − xˆtarget = Δxˆtarget (k, j) cos δˆ − Δ yˆtarget (k, j) sin δˆ (42) k+ j+1 k+ j Δ yˆtarget (k, j + 1) = yˆtarget − yˆtarget = Δxˆtarget (k, j) sin δˆ + Δ yˆtarget (k, j) cos δˆ (43) The steps for the trajectory prediction are described in Algorithm 2. k+ j+1

k+ j

4 Mathematical Formulation for Curve Fitting Method In this section, the mathematical formulation for the estimation of the looping trajectory is derived. The curve-fitting technique is applied in the next loop based on the initial observations of the target position in the first loop. It is to be noted that conventional curve fitting techniques using the regression method will fail to esti-

104

A. Agrawal et al.

Algorithm 2: Trajectory prediction k−i k−i Input: Sequence of m measurements {xtarget , ytarget } where i = 0, 1, 2, .., m − 1

• Populate Δxtarget (k, j) and Δytarget (k, j) in b(k) • Populate Δxtarget (k, j − 1) and Δytarget (k, j − 1) in A(k) • Estimate δˆ after solving evolution matrix. δˆ ← b(k)A(k)T (A(k)A(k)T )−1 • Predict the sequential change in the target location. Δxˆtarget (k, j + 1) ← Δxˆtarget (k, j) cos δˆ − Δ yˆtarget (k, j) sin δˆ Δ yˆtarget (k, j + 1) ← Δxˆtarget (k, j) sin δˆ + Δ yˆtarget (k, j) cos δˆ • Predict the target position after propagating the sequential changes. k+ j+1 k+ j xˆtarget ← xˆtarget + Δxˆtarget (k, j + 1) k+ j+1

yˆtarget

k+ j

← yˆtarget + Δ yˆtarget (k, j + 1)

Table 1 Equations of all high order curves taken into consideration Curves Curve equation Circle/Ellipse Astroid Deltoid Limacon Nephroid Quadrifolium Squircle Lemniscate of Bernoulli Lemniscate of Gerono

x2 a2

2 3

+

y2 b2

2

=1 2

x + y3 = a3 (x 2 + y 2 )2 + 18a 2 (x 2 + y 2 ) − 27a 4 = 8a(x 3 − 3x y 2 ) (x 2 + y 2 − ax)2 = b2 (x 2 + y 2 ) (x 2 + y 2 − 4a 2 )3 = 108a 4 y 2 (x 2 + y 2 )3 = 4a 2 x 2 y 2 (x − a)4 + (x − b)4 = r 4 (x 2 + y 2 )2 = 2a 2 (x 2 − y 2 ) x 4 = a 2 (x 2 − y 2 )

mate the complex curves with multiple loops; therefore, the curve fitting technique is formulated using the learning techniques. The following assumptions are made about the target motion. • The target drone is moving continuously in a looping trajectory in standard geometrical curves. • The target trajectory is a closed loop curve. We have considered all high-order closed curves to the best of our knowledge, and the curve fits the data to the appropriate curve equation without prior knowledge about the shape of the figure. The closed curves taken into consideration are listed in Table 1. We have considered curves with one parameter (for example, Astroid, Nephroid, etc.) and with two parameters (for example, Limacon, Squircle, etc.). Since Circle is a special case of the ellipse, we will include both in a single category. This method will also be applicable in any closed mathematically-derivable curve.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

105

The curves mentioned above have been well studied, and their characteristics are well known. They are usually the zero set of some multivariate polynomials. We can write their equations as f (x, y) = 0 (44) For example, the Lemniscate of bernoulli is (x 2 + y 2 )2 − 2a 2 (x 2 − y 2 ) = 0

(45)

Lemniscate of Bernoulli has a single parameter a, which needs to be estimated. On the other hand, the equation of an ellipse has two parameters, a and b, that need to be estimated. Therefore, we can write a general function for the curves as f (x, y, a, b) = 0

(46)

where b may or may not be used based on the category of shape the points are being fitted to. Univariate polynomials of the form f (x) = a0 + a1 x + a2 x 2 + ... + ak x k

(47)

can be solved using matrices if there are enough points to solve for the k unknown coefficients. On the other hand, multivariate equations require different methods to solve for their coefficients. One method for curve fitting uses an iterative least-squares approach along with specifying related constraints.

4.1 Classification of Curves The above-mentioned categories of curves all have different equations. Classifying the curve into one of the above categories is required before curve fitting. We train a neural network to classify the curves into various categories based on the (x, y) points collected from the target drone. The architecture of the network used is shown in Fig. 4. The input I is a vector of m points arranged as [x0 , x1 , . . . , xm , y0 , y1 , . . . , ym ]. The output O is a vector of length 9, denoting the probabilities of the given set of points belonging to the various categories. Therefore, the network can be represented as a function f trained to map f : [x0 , x1 , . . . , xm , y0 , y1 , . . . , ym ] → O

(48)

The training parameters are listed in Table 2. This network can classify 2D curves into the above-mentioned categories. In the case of 3D, we can use this same network to classify the curve once it has been rotated into a 2D plane (like the X -Y plane).

106

A. Agrawal et al.

Fig. 4 Network architecture Table 2 Training parameters for the classification network Parameters Value Optimizer Learning rte No. of taining eochs Final taining acuracy

Adam 10−4 9 98%

4.2 Least-Squares Curve Fitting in 2D Considering any of the above-mentioned curves in two dimensions, the base equation has to be modified to account for both offset and orientation in 2D. Therefore, let the orientation be some θ , and the offset be (x0 , y0 ). On applying a counter-clockwise θ rotation to a set of points, the rotation is defined by this matrix equation:      x cos θ − sin θ x  = y y sin θ cos θ

(49)

Substituting (x, y) from Eq. 49 into Eq. 46, we get the following function: f (y  cos θ + x  sin θ, x  cos θ − y  sin θ, a, b) = 0

(50)

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

107

Letting g(x  , y  , θ, a, b) = f (y  cos θ + x  sin θ, x  cos θ − y  sin θ, a, b)

(51)

and rewriting it by replacing x  and y  by x and y respectively, we have g(x, y, θ, a, b) = 0

(52)

To account for offset from origin, we can replace all x and y with x  and y  , respectively, where, x  = x − x0 y  = y − y0

(53) (54)

and (x0 , y0 ) is the offset of the centre of the figure from the origin. Therefore, we have (55) g(x, y, θ, a, b, x0 , y0 ) = 0 as the final equation of the figure we are trying to fit. Applying the least-squares method on the above equation for curve-fitting of m empirical points (xi , yi ). E2 =

m  (g(xi , yi , θ, a, b, x0 , y0 ) − 0)2

(56)

i=0

Our aim is to find x0 , y0 , a, b and θ such that E 2 is minimised. This can only be done by, d E2 = 0, where β ∈ {x0 , y0 , a, bθ } (57) dβ If g had been a linear equation, simple matrix multiplication would have yielded the optimum parameters. But since Eq. 55 is a complex nth (where n is 2, 4 or 6) order nonlinear equation with trigonometric variables, we need to use iterative methods in order to estimate the parameters, a, b, θ , x0 , and y0 . Therefore, this work uses Levenberg-Marquardt [14, 21] least-squares algorithm to solve the non-linear Eq. 57.

4.3 Least-Squares Curve Fitting of Any Shape in 3D If the orientation of any shape is in 3D, the above algorithm will need some modifications. We first compute the equation of the plane in which the shape lies and then transform the set of points to a form where the method in Sect. 4.2 can be applied.

108

A. Agrawal et al.

In order to find the normal to the plane of the shape, we carry out singular value decomposition (SVD) of the given points. Let the set of points (x, y, z) be represented as matrix A ∈ Rn×3 . From each point, subtract the centroid and calculate SVD of A. A = U Σ V.

(58)

where, columns of U = (u 1 , u 2 .....u n ) (left singular vectors), span the space of columns of A, columns of V = (v1 , v2 , v3 ) (right singular vectors) span the space of rows of A and Σ = diag(σ1 , σ2 , σ3 ) are the singular values linked to each left/right singular vector. Now, since the points are supposed to be in 2D space, σ3 = 0 and v3 = (n 1 , n 2 , n 3 ) gives the normal vector to the plane. Therefore, the equation of the plane is, n1 x + n2 y + n3 z = C

(59)

where C is a constant. The next step is to transform the points to the X − Y plane. For that, we first find the intersection of the above plane with X -Y plane by substituting z = 0 in Eq. 59. We get the equation of line as, n 1 x + n 2 y = C, where C is a constant.

(60)

Then we rotate the points about the z-axis such that the above line is parallel to the n1 ) needs to be substituted x-axis. Angle of rotation α = 0, β = 0 and γ = arctan( −n 2 in matrix R, given in Eq. 61. New points will be A z = A R. ⎤ ⎡ cos β sin γ sin α sin β cos γ − cos α sin γ cos α sin β cos γ + sin α sin γ R = ⎣ cos β sin γ sin α sin β sin γ + cos α cos γ cos α sin β sin γ − sin α cos γ ⎦ − sin β sin α cos β cos α cos β (61)

Algorithm 3: Least Means Squares Algorithm • Initialise parameters for shape detection network • Store data points in variable Shape • If Shape is in 3D then Transform the shape to the X -Y plane using method in Sect. 4.3 Shape ← Shapetrans f or med • Get shape prediction, shape pr ed , of Shape from shape detection network • Apply curve-fitting algorithms on Shape using the equation of shape pr ed • Generate target drone trajectory using the estimated shape parameters

We then rotate the points about X -axis by the angle cos−1 (|n 3 |/ n 1 , n 2 , n 3 ) to make the points lie in the X − Y plane. Then, we substitute angles α = |n 3 | arccos( n 1 ,n ), β = 0 and γ = 0 in the rotation matrix given in (61). Finally, 2 ,n 3 

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

109

the set of points in the X − Y plane will be A f inal = A z R. We can then use the neural network described in Sect. 4.1 to classify the curve into one of the nine categories. Then, we can compute the parameters (x0 , y0 , a, b, θ ) of the classified curve using method given in Sect. 4.2. The combined algorithm for the shape detection and parameter estimation is shown in Algorithm 3.

5 Interception Strategy In this section, the interception strategy is formulated considering that the target is moving in a repetitive figure-of-eight trajectory; however, the proposed framework could be extended to other geometric curves. Once the target trajectory is estimated, the favourable location for the interceptor is selected where interception can lead to almost head-on collision. Let’s consider a target is moving in the direction of the arrow (marked in green colour), as shown in Fig. 5. So, once the trajectory of the target is estimated through the EKF, machine learning and curve fitting techniques, and the direction of target motion is known; it is found that I1 and I2 are the favourable location for generating almost head-on interception scenario. The part of the curve between the red and yellow lines is in a straight line, so the target can be detected earlier, and the interceptor will have a higher response time for target engagement. Once the target is detected, the interceptor applies the pursuit-based guidance strategy to generate the desired velocity and desired yaw rate. Details of the guidance strategy are mentioned in [11, 34]. Here, In Algorithm 4, we have mentioned the important steps of the guidance command for the sake of completion. The desired velocity (Vdes ) for the interceptor is generated to drive the interceptor along the line joining the interceptor and the projection of the target in the image plane. The desired yaw rate (rdes ) is generated to keep the target in the field of view of the camera by using a PD controller based on the error (eψ ) between the desired yaw and the actual yaw. In the case of an interception scenario with multiple interceptors, the estimated favourable standoff locations are allocated to multiple drones through task allocation architecture considering the current states of target drones and interceptors.

Fig. 5 Favorable standoff location for interception

110

A. Agrawal et al.

Fig. 6 Complete architecture for high-speed interception

The important steps of the overall framework for interception of a high-speed target moving in a looping trajectory are shown in the block diagram (Fig. 6) and Algorithm 4.

6 Results 6.1 Simulation Experiments In the Gazebo environment, the complete simulation environment is created considering the target and interceptors, where the red ball attached to the target is considered for interception (Shown in Figs. 7 and 8). A vision module is developed for the detection and tracking of the red ball. Although, the visual information captured in the simulation environment will have similar uncertainty related to the outdoor environment. A ROS-based pipeline is written in C++ for the simulation of filtering, estimation, and interception algorithms. We have tested the estimation framework considering the target motion on different geometric curves. Initially, the desired geometric curve is represented by the various waypoints, and these waypoints are fed to the target autopilot. The target follows the waypoints at a constant speed, and the trace of the attached ball of the target is also can be approximately considered as the shape of the desired geometric curve. The path followed by the ball is slightly perturbed by the wind and the oscillation in the attaching mechanism between the target drone and the ball. The Interceptor drone detects the ball with the attached camera, and the ball position is derived using

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

111

Algorithm 4: Interception of high-speed target Input: Target pixel coordinates in the image plane, focal length, target depth, interceptor’s yaw (ψ), the magnitude of desired target velocity (V ), Rotation from camera frame to inertial frame (Rc2i ), 1. Obtain the target position from the target pixel coordinates in the camera frame. k xtarget,vision ← Z kfxk k ytarget,vision ← Z kfyk 2. Calculate the instantaneous centre of curvature of the target trajectory. Δy (k,1) k xc (k) ← xtarget − targetˆ Δx

3.

4. 5. 6.

δ

(k,1)

k yc (k) ← ytarget + targetˆ δ Estimate the target position using the EKF framework. EKF prediction steps: X˙ˆ ← F( Xˆ , U, 0) P˙ ← A P + P A T + Q EKF updation steps: Xˆ k+ ← Xˆ k− + L k (Yk − Ck Xˆ k− ) Pk+ = (I − L k Ck )Pk− Estimate the target trajectory using Algorithm 3. Obtain the suitable standoff point for interceptors for ease in the interception. Once the target is detected, apply the guidance strategy to generate the desired velocity command and yaw rate command. Vdes ← Rc2i Vc Vc = (Vcx , Vcy , Vcz ) Vcx ← V  2 xk2 , Vcy ← V  2 yk2 , Vcz ← V  2 f 2 x k +yk + f 2

rdes ←

x k +yk + f 2

x k +yk + f 2

de kpψ eψ + kdψ dtψ

Output: Desired velocity (Vdes ) and yaw rate (rdes ).

Fig. 7 Gazebo environment with IRIS drone: Target

the perspective projection. The position of the ball is fed to the EKF module as the measurement and used in the EKF module for filtered position estimation of the ball. After one loop of data, the neural network predicts the category of the shape, after

112

A. Agrawal et al.

Fig. 8 Gazebo environment: Target and interceptor Fig. 9 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Circle

which the curve fitting algorithm estimates the shape parameters using the appropriate shape equation. The raw position data and the estimated shape of different curves are shown in Figs. 9–17. The green colour dot shows the raw position of the ball obtained using the information from the camera, and the blue curve represents the predicted shape of the curve. As can be seen in Figs. 9–17 the overall estimation framework is able to reasonably approximate the desired geometry of the curve. The proposed estimation framework is tested for the estimation of 3D geometric curves. Estimation of Leminscate of Gerono in 3D is shown in Fig. 18. The proposed high-speed target interception strategy is tested by creating an interception scenario in the Gazebo environment where two interceptor drones are trying to intercept the target ball moving in the figure-of-eight (Leminscate of Gerono) curve (similar to Fig. 1). Different snapshot of the experiments are shown in Fig. 19– 24. Figure 20 shows a snapshot during the tracking and estimation of ball position and the corresponding trajectory using the visual information. Figures 22–24 shows the snapshot during the engagement once the target ball is detected by Drone 2 while waiting at the standoff location.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … Fig. 10 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Astroid

Fig. 11 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Deltoid

Fig. 12 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Limacon

113

114 Fig. 13 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Nephroid

Fig. 14 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Quadrifolium

Fig. 15 Figure showing the data points and predicted shape for Squircle

A. Agrawal et al.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter … Fig. 16 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Lemniscate of Bernoulli

Fig. 17 Figure showing the collected data points from the drone trajectory and predicted shape through curve-fitting for Lemniscate of Gerono

Fig. 18 Curve prediction in 3D using estimated figure parameters on simulated data

115

116 Fig. 19 Snapshots of target drone and interceptors in Gazebo environment

Fig. 20 Drone estimates the target trajectory while tracking the target drone

Fig. 21 After trajectory estimation, Drone 1 waits in standoff location till detection of target ball

A. Agrawal et al.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

117

Fig. 22 Drone 2 detects the target drone and starts applying the guidance strategy

Fig. 23 Snapshots during the head-on engagement

6.2 Hardware Experiments The experimental setup (shown in Fig. 25) consists of two drones. One of the drones, the target drone, has a red ball attached and will fly in a figure-of-8 trajectory. The second drone is fitted with Sec3CAM 130 monocular camera for the detection of the ball. The second drone will follow the target drone and estimate the ball position and velocity using computer vision techniques. The raw position data as obtained using the information from the image plane is shown in Fig. 26. The raw position data is fed to the EKF algorithm and subsequently through the RNN and Least-Square curve fitting techniques. Figure 27 shows the estimated figure-of-eight and raw ball

118

A. Agrawal et al.

Fig. 24 Snapshots during the terminal phase of interception

Fig. 25 Interceptor drone and target drone. Using visual information, the interceptor drone estimates the trajectory of the ball

position observed using visual information using the proposed framework (Figs. 13, 14, 15, 17, 21, 23).

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

119

Fig. 26 Red trace represents the raw position data as obtained by tracking the target

Fig. 27 Curve prediction in 3D using estimated figure parameters on experimental data

7 Discussions Interception of a target with low speed is easy compared to a target with a higher speed as the region of capturability for a given guidance algorithm will be higher for interception of a low-speed target. In the case of interception with small UAVs, the target information is obtained from a camera’s information, and a camera’s detection range is small compared to other sensors like radar. Interception of a high-speed target using a conventional guidance strategy is difficult, even if the high-speed target is moving in a repetitive loop. Therefore, the looping trajectory of the target needs to be estimated so that interceptor can be placed at a favourable position for ease in the interception of the high-speed target. The target position estimation using the visual sensor is too noisy in an outdoor environment to estimate the target trajectory, as

120

A. Agrawal et al.

observed from field experiments. So, we have estimated the target position using the extended Kalman filter framework. To obtain the target position, the target needs to track the target, so we have proposed to predict the target trajectory over a shorter horizon using least square methods considering the sequence of observed target positions. Once the initial observations of the target position are made, learning techniques and curve fitting methods are applied to identify the curve. Once the parameter of the curve is estimated, the interceptors are placed for the head-on collision situation. We have successfully validated the estimation framework for various geometric curves in the Gazebo and outdoor environments. The geometric curves should be a standard closed loop curve. While formulating the motion model for target position estimation, it is assumed that the target’s motion is smooth, i.e., the change in curvature of the target’s trajectory remains bounded and smooth over time. This assumption is the basis of our formulation of the target motion model. The interception strategy is checked only in simulation. The maximum speed of the target should be within a limit such that tracking by the interceptor is possible in the first loop. The standby location for the interceptors to be selected is such that the interceptor will have a higher reaction time to initiate the engagement. The proposed framework provides a better interception strategy for a high-speed target rather than directly chasing the target after detection due to higher response time and better alignment along the target’s path.

8 Conclusions In this chapter, we present the framework which is designed to estimate and predict the position of a moving target, which follows a repetitive path of some standard shape. The proposed trajectory estimation algorithm is used to formulate the interception strategy for a target having a higher speed at the interceptor. The target position is estimated using the EKF framework using visual information, and then the target position is used to estimate the shape of the repetitive loop of the target. Estimation of different curves such as Lemniscate of Bernoulli, Deltoid, and Limacon are performed using realistic visual sensors set up in the Gazebo environment. The proposed high-speed interception strategy is validated by simulating an interception scenario of a high-speed target moving in a figure-of-eight trajectory in the ROSGazebo framework. Future work includes the integration of the proposed estimation and prediction algorithm in the interception framework and validation of the complete architecture in the outdoor environment. The proposed technique can also be used to help the motion planning of autonomous cars and develop driver-assistance systems in traffic junctions. Acknowledgements We would like to acknowledge the Robert Bosch Center for Cyber Physical Systems, Indian Institute of Science, Bangalore, and Khalifa University, Abu Dhabi, for partial financial support.

Accurate Estimation of 3D-Repetitive-Trajectories using Kalman Filter …

121

References 1. Abbas, M. T., Jibran, M. A., Afaq, M., & Song, W. C. (2020). An adaptive approach to vehicle trajectory prediction using multimodel kalman filter. Transactions on Emerging Telecommunications Technologies, 31(5), e3734. 2. Anderson-Sprecher, R., & Lenth, R. V. (1996). Spline estimation of paths using bearings-only tracking data. Journal of the American Statistical Association, 91(433), 276–283. 3. Banerjee, P., & Corbetta, M. (2020). In-time uav flight-trajectory estimation and tracking using bayesian filters. In 2020 IEEE Aerospace Conference (pp. 1–9). IEEE 4. Barisic, A., Petric, F., & Bogdan, S. (2022). Brain over brawn: using a stereo camera to detect, track, and intercept a faster uav by reconstructing the intruder’s trajectory. Field Robotics, 2, 34–54. 5. Beul, M., Bultmann, S., Rochow, A., Rosu, R. A., Schleich, D., Splietker, M., & Behnke, S. (2020). Visually guided balloon popping with an autonomous mav at mbzirc 2020. In 2020 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR) ( pp. 34–41). IEEE 6. Cascarano, S., Milazzo, M., Vannin, A., Andrea, S., & Stefano, R. (2022). Design and development of drones to autonomously interact with objects in unstructured outdoor scenarios. Field Robotics, 2, 34–54. 7. Chen, M., Liu, Y., & Yu, X. (2015). Predicting next locations with object clustering and trajectory clustering. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 344–356). Springer 8. Cheung, Y., Huang, Y. T., & Lien, J. J. J. (2015). Visual guided adaptive robotic interceptions with occluded target motion estimations. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 6067–6072). IEEE 9. Dong, G., & Zhu, Z. H. (2016). Autonomous robotic capture of non-cooperative target by adaptive extended kalman filter based visual servo. Acta Astronautica, 122, 209–218. 10. Hadzagic, M., & Michalska, H. (2011). A bayesian inference approach for batch trajectory estimation. In 14th International Conference on Information Fusion (pp. 1–8). IEEE 11. Jana, S., Tony, L. A., Varun, V., Bhise, A. A., & Ghose, D. (2022). Interception of an aerial manoeuvring target using monocular vision. Robotica, 1–20 12. Kim, S., Seo, H., Choi, S., & Kim, H. J. (2016). Vision-guided aerial manipulation using a multirotor with a robotic arm. IEEE/ASME Transactions On Mechatronics, 21(4), 1912–1923. 13. Kumar, A., Ojha, A., & Padhy, P. K. (2017). Anticipated trajectory based proportional navigation guidance scheme for intercepting high maneuvering targets. International Journal of Control, Automation and Systems, 15(3), 1351–1361. 14. Levenberg, K. (1944). A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2), 164–168. 15. Li, T., Prieto, J., & Corchado, J. M. (2016). Fitting for smoothing: a methodology for continuous-time target track estimation. In 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN) (pp. 1–8). IEEE 16. Li, T., Chen, H., Sun, S., & Corchado, J. M. (2018). Joint smoothing and tracking based on continuous-time target trajectory function fitting. IEEE transactions on Automation Science and Engineering, 16(3), 1476–1483. 17. Lin, L., Yang, Y., Cheng, H., & Chen, X. (2019). Autonomous vision-based aerial grasping for rotorcraft unmanned aerial vehicles. Sensors, 19(15), 3410. 18. Liu, Y., Suo, J., Karimi, H. R., & Liu, X. (2014). A filtering algorithm for maneuvering target tracking based on smoothing spline fitting. In Abstract and Applied Analysis (Vol. 2014). Hindawi 19. Luo, C., McClean, S. I., Parr, G., Teacy, L., & De Nardi, R. (2013). UAV position estimation and collision avoidance using the extended kalman filter. IEEE Transactions on Vehicular Technology, 62(6), 2749–2762.

122

A. Agrawal et al.

20. Ma, H., Wang, M., Fu, M., & Yang, C. (2012). A new discrete-time guidance law base on trajectory learning and prediction. In AIAA Guidance, Navigation, and Control Conference (p. 4471) 21. Marquardt, D. W. (1963). An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics, 11(2), 431–441. 22. Mehta, S. S., Ton, C., Kan, Z., & Curtis, J. W. (2015). Vision-based navigation and guidance of a sensorless missile. Journal of the Franklin Institute, 352(12), 5569–5598. 23. Pang, B., Ng, E. M., & Low, K. H. (2020). UAV trajectory estimation and deviation analysis for contingency management in urban environments. In AIAA Aviation 2020 Forum (p. 2919) 24. Prevost, C. G., Desbiens, A., & Gagnon, E. (2007). Extended kalman filter for state estimation and trajectory prediction of a moving object detected by an unmanned aerial vehicle. In 2007 American Control Conference (pp. 1805–1810). IEEE 25. Qu, L., & Dailey, M. N. (2021). Vehicle trajectory estimation based on fusion of visual motion features and deep learning. Sensors, 21(23), 7969. 26. Roh, G. P., & Hwang, S. W. (2010). Nncluster: an efficient clustering algorithm for road network trajectories. In International Conference on Database Systems for Advanced Applications (pp. 47–61). Springer 27. Schulz, J., Hubmann, C., Löchner, J.,& Burschka, D. (2018). Multiple model unscented kalman filtering in dynamic bayesian networks for intention estimation and trajectory prediction. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC) (pp. 1467– 1474). IEEE 28. Shamwell, E. J., Leung, S., & Nothwang, W. D. (2018). Vision-aided absolute trajectory estimation using an unsupervised deep network with online error correction. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 2524–2531). IEEE 29. Shrivastava, A., Verma, J. P. V., Jain, S., & Garg, S. (2021). A deep learning based approach for trajectory estimation using geographically clustered data. SN Applied Sciences, 3(6), 1–17. 30. Strydom, R., Thurrowgood, S., Denuelle, A., & Srinivasan, M. V. (2015). UAV guidance: a stereo-based technique for interception of stationary or moving targets. In Conference Towards Autonomous Robotic Systems (pp. 258–269). Springer 31. Su, K., & Shen, S. (2016). Catching a flying ball with a vision-based quadrotor. In International Symposium on Experimental Robotics (pp. 550–562). Springer 32. Sung, C., Feldman, D., & Rus, D. (2012). Trajectory clustering for motion prediction. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 1547–1552). IEEE 33. Thomas, J., Loianno, G., Sreenath, K., & Kumar, V. (2014). Toward image based visual servoing for aerial grasping and perching. In 2014 IEEE International Conference on Robotics and Automation (ICRA) (pp. 2113–2118). IEEE 34. Tony, L. A., Jana, S., Bhise, A. A., Gadde, M. S., Krishnapuram, R., Ghose, D., et al. (2022). Autonomous cooperative multi-vehicle system for interception of aerial and stationary targets in unknown environments. Field Robotics, 2, 107–146. 35. Yan, L., Jg, Zhao, Hr, Shen, & Li, Y. (2014). Biased retro-proportional navigation law for interception of high-speed targets with angular constraint. Defence Technology, 10(1), 60–65. 36. Zhang, X., Wang, Y., & Fang, Y. (2016). Vision-based moving target interception with a mobile robot based on motion prediction and online planning. In 2016 IEEE International Conference on Real-time Computing and Robotics (RCAR) (pp. 17–21). IEEE 37. Zhang, Y., Wu, H., Liu, J., & Sun, Y. (2018). A blended control strategy for intercepting highspeed target in high altitude. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 232(12), 2263–2285. 38. Zhao, M., Shi, F., Anzai, T., Takuzumi, N., Toshiya, M., Kita, I., et al. (2022). Team JSK at MBZIRC 2020: interception of fast flying target using multilinked aerial robot. Field Robotics, 2, 34–54.

Robotics and Artificial Intelligence in the Nuclear Industry: From Teleoperation to Cyber Physical Systems Declan Shanahan, Ziwei Wang, and Allahyar Montazeri

Abstract This book chapter looks to address how upcoming technology can be used to improve the efficiency of decommissioning processes within the nuclear industry. Challenges associated with decommissioning are introduced with a brief overview of the previous efforts and current practices of nuclear decommissioning. A highlevel cyber-physical architecture for nuclear decommissioning applications is then proposed by drawing upon recent technological advances in the realm of Industry 4.0 such as internet of things, sensor networks, and increased use of data analytics and cloud computing approaches. In the final section, based on demands and proposals from industry, possible applications within the nuclear industry are identified and discussed. Keywords Cyber-physical systems · Sensor networks · Robotics · Industry 4.0 · Nuclear · Decommissioning · Artificial intelligence

1 Introduction 1.1 Background Around the world, many nuclear power plants are reaching the end of their active life and are in urgent need of decontamination and decommissioning (D&D). In the UK alone, seven advanced gas cooled reactors are due to enter decommissioning by 2028 [1]. This is in addition to the nuclear 11 Magnox reactors along with research sites, weapon production facilities, and fuel fabrication and reprocessing facilities already at various stages of decommissioning. Current estimates are that the decommissioning programme will not be completed until 2135, at a cost of £237bn [2]. The D&D process requires that the systems, structures, and components (SSC) of a nuclear facility be characterised and then handled accordingly. This may include D. Shanahan · Z. Wang · A. Montazeri (B) School of Engineering, LA14YW, Lancaster University, Bailrigg, Lancashire, UK e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_5

123

124

D. Shanahan et al.

removing materials that are unhazardous, to dealing with highly radioactive materials such as nuclear fuel. The main objectives of the D&D process are to protect workers, the public and environment, while also minimising waste and associated costs [3]. Despite this, at present many D&D tasks are still reliant on manual labour, putting workers at risk of radiation exposure. The need to limit this exposure in the nuclear industry, as defined by as low as reasonably practical (ALARP) principles, is a challenging problem [4].

1.2 Motivation Certain facilities within a nuclear plant are not accessible by humans due to levels of radioactivity present. Where access is possible such as in alpha-contaminated areas, heavy duty personal protective equipment is required, including air fed suits and leather overalls. This makes work cumbersome and strenuous, while still not removing all risk to the worker. In addition, the high cost of material disposal, including the associated equipment for segregation of materials, some of which may be highly radioactive, has necessitated more efficient processes. Consequently, decommissioning tasks such as cutting and dismantling components present challenges to the nuclear industry and require novel solutions [5]. In the nuclear industry due to uniqueness in the diversity and severity of its challenges, significant barriers prevent robotic and autonomous system (RAS) deployment These challenges, for example, are: (i) highly unstructured, uncertain and cluttered environments, (ii) high risk environments, with radioactive, thermal and chemical hazards, (iii) exploration, mapping and modelling of unknown or partially known extreme environments, (iv) powerful, precise, multi-axis manipulators needed with complex multimodal sensing capabilities, (v) critically damaging effects of radiation on electronic systems, (vi) need for variable robot supervision, from tele-immersion to autonomous human–robot collaboration [6]. The Nuclear Decommissioning Authority (NDA) in the UK has set out 4 grand challenges for nuclear decommissioning as shown in Table 1. As can be inferred from Table 1, digitisation of the nuclear industry, in combination with autonomous systems, advanced robotics, and wearable technologies, plays a significant role to address these challenges. Sellafield Ltd., in charge of managing the world’s largest inventory of untreated nuclear waste, has also identified key areas for technical innovation, based upon the grand challenges presented by the NDA [7]. A major difficulty in decommissioning current nuclear power plants is the lack of available data regarding storage and previous operation. Future builds would benefit from a comprehensive accounting of all lifecycle activities and incidents, which would in turn, make planning dismantling and decontamination a much easier process.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

125

Table 1 Grand challenges identified by the NDA [8] Reducing waste and reshaping the waste hierarchy

Finding new ways to drive the waste hierarchy, increasing recycling and reuse to reduce volumes sent for disposal

Intelligent infrastructure

Using autonomous technology to manage assets and buildings proactively and efficiently

Moving humans away from harm

Reducing the need for people to enter hazardous environments using autonomous systems, robotics, and wearable technology

Digital delivery—enabling data driven decisions

Adopting digital approaches for capturing and using data, to improve planning, training and aid decision making

1.3 Problem Statement Robots are a natural solution to challenges faced with D&D processes. Nevertheless, uptake of robot systems in the nuclear industry has been slow, with handling applications limited to teleoperated manipulators controlled through thick lead glass windows, greatly reducing situational awareness [5]. Furthermore, while these systems are robust and rugged, they lack feedback sensors or inverse kinematic controllers, making them slow to operate and reliant on the experience and skill of the operator. To address these challenges innovative technology is needed to improve efficiency and reduce the time taken to decommission plants. The need for robotic capability in the nuclear industry has long been recognised, particularly in the aftermath of accidents where radioactive sources can become dispersed and the danger to human life is much greater. Particular examples of these accidents include Three Mile Island in the USA, Chernobyl in Russia, and more recently at Fukushima in Japan. While the risks posed by nuclear are great, the need for cleaner energy and net-zero carbon emission is critical and so there is a necessity for systems that can deal with nuclear accidents, as well as provide support in the handling of common decommissioning tasks [9]. The current study expands on earlier research on the use of robotics and autonomous systems for nuclear decommissioning applications done at Lancaster University [10, 11]. Although the hydraulically actuated robotic manipulators are crucial for decommissioning operations, the inherent nonlinearities in the hydraulic joints make modelling and control extremely difficult. For instance, in [12] it is suggested to use a genetic algorithm technique to estimate the unknown parameters of a hydraulically actuated, seven degree of freedom manipulator. In [13], the estimation outcomes are enhanced by utilising a multi-objective cost function to replace the output error system identification cost function. Another issue arising in using hyper redundant manipulators in nuclear decommissioning is the need for developing a numerically efficient inverse kinematic approach to be robust against potential singularities [14]. An explanation of the earliest studies on the approaches

126

D. Shanahan et al.

to capturing the nonlinearities of the hydraulic manipulator is provided in [15, 16]. These findings are intended to be applied to state-dependent regulation of the robot joints under Wiener-type nonlinearity [17, 18]. It is evident that a significant amount of time must be allotted for pre-planning in decommissioning applications in order to gather and process the relevant data without the need for human intervention. Utilising an autonomous unmanned aerial vehicle in conjunction with the manipulator can bring an important benefit of the decommissioning by increasing speed, accuracy, and success [19]. A quadcopter would enable quicker 3D mapping and access to locations that could call for extensive preparation and labour. For UAV, attitude control in the presence of uncertainties and direct wind disturbance on the system model, a unique multi-channel chattering free robust nonlinear control system is developed in [20, 21]. Applying the Extended Kalman Filter (EKF) for state estimation [22] and using event-triggered particle filters to enhance UAV energy management throughout the state estimation process [23] are recent innovations that have brought the controller closer to practical application. The results obtained are encouraging, however they haven’t been tested on a real quadcopter platform yet. Additionally, expanding the number of robots to develop a heterogeneous collection of multi-agent systems would enable faster and more efficient execution of hitherto impossible missions. As a result, the goal of this book chapter is to discuss the technological foundations for the creation of such a system.

1.4 Recent Technological Advances–Industry 4.0 Industry 4.0 originated in Germany and has since expanded in scope to include digitalisation of manufacturing processes, becoming known as the fourth industrial revolution. The industry 4.0 paradigm can be characterised by key technologies that enable the shift towards greater digitalisation and the creation of cyber-physical systems (CPSs). This includes merging of technology that was once isolated, providing opportunities for new applications. As the key area of Industry 4.0 is manufacturing, many applications are focused on increasing productivity within factories, most commonly accomplished by improving the efficiency of machinery. Industrial automation in manufacturing encompasses a wide range of technologies that can be used to improve processes. With the advent of industry 4.0, there is now a trend for a holistic approach to increasing automation by considering all aspects of a process and how they are interlinked. Autonomous industrial processes can be used to replace humans in work that is physically hard, monotonous or performed in extreme environments as well as perform tasks beyond human capabilities such as handling heavy loads or working to fine tolerances. At the same time, they offer the opportunity for data collection, analytics, and quality checks, while providing improved efficiency and reduced operation costs [24]. Automation of industrial processes and manufacturing systems requires the use of an array of technologies and methods. Some of the technologies used at present

Robotics and Artificial Intelligence in the Nuclear Industry: From …

127

include distributed control systems (DCS), supervisory control and data acquisition (SCADA), and programmable logic controllers (PLC). These can be combined with systems such as robot manipulators, CNC machining centres and/or bespoke machinery. Industry 4.0 continues past developments and looks to enhance the level of autonomy in an industrial process. This can include new concepts and technologies such as: • • • • • •

Sensor networks The industrial internet of things (IIoT) Cloud computing Big data Fault tolerant control systems Simulation and digital twins

• • • • • •

Cloud computing Cloud robotics Cognitive computing Blockchain Artificial intelligence (AI) Cognitive computing

These technologies can also be applied to processes outside of those targeted by traditional automation and provide advanced abilities for decision making. The industry 4.0 concept involves integration at all possible levels, horizontal, vertical, and end-to-end, allowing data to be shared across the entirety of a process. According to [25], Industry 4.0 can be specified according to three paradigms: the smart product, smart machine, and augmented operator. At present the nuclear industry is somewhat lagging other industries such as manufacturing in adopting these new technologies. While industry 4.0 concepts show great potential, there are several challenges associated with practical implementation, especially in the nuclear sector. Lack of technical skills in the workforce, compounded by the fact that much of the nuclear workforce first employed by the industry are now reaching retirement age and taking tacit knowledge with them that could be used for the future development of the nuclear industry, in particular with regard to decommissioning [26]. Cyber security is of paramount importance in the nuclear industry with cyber-attacks becoming a more common occurrence, both from lone hackers and state actors. Introducing new technologies and greater interconnectivity within an ecosystem will typically also introduce new vulnerabilities that may be exploited, commonly known as a zeroday. In the case of a CPS, this could lead to data breaches of sensitive information from IT systems or disrupt operational technology (OT) systems, leading to loss of control and/or destruction of hardware or infrastructure, possibly also causing harm to or loss of human life [27]. Further challenges with implementing industry 4.0 concepts in nuclear include the need for interoperability between systems. As many systems in nuclear are bespoke due to the unique design requirements, often managing transfer of data between linked systems is not a design consideration; therefore, new communication systems will need to be established to manage this transition.

128

D. Shanahan et al.

1.5 Chapter Outlines and Contributions The aim of this chapter is to provide an overview of the current challenges faced during D&D operations at nuclear power plants, and how these can be addressed by greater digital integration and advanced technology such as seen as part of the Industry 4.0 initiative. A new conceptual design for the decommissioning process using the cyber-physical framework is proposed and the challenges and opportunities for the integrated technologies is discussed by drawing upon the literature. The start of the chapter will give background on the D&D process and some of the challenges faced. A review of the current state of the art in relevant research is then provided for common D&D tasks. In the final section, a framework for autonomous systems is developed, building upon the current advances in robot and AI system research.

2 Nuclear Decommissioning Processes This section will provide the background on the D&D process and some of the challenges faced at each stage of the process. There are several main processes that must be completed during D&D, categorised here as characterisation, decontamination, dismantling and demolition, and waste management.

2.1 Characterisation The initial stage of any decommissioning operation is to develop an inventory of all materials in a facility which can then be used to plan subsequent decommissioning processes. The first step of characterisation is performing a historical assessment of a facility. This can consist of reviewing plant designs along with operational and maintenance reports, which can then allow expected materials to be categorised as contaminated, unrestricted, or suspected contaminated, and help focus further characterisation efforts. The initial inventory of materials can subsequently be further developed by inspection, and then detailed characterisation of radiological and hazardous materials performed. Separate characterisation techniques may be required depending on the types of contamination present, particularly as some parts of a nuclear facility are not easily characterised due to high contamination and lack of access. Examples of this are components of the reactor which become radioactive over time due to neutron activation. A solution to this is to model expected radioactivity and then take samples at suitable locations. Comparing the model predictions with the sample readings can then allow for accurate characterisation of the complete reactor [3]. Such techniques often entail some level of uncertainty which must be considered when characterising waste. Factors such as radionuclide migration or spatial

Robotics and Artificial Intelligence in the Nuclear Industry: From …

129

variations in radionuclide concentrations can make sampling to the required confidence a difficult task. Therefore, new approaches are required across a spectrum of characterisation scenarios such in -situ characterisation of SSC in inaccessible areas, non-destructive monitoring of packaged waste in disposal facilities, and accurate characterisation of waste on the boundary between intermediate and low level [28]. Research focused on new solutions for characterisation is already underway such as the CLEANDEM (Cyber physicaL Equipment for unmAnned Nuclear DEcommissioning Measurements) project that aims to implement an unmanned ground vehicle (UGV) with an array of radiological sensing probes that can perform initial assessments as well as ongoing monitoring during D&D operations [29].

2.2 Decontamination During decommissioning operations, decontamination is a process that may be performed with the primary aim to reduce radioactivity for subsequent dismantling operations. This can reduce risk to workers during the dismantling procedure, along with providing cost savings in waste being reclassified at a lower level. These benefits however must be balanced with the increased cost, additional secondary waste, and radiation doses to workers associated with decontamination processes. In determining whether decontamination will be beneficial, many variables must be considered including: • Types of structures • Accessibility of structure surfaces • Material radioactivity levels and type

• Composition of materials • Type of contaminant • Destination of waste components

Decontamination can be performed in situ or parts can be relocated to a specialist location. There is a variety of decontamination options that can be used including chemical, electrochemical, and mechanical techniques. Each can have advantages for specific applications [30].

2.3 Dismantling and Demolition Dismantling (or segmentation) is often required for the components and systems within a nuclear facility, a key example of this is the reactor. There is no standardised method for dismantling; each facility will require a tailored approach depending on the design and conditions. As with decontamination, variables such as component types and ease of access need to be considered to determine the optimal approach. There are several techniques available for dismantling, these include:

130

• Plasma arc cutting • Electric discharge machining (EDM) • Metal disintegration machining (MDM)

D. Shanahan et al.

• Abrasive water jet cutting • Laser cutting • Mechanical cutting

Each technique can have advantages for a given application, depending on requirements for cutting depth, speed, and type of waste generated. Off-the-shelf equipment may be adapted for the nuclear environment, along with the application of tele-operation and/or robotics to reduce the risk of radiation to operators. Once the systems and components have been removed, and the building is free from radioactivity and contamination, demolition can be carried out. This can be done using conventional techniques such as ball and breakers, collapsing, or explosives where appropriate [3].

2.4 Waste Management Careful consideration must be given to handling and disposal of waste materials produced during decommissioning operations. The waste management hierarchy should be adhered to and is of particular importance during decommissioning due to higher waste disposal costs. In the UK, waste is classified into one of 3 categories depending on radioactivity and heat production. These are high, intermediate, and low-level waste. Low level waste (LLW) can be further broken down to include very low-level waste (VLLW) which includes waste that can be disposed of through traditional waste routes. In determining the available routes for waste, it is necessary to review technical requirements and processing capabilities including aspects such as radioactivity levels, storage capacities, treatment facilities, and material handling options. Raw waste must undergo treatment and packaging to a form suitable for disposal or long-term storage; these activities can be broadly categorised as post-processing operations. Treatment and packaging of raw waste may use several processes, some of which have already been introduced. These can include retrieval of waste, sorting and segregation, size reduction, decontamination, treatment, conditioning/immobilisation and/or packaging [8]. Post-processing provides several benefits including reduced waste classification, smaller final volumes, safer waste packages, and the opportunity to generate waste inventories. As with other stages of the D&D process, humans are not able to work in the vicinity of waste materials, thereby the necessitating the use of tele-operated or autonomous systems [31]. The PREDIS (PRE-DISposal management of radioactive waste) project has already started work focusing on developing treatment and conditioning methods for different decommissioning wastes by producing new solutions or adopting immature solutions by increasing technology readiness levels [32].

Robotics and Artificial Intelligence in the Nuclear Industry: From …

131

3 Current Practice in Nuclear Decommissioning Research There is a range of robots required in the nuclear industry, each designed for specific applications. These can be water, land or air based, and often have to deal with challenging environments that include high radiation and restricted access. Robots used in nuclear have additional requirements over other industries, particularly in relation to the ability to cope with radioactive environments. This results in the need for high equipment reliability with low maintenance requirements. While conventional off the shelf equipment may be suitable for some applications in nuclear, invariably it will need to be adapted to make it more suitable for being deployed. This can involve technique such as relocating sensitive electronics, adding shielding, upgrading components to radiation tolerant counterparts and/or adding greater redundancy through alternative recovery methods [33].

3.1 Assisted Teleoperation and Manipulation in Nuclear Decommissioning Approaches looking to improve control of teleoperated robots include virtual and augmented reality, AI assistance, and haptic feedback, such as being carried out by researchers at the QMUL Centre for Advanced Robotics [34]. Virtual reality (VR) would allow the operator to have a greater awareness of surroundings as opposed to the current approach of using a set of screens to provide different perspectives of the workspace. AI assistance can be used in conjunction with VR to automate the more routine tasks and allow faster operation. This could also provide the possibility of allowing a single operator to control multiple manipulators and only have to take full control when a new problem is encountered. Haptic feedback could further improve the operability of teleoperation systems by providing feedback that allows safer grasping of an object [34]. Significant research has been carried out into the use of hydraulic manipulators for decommissioning tasks such as pipe cutting as described in [35]. The system under development uses COTS 3D vision to allow an operator to select the intended work piece without having to manually position the cutting tool or manipulators. This was shown to be faster than using tele-operational control alone. A common task required of robots is grasping objects. While this has been performed for some time in manufacturing, nuclear industry environments are often unstructured making such tasks considerably more difficult. To address this, researchers at the University of Birmingham have developed autonomous grasp planning algorithms with vision-based tracking that does not require a-priori knowledge of objects or learning from training data. This is achieved using a local contact moment based grasp metric using characteristic representations of surface regions [36].

132

D. Shanahan et al.

3.2 Robot-Assisted Glovebox Teleoperation Gloveboxes are used throughout the nuclear industry to provide a contained environment for handling hazardous objects and materials. They can however be difficult to use and still present a risk of exposure to radioactive materials for the operator. In addition, the gloves reduce tactile feedback and reduce mobility of the arms making simple tasks tiring and challenging; it would therefore be beneficial to incorporate a robotic system. There are however challenges for implementation such as integrating robots within a glovebox which can be dark and cluttered, and protecting systems from the harmful effects of radiation which can cause rapid deterioration to electrical components. In [37], new technologies in robotics and AI are explored as to how gloveboxes can be improved through the use of robotics. It is suggested that it is preferable to design a robot to operate within the gloves to simplify maintenance and protect it from contaminants. It is also noted while greater autonomy would improve productivity, glovebox robotics mainly utilise teleoperation due to the risk of breaking containment when using an autonomous system. Teleoperation methods were further developed by a team at the University of Manchester which allowed control of a manipulator using only the posture and gesture of bare hands. This was implemented using virtual reality with Leap Motion and was successful at executing a simple pick and place task. The system could be further improved with the addition of haptic feedback, possibly achieved virtually using visual or audio feedback [38].

3.3 Post-processing of Nuclear Waste Currently under development by Sellafield Ltd. and the National Nuclear Laboratory is the box encapsulation plant (BEP) which is intended to be used for post-processing of decommissioning waste. Post-processing offers benefits including lower waste classifications, reduced waste volume and safer waste packaging while also allowing creation of an inventory. Due to the radioactivity of the waste materials, it is necessary to have a remotely operated system. Current designs are tele-operated however they are extremely difficult to operate. To address this, research [31] looks at how greater autonomy can be applied to the manipulators used in the BEP. The key requirements of the autonomous system can be defined as: • Visual object detection, recognition, and localisation of waste canisters. • Estimation of 6DOF manipulator pose and position. • Decision making agent utilising vision data and acting with the manipulator control system. • Autonomous manipulation and disruption.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

133

3.4 Modular and Cooperative Robotic Platforms A consideration in deploying robotics in radioactive environments is their total integrated dose (TID), which is the amount of radiation a robot can be exposed to before failure. In highly radioactive areas such as close to reactors, this can be a matter of minutes. A further challenge in deploying robotics in nuclear facilities is the unstructured nature of the operating environment. This means that robotic systems will often spend a large proportion of time planning and mapping. These factors combined mean that the time available to complete necessary tasks can be limited. Greater autonomy is also a key driver for improved technology. Current methods of operation by teleoperation can result in slow and inefficient operation. With the high risk of failure, it is useful to have redundancy in robotic systems; modular and multi-robot systems have been identified as a possible solution to this challenge [39].

3.5 Unmanned Radiation-Monitoring Systems Characterisation of nuclear environments has long been a challenge within the industry, particularly in the aftermath of incidents such as Fukushima, TMI and Chernobyl. Monitoring radiation levels in and around a nuclear power plant is an important task during operation, decommissioning and in response to incidents that may result in emission of radiation. Typically, plants will have numerous monitoring points however these do not provide data over a large area or may become faulty, requiring an alternative solution. The authors in [40] propose an unmanned radiation monitoring system by combining radiations sensors and a unmanned aerial vehicle that can be deployed in quick response to where there may be a source of radiation. Some of the key features for this include easy decontamination, hibernating mode, and custom software, all with a high standard of reliability. Further detail of previous robots can be found in [41]. A similar project for carrying out mapping of radioactive environments was carried out as detailed in [42]. This work focused on implementing gamma radiation avoidance capabilities that would allow a robot to navigate a nuclear environment while avoiding high radiation doses that may cause system failure. In response to the flooding of primary containment vessels at the Fukushima nuclear plant, a submersible ROV named AVEXIS was developed by collaboration between the Universities of Manchester and Lancaster along with institutions in Japan. Characterisation is achieved by combining visual inspection, acoustic mapping, and radiation assessment by a CeBr3 inorganic scintillator. The system has been validated as being able to detect fuel debris from experimental testing at the Naraha test facility and National Maritime Research Institute in Japan, as well as in water tanks at Lancaster University, UK. MallARD, an autonomous surface vehicle, has also been developed by the University of Manchester for an International Atomic Energy Agency (IAEA) robotics

134

D. Shanahan et al.

challenge on the inspection of spent fuel storage ponds. This project focused on development of robust localisation and positioning techniques using Kalman filters and model predictive control techniques. Experimental testing showed the MallARD system was able the follow planned paths with a reduction in error of two orders of magnitude [43].

4 Towards an Autonomous Nuclear Decommissioning Process The review in the previous section reveals that a large body of work remains to complete all aspects of autonomy expected in the nuclear sector. Moreover, the overview in the last two sections highlights the challenges the nuclear sector is currently facing for decommissioning of power plants and clarifies the urgent need for increasing the autonomy level in the whole process. Inspired from autonomous industrial processes and drawing upon what is currently practiced in various industrial sectors after the fourth industrial revolution, in this section we aim to study the underlying architecture and recent technological advances that motivates designing an autonomous decommission process for the nuclear sector.

4.1 Different Levels of Autonomy In autonomous systems, autonomy is typically defined by levels dependent on the level of human input. The categorisations vary depending on the application, but all have the same structure of being non-autonomous (lowest) up to completely autonomous (highest). Likewise, within an automated system there is a hierarchy of processes from the lowest being actuator and sensor control, up to high level which can consist of sophisticated decision making. The processes of an automated system can also be grouped by function. In [44], these classes are proposed as information acquisition, information analysis, decision and action selection, and action implementation. When designing a system, a key question is what should be automated, particularly with regard to safety and reliability requirements. Advancements in the field of robotics and software mean that systems can operate with a higher degree of automation than ever before. A proposed classification structure of autonomy levels in industrial systems is shown in Fig. 1. Due to the similarities between process plants and decommissioning operations, these levels are relevant to autonomous operations within the nuclear industry for decommissioning tasks. At present many nuclear decommissioning technologies have only reached Level 2. A key issue in using new technologies within safety critical application such as the nuclear industry is ensuring that they are fit for use and safe; the process of determining this is known as verification and validation (V&V) [46]. V&V methods

Robotics and Artificial Intelligence in the Nuclear Industry: From …

135

Fig. 1 Taxonomy of autonomy levels for industrial systems [45]

will vary dependent on how a system is designed. Early stages of development may rely more on simulation due to the unpredictable nature of systems in the initial phases. Considerations for design include robust design methods, facility design and possible adaptions to accommodate the system. In addition, considerations regarding the maintenance and decommissioning of a new system must be evaluated such as methods of recovery in the event of power loss or other critical fault [46]. For a tele-operated system this may include physical testing, user analysis and simulated testing. Semi-autonomous systems require greater analysis and may require stability proofs along with hardware in the loop testing. Such systems are at risk from events such as loss of communication and may need to be continuously recalibrated to ensure they are operating within safe limits. Research at the University of West of England has investigated how simulation-based internal models can be used in the decisionmaking process of robots needs to identify safety hazards at runtime. This approach would allow a robot to adapt to a dynamic environment and predict dangerous actions, allowing a system to be deployed across more diverse applications [47].

136

D. Shanahan et al.

4.2 The Cyber Physical System Architecture A cyber physical system (CPS) can be defined as “integrations of computation with physical processes” [48]. They generally consist of embedded computers joined by networks and may involve feedback loops for controlling physical processes. CPSs are unique in their intricate connection between the cyber and physical world; while this can present many problems for implementation, exploiting this fact can lead to advanced systems with greater capabilities than ever seen before. The potential applications for CPSs are vast and spread across a spectrum of sectors such as healthcare, transport, manufacturing, and robotics. According to the National Institute of Standards and Technology (NIST) [49], CPSs are defined as systems that “integrate computation, communication, sensing, and actuation with physical systems to fulfil time-sensitive functions with varying degrees of interaction with the environment, including human interaction”. There are some key elements of a CPS that distinguish them from conventional systems. These are: • • • • • • • • • • •

Amalgamation of cyber and physical domains, along with interconnectedness. The scope for systems of systems (SoS). Expected emergent behaviour. Requirements for methods to ensure interoperability. The possibility for functionality extended beyond initial design scope. Cross-domain linked applications. A greater emphasis on trustworthiness. Modifiable system architecture. Broad range of computational models and communication protocols. Time-sensitive nature, with latency and real time operation a key design issue. Characterisation by interaction with their operating environment.

The current CPS concept has developed from the increasing number of embedded systems, associated sensors, and greater connectivity inherent in today’s systems. This allows data to be collected directly from a system and be processed to give insight into operation, made possible by the use of advances in data processing with concepts such as big data, machine learning and AI. This can result in a system that exhibits a level of intelligence [50]. As CPSs are a relatively recent concept, methods and structure for their design are still being developed. An early architecture, proposed in [51], is a 5-level structure for developing CPSs for Industry 4.0 applications. An overview of this is given in Fig. 2. Along with the 5C architecture, other architectures have been proposed including the Reference Architectural Model Industry 4.0 (RAMI4.0) and Industrial Internet Reference Architecture (IIRA) [52]. A key challenge in the implementation of CPSS is the lack of interoperability of individual systems. In terms of software implementation this interoperability refers to the ability to exchange data between system, more specifically known as semantic interoperability [53]. In the context of CPSs,

Robotics and Artificial Intelligence in the Nuclear Industry: From …

137

Fig. 2 The 5C architecture for a cyber physical system [51]

cyber physical production systems have been given as an adaptation to the manufacturing industry. This is based on the same framework as the conventional CPS, mainly the 5Cs [54]. An example of a CPS developed for manufacturing automation is proposed in [55]. The authors account for the hybrid system approach, integrating the continuous-time physical process with the discrete-time control. In addition, the paper notes that service-orientated architecture and multi-agent system approaches are promising for the development of such a CPS. The concept system integrates the robot movements with object recognition and fetching, all while take consideration of safety during human–robot interaction. The interconnected nature of CPSs make them inherently susceptible to cyberattacks. These attacks can come from actors such as hacking organisations, governments, users (that may be intentional or not), or hacktivists. Hackers can exploit vulnerabilities, such as cross site scripting or misconfigured security in order to break into a system. These risks can be mitigated by maintaining security defined by the CIA triad—confidentiality, integrity, and availability. Many cyber-attacks are confined to the cyber domain and focus on sensitive data or disrupting computer systems. In contrast, CPS attacks can have a direct impact in the physical world that can cause damage to the physical systems as well as the potential to endanger life. Preventing future attacks, which could have greater impact due to the greater number of connected systems, is of upmost importance [56].

138

D. Shanahan et al.

4.3 Enabling Technologies The cyber physical architecture discussed in the previous section can be realised on the pillars of various recently proposed technologies, referred to as enabling technologies here. In the following section, these technologies are reviewed in more depth to set the scene for development of a cyber-physical system for the nuclear industry. Industrial Internet of Things. The Industrial Internet of Things (IIoT) is a subset of the Internet of Things (IoT) concept whereby devices are all interconnected, facilitated by the internet. IoT has seen increasing development in many sectors such as transportation and healthcare. Increasing connectivity has been enabled by the greater use of mobile devices each of which can provide feedback data. IIoT covers machine-to-machine (M2M) communication and similar industrial automation interaction technology. While there are some similarities with consumer based IoT, IIoT differs in that connectivity must be structured, applications are critical, and data volume is high [53]. While IoT tends to use wireless communications, industrial applications often rely on wired connectivity due to greater reliability. As IIoT is characterised by the substantial amounts of data transferred, the methods for data exchange are a key consideration for IIoT systems. Industrial communication networks have developed considerably over the years, utilising developments in fields such as IT. Initial industrial communication was developed using fieldbus system networks which helped to improve communication between low level devices. As the internet continued to grow, automation networks changed to incorporate more ethernet based technologies despite issues with real time capabilities. More recently wireless networks have become more common as they allow easier reconfiguration of systems and do not require extensive cabling; they do have still have issues with real time capabilities and concerns regarding reliability. In particular the new wave of communication technology that has added the development of the IoT and IIoT is more focused on consumer requirements and so are not currently suitable for many of the demanding requirements of industry [57]. Using IIoT can help improve productivity and efficiency of processes. Real time data acquisition and data analytics can be used in conjunction with IIoT to predict equipment maintenance requirements and allow fast response to failures. In healthcare setting IoT can be used to improve patient safety by better monitoring patient conditions. There are many examples of IoT applications, some have been slowly developed while others are quickly being implemented as enabling technologies become available. Examples include: • Smart Cities—Smart highways can be used to improve traffic flow and reduce accidents while monitoring of available parking spaces can be used to assist drivers. • Smart Agriculture—Monitoring weather along with soil moisture and quality can ensure planting and harvesting is done at the correct time.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

139

• Smart Homes—using IoT devices in the home can improve efficiency in use of utilities in addition to also allowing better safety through detection of break ins. Internet of Things systems generally comprise of a wireless sensor network (WSN). WSNs have some key objectives such as sensing properties of their environment, sampling signals to allow digital processing, and some provide some ability towards extracting useful information from the collected data. Typically, WSNs will involve low-cost, low-power, communication methods such as Wi-Fi, Bluetooth, or near frequency communication (NFC); some drawback of using these methods include interference and loss of data. This can be particularly problematic in a nuclear environment where good reliability is required [58]. The convergence of IoT and robotics has resulted in the concept of an internet of robotic things (IoRT) which can be seen from control approaches such as with cloud and ubiquitous robotics [59]. The term was initially identified to refer to fusion of sensor data and manipulation of objects resulting in a cyber-physical approach to robotics, and sharing characteristics with the CPS concept. Elementary capabilities of an IoRT system include perception, motion, and manipulation, along with higher level processing including decisional autonomy, cognitive ability, and interactive capacity. Cloud Computing. Cloud computing allow quick access to IT resources, providing flexibility to match requirements. Resources are available on demand, over the internet, and are used across a range of industries. There are several benefits of cloud computing such as reduced initial upfront cost for infrastructure, running costs associated with IT infrastructure, and allowing capacity to be quickly adapted to requirements, reducing the need for planning. There are several deployment methods for cloud computing which each offer benefits in terms of flexibility and management. Some examples are shown in Fig. 3.

Infrastructure as a Service (IaaS)

Platform as a Service (PaaS)

Software as a Service (SaaS)

Access to networking and data storage with high flexibility and management

Access to cloud computing without associated infrastructure maintenance

Software available on demand thorough cloud infrastructure with low to no maintenance

Fig. 3 Cloud computing service models

140

D. Shanahan et al.

Edge computing is becoming a common alternative to cloud computing. It can be defined as computing resources located part way between the data sources and a cloud computing centre. This has the advantage of improving system performance in terms of overall delay and reducing the requirements for bandwidth. In addition, fog computing can provide greater security over the cloud, reducing the possibility of data being intercepted during transmission and allowing greater control over storage [60]. Edge-based system will also help drive the development of CPS by allowing data processing in real-time, greatly improving the response of such systems. The architecture for an edge-based network consists of an edge layer situated between a device layer and cloud layer, which can be further broken down into near, mid and far levels. The edge provides opportunities for integration of 5G, allowing applications such as real-time fault monitoring. This however requires consideration of quality of service based on availability, throughput and delay. 5G networks also offer the ability to be sliced, such as by using network function virtualisation (NFV) technology, a form of logical separation, which allows sharing of network resources. As the edge relies on distributed computing power, data offloading and balancing must be considered to avoid overloading resources and improving availability and efficiency [61]. Cloud Robotics. A cloud robot can be defined as a robotic system that uses data from a network to assist operating tasks [62]. This is in contrast to a conventional robotic system whereby all processing and related tasks are carried out in a standalone system. As there is latency in passing data, often cloud robotics are designed with some on board abilities as required for real-time control. Using the cloud allows access to several possibilities including large libraries of data, parallel computing for statistical analysis, collective learning between robots, and including multiple humans in the loop. Some of the challenges involved with cloud computing include privacy and security requirements, along with varying network latency. Using the cloud with robotics has the opportunity to allow a SaaS approach to robotic development where packages such as those used with ROS can be available without requiring the lengthy setup currently associated with using them. Research in [63] investigated the use of cloud computing for global path planning with large grid maps, as these tend to be computationally intensive. This was performed by using a vertex-centric RA* planning algorithm, implemented with Apache Giraph distributed graph processing framework. The research ultimately found that current cloud computing techniques are unsuitable for the real-time requirements of path planning due to the latency in network connections. Big Data. The use of IIoT and WSNs generates a large amount of data, commonly referred to as big data. Big data analytics is then required to make sense of the large quantity of data and provide useful insights which can be used for tasks such as structural health monitoring and diagnostics. It can be combined with cloud computing and AI techniques to manage data and perform analysis. Big data is characterised by 4 main characteristics, known as the 4Vs [64] shown in Table 2.

Robotics and Artificial Intelligence in the Nuclear Industry: From … Table 2 The 4Vs of big data [64]

141

Volume

Data can be in the order of 1000 + TB

Variety

Data can be structured or unstructured, with varying formats

Velocity Data is generated rapidly Value

Data must be processed to extract value

Within a manufacturing environment data can be produced from numerous sources. This can include resource data from IoT equipment along with material and product parameter and environmental readings. In addition, management data can be included arising from information systems such as ERP and PDM, as well as CAD data. Data must be cleaned and organised before being stored, at which point it can then be analysed. The processed data can be integrated into a digital twin of a system to provide real time feedback to operators and develop predictive capabilities for a system [65]. A framework for managing big data during nuclear decommissioning was carried out in [66]. This involved combining imaging robotics and machine learning to help assess condition of EOL nuclear facilities. Data was collected using LiDAR, which was subsequently processed using AI techniques and stored as a distributed file system using Hadoop. Big data has the potential for multiple applications in robotics such as for object recognition and pose estimation. Using datasets such as the Willow Garage household object dataset, models can be trained for parameters such as grasp stability, robust, grasping, and scene comprehension [62]. A major challenge remains in producing data that is suitable for cross platform applications by standardising formats and ensuring good interoperability between systems. In addition, research is required into sparse representation of data to improve transmission efficiency, and approaches that are robust to dirty data. Digital Twins and Simulation. In 2012, NASA released a paper [67] defining a digital twin as “an integrated multi-physics, multiscale, probabilistic simulation of an as-built vehicle or system that uses the best available physical models, sensor updates, fleet history, etc., to mirror the life of its corresponding flying twin.” While this definition was created with aerospace engineering in mind, the concept can be extended to a wide range of systems. A more recent definition gives a digital twin (DT) in terms of physical objects (POs) as “a comprehensive software representation of an individual PO. It includes the properties, conditions, and behaviour(s) of the real-life object through models and data [68].” A DT is a set of realistic models that can simulate an object’s behaviour in the deployed environment. The DT represents and reflects its physical twin and remains its virtual counterpart across the object’s entire lifecycle. This means that a digital twin can reflect to real world condition of a system, components, or system of systems, along with providing access to relevant historical data. Digital twins have applications across a range of industries and sectors; some key examples are in smart cities and healthcare. IoT sensor that gather data from cities

142

D. Shanahan et al.

can be used to give insight into the use of utilities and how it may be possible to save energy. Within manufacturing real time status of machines performance can be obtained helping to predict maintenance issues and increase productivity. Within medicine and healthcare, digital twins can be used to provide real time diagnostics of the human body and may even allow simulation of the effects of certain drugs, or surgical procedures [69]. There are some key enablers in related technology that have allowed the development of more comprehensive digital twins. While digital modelling has been around for decades, advances now allow real world scenarios to be replicated and testing carried out without any risk to the physical system. Digital modelling, made possible by advances in processor power, likewise can be used to make predictions about that condition of a system. As shown in Fig. 4, digital twins can have different applications depending on the computing layer in which they operate. In addition, it is possible to have multiple digital twins running simultaneously each providing a different application. This could include deployment within the system of interest itself, allowing rapid to response to conditions that fall outside of nominal operating conditions. Simultaneously, a digital twin could be applied to work with historical data, possible utilising data from concurrent systems and allowing predictions that can influence maintenance strategies and test possible scenarios. A key consideration in the development of a digital twin is the modelling method used; these can be categorised into datadriven or physics-based modelling, however modelling approaches also exist as a combination of these two techniques. Multi-agent Systems. The concept of multi-agent system (MAS) developed from the field of distributed artificial intelligence (DAI) first seen in the 1970s. DAI can be defined as “the study, construction, and application of multiagent systems, that is, systems in which several interacting, intelligent agents pursue some set of goals or perform some set of tasks” [71]. An agent can be either cyber based such as a software program, sometimes referred to as a bot, or a physical robot that can interact directly with its environment. Research in DAI has developed as a necessity from the ever-increasing distributed computing in modern systems. A major aspect of DAI is that agents are intelligent,

Fig. 4 Digital twin strategies at different levels [70]

Robotics and Artificial Intelligence in the Nuclear Industry: From …

143

and therefore have some degree of flexibility while being able to optimise their operations in relation to a given performance indicator. Typically, an agent will have a limited set of possible actions which is known as its effectoric capability. This capability may vary dependent on the current state of the environment. The task environment for an agent can be characterised according to a set of properties. According to [72], these can be defined as. • Observability—whether the complete state of the environment is available. • Number of agents—how many agents are operating in the environment; this also required the consideration of how an agent will act, if it is competitive, cooperative or can be viewed as a simple entity. • Causality—the environment may be deterministic allowing future states to be predicted with certainty, or alternatively stochastic in which there will be a level of uncertainty in actions. • Continuity—whether previous decisions affect future decisions. • Changeability—an environment can be static or dynamic and requiring continual updates. • State—can either be discrete or continuous. Agents can have software architectures not dissimilar from those of robots. Reflex agents respond directly to stimuli, while model-based agents use an internal state to maintain a belief of the environment. Goal based agents may involve some aspects of search and planning in order to achieve a specific goal, utility agents have some sense of optimising, and learning agents can improve their performance based on a given performance indicator. MASs can assist with many tasks required of the robots in a previously unknown environment such as planning, scheduling, and autonomous operation. The performance of a multi-agent system can be measured in terms of rationality, i.e., choosing the correct action. There are 4 main approaches to implementation of a MAS are shown in Table 3. Table 3 MAS approaches Acting humanly

An approach based on the Turing test whereby an agent can respond to written questions, with a result that is indistinguishable from if they had been given by a human

Thinking humanly

This approach relies on programming the working of the human mind—this has developed into the field of cognitive science

Thinking rationally

At its centre, this approach uses logic and reasoning. The main challenges with this approach are formalising knowledge and managing the computational burden when analysing a problem with a large number of aspects

Acting rationally

An approach that uses multiple methods to achieve rationality and optimise performance for a given task

144

D. Shanahan et al.

Deep Learning Techniques. Machine learning has developed as a culmination of research in associated research areas such as statistics and computer science. The main goal of a machine learning approach is to generate predictions via a model inference algorithm, using a machine learning model that has been trained for a specific purpose. Different approaches for training can be implemented depending on the characteristic of the data set being used such as supervised, unsupervised, or reinforcement learning. Training can be performed by splitting a dataset to give a training set, along with a test set which can be used to optimise the model by varying parameters based to minimise a loss function [73]. Deep learning algorithms, sometimes also referred to as artificial neural networks or multilayer perceptrons, are a subset of machine learning algorithms that mimic pathways through the brain and are designed to recognise patterns. So far, deep learning algorithms have generated a great impact in semantic image processing and designing vision systems for robotic applications. The main advantage of deep learning techniques is their potential to extract features from images and classify them using data and with minimum human intervention when they are trained. The neural networks in deep learning application are composed of nodes, or neurons, which can be used to solve a variety of problems, most commonly separating data into groups. Hidden layers add to the depth of the network using activation functions which convert an input signal to an output that may be passed to the next layer. These functions can be setup to create different networks such as feedforward, recurrent or convolutional, dependent on the desired results. Deep learning can be used to improve the performance of machine learning algorithms through building representations from simpler representations, therefore allowing more abstract features to be identified. Neural networks use weightings to control the outputs. These can be determined by training the network, either online or offline. This is done by showing the network many examples of different classes, known as supervised learning, and minimising the cost function. With multiple weightings, it is infeasible to test all possibilities. In this case, it is possible to find the optimum weight of the cost function through gradient descent [74]. Human–Robot Collaboration. In the context of nuclear scenarios, human–robot collaboration refers to a teleoperation scheme, where robot assists human in a virtual interaction to perform the task. Robot motion would benefit from human-in-theloop cooperative framework, where robot autonomy can be effectively compensated by introducing human intelligence. It has been shown in [75] that learning from and adapting to humans can overcome the difficulties of decision making in fully manual and autonomous tasks. Therefore, mixture of human and robot intelligence presents opportunities for effective teamwork and task-directed assistance in nuclear decommissioning context. On the other hand, the robot’s autonomy is often limited by its perception capabilities when performing complex tasks. Projecting sensorimotor capabilities onto the robot side would facilitate to reduce the high demand on environment perception while keeping the operation safety. Current nuclear robots are typically commanded according to a master–slave scheme (Level 1–2 in Fig. 1), where the human operator at a console remotely controls

Robotics and Artificial Intelligence in the Nuclear Industry: From …

145

the robot during operation using visual and haptic feedback through the workstation. However, local sensors equipping the tools at the remote environment may provide better quality or complementary sensory information. Additionally, the operator’s performance may decrease with fatigue, and in general robotic control should be adapted to the operator’s specific sensorimotor abilities. Therefore, empowering robots with some autonomy would effectively regulate flexible interaction behaviour. In this regard, how to understand and predict human motion intent is the fundamental problem for the robot side. Three human motion intents are typically considered in physical human–robot interaction, namely motion intent, action intent and task-level intent [75], which covers human intent in the short to long time and even full task horizon. A more intuitive approach is developing physical or semi-physical connection between robot and human. Such method can be observed in human– human interaction literature. In terms of the performances of physically interacting subjects in a tracking task, subjects can improve their performance by interacting with (even a worse) partner [76]. Similar performances obtained with a robotic partner demonstrated that this property results from the sensory exchange between the two agents via the haptic channel [77]. This haptic communication is currently exploited to develop sensory augmentation in a human–robot system [78]. When both human and robot perform partner intent estimation and interaction control design, it is essential to investigate the system performance under this process of bilateral adaptation. Game theory is a suitable mathematical tool to address this problem [79]. For example, for novice or operator in fatigue, the robot controller tends to compensate for human motion to mitigate any possible adverse effects due to incorrect human manipulation. On the other hand, human may still wish to operate along his/her own target trajectory. With the estimated intent and gain from the robot side, human can counteract robot’s effects by further increasing his/her control input (gain). Thus, there is haptic communication between human and robot regarding their respective behaviour. The strategies of the two controllers can converge to a Nash equilibrium as defined in non-cooperative game theory.

5 A Cyber Physical Nuclear Robotic System The overall architecture of the cyber physical system proposed for the nuclear decommissioning applications is illustrated in Fig. 5. This architecture consists of two layers namely ‘Cyber Environment’ and Extreme ‘Physical Environment’. Each robot based on the local information from its own sensor, information from other robots and the cloud, and a mission defined by ‘Decision Making and Action Invoking Unit’ or ‘Assisted Teleoperator’ can obtain a proper environmental perception and design a proper path planning to avoid obstacles and conduct to the mission position. As can be seen from Fig. 5, the proposed system involves various integrated heterogeneous subsystems with a great level of interoperability and a high level of automation in both vertical and horizontal dimensions of the system. In this sense, the proposed architecture can be viewed as a complex system of systems with a

D. Shanahan et al.

Decision Making and Acon Invoking Unit

Cyber Environment

146

Data Analycs and AI Algorithms

Assisted Tele-operator

Digital Twin (ROS, VR, etc)

Autonomous Roboc Agent #i on the Edge Cloud Infrastructure

Pose Esmaon

Sensor Fusion

Feature Extracon

Percepon

Object Recognion

Fault Detecon

Cloud

Navigaon

Asset #M

Path Planning

Asset #i

SLAM

Sensor #K

Actuaon

Asset #1

Fault Accommodaon

Sensor #i

Robot Constraints

FTC

Sensor #1

Obstacle Avoidance

Agent #N

Agent #i

Introspecve Autonomy

Agent #2

Fault Isolaon

Extreme Physica Environment

Sensing

Communicaon

Agent #1

Fig. 5 A schematic block diagram of the proposed solution. The components studied in this project and their relevance to the work packages are highlighted in light blue colour

hierarchy of various software, hardware, algorithms, and robotic platforms aiming to enhance the autonomous operation of the overall system and execute complex decommissioning tasks in an uncertain and unstructured environment. The proposed system relies on a heterogeneous multi robotics system by which the complex decommissioning tasks are distributed and executed in the most efficient way to reduce the execution time and hence the exposure of the robots to high does radioactive environments. The conceptual cyber physical system illustrated in Fig. 5 follows the level 5 of autonomy in the autonomy taxonomy levels depicted in Fig. 2. The proposed architecture works based on the 5C architecture discussed in Fig. 2 of the book chapter. At the very bottom level, the system involves a mobile sensor network using a set of aerial and ground-based mobile manipulators. The mobile robots are equipped with a range of sensors to collect multi-sensory data from the physical assets and the surrounding environment. The collected data is processed in real-time on the edge and through the on-board computer available on the robotic platforms and used to characterise the nuclear environment. This is usually carried out by estimating the spatial radiation field and other environmental variables such as temperature and humidity, and identifying different objects and their positions within the generated

Robotics and Artificial Intelligence in the Nuclear Industry: From …

147

map of the nuclear site. For autonomous operation of individual robots and their interaction with other robots, various algorithms such as motion planning, trajectory tracking controller, simultaneous localisation and mapping, object detection, and pose estimation techniques should be designed by respecting the environmental constraints imposed on each robot. In the following, the most important components of the proposed cyber physical system are explained in more depth.

5.1 Software Architectures The development of more advanced robotics has necessitated better structures for designing robots based on how subsystems interact or communicate. Developments have seen the formation of paradigms which provide approaches for solving robotic design problems. Initial robots such as Shakey [80], were based on the sense-plan-act (hierarchical) paradigm. As robotics developed, this paradigm became inadequate. Planning would take too long and operating in a dynamic environment meant that plans may be outdated by the time they are executed which may be challenging. To address the challenges in control architecture, new paradigms were created such as using reactive planning. A reactive system is one of the simplest forms of the control architecture depicted in Fig. 6. While not effective for general purpose applications, this paradigm is particularly useful in applications where fast execution is required. This structure also exhibits similarities with some biological systems. Another alternative approach that has been widely adopted is the subsumption architecture in Fig. 7, which is built from layers of interacting behaviours. This utilised an arbitration method that would allow higher-level behaviours to override lower-level ones [81]. While this method proved popular, it was unable to deal with longer term planning. Fig. 6 Example of SPA architecture [81]

Fig. 7 Example of subsumption architecture [81]

148

D. Shanahan et al.

Fig. 8 Real-time control system architecture [83]

Hybrid Architecture. Hybrid architectures combine both reactive and deliberative control architectures to give a combined behaviour, they are sometime also referred to as layered. Many robots have been designed according to the hybrid model including MITRE, ATLANTIS, and LAAS [82]. A good example of structured approach to control implementation is the Real-Time Control System (RCS) reference model architecture depicted in Fig. 8 [83]. The architecture provides a way to organise system complexity based on a hierarchical structure and takes account of the entire range of processes that affect the system. Figure 8 shows a high-level overview of the system architecture. The same structure can also be applied at various levels of the system hierarchy whereby tasks are decomposed and passed to subordinate levels. Commonly used is a three-tiered approach to hybrid control architecture. This consist of behavioural control at the lowest level which is involved interfacing with components. Above this is an executive level which manages current tasks, and on the highest level is the planning tier which is used for achieving long term goals. Middleware/Communication. As the proposed robotic CPS consists of many components interacting with each other, they require software that can facilitate internal communication between different software and hardware modules. A middleware allows this by providing an abstraction layer to communicate between devices with different protocols. Typically, this is achieved by a client–server or publish-subscribe approach such as used by the popular framework Robot Operating System (ROS). This open-source software has the key features of being modular and reusable and is integrated with other software libraries such as OpenCV for real-time computer vision, and open-source Python libraries such as KERAS and OpenAI for machine learning, allowing for rapid development of advanced robotic systems.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

149

5.2 Autonomous Multi-robot Systems The early generations of the multi-robot system proposed in Fig. 5 were tele-operated ensuring that workers need not enter hazardous environments. Such systems are complex to operate and require intensive human supervision. The next generation will be the autonomous multi-robot systems that has a hierarchical architecture depicted in Fig. 8. A robot can be defined as an autonomous system if it exists in the physical world, senses its environment, acts on what it senses and has a goal or objective. They consist of a mixture of hardware and software components which are designed to work together. Robot designs and suitable components will vary depending on the intended application along with constraints regarding size, power, and budget [84]. Autonomous robots require electronics for different purposes; often a micro controller is used for motors while a separate and more powerful on-board computer is used for processing sensor data. Since the electronics components are vulnerable to radiations the current practice to protect them is to use the radiation hardened electronics. This may increase the TID measures of the nuclear robot, however, from the viewpoint of completing a specific mission this may not be a sufficient time. Also, deploying numerous single robots will not necessarily improve the progress of the intended mission. Power efficiency is a key consideration for designing embedded systems in robotic applications. This may constrain practical implementation of advanced processing algorithms such as deep learning techniques which may alternatively be done using cloud-based form of computing. A multi-robot system also requires hardware for external communication. The simplest method of communication is using a wired connection. This is straightforward if the robot is static. However, for mobile robots a tether can become tangled or caught. Alternatively, wireless communication methods are often used, for example using Wi-Fi, allowing for greater system mobility and reconfiguration along with quicker setup times due to the reduction in cabling required. Within a radioactive environment, wireless communication and in particularly wireless sensor networks can be challenging to implement. Some of the challenges include lack of accessible power sources, radiation protection of communication system components, and reinforced wall in nuclear plants that results in significant signal attenuation, along with the need to ensure all communications are secure and reliable. A new wireless communication design architecture using nodes with a base station has recently been tested at the Sellafield site, UK, and was shown to be operationally effective within reinforced concrete structures found in nuclear plants while the low complexity sensor nodes used allow for greater radiation tolerance [85]. A low power but rather lower range communication methods such as Bluetooth and ZigBee can be also attempted for deployment of wireless sensor network in nuclear environments. Using multi-robot or modular cheap robotic units with simple functionality can be a possible solution to radiation exposure or TID problem. Such a system is inherently redundant and the collective behaviour of the multi-robot system is significant. In

150

D. Shanahan et al.

order to better understand the behaviour of a multi-robot system, it is useful to have a kinematics model. A key element in modelling of a robot is the generalised state of the system. This provides a complete representation of the multi-robot system by adding the states of the individual robots together. This is completed by defining the local and global reference frames to define the location of the robot and the relative distance between the multi-robot system and the extracted features in the environment. In addition to the state, forward and inverse kinematics of the robots are required for complete understanding of the robot behaviour. Building on the kinematics, a dynamic model can also be developed to give a relationship between control inputs and states.

5.3 Control System Design A robot acts on its environment in two main ways: locomotion and manipulation. This is achieved by various components depending on the task that is required. Effectors are components or devices that allow a robot to carry out tasks such as moving and grasping and consist of or are driven by actuators. Actuators typically produce linear or rotary motion and may be powered via different mediums including electrics, hydraulics, or pneumatics. While an actuator typically has one degree of freedom, a robot can be designed with mechanisms comprising joints and links giving multiple degrees of freedom. Dexterous operation of the robotic manipulators in the restricted environments existing in the nuclear sites require degrees of freedom more than required for a task. In this case, the manipulator is kinematically redundant and higher degrees of freedom are used to satisfy various environmental and robotics constraints such as obstacle avoidance, joint limit avoidance, avoiding singularities, etc. Figure 9 illustrates the detailed block diagram of the control system designed for the manipulation and grasping of a single dual arm robot. Similarly, the control system designed for autonomous operation of a single UAV in an unstructured environment is depicted in Fig. 10. Improving control systems leads to better performance, in turn allowing faster operation. In the development of new system, better control can also allow the use of smaller and lighter weight components for a given application. As many physical systems are inherently non-linear, there has been increasing demand for control schemes that can improve control of such systems. A common example of a system requiring non-linear control is the hydraulic manipulator. Greater availability of computing power has allowed more complex control schemes to be implemented as the real time control. Nevertheless, proportional-derivative and proportional-integralderivative control are still commonly used controllers for industrial robotics [86]. Developing control systems using non-linear control methods has many benefits over classical methods [87]. An overview of these is.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

151

Fig. 9 The schematic of the control system designed for a dual arm manipulator

Fig. 10 The schematic of the control system designed for autonomous operation of a single UAV

• Improvement in performance over linear control methods. • Control of systems that cannot easily be linearised, such as those with hard nonlinearities. • Reliable control in the presence of model uncertainties and external disturbances, classed as robust controllers and adaptive controllers. • The possibility of simpler control design due to methods being more consistent with the true system physics. • Guarantee of stability.

152

D. Shanahan et al.

• Cost saving, using cheaper components which are not required to have a linear operating region. Control Algorithms. Using the inverse dynamics of a system, a linear response can be obtained by an inverse dynamics control, sometimes also called computed torque. Often when designing a control scheme, knowledge of system of parameters is not perfect. One technique to address this is adaptive control. Adaptive control uses system identification techniques in combination with a model-based control law that allows model parameters to be updated in response to the error. As with most control laws, the challenge of a rigorous proof of stability is of central importance [88]. Unlike traditional continuous-time systems, distanced communication in cyber-physical nuclear robotic systems would potentially result in time delay, data loss, and even instability. Since the active nodes in the network are typically not synchronised with each other, the number of delay sources tends to be stochastic. Given the potential network congestion and packet loss in closed-loop robotic systems, the delay is typically assumed to follow a known probability distribution [89]. From the perspective of model simplification, time delay sequence can be hypothesised to be independently and identically distributed, where Markov chain would benefit the modelling of network-induced delays [90, 91]. Therefore, maintaining system stability in the presence of time delays is critical yet challenging for systematic stability analysis and control synthesis. A classic control strategy is built on the basis of passivity theory, which characterises the input–output property of the system’s dissipation. The energy stored in passive system does not exceed the energy imported from the environment. Passivity-based time-delay stabilisation methods have been intensively studied and have yielded rich results, such as scattering approach, wave variable, damping injection, and time domain passivity approach [92]. Different from passive-based methods, predictive control strategies avoid passive analysis by compensating for the uncertainty of communication delays in the system and provide unique advantages especially in handling constraints and uncertain delays [93]. Another alternative approach is machine learning control which is a subset of optimal control that allows improvements in performance as the system being controlled repeats tasks, not necessarily involving the use of a parametric model. This allows compensation of difficult to model phenomena such as friction and system nonlinearities. This type of control can also be used to validated dynamic modes of a system by studying how the torque error function varies over a system operating envelope. Research detailed in [94] shows how a remote centre of movement (RCM) strategy can be used to avoid collisions while working with small openings. This is particularly relevant to nuclear whereby ports are used to access highly radioactive environments. The physical and computational components are intertwined through communication networks in cyber-physical systems (CPSs). When the control design for the CPSs is considered, the CPSs can also be regarded as one type of nonlinear networked control systems (NCSs), in which the plant is controlled through the communication networks [95–98]. For the NCSs, one main concern is the security, in which the communication network may suffer from different types of malicious attacks, such

Robotics and Artificial Intelligence in the Nuclear Industry: From …

153

as deception attacks, replay attacks, denial-of-service (DoS) attacks, etc. [99–101]. Different from the deception attacks, where the attacks are trying to alter data from the sensors and controllers, the DoS attacks are trying to jam the communication networks for the control systems and deteriorate the control performance [99]. In the paper, the last type, i.e., DoS attacks type, is investigated in the networked control design. To improve the communication efficiency and save communication resources, the event-triggered mechanism is promising to be implemented in NCSs [102–106]. Following the event-triggered mechanism in the communication networks, instead of sending all the sampled data, only the sampled data considered as “very necessary” will be sent from the plant end to the controller end. Adopting the event-triggered mechanism into the NCSs, the control performance can be maintained as desired with lower consumption of the communication bandwidth. Having mentioned the merits of utilising event-triggered mechanism in NCSs, the event-triggered mechanism will make the stability analysis more complex and the feasible stability conditions are more difficult to obtain. Fault tolerant control (FTC) schemes are those that include techniques for reducing downtime and identifying developing faults and safety risks, by predicting, recognising faults early, and mitigating effects of faults before they develop into more serious problems. Within electrohydraulic systems, there are many faults that may possibly develop which can impact tracking performance and system safety. These include leakages, changes in supply pressure, and sensor faults [107]. FTC control can be broadly categorised into passive and active approaches. Passive FTC can often be implemented as an extension of robust control by which a fixed controllers are chosen that can satisfy requirements for all expected operating conditions. While this provides redundancy and improves system reliability, this is often at the cost of system performance as both normal operation and faults are considered together. Conversely, active FTC employs fault detection and diagnosis (FDD). Once a fault is detected, the active DTC scheme will generate at first an admissible control before gradually improving system performance once the system safety is guaranteed [108]. An example of FTC with FDD for robot manipulators with unknown loss of torque at a joint is detailed in [109]. Experimental test showed that the scheme was able to reduce tracking errors related to both bias and additive time-varying faults. Visual servoing employs data obtained via a vision sensor in order to provide feedback and control the motion of a robot. Classical approaches include image-based (IBVS) and pose-based (PBVS) visual servo control. From this, a global control scheme can be created, such as a velocity controller. PBVS tends to be computationally expensive while IBVS creates problems for control as it omits the pose estimation step used with PBVS and so requires image features to be solved from a nonlinear function of the camera pose [110]. For reliable visual servoing, it is important to have good object tracking ability, in order to be able to specify the desired end position. Tracking can be achieved through different methods including using fiducial markers, 2-D contour tracking and pose estimation. The two basic configurations of the end effector and camera are eye-in-hand, whereby the camera is attached to the end effector, and eye-to-hand, where the camera is fixed externally in the world. Visual servoing can be improved through good path planning, which accounts for

154

D. Shanahan et al.

constraints and uncertainties within a system. Path planning approaches for visual servoing can be categorised into 4 groups: image space, optimisation-based, potential field-based, and global [111].

5.4 Motion Planning Algorithms Motion planning is the process of determining the sequence of actions and motions required to achieve a goal such as moving from point A to B, in the case of a manipulator this will be moving or rearranging objects in an environment. At present in the nuclear industry teleoperation in still widely used which can be slow and tedious for operators due to poor situational awareness and difficulty in using controllers. Improved motion planning capability has the possibility of speeding up tasks such as packing waste into safe storage containers [112]. The goal of a planner is to find a solution to a planning problem while satisfying any constraints such as the kinematic and dynamics constraints of the links along with constraints that arise from the environment such as obstacles. This must be done in the presence of uncertainties arising from modelling, actuation, and sensing. Motion planning has traditionally been split into macro planning for large scale movements, and fine planning for high precision. Typically, as the level of constraints increases such as during fine motor movements, feedback must be obtained at a higher rate and actions are more computationally expensive [111]. One of the most basic forms of motion planning is by artificial potential fields which operates by incrementally exploring free space until a solution is found. This can be treated as an optimisation problem, using gradient descent to fins the optimal solution. Other methods include graph-based path planning which evaluate different path trees using a graph representation to get from the start to goal state. Examples of these are A*, Dijkstra, Breadth First Search (BFS), and Depth First Search (DFS). Alternatively, sampling-based path planning can be used which randomly add points to a tree until a solution is found. Examples include RRT and PRM [113]. Graphs are often used in motion planners, consisting of nodes which typically represent states, and edges which represent the ability to move between two nodes. Graphs can be directed or undirected, depending on whether edges are bidirectional. Weightings can be given to edges as a cost associated with traversing it. A tree refers to a graph with one root node, several leaf nodes, no cycles and at most one parent node per node. Graphs can also be represented as matrices. Once a graph is built, it can then be searched. A* is a popular best-first search algorithm which finds the minimum-cost path for a graph. It is a popular and efficient search technique which has been applied to general manipulators and those with revolute joints in. In order to allow a search such as A*, the configuration space must be discretised, which is most easily done with a grid. Using a grid requires consideration of the appropriate costs as well as in how many directions a path planner can travel. Multi-resolution grids can be used which repeatedly subdivide cells that are in contact with an obstacle, and

Robotics and Artificial Intelligence in the Nuclear Industry: From …

155

therefore reduce the computational complexity that is one of the main drawbacks of using grid methods [114]. Sampling methods sacrifice resolution-optimality but can find satisficing solutions quickly. These methods are spilt into two main groups: RRT for single-query and PRM for multiple-query planning. RRT is a data structure sampling scheme that can quickly search high dimensional spaces that have both algebraic and differential constraints. It does this through biasing exploration in the state space to “pull” towards unexplored areas [115]. PRM uses randomly generated free configurations of the robot which are then connected and stored as a graph. This learning phase is then followed by the query phase where a graph search is used to connect two nodes within the roadmap from the start and goal configurations. Segments of the path are then concatenated to find a full path for the robot. Difficulties found when querying can then be used to improve the roadmap [116]. Research into robotic path planning is an active research area; recently developments have utilised machine learning and deep neural networks to solve complex multi-objective planning problems [117]. The increasing number of planning algorithms is necessitating greater capability for benchmarking and compare algorithms on the basis of performance indicators such as computational efficiency, success rates, and optimality of the generated paths. Recently, the PathBench framework has been created that allows comparison of traditional sample- and graph-based algorithms and newer ML-base algorithms such as value iteration networks and long short-term memory (LSTM) networks [118]. This is achieved by using a simulator to test each algorithm, a generator and trainer component for the ML models, and an analyser component to generate statistical data on each trial. Research on the PathBench platform has shown that at present ML algorithms have longer path planning times in comparison to classical planning approaches [117]. Another key component of planning manipulation tasks is the ability to properly grasp an object. Research by Levine et al. [119] used a deep convolutional neural network to predict the chance of a successful grasp, and a continuous servoing mechanism to update the motor commands. The CNN was trained with data from over 80,000 grasp attempts. These were obtained with the same robot model; however, each robot is not identical and so the differences provided a diverse dataset for the neural network to learn from. The proposed grasping methods can find non-obvious grasping strategies and have the possibility to be extended to a wider range of grasping strategies as the dataset increases.

5.5 Vision and Perception Sensors are a key component of robotic needed for measuring physical properties such as position and velocity. They can be classified as either proprioceptive or exteroceptive depending on whether they take measurements of the robot itself or of the environment. They can also be categorised according to energy output, being

156

D. Shanahan et al.

either active or passive. Commonly used sensors include LiDAR (Light Detection and Ranging), SONAR (Sound Navigation and Ranging), RADAR (Radio Detection and Ranging), RGB Camera, and RGB-D Camera. Processing sensor data can be a difficult task in robotics as measurements are often noisy, can be intermittent, and must sometimes be taken indirectly. Many sensors provide a large amount of data which requires processing to extract useful components such as for obstacle detection and object recognition. One of the most commonly used forms of sensing is by visual data from a camera. Light has many properties that can be measured such as intensity and wavelength and can interact by different means such as absorption and reflection. Using an image, the size, shape, and/or position of an object can be determined. Digital cameras use a light sensor to convert a projection of the 3D world into a 2D image, a technique known as perspective projection. As images are collected using a lens, it is important to account for how the image is formed as it passes through the lens such as using the thin lens equation to relate the distances between the object and image [110]. Recently, deep neural networks have been used for a range of computer vision tasks such as object recognition and classification, depth map inference, and pose and motion estimation. This can be extended to visual servoing, using direct visual servoing (DVS) which does not require feature extraction or tracking. Using a deep neural network the convergence domain can be increased to create a CNN-based VS as in [120]. Object Recognition and Detection. Object recognition has many applications such as position measurement, inspection, sorting, counting, and detection. Requirements for an object recognition task vary depending on the application and may include evaluation time, accuracy, recognition reliability, and invariance. Invariance can be with respect to illumination, scale, rotation, background clutter, partial occlusion, and viewpoint change [121]. In unstructured environments such as with nuclear decommissioning, it is likely that all these aspects will in some way affect the object recognition algorithm. A neural network can be used for image classification, outputting a probability for a given object. Images can be converted to a standard array of pixels which are then used as input to the neural network [74]. Deformable part models (DPM) for discriminative training of classifiers have shown to be efficient and accurate on difficult datasets in [122]. This can also be implemented using C++ with OpenCV which combines the DPM with a cascade algorithm to speed up detection. Research in [123] looked to exploit visual context, to aid object recognition, for example by identifying the room an object is in, and then using that information to help narrow down possible objects. This allows recognition with less local stimulus. Advances in deep learning algorithms have opened up new avenues of research in object detection. Some commonly used detectors include R-CNN, YOLO and SSD, which generally operate by localising in terms of bounding boxes [124]. Research in [125] looks to address the issues within the nuclear industry of detecting and categorising waste objects using a RGB-D camera. Common objects in decommissioning includes PPE, tools, and pipes. These need to be detected, categorised, sorted, and

Robotics and Artificial Intelligence in the Nuclear Industry: From …

157

segregated according to their radioactivity level. Presently DCNN methods are a good solution for object detection and recognition; however, they rely on large amounts of training data which may be unavailable for new applications. Research at the University of Birmingham [125] looked to address this by using weakly-supervised deep learning for detection and recognition of common nuclear waste objects. Using minimally annotated data for initial training the network was able to handle sparse examples and was able to be implemented in a real time recognition pipeline for detecting and categorising unknown waste objects. Researchers at the University of Birmingham have also been able to expand 3D geometric reconstruction to allow semantic mapping and provide understanding of features within scene contexts. This was achieved using a Pixel-Voxel network to process RGB image and point cloud data [126]. Pose Estimation. 6D object detection is the combination of object detection with 6D pose estimation. Within manufacturing there is demand for algorithms that can perform 6D object detection for tasks such as grasping and quality control. Tasks situations in manufacturing have the benefits of known CAD models, good cameras, and controlled environments in terms of lightning. In the nuclear industry this is often not the case and algorithms are required that can withstand lack of these factors along with other difficulties such as occlusions, lack of textures, unknown instances and colours, and difficult surface properties. A common approach to training 6D object detection algorithms is via model-based training. This use CAD models to generate augmented images; for geometric manipulations producing training images is straightforward, while generating images with variations in surface properties or projections can be more difficult. It is still however more cost effective and time efficient to use model based training when possible rather than creating real images, which can be particularly problematic in varying environmental conditions [127].

5.6 Digital Twins in Nuclear Environment The creation of digital twins has been identified by the UK’s National Nuclear Laboratory (NNL) as an area that could be adapted for use in a nuclear environment. Digital twins have the possibility to be utilised throughout the lifespan of a nuclear facility. However, while the technology may be available, implementing digital twins could be more difficult due to stringent requirements for safety and security demanded by regulatory bodies. Despite this the nuclear industry is in a good position to make better use of digital twin technology, having already built a strong system of documentation based on destructive and non-destructive testing and analysis of components and infrastructure [128]. Some progress is development of digital twins solutions for nuclear has been made by the consortium for advanced simulation of INRs [129]. In [130], digital twins are identified as a possible to technology to help the UK develop the next generation of nuclear plants. This can be achieved by benefits including

158

D. Shanahan et al.

increased efficiency and improved safety analysis. This would be building on the Integrated Nuclear Digital Environment (INDE) as proposed in [128]. Digital twins offer the opportunity to visualise and simulate work tasks in a virtual environment using up to date data to plan work and improve efficiency and safety. While task simulation may not require a digital twin, and a digital twin may not necessarily be used for task simulation. The authors in [131] used Choreonoid to simulate tasks to be performed by remotely controlled robots. To achieve this, they developed and used several plug-ins for the software to emulate behaviour such as underwater, aerial, camera-view modifications and disturbance, gamma camera, and communication failure effect. In another set of research, a digital twin in virtual environment is used to analyse scenarios involving a remote teleoperation system. Beneficial to using the digital twin is the opportunity to test configuration changes including in development of a convolutional neural network [132]. In [133] the authors developed a digital environment within Gazebo to allow the simulation of ionising radiation to study the effects of interactions with radioactive sources and how radiation detectors can be better developed. While this allowed some research into the optimisation of robotic activities in radioactive environments, due to the heavy computational burden of modelling complex radiation sources, some simplifications had to be made such as point sources and assumption of a constant radioactivity. Simulation can also often used in the development of control systems, to aid with system design and operator training. A real-time simulator was developed in [134] that was verified using open loop control experiments, and then was subsequently applied to investigate the performance of trajectory tracking and pipe-cutting tasks.

6 Conclusions Future work will be focused on developing a virtual environment to allow sharing of data between robots. This can be combined with improved methods to facilitate human-in-the-loop control where data processing can be embedded within system to provide insights and predictions to an operator, allowing for more efficient completion of tasks. In addition, fault tolerant control schemes will be researched to allow for more robust systems that are able to handle the demands of a nuclear environment. On going research will require collaboration with both research groups and industry to ensure results are feasible in relation to regulatory requirements. This chapter has presented the background on decontamination and decommissioning tasks, along with a review of current methods in industry to perform D&D operations. A key goal for the industry is to develop more autonomous systems which can reduce the need for workers to enter dangerous radioactive environments or spend excessive time operating equipment. Such advances have applications and benefits in related industries and other hazardous environments. Building upon more advanced autonomous systems, the concept of cyber physical system is introduced

Robotics and Artificial Intelligence in the Nuclear Industry: From …

159

and some of the progress made in utilising such systems in manufacturing as part of the industry 4.0 concept are detailed. Finally, an overview of the enabling technologies along with a concept framework for a nuclear decommissioning CPS is developed with attention to how developments in Industry 4.0 can be transferred for application in nuclear decommissioning activities.

References 1. NAO. (2022). The decommissioning of the AGR nuclear power stations. https://www.nao. org.uk/report/the-decommissioning-of-the-agr-nuclear-power-stations/. 2. Nuclear Decommissioning Authority. (2022). Nuclear Decommissioning Authority Annual Report and Account 2021/22. http://www.nda.gov.uk/documents/upload/Annual-Report-andAccounts-2010-2011.pdf. 3. NEA. (2014). R&D and Innovation Needs for Decommissioning Nuclear Facilities. https://www.oecd-nea.org/jcms/pl_14898/r-d-and-innovation-needs-for-decommission ing-nuclear-facilities. 4. Industry Radiological Protection Co-ordination Group. (2012). The application of ALARP to radiological risk, (IRPCG) Group. 5. Marturi, N., et al. (2017). Towards advanced robotic manipulations for nuclear decommissioning. In Robots operating in hazardous environments. https://doi.org/10.5772/intechopen. 69739. 6. Watson, S., Lennox, B., & Jones, J. (2020). Robots and autonomous systems for nuclear environments. 7. Sellafield Ltd. (2021). Future research and development requirements 2021 (pp. 1–32). 8. NDA. (2019). Integrated waste management radioactive waste strategy. https://www.gov.uk/ government/consultations/nda-radioactive-waste-management-strategy. 9. Bogue, R. (2015). Robots in the nuclear industry: a review of technologies and applications. 10. Montazeri, A., & Ekotuyo, J. (2016). Development of dynamic model of a 7DOF hydraulically actuated tele-operated robot for decommissioning applications. In Proceedings of American Control Conference (Vol. 2016-July, pp. 1209–1214). https://doi.org/10.1109/ACC.2016.752 5082. (Jul 2016). 11. Montazeri, A., West, C., Monk, S. D., & Taylor, C. J. (2017). Dynamic modelling and parameter estimation of a hydraulic robot manipulator using a multi-objective genetic algorithm. International Journal of Control, 90(4), 661–683. https://doi.org/10.1080/00207179.2016. 1230231. 12. West, C., Montazeri, A., Monk, S. D., & Taylor, C. J. (2016). A genetic algorithm approach for parameter optimization of a 7DOF robotic manipulator. IFAC-PapersOnLine, 49(12), 1261–1266. https://doi.org/10.1016/j.ifacol.2016.07.688. 13. West, C., Montazeri, A., Monk, S. D., Duda, D. & Taylor, C. J. (2017). A new approach to improve the parameter estimation accuracy in robotic manipulators using a multi-objective output error identification technique. In RO-MAN 2017-26th IEEE International Symposium on Robot and Human Interactive Communication, Dec. 2017 (Vol. 2017-Jan, pp. 1406–1411). https://doi.org/10.1109/ROMAN.2017.8172488. 14. Burrell, T., Montazeri, A., Monk, S., & Taylor, C. J. J. (2016). Feedback control—based inverse kinematics solvers for a nuclear decommissioning robot. IFAC-PapersOnLine, 49(21), 177–184. https://doi.org/10.1016/j.ifacol.2016.10.541. 15. Oveisi, A., Anderson, A., Nestorovi´c, T., Montazeri, A. (2018). Optimal input excitation design for nonparametric uncertainty quantification of multi-input multi-output systems (Vol. 51, no. 15, pp. 114–119). https://doi.org/10.1016/j.ifacol.2018.09.100.

160

D. Shanahan et al.

16. Oveisi, A., Nestorovi´c, T., & Montazeri, A. (2018). Frequency domain subspace identification of multivariable dynamical systems for robust control design, vol. 51, no. 15, pp. 990–995. https://doi.org/10.1016/j.ifacol.2018.09.065. 17. West, C., Monk, S. D., Montazeri, A., & Taylor, C. J. (2018) A vision-based positioning system with inverse dead-zone control for dual-hydraulic manipulators. In 2018 UKACC 12th International Conference on Control, CONTROL 2018 (pp. 379–384). https://doi.org/ 10.1109/CONTROL.2018.8516734. (Oct, 2018). 18. West, C., Wilson, E. D., Clairon, Q., Monk, S., Montazeri, A., & Taylor, C. J. (2018). State-dependent parameter model identification for inverse dead-zone control of a hydraulic manipulator∗ . IFAC-PapersOnLine, 51(15), 126–131. https://doi.org/10.1016/j.ifacol.2018. 09.102. 19. Burrell, T., West, C., Monk, S. D., Montezeri, A., & Taylor, C. J. (2018). Towards a cooperative robotic system for autonomous pipe cutting in nuclear decommissioning. In 2018 UKACC 12th International Conference on Control, CONTROL 2018 (pp. 283–288). https://doi.org/ 10.1109/CONTROL.2018.8516841. (Oct 2018). 20. Nemati, H., & Montazeri, A. (2018). Analysis and design of a multi-channel time-varying sliding mode controller and its application in unmanned aerial vehicles. IFAC-PapersOnLine, 51(22), 244–249. https://doi.org/10.1016/j.ifacol.2018.11.549. 21. Nemati, H., & Montazeri, A. (2018). Design and development of a novel controller for robust attitude stabilisation of an unmanned air vehicle for nuclear environments. In 2018 UKACC 12th International Conference on Control (CONTROL) (pp. 373–378). https://doi.org/10. 1109/CONTROL.2018.8516729. 22. Nemati, H., Montazeri, A. (2019). Output feedback sliding mode control of quadcopter using IMU navigation. In Proceedings-2019 IEEE International Conference on Mechatronics, ICM 2019 (pp. 634–639). https://doi.org/10.1109/ICMECH.2019.8722899. (May 2019). 23. Nokhodberiz, N. S., Nemati, H., & Montazeri, A. (2019). Event-triggered based state estimation for autonomous operation of an aerial robotic vehicle. IFAC-PapersOnLine, 52(13), 2348–2353. https://doi.org/10.1016/j.ifacol.2019.11.557. 24. Lamb, F. (2013). Industrial automation hands-on. 25. Weyer, S., Schmitt, M., Ohmer, M., & Gorecky, D. (2015). Towards industry 4.0Standardization as the crucial challenge for highly modular, multi-vendor production systems. IFAC-PapersOnLine, 28(3), 579–584. https://doi.org/10.1016/j.ifacol.2015.06.143. 26. IAEA. (2004). The nuclear power industry’s ageing workforce : transfer of knowledge to the next generation (p. 101). (no. June). 27. Department for Business Energy and Industrial Strategy UK. 2022 Civil Nuclear Cyber Security Strategy. https://assets.publishing.service.gov.uk/government/uploads/system/ uploads/attachment_data/file/1075002/civil-nuclear-cyber-security-strategy-2022.pdf. (no. May, 2022). 28. Emptage, M., Loudon, D., Mcleod, R., Milburn, H., & Row, N. (2016). Characterisation: Challenges and opportunities–A UK perspective (pp. 1–10). 29. Euratom (2022) Cyber physicaL Equipment for unmAnned Nuclear DEcommissioning Measurements. Horizon 2020. Retrieved September 08, 2022, from https://cordis.europa. eu/project/id/945335. 30. OECD/NEA. (1999). Decontamination techniques used in decommissioning activities. In Nuclear Energy Agency (p. 51). 31. Aitken, J. M., et al. (2018). Autonomous nuclear waste management. IEEE Intelligent Systems, 33(6), 47–55. https://doi.org/10.1109/MIS.2018.111144814. 32. Euratom (2020) PREDIS. Horizon 2020. https://doi.org/10.3030/945098. 33. Smith, R., Cucco, E., & Fairbairn, C. (2020). Robotic development for the nuclear environment: Challenges and strategy. Robotics, 9(4), 1–16. https://doi.org/10.3390/robotics9 040094. 34. Vitanov, I., et al. (2021). A suite of robotic solutions for nuclear waste decommissioning. Robotics, 10(4), 1–20. https://doi.org/10.3390/robotics10040112.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

161

35. Monk, S. D., Grievson, A., Bandala, M., West, C., Montazeri, A., & Taylor, C. J. (2021). Implementation and evaluation of a semi-autonomous hydraulic dual manipulator for cutting pipework in radiologically active environments. Robotics, 10(2). https://doi.org/10.3390/rob otics10020062. 36. Adjigble, M., Marturi, N., Ortenzi, V., Rajasekaran, V., Corke, P., & Stolkin, R. (2018). Model-free and learning-free grasping by Local Contact Moment matching. In IEEE International Conference on Intelligent Robots and Systems (pp. 2933–2940). https://doi.org/10. 1109/IROS.2018.8594226. 37. Tokatli, O., et al. (2021). Robot-assisted glovebox teleoperation for nuclear industry. Robotics, 10(3). https://doi.org/10.3390/robotics10030085. 38. Jang, I., Carrasco, J., Weightman, A., & Lennox, B. (2019). Intuitive bare-hand teleoperation of a robotic manipulator using virtual reality and leap motion. In TAROS 2019 (pp. 283–294). London: Springer. 39. Sayed, M. E., Roberts, J. O., & Donaldson, K. (2022). Modular robots for enabling operations in unstructured extreme environments. Advanced Intelligent Systems. https://doi.org/10.1002/ aisy.202000227. 40. Cerba, Š, Lüley, J., Vrban, B., Osuský, F., & Neˇcas, V. (2020). Unmanned radiation-monitoring system. IEEE Transactions on Nuclear Science, 67(4), 636–643. https://doi.org/10.1109/TNS. 2020.2970782. 41. Tsitsimpelis, I., Taylor, C. J., Lennox, B., & Joyce, M. J. (2019). A review of ground-based robotic systems for the characterization of nuclear environments. Progress in Nuclear Energy, 111, 109–124. https://doi.org/10.1016/j.pnucene.2018.10.023. (no. Oct, 2018). 42. Groves, K., Hernandez, E., West, A., Wright, T., & Lennox, B. (2021). Robotic exploration of an unknown nuclear environment using radiation informed autonomous navigation. Robotics, 10(2), 1–15. https://doi.org/10.3390/robotics10020078. 43. Groves, K., West, A., Gornicki, K., Watson, S., Carrasco, J., & Lennox, B. (2019). MallARD: An autonomous aquatic surface vehicle for inspection and monitoring of wet nuclear storage facilities. Robotics, 8(2). https://doi.org/10.3390/ROBOTICS8020047. 44. Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and CyberneticsPart A: Systems and Humans, 30(3), 286–297. https://doi.org/10.1109/3468.844354. 45. Gamer, T., Hoernicke, M., Kloepper, B., Bauer, R., & Isaksson, A. J. (2020). The autonomous industrial plant–future of process engineering, operations and maintenance. Journal of Process Control, 88, 101–110. https://doi.org/10.1016/j.jprocont.2020.01.012. 46. Luckcuck, M., Fisher, M., Dennis, L., Frost, S., White, A., & Styles, D. (2021). Principles for the development and assurance of autonomous systems for safe use in hazardous environments. https://doi.org/10.5281/zenodo.5012322. 47. Blum, C., Winfield, A. F. T., & Hafner, V. V. (2018). Simulation-based internal models for safer robots. Frontiers in Robotics and AI, 4. https://doi.org/10.3389/frobt.2017.00074. (no. Jan, 2018). 48. Lee, E. A. (2008). Cyber physical systems: Design challenges. In Proceedings-11th IEEE Symposium Object/Component/Service-Oriented Real-Time Distributed Computing ISORC 2008, (pp. 363–369). https://doi.org/10.1109/ISORC.2008.25. 49. NIST. (2017). Framework for Cyber-Physical Systems: Volume 1, Overview NIST Special Publication 1500–201 Framework for Cyber-Physical Systems: Volume 1, Overview. https:// nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1500-201.pdf. 50. Wang, L., Törngren, M., & Onori, M. (2015). Current status and advancement of cyberphysical systems in manufacturing. Journal of Manufacturing Systems, 37, 517–527. (no. Oct, 2020). https://doi.org/10.1016/j.jmsy.2015.04.008. 51. Lee, J., Bagheri, B., & Kao, H. A. (2015). A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23. https://doi.org/10.1016/ j.mfglet.2014.12.001. 52. Pivoto, D. G. S., de Almeida, L. F. F., da Rosa Righi, R., Rodrigues, J. J. P. C., Lugli, A. B., & Alberti, A. M. (2021). Cyber-physical systems architectures for industrial internet of things

162

53.

54.

55.

56.

57. 58.

59.

60.

61.

62.

63.

64. 65.

66. 67.

68.

69.

70.

D. Shanahan et al. applications in Industry 4.0: A literature review. Journal of Manufacturing Systems, 58(no. PA), 176–192. https://doi.org/10.1016/j.jmsy.2020.11.017. Sisinni, E., Saifullah, A., Han, S., Jennehag, U., & Gidlund, M. (2018). Industrial internet of things: Challenges, opportunities, and directions. IEEE Transactions on Industrial Informatics, 14(11), 4724–4734. https://doi.org/10.1109/TII.2018.2852491. Aceto, G., Persico, V., Pescapé, A., & Member, S. (2019). A Survey on information and communication technologies for industry 4.0: State-of-the-art, taxonomies, perspectives, and challenges. IEEE Communications Surveys and Tutorials, 21(4), 3467–3501. Luo, R. C., & Kuo, C. W. (2016). Intelligent seven-DoF robot with dynamic obstacle avoidance and 3-D object recognition for industrial cyber-physical systems in manufacturing automation. Proceedings of the IEEE, 104(5), 1102–1113. https://doi.org/10.1109/JPROC.2015.2508598. Yaacoub, J. P. A., Salman, O., Noura, H. N., Kaaniche, N., Chehab, A., & Malli, M. (2020). Cyber-physical systems security: Limitations, issues and future trends. Microprocessors and Microsystems, 77. https://doi.org/10.1016/j.micpro.2020.103201. Wollschalger, M., Sauter, T., & Jasperneite, J. (2017). The Future of Industrial Communication. IEEE Industrial Electronics Magazine, pp. 17–27. (no. March). Krishnamurthi, R., Kumar, A., Gopinathan, D., Nayyar, A., & Qureshi, B. (2020). An overview of IoT sensor data processing, fusion, and analysis techniques. Sensors, 20(21), 1–23. https:// doi.org/10.3390/s20216076. Simoens, P., Dragone, M., & Saffiotti, A. (2018). The internet of robotic things: A review of concept, added value and applications. International Journal of Advanced Robotic Systems, 15(1), 1–11. https://doi.org/10.1177/1729881418759424. Mukherjee, M., Shu, L., & Wang, D. (2018). Survey of fog computing: Fundamental, network applications, and research challenges. IEEE Communications Surveys and Tutorials, 20(3), 1826–1857. https://doi.org/10.1109/COMST.2018.2814571. Qiu, T., Chi, J., Zhou, X., Ning, Z., Atiquzzaman, M., & Wu, D. O. (2020). Edge computing in industrial internet of things: Architecture, advances and challenges. IEEE Communications Surveys and Tutorials, 22(4), 2462–2488. https://doi.org/10.1109/COMST.2020.3009103. Kehoe, B., Patil, S., Abbeel, P., & Goldberg, K. (2015). A survey of research on cloud robotics and automation. IEEE Transactions on Automation Science and Engineering, 12(2), 398–409. https://doi.org/10.1109/TASE.2014.2376492. Chaari, I., Koubaa, A., Qureshi, B., Youssef, H., Severino, R., & Tovar, E. (2018). On the robot path planning using cloud computing for large grid maps. In 18th IEEE International Conference on Autonomous Robot Systems and Competitions. ICARSC 2018, (pp. 225–230). https://doi.org/10.1109/ICARSC.2018.8374187. Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Network Application, 19(2), 171–209. https://doi.org/10.1007/s11036-013-0489-0. Tao, F., Qi, Q., Wang, L., & Nee, A. Y. C. (2019). Digital twins and cyber-physical systems toward smart manufacturing and industry 4.0: Correlation and comparison. Engineering, 5(4), 653–661. https://doi.org/10.1016/j.eng.2019.01.014. Upadhyay, H., Lagos, L., Joshi, S., & Abrahao, A. (2018) Big data framework with machine learning for D&D applications. Glaessgen, E. H., & Stargel, D. S. (2012). The digital twin paradigm for future NASA and U.S. Air force vehicles. In 53rd Structures, Structural Dynamics, and Materials Conference: Special Session on the Digital Twin (pp. 1–14). Minerva, R., Lee, G. M., & Crespi, N. (2020). Digital twin in the IoT context: A survey on technical features, scenarios, and architectural models. Proceedings of the IEEE, 108(10), 1785–1824. https://doi.org/10.1109/JPROC.2020.2998530. Fuller, A., Fan, Z., Day, C., & Barlow, C. (2020). Digital twin: Enabling technologies, challenges and open research. IEEE Access, 8, 108952–108971. https://doi.org/10.1109/ACCESS. 2020.2998358. Mathworks (2021) Digital twins for predicitive maintenance. https://explore.mathworks.com/ digital-twins-for-predictive-maintenance.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

163

71. Weiss, G. (1999). Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, (Vol. 3, no. 2). http://books.google.com/books?hl=nl&lr=&id=JYcznFCN3xcC& pgis=1. 72. Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach. Prentice Hall. 73. Alpaydın, E. (2010). Introduction to machine learning second edition. MIT Press. https://doi. org/10.1007/978-1-62703-748-8_7. 74. Goodfellow, I., Bengio, Y., & Courville, A. (2012) Deep learning. 75. Li, Y., et al. (2022) A review on interaction control for contact robots through intent detection. Progress in Biomedical Engineering, 4(3). https://doi.org/10.1088/2516-1091/ac8193. 76. Ganesh, G., Takagi, A., Osu, R., Yoshioka, T., Kawato, M., & Burdet, E. (2014). Two is better than one: Physical interactions improve motor performance in humans. Science and Reports, 4(1), 3824. https://doi.org/10.1038/srep03824. 77. Takagi, A., Ganesh, G., Yoshioka, T., Kawato, M., & Burdet, E. (2017). Physically interacting individuals estimate the partner’s goal to enhance their movements. Nature Human Behaviour, 1(3), 54. https://doi.org/10.1038/s41562-017-0054. 78. Li, Y., Eden, J., Carboni, G., & Burdet, E. (2020). Improving tracking through human-robot sensory augmentation. IEEE Robotics and Automation Letters, 5(3), 4399–4406. https://doi. org/10.1109/LRA.2020.2998715. 79. Ba¸sar, T., & Olsder, G. J. (1998). Dynamic noncooperative game theory (2nd ed.). Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611971132. 80. Nilsson, N. (1969). A mobile Automaton. An application of artificial intelligence techniques. 81. Brooks, R. A. (1986). A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 2(1), 14–23. https://doi.org/10.1109/JRA.1986.1087032. 82. Siciliano, B., & Khatib, O. (2012). Handbook of robotics. https://link.springer.com/book/. https://doi.org/10.1007/978-3-319-32552-1. 83. Albus, J., et al. (2002). 4D/RCS version 2.0: A reference model architecture for unmanned vehicle systems. NIST Interagency/Internal Report (NISTIR), National Institute of Standards and Technology, Gaithersburg, MD. https://doi.org/10.6028/NIST.IR.6910. 84. Mataric, M. J. (2008). The robotics primer. MIT Press. https://doi.org/10.5860/choice.453222. 85. Di Buono, A., Cockbain, N., Green, P., & Lennox, B. (2021). Wireless communications in nuclear decommissioning environments. In UK-RAS Conference: Robots Working For and Among us Proceedings (Vol. 1, pp. 71–73). https://doi.org/10.31256/ukras17.23. 86. Spong, M. W. (2022). An historical perspective on the control of robotic manipulators. Annual Review of Control, Robotics, and Autonomous Systems, 5(1). https://doi.org/10.1146/annurevcontrol-042920-094829. 87. Slotine, J.-J. E., & Li, W. (2011). Applied nonlinear control. Prentice Hall. 88. Craig, J. J., Hsu, P., & Sastry, S. S. (1987). Adaptive control of mechanical manipulators. The International Journal of Robotics Research, 6(2), 16–28. https://doi.org/10.1177/027836498 700600202. 89. Shousong, H., & Qixin, Z. (2003). Stochastic optimal control and analysis of stability of networked control systems with long delay. Automatica, 39(11), 1877–1884. https://doi.org/ 10.1016/S0005-1098(03)00196-1. 90. Huang, D., & Nguang, S. K. (2008). State feedback control of uncertain networked control systems with random time delays. IEEE Transactions on Automatic Control, 53(3), 829–834. https://doi.org/10.1109/TAC.2008.919571. 91. Shi, Y., & Yu, B. (2009). Output feedback stabilization of networked control systems with random delays modeled by Markov chains. IEEE Transactions on Automatic Control, 54(7), 1668–1674. https://doi.org/10.1109/TAC.2009.2020638. 92. Hokayem, P. F., & Spong, M. W. (2006). Bilateral teleoperation: An historical survey. Automatica, 42(12), 2035–2057. https://doi.org/10.1016/j.automatica.2006.06.027. 93. Bemporad, A. (1998). Predictive control of teleoperated constrained systems with unbounded communication delays. In Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171), 1998 (Vol. 2, pp. 2133–2138). https://doi.org/10.1109/CDC.1998. 758651.

164

D. Shanahan et al.

94. Guo, K., Su, H., & Yang, C. (2022) A small opening workspace control strategy for redundant manipulator based on RCM method. IEEE Transactions on Control Systems Technology, 1–9. https://doi.org/10.1109/TCST.2022.3145645. 95. Walsh, G. C., Ye, H., & Bushnell, L. G. (2002). Stability analysis of networked control systems. IEEE Transactions on Control Systems Technology, 10(3), 438–446. https://doi.org/10.1109/ 87.998034. 96. Tipsuwan, Y., & Chow, M.-Y. (2003). Control methodologies in networked control systems. Control Engineering Practice, 11, 1099–1111. https://doi.org/10.1016/S0967-0661(03)000 36-4. 97. Yue, D., Han, Q.-L., & Lam, J. (2005). Network-based robust H∞ control of systems with uncertainty. Automatica, 41(6), 999–1007. https://doi.org/10.1016/j.automatica.2004.12.011. 98. Zhang, X.-M., Han, Q.-L., & Zhang, B.-L. (2017). An overview and deep investigation on sampled-data-based event-triggered control and filtering for networked systems. IEEE Transactions on Industrial Informatics, 13(1), 4–16. https://doi.org/10.1109/TII.2016.260 7150. 99. Pasqualetti, F., Member, S., Dör, F., Member, S., & Bullo, F. (2013). Attack detection and identification in cyber-physical systems. Attack Detection and Identification in Cyber-Physical Systems, 58(11), 2715–2729. 100. Dolk, V. S., Tesi, P., De Persis, C., & Heemels, W. P. M. H. (2017). Event-triggered control systems under denial-of-service attacks. IEEE Transactions on Control of Network Systems., 4(1), 93–105. https://doi.org/10.1109/TCNS.2016.2613445. 101. Ding, D., Han, Q.-L., Xiang, Y., Ge, X., & Zhang, X.-M. (2018). A survey on security control and attack detection for industrial cyber-physical systems. Neurocomputing, 275(C), 1674–1683. https://doi.org/10.1016/j.neucom.2017.10.009. 102. Yue, D., Tian, E., & Han, Q.-L. (2013). A delay system method for designing event-triggered controllers of networked control systems. IEEE Transactions on Automatic Control, 58(2), 475–481. https://doi.org/10.1109/TAC.2012.2206694. 103. Wu, L., Gao, Y., Liu, J., & Li, H. (2017). Event-triggered sliding mode control of stochastic systems via output feedback. Automatica, 82, 79–92. https://doi.org/10.1016/j.automatica. 2017.04.032. 104. Li, X.-M., Zhou, Q., Li, P., Li, H., & Lu, R. (2020). Event-triggered consensus control for multi-agent systems against false data-injection attacks. IEEE Transactions on Cybernetics, 50(5), 1856–1866. https://doi.org/10.1109/TCYB.2019.2937951. 105. Zhang, L., Liang, H., Sun, Y., & Ahn, C. K. (2021). Adaptive event-triggered fault detection scheme for semi-markovian jump systems with output quantization. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(4), 2370–2381. https://doi.org/10.1109/TSMC. 2019.2912846. 106. Huo, X., Karimi, H. R., Zhao, X., Wang, B., & Zong, G. (2022). Adaptive-critic design for decentralized event-triggered control of constrained nonlinear interconnected systems within an identifier-critic framework. IEEE Transactions on Cybernetics, 52(8), 7478–7491. https:// doi.org/10.1109/TCYB.2020.3037321. 107. Dao, H. V., Tran, D. T., & Ahn, K. K. (2021). Active fault tolerant control system design for hydraulic manipulator with internal leakage faults based on disturbance observer and online adaptive identification. IEEE Access, 9, 23850–23862. https://doi.org/10.1109/ACC ESS.2021.3053596. 108. Yu, X., & Jiang, J. (2015). A survey of fault-tolerant controllers based on safety-related issues. Annual Reviews in Control, 39, 46–57. https://doi.org/10.1016/j.arcontrol.2015.03.004. 109. Freddi, A., Longhi, S., Monteriù, A., Ortenzi, D., & Proietti Pagnotta, D. (2019). Fault tolerant control scheme for robotic manipulators affected by torque faults. IFAC-PapersOnLine, 51(24), 886–893. https://doi.org/10.1016/j.ifacol.2018.09.680. 110. Corke, P. (2016). Robotics, vision and control (2nd ed.). Springer. 111. Brock, O., Kuffner, J., & Xiao, J. (2012) Robotic motion planning. In Springer handbook of robotics. Springer.

Robotics and Artificial Intelligence in the Nuclear Industry: From …

165

112. Marturi, N., et al. (2017). Towards advanced robotic manipulation for nuclear decommissioning: A pilot study on tele-operation and autonomy. In International Conference on. Robotics and Automation for Humanitarian Applications RAHA 2016-Conference Proceedings. https://doi.org/10.1109/RAHA.2016.7931866. 113. Spong, M. W., Hutchinson, S., & Vidyasgar, M. (2004). Robot dynamics and control. 114. Lozano-PéRez, T. (1987). A simple motion-planning algorithm for general robot manipulators. IEEE Journal of Robotics and Automation, 3(3), 224–238. https://doi.org/10.1109/JRA. 1987.1087095. 115. Lavalle, S., & Kuffner, J. (2000). Rapidly-exploring random trees: Progress and prospects. Algorithmic Computational Robotics. (New Dir.). 116. Kavraki, L. E., Švestka, P., Latombe, J. C., & Overmars, M. H. (1996). Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4), 566–580. https://doi.org/10.1109/70.508439. 117. Hsueh, H.-Y., et al. (2022). Systematic comparison of path planning algorithms using PathBench (pp. 1–23). http://arxiv.org/abs/2203.03092. 118. Guo, N., Li, C., Gao, T., Liu, G., Li, Y., & Wang, D. (2021). A fusion method of local path planning for mobile robots based on LSTM neural network and reinforcement learning. Mathematical Problems in Engineering, 2021. https://doi.org/10.1155/2021/5524232. 119. Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., & Quillen, D. (2018). Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. International Journal of Robotics Research, 37(4–5), 421–436. https://doi.org/10.1177/027836491 7710318. 120. Bateux, Q., et al. (2018). Training deep neural networks for visual servoing. In ICRA 2018IEEE International Conference on Robotics and Automation, 2018 (pp. 3307–3314). 121. Treiber, M. (2013). An introduction to object recognition selected algorithms for a wide variety of applications. Springer. 122. Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9). https://doi.org/10.1109/MC.2014.42. 123. Torralba, A., Murphy, K. P., Freeman, W. T., & Rubin, M. A. (2003). Context-based vision system for place and object recognition. In Proceedings of the IEEE International Conference on Computer Vision (Vol. 1, pp. 273–280). https://doi.org/10.1109/iccv.2003.1238354. 124. Zakharov, S., Shugurov, I., & Ilic, S. (2019) DPOD: 6D pose object detector and refiner. In Proceedings of the IEEE International Conference on Computer Vision, (Vol. 2019 Oct, pp. 1941–1950). https://doi.org/10.1109/ICCV.2019.00203. 125. Sun, L., Zhao, C., & Yan, Z. (2019). A novel weakly-supervised approach for RGB-D-based nuclear waste object detection (Vol. 19, no. 9, pp. 3487–3500). 126. Zhao, C., Sun, L., Purkait, P., Duckett, T., & Stolkin, R. (2018). Dense RGB-D semantic mapping with pixel-voxel neural network. Sensors (Switzerland), 18(9). https://doi.org/10. 3390/s18093099. 127. Gorschlüter, F., Rojtberg, P., & Pöllabauer, T. (2022). A Survey of 6D object detection based on 3D models for industrial applications. Journal of Imaging, 8(3), 1–18. https://doi.org/10. 3390/jimaging8030053. 128. Patterson, E. A., Taylor, R. J., & Bankhead, M. (2016). A framework for an integrated nuclear digital environment. Progress in Nuclear Energy, 87, 97–103. https://doi.org/10.1016/j.pnu cene.2015.11.009. 129. Lu, R. Y., Karoutas, Z., & Sham, T. L. (2011). CASL virtual reactor predictive simulation: Grid-to-rod fretting wear. JOM Journal of the Minerals Metals and Materials Society, 63(8), 53–58. https://doi.org/10.1007/s11837-011-0139-6. 130. Bowman, D., Dwyer, L., Levers, A., Patterson, E. A., Purdie, S., & Vikhorev, K. (2022) A unified approach to digital twin architecture–Proof-of-concept activity in the nuclear sector. IEEE Access, 1–1. https://doi.org/10.1109/access.2022.3161626. 131. Kawabata, K., & Suzuki, K. (2019) Development of a robot simulator for remote operations for nuclear decommissioning. In 2019 16th Int. Conf. Ubiquitous Robot. UR 2019 (pp. 501–504). https://doi.org/10.1109/URAI.2019.8768640.

166

D. Shanahan et al.

132. Partiksha, & Kattepur, A. (2022). Robotic tele-operation performance analysis via digital twin simulations (pp. 415–417). https://doi.org/10.1109/comsnets53615.2022.9668555. 133. Wright, T., West, A., Licata, M., Hawes, N., & Lennox, B. (2021). Simulating ionising radiation in gazebo for robotic nuclear inspection challenges. Robotics, 10(3), 1–27. https://doi. org/10.3390/robotics10030086. 134. Kim, M., Lee, S. U., & Kim, S. S. (2021). Real-time simulator of a six degree-of-freedom hydraulic manipulator for pipe-cutting applications. IEEE Access, 9, 153371–153381. https:// doi.org/10.1109/ACCESS.2021.3127502.

Deep Learning and Robotics, Surgical Robot Applications Muhammad Shahid Iqbal, Rashid Abbasi, Waqas Ahmad, and Fouzia Sher Akbar

Abstract Surgical robots can perform difficult tasks that humans cannot. They can perform repetitive tasks, work with hazardous materials, and can operate difficult objects. This has helped businesses, saved time and money while also preventing numerous accidents. The use of surgical robots, also known as robot-assisted surgery allows medical professionals to perform a wide range of complex procedures with greater accuracy, adaptability, and control than traditional methods. Minimally invasive surgery, which is frequently associated with robotic surgery, is performed through small incisions. It is also used in some traditional open surgical procedures. This chapter discusses advanced robotic surgical systems and deep learning (DL). The purpose of this chapter is to provide an overview of the major issues in artificial intelligence (AI), including how they apply to and limit surgical robots. Each surgical system is thoroughly explained in the chapter, along with any most recent AI-based improvements. Case studies are provided with the information on recent advancements and on the role of DL, and future surgical robotics applications in ophthalmology are also thoroughly discussed. The new ideas, comparisons, and updates on surgical robotics and deep learning are all summarized in this chapter. Keywords Robotics · Deep learning · Surgical robot · Application of surgical robot · Modern trends in surgical robotics

M. S. Iqbal (B) · F. S. Akbar Department of Computer Science and Information Technology, Women University of AJ&K, Bagh, Pakistan e-mail: [email protected] R. Abbasi Anhui Polytechnic University Hefei, Wuhu, China W. Ahmad Higher Education Department Govt, AJ&K, Mirpur 10250, Pakistan © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_6

167

168

M. S. Iqbal et al.

1 Introduction The development, application, and the use of mechanical robots are the subject of high-level mechanics, an interdisciplinary field of science and planning. The assistant will give you a thorough understanding of mechanical innovation, including various robot types and how they are used in adventures [1, 2]. One barrier to robots mimicking humans is a lack of proprioception—a sense of attention to muscles and body parts—a type of “intuition” for humans that is essential to how to coordinate development. Roboticists have had the option to provide robots with the feeling of sight through cameras, feeling of smell and taste through synthetic sensors and receivers assist robots with hearing. However they have battled to assist robots with procuring this “intuition” to see their body. Now, utilizing tactile materials and AI calculations, progress is being made. In one case, arbitrarily positioned sensors distinguish contact and tension and send information to an AI calculation that deciphers the signs. In another model, roboticists are attempting to foster a mechanical arm that is essentially as able as a human arm, and that can get an assortment of articles. Until late turns of events, the interaction included separately preparing a robot to play out each undertaking or to have an AI calculation with a huge dataset of involvement to gain from. Robert Kwiatkowski and Hod Lipson of Columbia University are chipping away at “task-skeptic self-demonstrating machines.” Similar to a newborn child in its most memorable year of life, the robot starts without any information on its own body or the material science of movement. As it rehashes great many developments it observes the outcomes and constructs a model of them. The AI calculation is then used to help the robot plan about future developments in light of its earlier movement. Thusly, the robot is figuring out how to decipher its activities. A group of USC specialists at the USC Viterbi School of Engineering accept they are quick to foster an AI-controlled automated appendage that can recuperate from falling without being unequivocally customized to do as such. This is progressive work that shows robots advancing by doing. Man-made brainpower empowers present day advanced mechanics. AI and AI assisted robots with seeing, walk, talk, smell and move in progressively human-like ways [3–13]. In this chapter, we propose taking a gander at the modified game plan execution of the convolutional cerebrum network-based cautious robot with that of various robots and cautious robots as well as the business standard of expert manual gathering. Different convolutional mind network plans can be done by changing the amount of part graphs and the coding and unraveling layer. Examination is finished to sort out how gathering execution is impacted by designing arrangement limits. The chapter describes each surgical system in detail, as well as its most recent advancements through the use of AI. Future surgical robotics applications in ophthalmology are thoroughly discussed, with case studies provided alongside recent progress and the role of DL. This chapter summarises the new concepts and comparisons, as well as updates on surgical robotics and deep learning. Figure 1 show the PRISMA diagram of this chapter.

Deep Learning and Robotics, Surgical Robot Applications

169

Fig. 1 Shows the detail of subtopic of this chapter

2 Related Work Careful robots have been available for some time, and mechanical medicine has made significant progress over the past almost ten years by collecting data and experimenting with unusual situations. Mechanical and laparoscopic medical procedures are known to have less of an impact on patients because they can complete tasks with minimal intrusion [14–19]. A traditional open surgical procedure necessitates the use of a critical and cautious site when performing an activity within an organ. A field of view costs money because the intricate entanglement of human organs necessitates it. This was the way the laparoscopic surgical procedure and the embedding of an endoscope to get a field of view were described. The stomach depression and its cavity could then be examined and treated. Additionally, it was promoted as a more precise and meticulous method of medical treatment. Imaging and representation, contact force detecting, and control advancements have made tissue palpation in the

170

M. S. Iqbal et al.

controller possible [20]. The impacted area is less damaged by robotic surgery, and the speed of the patient’s recovery reduces the amount of time they can spend doing other things [21–23]. Because this kind of work is done by controlling a robot arm, it is hard for the specialist to talk to the patient directly. To complete this task, you need a lot of skill and careful attention. On the operating table, the patient is surrounded by numerous machines. The robot’s console is occupied by the operating surgeon. The patient is kept well away from the site of the procedure. The surgeon simultaneously controls and monitors the console unit. The surgeon uses visual information to understand how the robotic arm works during surgery [24, 25]. This has to do with how well robotic surgery works. In addition, surgeons assert that robotic surgery necessitates greater caution than open surgery due to its greater reliance on visual aspects because the affected area cannot be directly observed. When the surgical site is narrowed and the camera view is inevitably narrowed, the surgeon receives even less information [26, 27]. A major drawback of surgical robots is this. A gas-filled surgery space is created in the abdomen to ensure a clear view, and surgical instruments are then inserted. Additionally, a camera is inserted to demonstrate to the surgeon the state of the abdominal cavity. The pinchtype master controller of the surgery robot enables extremely precise operation. It’s hard and requires a lot of skill to operate. 3D images have recently been developed and integrated into technology to address surgeons’ concerns [26, 27]. Combining images from various vantage points can improve the quality of the information provided, as demonstrated in this example. However, for surgeons who perform direct operations, it is still difficult to improve visual information. In an emergency, the method of operation can also lead to poor judgment because it is carried out away from the patient. Surgeons can use a variety of simulations to improve their accuracy and become more accustomed to performing operations. Several subtask devices have recently been developed to provide surgeons with a virtual practice area prior to robotic surgery [28]. The Da Vinci Research Kit, developed at Johns Hopkins, is the most well-known tool. You can practice by creating a setting from materials that look like human tissues. Because microscopic surgery is based on what the camera sees in the area being treated, it is still essential to be able to use a hand even with this equipment. To improve the success rate of surgery performed with such a limited view, additional sensory transmission is required. A haptic feedback system is still missing from even the most widely used da Vinci robot [29, 30]. Therefore, if RMIS can provide the surgeon with tactile sensation data in real time during surgery, this issue will partially be resolved. In robotic surgery, it is anticipated that the proposed haptic system will speed up decision-making and enhance surgical quality and accuracy [31–37]. Surgeons who need to operate with great dexterity may require haptic systems [31, 36, 37]. When the surgeon has access to the patient’s real-time data, they are able to make decisions quickly and precisely. The human body’s internal conditions can vary greatly. Tissue stiffness may differ from that of the surrounding area if a tumor hasn’t been found yet. However, unless you directly touch the deeply concealed area, the issue might not be apparent. Tactile feedback can help with some of these issues. Using a single tactile feedback device against a variety of human body tissues, it

Deep Learning and Robotics, Surgical Robot Applications

171

ought to be possible to alter the tactile perception of various organs and tissues in real time. Numerous tactile transmission devices have been investigated in light of these factors. The vibration feedback system is the most widely used tactile feedback system, as previously mentioned [38, 39]. The intensity of the vibration can convey the tactile sensation. However, it is frequently utilized to issue a warning signal in response to external stimuli. It is common knowledge that a piezoelectric-based vibration feedback system can also function as a tactile device. According to numerous sources, various human organs and tissues are viscoelastic. To ensure high surgical quality and safety, a tactile device with properties comparable to or identical to those of human tissues and organs ought to be utilized in order to provide surgeons with more precise information about viscoelastic properties. However, due to the viscous effect’s time delay, implementing viscoelastic properties via the vibration feedback system is extremely challenging. Because of this, the surgical robot console’s tactile device cannot be used with it. The piezoelectric technology is one of the additional systems [40]. In accordance with different strategies for arrangement, this method provides tactile sensation. You can be successful with a sufficient number of different forces [40]. However, using simple force to convey the state of body tissue is inappropriate. A method that can simultaneously express all of the viscoelastic properties of the human body is more suitable. To incorporate viscoelastic properties, a pneumatic tactile transmission device has been proposed [41]. The entry of compressible gas prevents the condition of incompressible tissues from being expressed under pneumatic pressure. In RMIS (Robot-assisted Minimally Invasive Surgery), numerous tactile devices made of magnetorheological materials have recently been proposed to support these points [42–49]. The development of the Haptic Master with MR materials has been the subject of numerous studies [42, 50]. Additionally, a method for directly delivering haptic information to the surgeon’s hand has been proposed as the MR Tactile Cell Device [51–56]. Because the magnetorheologically based tactile device is able to alter the yield stress by varying the intensity of the magnetic field, it is possible to use a single sample to represent the organ characteristics of various human tissues.

3 Machine Learning and Surgical Robot A crucial component of any surgeon’s training is accurate and impartial performance evaluation. On the other hand, surgeons continue to track their performance using relatively basic metrics like operative duration, postoperative outcomes, and complication rates despite the abundance of innovation available to the modern surgeon. The surgeon’s performance during the operation is not accurately captured by these metrics. It is difficult to consistently track performance because trainer feedback on intraoperative performance is not always unstructured, consistent, or consistent. It is nothing new to find more systematic and objective methods for evaluating intraoperative performance. The Objective Structured Assessment of Technical Skills

172

M. S. Iqbal et al.

(OSATS) is one of many rating scales that expert raters can use to evaluate surgeons in a variety of areas, such as effectiveness, tissue handling, and operation flow [57]. These have also been modified for use with robotic platforms [58, 59], laparoscopic procedures [60], and specific research fields [61, 62]. Despite their widespread use in academic research, these scales are rarely utilized in clinical settings. This is due to the fact that it requires a professional reviewer, is prone to rater bias, and requires a lot of time and effort. These issues might be solved by putting ML to use. The scientific field that focuses on how computers learn from data is referred to as “machine learning” (ML).It is capable of quickly generating automated feedback that can be replicated without the assistance of professional reviewers once it has undergone training or is constructed empirically. It can also easily process the vast amount of data that is available from the modern operating room. Due to the ever-increasing availability of computational power, machine learning (ML) is being utilized in a variety of medical fields, including surgery. Postoperative mortality risk prediction [63], autonomous performance of simple tasks [64], and surgical work-flow analysis [65] are just a few of the many surgical applications of machine learning (ML) and artificial intelligence (AI).The widespread use of machine learning (ML) has led to the development of the field of surgical data science, which aims to improve the value and quality of surgery through data collection, organization, analysis, and modeling [63, 66–68]. Over the past ten years, the use of machine learning (ML) in the assessment of surgical skill has increased rapidly. However, the extent to which ML can be utilized to evaluate surgical performance is still unclear. HMM, SVM, and ANN were the most commonly used machine learning (ML) techniques for evaluating surgical performance. Coincidentally, these three important ML techniques follow the research trends in this area, which initially emphasized the use of HMM before moving on to SVM methods and, more recently, ANN and deep learning.

4 Robotics and Deep Learning At the point when people and creatures participate in object control ways of behaving, the cooperation innately includes a quick input circle among discernment and activity. Indeed, even complex control errands, for example, removing a solitary article from a jumbled container, can be performed without first confining the items or planning the scene, depending rather on ceaseless detecting from contact and vision. Interestingly, automated control frequently (however not consistently) depends all the more vigorously on early arrangement and examination, with moderately basic input, for example, direction following, to guarantee solidness during execution. Part of the justification behind this is that integrating complex tactile information sources. For example, vision straightforwardly into an input regulator is extremely difficult. Procedures, for example, visual servoing perform consistent criticism on visual elements, yet commonly require the highlights to be determined manually, and both open circle insight and input (for example by means of visual servoing)

Deep Learning and Robotics, Surgical Robot Applications

173

requires manual or programmed alignment to decide the exact mathematical connection between the camera and the robot’s end-effector. In this paper, authors propose a learning-based way to deal with dexterity for mechanical getting a handle on. This approach is information driven and objective driven: our technique figures out how to servo a mechanical gripper to represents that is probably going to create fruitful handles, with start to finish preparing straightforwardly from picture pixels to task-space gripper movement. By constantly re-computing the most encouraging engine orders, our strategy consistently incorporates tangible signs from the climate, permitting it to respond to bothers and change the grip to expand the likelihood of achievement [69, 70]. Moreover, the engine orders are given in the casing of the robot’s base, which isn’t known to the model at test time. This implies that the model doesn’t need the camera to be definitively adjusted as for the end-effector, however rather utilizes viewable prompts to decide the spatial connection between the gripper and graspable articles in the scene. Our point in planning and assessing this approach is to comprehend the way in which well as getting a handle on framework could be advanced altogether without any preparation, with negligible earlier information or manual designing. Our strategy comprises of two parts: a grip achievement indicator, which utilizes a profound convolutional brain organization (CNN) to decide how likely a given movement is to create a fruitful handle, and a constant servoing instrument that utilizes the CNN to refresh the robot’s engine orders ceaselessly. By consistently picking the best anticipated way to a fruitful handle, the servoing component furnishes the robot with quick criticism to irritations and item movement, as well as power to off base activation. The principal commitments of this work are: a technique for gaining persistent visual servoing for automated getting a handle on from monocular cameras, a novel convolutional brain network design for figuring out how to anticipate the result of a grip endeavor, and an enormous scope information assortment system for mechanical handles. The author also presents a broad trial evaluation aimed at learning the efficacy of this strategy, determining its information requirements, and dissecting the possibility of reusing getting a handle on information across different types of robots. This paper builds on our previous meeting paper by performing a second exploratory assessment on another mechanical stage and evaluating move gaining using data from two different robots [71]. Author depicts two enormous scope tries that directed on two separate mechanical stages. In the core arrangement of analyses, the grip expectation CNN was prepared on a dataset of around 800,000 handle endeavors, gathered utilizing a bunch of 7 level of opportunity mechanical arms. Although the equipment boundaries of every robot were at first indistinguishable, every unit experienced different mileage throughout information assortment, cooperated with various items, and utilized somewhat unique camera presents comparative with the robot base. The assortment of items gives a different dataset to learning handle methodologies, while the fluctuation in camera presents gives an assortment of conditions to learning nonstop dexterity for the getting a handle on task. The principal explore was pointed toward assessing the adequacy

174

M. S. Iqbal et al.

of the proposed strategy, as well as contrasting it with baselines and earlier procedures. The dataset utilized in these trials is accessible for download: https://sites. google.com/website/brainrobotdata/home, the second arrangement of investigations was pointed toward assessing whether getting a handle on information gathered by one kind of robot could be utilized to work on the getting a handle on capability of an alternate robot. In these examinations, author gathered in excess of 900,000 extra handle endeavors utilizing an alternate automated controller, with a considerably bigger assortment of items. This second mechanical stage was utilized to test whether consolidating information from numerous robots brings about better generally getting a handle on capability. Trials showed that our convolutional brain network getting a handle on regulator makes a high progress rate while getting a handle on in mess on a wide reach. Author gathered of 800,000 handle endeavors to prepare the CNN handle forecast model. Objects, including objects that are huge, little, hard, delicate, deformable, and clear. Supplemental recordings of our getting a handle on framework showed that the robot utilizes consistent input to continually change its grip, representing movement of the articles and erroneous incitation orders. Author additionally contrast our methodology with open-circle variations to show the significance of ceaseless criticism, as well as a hand-designing getting a handle on benchmark that utilizes manual hand-to-eye alignment and profundity detecting. Our technique makes the most elevated progress rates in our examinations. At long last, author shows the way that can consolidate information gathered for two unique sorts of robots, and use information from one robot to work on the getting a handle on capability of another [71, 72]. Mechanical getting a handle on is one of the broadly investigated area of control. While a total overview of getting a handle on is outside the extent of this work, author allude the peruser to standard studies regarding the matter for a more complete treatment. Comprehensively, getting a handle on techniques can be classified as mathematically determined and information driven. Mathematical techniques dissect the state of an objective article and plan an appropriate handle present, in light of standards, for example, force conclusion or confining. These techniques commonly need to figure out the calculation of the scene; utilizing profundity or sound system sensors and matching of recently examined models to perceptions. Author approach is most firmly connected with ongoing work on self-directed learning of handle presented by Pinto and Gupta, as well as prior work on gaining from independent trial and error, proposed to become familiar with an organization to foresee the ideal handle direction for a given picture fix, prepared with self-administered information gathered utilizing a heuristic getting a handle on framework in view of article proposition. As opposed to this earlier work, our methodology accomplishes ceaseless dexterity for getting a handle on by noticing the gripper and picking the best engine order to move the gripper toward an effective handle, instead of making open-circle forecasts. Since our technique utilizes no human comments, author can likewise gather a huge genuine world dataset totally independently [73, 74]. Since our strategy makes considerably more fragile suppositions about the accessible human management (none) and the accessible detecting (just over-the-shoulder RGB), direct correlations as far as handle achievement rate to values detailed in earlier

Deep Learning and Robotics, Surgical Robot Applications

175

work are unrealistic. The arrangement of articles that we use for assessment incorporates incredibly troublesome items, like straightforward jugs, little round objects, deformable articles, and mess. An error in object trouble between our work and earlier examinations further muddles direct correlation of detailed precision. The point of our work is thusly not to outline which framework is ideal, since such correlations are unthinkable without normalized benchmarks, but instead analyze how much a getting a handle on strategy dependent completely upon gaining from crude independently gathered information can scale to mind boggling and different handle situations [75]. One more related region to creator technique is automated coming to, which manages coordination and criticism for arriving at movements, and visual servoing, which tends to move a camera or end-effector to an ideal posture utilizing visual input. As opposed to our methodology, visual servoing techniques are normally worried about arriving at a posture comparative with objects in the scene, and frequently (however not dependably) depend on physically planned or determined highlights for input control. Photometric visual servoing utilizes an objective picture instead of highlights, and a few visual servoing strategies have been recommended that don’t straightforwardly need earlier adjustment between the robot and camera. A few ongoing visual servoing strategies have additionally utilized learning and PC vision procedures. Supposedly, no earlier learning-based technique has been suggested that utilizes visual servoing to straightforwardly move into a represent that expands the likelihood of achievement on a given errand (like getting a handle on) [76]. To anticipate the ideal engine orders to expand handle achievement, author use convolutional brain organizations (CNNs) prepared on handle achievement expectation. Albeit the innovation behind CNNs has been known for a really long time they have made noteworthy progress lately on a wide scope of testing PC vision benchmarks, turning into the accepted norm for PC vision frameworks. Nonetheless, utilizations of CNNs to mechanical control issues has been less predominant, contrasted with applications to inactive insight undertakings like item acknowledgment, limitation, and division. A few works have proposed to involve CNNs for profound support learning applications, including playing computer games, executing straightforward errand space movements for visual serving, controlling basic recreated mechanical frameworks, and playing out an assortment of automated control undertakings. Large numbers of these applications have been in straightforward or engineered spaces, and every one of them have zeroed in on generally obliged conditions with little datasets [77].

5 Surgical Robots and Deep Learning After dominating the market with the amazing Da Vinci framework for many years, Intuitive Surgical is now finally up against international companies that are vying for market share with their own iterations of cutting-edge robots [78]. A few studies have suggested the use of CNNs for sophisticated support learning applications.

176

M. S. Iqbal et al.

These frameworks will typically include open command centers, lighter equipment, and increased mobility. Even interest in robotization, which hasn’t been seen in nearly 30 years, has been reignited. The STAR robot can sew inside objects more effectively than a human hand without human intervention. In order to participate in the gastrointestinal anastomosis of a pig, it combined 3-layered imaging and sensors (close infrared fluorescent/NIRF labels) with the concept of managed independent stitching [63]. The Revo-I, a Korean robot, recently completed its first clinical trials in a long time, including a groundbreaking prostatectomy that saved Retzius’ life (RARP). Even in the hands of skilled practitioners, three patients underwent blood bonding, and the positive edge rate was 23% [79]. This is a great example of legitimate advertising. The new devices may be able to reduce the cost of an automated medical procedure to be similar to that of a laparoscopy, even though the underlying equipment cost may still be substantial. The UK’s Cambridge Medical Robotics has plans to present more up-to-date costing models that cover support, tools, and even aides as a whole package in addition to the actual equipment. This may attract multidisciplinary development in the east, among high volume open and laparoscopic specialists, to advanced mechanics. For instance, lower costs could support a more notable acknowledgment of an automated medical procedure in eastern India, where prostate malignant growth is rare but instead aggressive in those who get it. According to data from the Vattikuti Foundation, there are currently 60 Da Vinci cases in India, with urologists making up about 50% of the specialists and RARP being the most popular approach. In a review of a recent series of RARPs from Kolkata, 90% self-control and a biochemical repeat-free endurance of 75% at 5 years were found in cases of mostly high-risk prostate cancer. While effective multidisciplinary teamwork will reduce costs, it is almost certain that the use of Markov displaying will determine the automated medical procedure’s medium-term cost-adequacy in the developing scene. The two distinct perspectives in the field of new robots that are stirring up excitement are man-made consciousness (AI) and quicker advanced correspondence, even though cost may outweigh the titles. The era of careful AI has begun, even though the concept is not new and can be traced back to Alan Turing, a genius whose deciphering skills had a significant impact on the outcome of World War II. Despite how trendy it may sound, AI is probably going to be the main force behind the digitization of meticulous practice. Artificial intelligence is the superset of managing a group of intricate computer programs designed to achieve a goal by making decisions. With models like visual discrimination, discourse acknowledgment, and language interpretation, it is comparable to human insight in this way. A subset of AI called AI (ML) uses dynamic PC calculations to understand and respond to specific information. By determining, for example, whether a particular image represents a prostate malignant growth, a prostate recognition calculation might enable the machine to reduce the variability in radiologists’ interpretations of attractive resonance imaging. Current machine learning frameworks have been transformed by fake neural networks explicitly deep learning, graphics processing units, and limitless information stockpiling limits, making the executions faster, less expensive, and more impressive than at any time in recent memory. The video accounts of experts performing RARP can now be converted into Automated Performance Metrics through a black box, and they reveal

Deep Learning and Robotics, Surgical Robot Applications

177

astounding discoveries, such as the finding that not all highly productive experts are necessarily those who achieve the best results [80]. Medical intervention is intended to become a more dependable, secure, and minimally invasive process through the use of mechanical technology [81, 82]. New developments are being made in the direction of fully autonomous automated specialists and robot-assisted frameworks. The careful framework that has been used the most frequently to date is the da Vinci robot. Through remote-controlled laparoscopic surgery in gynaecology, urology, and general surgery, it has already demonstrated its effectiveness [81]. Data in a careful control center of a careful framework with robot assistance includes crucial nuances for internally employable direction that can help the dynamic cycle. Typically, this information is presented as 2D images or recordings with exacting tools and human tissues. It is a complex problem to comprehend these details, which also include the posture evaluation of careful instruments near careful scenes. The semantic grouping of the instruments in the meticulous control center is a fundamental component of this interaction. Semantic separation of automated instruments is a challenging task due to the complexity and dynamic nature of foundation tissues, as well as light changes like shadows and specular reflections, visual obstructions like blood and camera focal point hazing, and visual impediments like these. Division veils can be used to make a significant contribution to instrument GPS systems. This creates a compelling need for the development of precise and powerful PC vision techniques for the semantic separation of precise instruments from functional pictures and video. Numerous vision-based techniques have been developed for mechanical instrument identification and tracking [82]. Instrument background division can be viewed as a double or occurrence division problem, and old-style AI calculations have been used for this problem, utilising both surface highlights and variety [83, 84]. Later applications found a solution in semantic division, which refers to the recognition of various instruments or their components [85, 86]. Deep learning-based approaches recently demonstrated performance improvements over conventional AI methods for some biomedical issues [87, 88]. Convolutional brain organisations have been successfully used in the field of clinical imaging for a variety of purposes, including the analysis of a histology image of a bosom malignant growth [89], the prediction of bone disease [90], the determination of age [91], and others [87]. Applications of deep learning-based automated instrument division have previously shown solid performance in paired division [92, 93], and hopeful outcomes in multiclass division [94]. Beginning to emerge [95] are deep brain network changes that are suitable for use in fixed and mobile devices, such as clinical robots. The authors of this paper offer a thorough learning-based approach to the semantic grouping of mechanical instruments that produces cutting-edge outcomes in both a two-class and a multi-class setting. By using this method, the author is able to create a solution for the Robotic Instrument Segmentation MICCAI 2017 Endoscopic Vision Sub Challenge [96]. This lodging placed first in the division of double and multi-class instruments and second in the division of instrument parts sub-tasks. Here, authors illustrate the arrangement’s subtleties in light of a U-Net model adjustment [97]. Authors also provide additional improvements to this arrangement using other contemporary profound models.

178

M. S. Iqbal et al.

6 Current Innovation in Surgical Robotics CNNs with robotic assistance have been suggested in a few works for significant support. For procedures like oncological surgery, colorectal surgery, and general anesthesia, minimally invasive surgery (MIS) has recently gained popularity [98]. Natural Orifice Transluminal Endoscopic Surgery (NOTES), with mechanical assistance, has the greatest potential and dependability of any technique for performing tasks inside the peritoneal depression without stomach entry points. Phee et al. created a master–slave adaptable endoscopic careful robot [99] to significantly enhance the mobility of specialists within peritoneal pits. A flexible endoscope directs the slave robot arms to the precise locations they desire, while the expert robot moves in accordance with instructions from a specialist at the near end. Although the strength, adaptability, and consistency of robot-assisted NOTES quickly improved over time, the absence of precise haptic input remained a critical flaw. As a result, experts rely heavily on experience and visual information to make decisions [100]. Numerous studies [31, 101–103] have shown that providing doctors with haptic feedback will not only significantly shorten the amount of time spent in the operating room during the procedure, but it will also reduce the instances of excessive or inadequate power move to reduce tissue damage. Although Omega.7, CyberForce, and CyberGrasp [104] are among the wellknown haptic devices available on the market, the force data connecting the attentive robots and worked objects is missing from the cycle. Tendon-Sheath Mechanisms (TSMs) have been widely used for movement and power transmission in mechanical frameworks for NOTES due to their high degree of adaptability and responsiveness to control in constrained and convoluted ways. TSMs are frequently associated with issues such as backfire, hysteresis, and nonlinear pressure misfortune due to grating that exists between the ligament and the associated sheath. Therefore, it is challenging to obtain precise haptic feedback using these frameworks. At the distal end of a careful robot, various sensors with working standards for removal [105], current [106], pressure [107], resistance [108], capacitance [109], vibration [110], and optical properties [111] can be mounted to clearly determine the connection force for haptic input. However, the inability to sanitize, the uncomfortable environment during the procedure, the lack of mounting space at the distal end, issues with the associated wires and decorations, and other factors typically limit these sensors. However, extensive efforts have been made to numerically depict the robot’s power transmission for TSM-driven robots so that models can calculate the power at the robot’s distal end using estimates from its proximal end. Kaneko et al. conducted research on the pressure transmission in TSMs [112] in accordance with the Coulomb rubbing model. Lampaert, Pitkowski, and Others [113, 114] proposed the Dahl, LuGre, and Leuven models as alternatives to the Coulomb model in an effort to alter the displaying procedure. However, as these demonstration techniques became more accurate, they fundamentally began to exhibit irregularities between various hysteresis stages and were unable to accurately depict the contact force when the system was operating at zero speed. In a subsequent development of clinical robots,

Deep Learning and Robotics, Surgical Robot Applications

179

the nonlinear erosion in TSMs was unavoidably demonstrated using the Bouc-Wen model [115]. Backfire pay is a common strategy in Bowden link control to reduce hysteresis [116, 117]. Do and co. [118, 119] suggested an improved Bouc-Wen model with dynamic properties that utilized speed and speed increase data to significantly more accurately depict the erosion profile. It’s important to note that springs were frequently used in writing to mimic the response of tissue. In fact, in order to effectively perform force expectation on a haptic device, the non-straight property of tissue needs to be taken into consideration. Wang et al. [120] also used the Voigt, Kelvin, and Hunt-Crossley models, looked into other approaches for modeling the tissue force response for TSMs in NOTES while keeping the space speed constant. For each viscoelastic model that is going to be constructed, a number of challenging boundaries must be meticulously deduced for the scenarios where the ligament assumption is sufficiently large to prevent any framework loosening and the types of collaborations with tissue are constrained. This is necessary in order to accurately predict the distal power in TSMs using numerical models. In automated control problems where robots derive strategies directly from images, neural networks have demonstrated observational results [100, 121]. Learning control arrangements with convolutional highlights suggest that these components may also possess additional dynamical framework-related properties. The dynamical framework hypothesis serves as the impetus for the author’s investigation of methods for integrating the Transition State model, with significant emphasis on division. The author acknowledges that precise division necessitates the appropriate selection of visual highlights, and that division is an essential initial step in numerous robot learning applications.

7 Limitation of Surgical Robot Instrumentation: According to a significant portion of the studies that are taken into consideration in this review, the absence of instrumentation designed specifically for microsurgery severely limits the capabilities of the current robotic surgical systems. The majority of the published articles examined the viability of robotic microsurgery performed with the da Vinci surgical robot. Even though this particular system is approved for seven different types of minimally invasive surgery, it is not recommended or intended to be used for open plastic and reconstructive microsurgery. The majority of compatible instruments with the da Vinci are thought to be too big to handle the delicate tissue that is frequently encountered during microsurgery. Using the da Vinci’s Black Diamond Micro Forceps, nerves and small vessels can be operated on with success. However, the process of handling submillimeter tissue and equipment is time-consuming and difficult due to the absence of a comprehensive set of appropriate microsurgical instruments. When compared to traditional microsurgery, using a surgical robot makes routine microsurgical procedures like dissecting blood vessels, applying vessel clamps, and handling fine sutures more difficult. The variety of tissues encountered during microsurgery is not completely covered by the surgical toolkit. The large instruments are also present. Several operations

180

M. S. Iqbal et al.

involving the upper or lower limbs necessitate manipulating a variety of tissues, including bone, blood vessels, and skin. Right now, there isn’t a robotic system that has all the tools needed to work on different kinds of tissue. Consequently, surgical robotics cannot be used exclusively for procedures involving both soft and hard tissues. Additionally, robotics are ineffective in reconstructive surgery due to the extensive use of microscopic and macroscopic techniques. It is therefore challenging and time-consuming to switch between traditional and robotic-assisted microsurgery. During microsurgery, the right optical aids to magnify and make it easier to see the surgical field are just as essential as the right surgical instruments for working on delicate tissue. An endoscopic 3D imaging system with a digital zoom that can magnify up to ten times is provided by the da Vinci surgical robot. Sadly, the da Vinci surgical robot’s image quality and magnification are below those of surgical microscopes. The use of this system is limited because microsurgical procedures sometimes require more magnification than the da Vinci can provide. It is preferable to use surgical microscopes or other imaging systems that can provide sufficient optical magnification while still maintaining high image quality. Due to a lack of microsurgical instruments and robotic platforms designed specifically for plastic and reconstructive microsurgery, the theoretical potential of surgical robots cannot be realized. Feedback by touching: The lack of tactile feedback during operations has been cited numerous times by medical professionals as a disadvantage of surgical robotics. The absence of haptic feedback in current surgical robots is cited as a limitation in just 17 of the reviewed papers. However, it is debatable whether this is a disadvantage. Even though some people are unhappy that there isn’t haptic feedback, others say that tactile feedback is optional and can be replaced with something else. It has been demonstrated that visual feedback during microsurgery can reliably compensate for this deficit, despite the fact that the capacity to sense the amount of forces applied to delicate tissue may initially appear to be crucial. In addition, there are those who contend that microsurgery’s forces are too weak to be felt by humans and should not be trusted. Although surgical robots do not require tactile feedback, it can still be beneficial. Soft tissue may deform during manipulation, but the rigid instruments used in the procedure do not. Numerous clinical studies have demonstrated that when the needle is handled by two robotic arms, the absence of tactile feedback can result in needle bending. If implemented in such a way that the surgeon’s forces are scaled, tactile feedback may also be beneficial. A surgeon can feel and evaluate artificially increased forces, potentially minimizing unnecessary trauma to delicate tissue. However, extensive testing is required to determine whether tactile feedback can lower the risk of soft tissue damage. The differences in tissue trauma between traditional manual microsurgery and robotic microsurgery with tactile feedback may be better investigated in future studies. High Price: In the literature, the cost of purchasing, using, and maintaining surgical robotic systems has been a frequent topic. Because a single surgical robot can cost more than $2 million, it takes a significant financial commitment to purchase one. Running the system and providing a safe environment for robot-assisted surgery come with additional direct and indirect costs. The cost of a procedure’s consumables

Deep Learning and Robotics, Surgical Robot Applications

181

can range from $1,800 to $4,600 per instrument. In order to familiarize personnel working in operating rooms with surgical robots, training resources must be allocated. In order to guarantee these systems’ dependability outside of the operating room, more staff is required. Because of their intrinsic intricacy, fix and support of careful robots require specific information. Because of this, hospitals that use surgical robots have to negotiate service agreements with the manufacturers, which result in a 10% increase in the system’s annual cost. The use of surgical robots is becoming less appealing due to the rising costs brought on by the increased demands placed on personnel and supplies. If costly treatment options are linked to improved outcomes or increased revenue over time, hospitals may benefit. However, only a small amount of evidence indicates that plastic reconstructive microsurgery is an exception. There are currently few reasons to spend a lot of money on surgical robots. The operating time for roboticassisted microsurgery is also longer than that of traditional microsurgery, according to published data. As a result, waiting times may be longer and the number of patients who can be treated may be reduced. The claimed paradoxical cost savings from shorter hospital stays and fewer post-operative complications do not yet outweigh the investment required to justify the use of surgical robots in plastic and reconstructive microsurgery. Very few plastic surgery departments will currently be willing to invest in robotic-assisted surgery unless patient turnover and cost efficiency both rise. In surgical training, the apprenticeship model is frequently utilized, in which students first observe a skilled professional before becoming more involved in procedures. Typically, surgical robots only permit a single surgeon to complete the procedure and operate the entire system. As a result, assistants rarely have the opportunity to participate in robotically assisted tasks. Surgeons’ exposure to surgical robotics and opportunities to improve their skills may be limited if they are not actively involved. This problem can be solved by switching surgeons in the middle of the procedure or by using two or more complete surgical robotic systems. Even though switching between different users is a quick process, clinical outcomes may be jeopardized if crucial circumstances are delayed. It might be safer to train new surgeons with multiple surgical robots. Students can learn the skills necessary for robotic microsurgery while also providing the lead surgeon with an assistant who can assist during the procedure. However, considering that each robotic system can cost more than $2 million, it is difficult to justify purchasing one solely for training purposes. Last but not least, it’s important to know that surgical robots shouldn’t replace traditional microsurgery; rather, they should be seen as an additional tool. The skills required for each type of microsurgery are very different. Due to the very different movements and handling of delicate tissue, the skills required to successfully use a surgical robot in these circumstances cannot be directly applied to conventional microsurgery. For future surgeons to be able to deal with the many different problems that will arise during their careers, they will need to receive training in both conventional and robotic-assisted microsurgery. Therefore, surgical training ought to incorporate both traditional and robotically assisted surgical experience.

182

M. S. Iqbal et al.

Procedure flow: According to research, traditional manual microsurgery may be preferable to robotic surgery in some instances. Willems and his colleagues demonstrated that traditional surgery is quicker than robotic-assisted microsurgery when there is sufficient access to the surgical field. Only by reviewing patients prior to any treatment and planning procedures in advance can the best treatment plan be developed. Because there will always be some degree of uncertainty, it is challenging to predict which procedures will provide good surgical access and which will not. Consequently, in order to achieve the desired outcomes, surgeons may need to switch between robotic and conventional surgery during a procedure. It is absolutely possible to transition during a procedure; however, this is a laborious and time-consuming process that requires the operating room staff to be knowledgeable about surgical robots. Costs could rise if this procedure is put off for too long. In addition, complications may be more likely in situations that extend surgical and anesthetic procedures. Surgical robots must be able to accommodate uncertainty during microsurgery and facilitate a seamless and quick transition between conventional and robotic microsurgery in order to maximize surgical workflow [122].

8 Future Direction of Surgical Robot RAS systems and supplies ought to begin to become more affordable as a result of market competition for laparoscopic robotic assisted surgery. Laparoscopic RAS ought to become more affordable as a result of this. RAS ought to be used more frequently for laparoscopic procedures due to the benefits it provides to the patient and the cost savings it provides. Laparoscopic RAS surgery will continue to become more affordable due to the economies of scale that result from lower costs for RAS systems, supplies, and maintenance [123]. Despite the fact that da Vinci continues to dominate the market for single port laparoscopic RAS surgery, we can see that a few rival systems are still in the testing phase. The cost and frequency of single port laparoscopic RAS surgery should go down as a result of these systems’ availability. Single port laparoscopic RAS surgery is likely to become the technique of choice for both surgeons and patients due to the advantages of almost scar-free surgery and the decreasing costs. Endo Wrist instruments with a single port are likely to be purchased by hospitals that have purchased the da Vinci Xi system in order to perform both single-port and multi-port laparoscopic surgery with the same RAS system. As single-port laparoscopic RAS systems become available in the operating room, we are likely to see an increase in the use of NOTES for genuine scar-free procedures. Similar to how Intuitive Survival introduced the dedicated single port laparoscopic RAS system for the da Vinci SP [123], they will probably introduce instruments that the da Vinci SP can use with NOTES procedures to compete with the new NOTES-specific systems on the market. Finally, both new RAS systems and upgrades to existing RAS systems are likely to include augmented reality as a standard feature. Surgeons will be able to overlay realtime endoscope camera feeds on top of elements of the operating room workspace

Deep Learning and Robotics, Surgical Robot Applications

183

using augmented reality [53, 86]. Technology advancements that can map features like blood vessels, nerves, and even tumors and overlay their locations on the surgeon’s display in real time have made this possible [54–56, 80]. Overlaid medical images can also include images taken prior to a diagnosis or intervention design. By assisting the surgeon in locating the area of interest and avoiding major blood vessels and nerves that could cause the patient problems after surgery, this will help the surgeon provide the safest and best care possible throughout the intervention. New surgical systems that improve either manipulation or imaging, two essential aspects of surgery, must be researched. Given the widespread adoption of these technologies, it seems inevitable that new and improved imaging will be developed. They must continue in order to keep up with robotic technology advancements on the manipulation side [124]. The use of robotic surgery is still in its infancy. Equipment is incorporating new technologies to boost performance and cut down on downtime. Siemens employee Balasubramaniac asserts that digital twins and AI will improve future performance. The procedure can undoubtedly be recorded and analyzed in the future for educational and process improvement purposes using the digital twin technology. It is necessary to keep a minute-by-minute record of the process. There is a lot of hope that robotic surgery will eventually improve precision, efficiency, and safety while potentially lowering healthcare costs. Additionally, it may facilitate access to specialists in difficult-to-reach locations. Santosh Kesari, M.D., Ph.D., co-founder and director of neuro-oncology at the Pacific Neuroscience Institute in Santa Monica, California, stated, Access to surgical expertise is limited in many rural areas of the United States as well as in many parts of the world. It is anticipated that robotic-assisted surgical equipment will be utilized by a growing number of healthcare facilities for both in-person and online procedures. The technology will keep developing and improving. The technology of the future will be more adaptable, portable, and based on AI. Additional robotic equipment, such as handheld devices, will be developed to accelerate telehealth and remote care. How quickly high-speed communication infrastructure is established will play a role in this.5G will be useful due to its 20 Gbps peak data rate and 1 ms latency, but 6G is anticipated to be even better. With a latency of 0.1 ms, 6G’s peak data rate ought to theoretically reach one terabit per second. However, speeds can vary significantly depending on the technology’s application and location. Open Signal, a company that monitors 5G performance all over the world, asserts that South Korea frequently takes the lead in achieving the fastest 5G performance, such as Ultra-Wideband download speeds of 988.37 Mbps. Verizon, on the other hand, recently achieved a peak speed of 1.13 Gbps. The speed is significantly impacted by the position of the 5G antennas. Even if you only reach your peak performance once, that does not mean it will last. 5G has a long way to go before it reaches 20 Gbps, even though it is currently at 1 Gbps. In conclusion, the medical field can benefit greatly from remote robotic-assisted surgery. There are numerous advantages. Ramp-up time will be affected by reliable communications systems and secure chips, as well as the capacity to monitor each component in the numerous interconnected systems that must cooperate for RAS to be successful.

184

M. S. Iqbal et al.

9 Discussion A couple of works have proposed to use CNNs, A few works have proposed to include CNNs for significant help learning applications; Action word Careful is a joint undertaking among Johnson and Johnson’s clinical equipment division Ethicon and Google’s life sciences division Verily. It has as of late planned its most memorable computerized a medical procedure model, bragging driving edge mechanical abilities and top tier clinical gadget innovation. Mechanical technology, representation, high level instrumentation, information investigation and network are its excellent points of support. IBM’s Watson additionally anticipates being an astute careful colleague. It is a harbinger of limitless clinical data, utilizing regular language handling to explain a specialist’s questions. It is right now being utilized to investigate electronic clinical records and arrangement growth qualities determined to form more customized treatment plans. Medical procedure might be additionally democratized by low inertness ultrafast 5G availability. The Internet of Skills could make distant mechanical medical procedure, educating and mentorship effectively available, independent of the area of the master specialist [125]. In rundown the three trendy expressions for the eventual fate of automated a medical procedure are-cost, information and network. The effect of these advancements on tolerant consideration is being watched with extensive interest. Author mean to examine assuming the exhibition improves when train the CNNs with careful pictures [85]. Author will investigate how to extricate predictable construction across conflicting showings and find that a few careful exhibitions have circles, i.e., dreary movements where the specialist rehashes a subtask until progress. Combining these movements into solitary crude is a significant need for us. The subsequent stage is to apply this and future mechanized division strategies to expertise appraisal and strategy acquiring. A potential following stage of author work is to utilize the weighting factor grid for helping techniques to all the more proficiently train the bound together state assessment model. Albeit demonstrated as a FSM, the fine-grained states inside each careful assignment are assessed autonomously, without impact from the past state(s). One more potential following stage is perform state forecast in light of recently assessed state succession. Later on, additionally plan to apply this state assessment system to applications, for example, brilliant help advancements and directed independence for careful subtasks. This study had a few limits. In the first place, the proposed framework was applied to Video Sets of preparing model and patients with thyroid disease who went through BABA medical procedure. It is important to check the adequacy of the proposed framework utilizing different careful techniques and careful regions. Secondly, author couldn’t straightforwardly look at the exhibitions of the kinematics and proposed picture based strategies since admittance to the da Vinci Research Interface is restricted, permitting most scientists just to get kinematic crude information [85]. Nonetheless, past investigations have detailed that the kinematics strategy utilizing da Vinci robot had a mistake of somewhere around 4 mm [9]. Direct examination of execution is troublesome in light of the fact that the careful pictures utilized

Deep Learning and Robotics, Surgical Robot Applications

185

in the past review and in this study varied. Nonetheless, the normal RMSE of the proposed picture based following calculation was 3.52 mm, demonstrating that this strategy is more precise than the kinematics technique and that the last option can’t be portrayed as prevalent. The exhibition of the ongoing technique with the past visual strategy couldn’t be straightforwardly looked at on the grounds that no comparable review distinguished and followed the tip directions of the SIs. Nonetheless, studies have utilized profound learning-based discovery strategies to decide the bouncing boxes of the SIs and to show the direction of the middle places of these cases [94, 95]. By the by, on the grounds that this approach couldn’t decide the particular areas of the SIs, it can’t be viewed as an exact following technique instinctively. Correlation of the quantitative presentation of the proposed strategy and different methodologies are significant, making it important to analyze different SI following techniques. Thirdly, since SIs is recognized on two-layered sees, mistakes might happen because of the shortfall of profundity data. Mistakes of amplification were in this manner limited by estimating the width of SIs on the view and changing pixels over completely to millimeters. Notwithstanding, techniques are expected to use three-layered data in light of stereoscopic matching of left and right pictures during mechanical medical procedure [10, 11]. Fourth, on the grounds that the proposed technique is a mix of a few calculations, longer recordings can bring about the aggregation of extra blunders, corrupting the exhibition of the framework. Consequently, specifically, it is important to prepare extra regrettable models with the occasion division system, which is the start of the pipeline. For instance, cloth or cylinders on the mechanical medical procedure view can be perceived as SIs (Supplementary Figure S4). At last, since blunders from re-recognizable proof in the following system could fundamentally influence the capacity to decide right directions, exact evaluation of careful abilities requires manual amendment of mistakes. Regardless of the advancement in present work, there actually exist a few restrictions of profound learning models toward a capable web-based ability appraisal. To begin with, as affirmed by our outcomes, the classification exactness of managed profound learning depends intensely on the marked examples. The essential worry in this study lies with the JIGSAWS dataset and the absence of severe ground-truth names of ability levels. It is vital to make reference to that there is an absence of agreement in the ground-truth explanation of careful abilities. In the GRS-based marking, ability names were explained in view of the predefined cutoff edge of GRS scores, nonetheless, no usually acknowledged cutoff exists. For future work, a refined marking approach with more grounded ground-truth information on specialist ability might additionally further develop the general expertise evaluation [9, 10]. Second, author will look for a point by point streamlining of our profound design, boundary settings, and expansion systems to all the more likely handle movement time series information and further develop the internet based execution further. Likewise, the interpretability of consequently educated portrayals is at present restricted because of the discovery idea of profound learning models. It would be intriguing to examine a perception of profound various leveled portrayals to comprehend stowed away ability designs, in order to all the more likely legitimize the choice taken by a profound learning classifier.

186

M. S. Iqbal et al.

At this point, creator acknowledges that the huge limitation of the Profound construction is its high estimation power. Running various significant mind networks consistently requires different dealing with units, which limits the update speeds of the trackers. Lightweight significant cerebrum associations will be perfect for continuous cautious applications, at whatever point changed without choosing accuracy. As late progress has been made on profound learning-based redoing and conveying strategies [117, 118], a future course could be utilizing a learnable tissue tracker and gadget tracker to expansion partner improve the knowledge framework. Another bearing to pursue is cautious task computerization. By including the clear environment as analysis, controllers applied to the cautious instrument will really need to accomplish tasks in unstructured, misshaping cautious circumstances.

10 Conclusions Deep learning techniques are currently surpassing the prior level of workmanship in a wide range of advanced mechanics, clinical mechanical technology, and fix improvement endeavors. According to our main question, has the investigation of careful robot changed deep learning in light of this rapid advancement? Although the answer to this question is straightforwardly dependent on the specific issue and space, we accept that deep learning has yet to comprehend or trigger a critical defining moment in its ability to change. Despite being evaluated as having a strong position in many different areas, issues like these have not yet been addressed by tactical improvements in the ability to predict the future. This research focuses on careful robot and deep learning, progress, achievement, and future perspectives. This area requires more attention; in future more medical or surgical robots are needed.

References 1. Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1), 1334–1373. 2. Bakshi, G., Kumar, A., & Puranik, A. N. (2022). Adoption of robotics technology in healthcare sector. In Advances in communication, devices and networking (pp. 405–414). Singapore: Springer. 3. Maibaum, A., Bischof, A., Hergesell, J., & Lipp, B. (2022). A critique of robotics in health care. AI & Society, 37(2), 467–477. 4. Tasioulas, J. (2019). First steps towards an ethics of robots and artificial intelligence. Journal of Practical Ethics, 7(1). 5. Hallevy, G. (2013). When robots kill: Artificial intelligence under criminal law. UPNE. 6. Bryndin, E. (2019). Robots with artificial intelligence and spectroscopic sight in hi-tech labor market. International Journal of Systems Science and Applied Mathematic, 4(3), 31–37. 7. Lopes, V., Alexandre, L. A. & Pereira, N. (2019). Controlling robots using artificial intelligence and a consortium blockchain. arXiv:1903.00660.

Deep Learning and Robotics, Surgical Robot Applications

187

8. Bataev, A. V., Dedyukhina, N., & Nasrutdinov, M. N. (2020, February). Innovations in the financial sphere: performance evaluation of introducing service robots with artificial intelligence. In 2020 9th International Conference on Industrial Technology and Management (ICITM) (pp. 256–260). IEEE. 9. Nitto, H., Taniyama, D., & Inagaki, H. (2017). Social acceptance and impact of robots and artificial intelligence. Nomura Research Institute Papers, 211, 1–15. 10. Yoganandhan, A., Kanna, G. R., Subhash, S. D., & Jothi, J. H. (2021). Retrospective and prospective application of robots and artificial intelligence in global pandemic and epidemic diseases. Vacunas (English Edition), 22(2), 98–105. 11. Rajan, K., & Saffiotti, A. (2017). Towards a science of integrated AI and Robotics. Artificial Intelligence, 247, 1–9. 12. Chatila, R., Renaudo, E., Andries, M., Chavez-Garcia, R. O., Luce-Vayrac, P., Gottstein, R., Alami, R., Clodic, A., Devin, S., Girard, B., & Khamassi, M. (2018). Toward self-aware robots. Frontiers in Robotics and AI, 5, 88. 13. Gonzalez-Jimenez, H. (2018). Taking the fiction out of science fiction:(Self-aware) robots and what they mean for society, retailers and marketers. Futures, 98, 49–56. 14. Schostek, S., Schurr, M. O., & Buess, G. F. (2009). Review on aspects of artificial tactile feedback in laparoscopic surgery. Medical Engineering & Physics, 31(8), 887–898. 15. Naitoh, T., Gagner, M., Garcia-Ruiz, A., Heniford, B. T., Ise, H., & Matsuno, S. (1999). Handassisted laparoscopic digestive surgery provides safety and tactile sensation for malignancy or obesity. Surgical Endoscopy, 13(2), 157–160. 16. Schostek, S., Ho, C. N., Kalanovic, D., & Schurr, M. O. (2006). Artificial tactile sensing in minimally invasive surgery–a new technical approach. Minimally Invasive Therapy & Allied Technologies, 15(5), 296–304. 17. Kraft, B. M., Jäger, C., Kraft, K., Leibl, B. J., & Bittner, R. (2004). The AESOP robot system in laparoscopic surgery: Increased risk or advantage for surgeon and patient? Surgical Endoscopy And Other Interventional Techniques, 18(8), 1216–1223. 18. Troisi, R. I., Patriti, A., Montalti, R., & Casciola, L. (2013). Robot assistance in liver surgery: A real advantage over a fully laparoscopic approach? Results of a comparative bi-institutional analysis. The International Journal of Medical Robotics and Computer Assisted Surgery, 9(2), 160–166. 19. Dupont, P. E., Nelson, B. J., Goldfarb, M., Hannaford, B., Menciassi, A., O’Malley, M. K., Simaan, N., Valdastri, P., & Yang, G. Z. (2021). A decade retrospective of medical robotics research from 2010 to 2020. Science Robotics, 6(60), eabi8017. 20. Fuchs, K. H. (2002). Minimally invasive surgery. Endoscopy, 34(02), 154–159. 21. Robinson, T. N., & Stiegmann, G. V. (2004). Minimally invasive surgery. Endoscopy, 36(01), 48–51. 22. McDonald, G. J. (2021) Design and modeling of millimeter-scale soft robots for medical applications (Doctoral dissertation, University of Minnesota). 23. Currò, G., La Malfa, G., Caizzone, A., Rampulla, V., & Navarra, G. (2015). Three-dimensional (3D) versus two-dimensional (2D) laparoscopic bariatric surgery: A single-surgeon prospective randomized comparative study. Obesity Surgery, 25(11), 2120–2124. 24. Dogangil, G., Davies, B. L., & Rodriguez, Y., & Baena, F. (2010) A review of medical robotics for minimally invasive soft tissue surgery. Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine, 224(5), 653–679. 25. Yu, L., Wang, Z., Yu, P., Wang, T., Song, H., & Du, Z. (2014). A new kinematics method based on a dynamic visual window for a surgical robot. Robotica, 32(4), 571–589. 26. Byrn, J. C., Schluender, S., Divino, C. M., Conrad, J., Gurland, B., Shlasko, E., & Szold, A. (2007). Three-dimensional imaging improves surgical performance for both novice and experienced operators using the da Vinci Robot System. The American Journal of Surgery, 193(4), 519–522. 27. Kim, S., Chung, J., Yi, B. J., & Kim, Y. S. (2010). An assistive image-guided surgical robot system using O-arm fluoroscopy for pedicle screw insertion: Preliminary and cadaveric study. Neurosurgery, 67(6), 1757–1767.

188

M. S. Iqbal et al.

28. Nagy, T. D., & Haidegger, T. (2019). A dvrk-based framework for surgical subtask automation. Acta Polytechnica Hungarica (pp.61–78). 29. Millan, B., Nagpal, S., Ding, M., Lee, J. Y., & Kapoor, A. (2021). A scoping review of emerging and established surgical robotic platforms with applications in urologic surgery. Société Internationale d’Urologie Journal, 2(5), 300–310 30. Nagyné Elek, R., & Haidegger, T. (2019). Robot-assisted minimally invasive surgical skill assessment—Manual and automated platforms. Acta Polytechnica Hungarica, 16(8), 141– 169. 31. Okamura, A. M. (2009). Haptic feedback in robot-assisted minimally invasive surgery. Current Opinion Urology, 19(1), 102. 32. Bark, K., McMahan, W., Remington, A., Gewirtz, J., Wedmid, A., Lee, D. I., & Kuchenbecker, K. J. (2013). In vivo validation of a system for haptic feedback of tool vibrations in robotic surgery. Surgical Endoscopy, 27(2), 656–664. 33. Van der Meijden, O. A., & Schijven, M. P. (2009). The value of haptic feedback in conventional and robot-assisted minimal invasive surgery and virtual reality training: A current review. Surgical Endoscopy, 23(6), 1180–1190. 34. Bethea, B. T., Okamura, A. M., Kitagawa, M., Fitton, T. P., Cattaneo, S. M., Gott, V. L., Baumgartner, W. A., & Yuh, D. D. (2004). Application of haptic feedback to robotic surgery. Journal of Laparoendoscopic & Advanced Surgical Techniques, 14(3), 191–195. 35. Amirabdollahian, F., Livatino, S., Vahedi, B., Gudipati, R., Sheen, P., Gawrie-Mohan, S., & Vasdev, N. (2018). Prevalence of haptic feedback in robot-mediated surgery: A systematic review of literature. Journal of robotic surgery, 12(1), 11–25. 36. Okamura, A. M. (2004). Methods for haptic feedback in teleoperated robot-assisted surgery. Industrial Robot: An International Journal, 31(6), 499–508. 37. Pacchierotti, C., Scheggi, S., Prattichizzo, D., & Misra, S. (2016). Haptic feedback for microrobotics applications: A review. Frontiers in Robotics and AI, 3, 53. 38. Yeh, C. H., Su, F. C., Shan, Y. S., Dosaev, M., Selyutskiy, Y., Goryacheva, I., & Ju, M. S. (2020). Application of piezoelectric actuator to simplified haptic feedback system. Sensors and Actuators A: Physical, 303, 111820. 39. Okamura, A. M., Dennerlein, J. T., & Howe, R. D. (1998, May). Vibration feedback models for virtual environments. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146) (Vol. 1, pp. 674–679). IEEE. 40. Luostarinen, L. O., Åman, R., & Handroos, H. (2016, October). Haptic joystick for improving controllability of remote-operated hydraulic mobile machinery. In Fluid Power Systems Technology (Vol. 50473, p. V001T01A003). American Society of Mechanical Engineers. 41. Shang, W., Su, H., Li, G., & Fischer, G. S. (2013, November). Teleoperation system with hybrid pneumatic-piezoelectric actuation for MRI-guided needle insertion with haptic feedback. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 4092–4098). IEEE. 42. Kim, P., Kim, S., Park, Y. D., & Choi, S. B. (2016). Force modeling for incisions into various tissues with MRF haptic master. Smart Materials and Structures, 25(3), 035008. 43. Hooshiar, A., Payami, A., Dargahi, J., & Najarian, S. (2021). Magnetostriction-based force feedback for robot-assisted cardiovascular surgery using smart magnetorheological elastomers. Mechanical Systems and Signal Processing, 161, 107918. 44. Shokrollahi, E., Goldenberg, A. A., Drake, J. M., Eastwood, K. W., & Kang, M. (2018, December). Application of a nonlinear Hammerstein-Wiener estimator in the development and control of a magnetorheological fluid haptic device for robotic bone biopsy. In Actuators (Vol. 7, No. 4, p. 83). MDPI. 45. Najmaei, N., Asadian, A., Kermani, M. R., & Patel, R. V. (2015). Design and performance evaluation of a prototype MRF-based haptic interface for medical applications. IEEE/ASME Transactions on Mechatronics, 21(1), 110–121. 46. Song, Y., Guo, S., Yin, X., Zhang, L., Wang, Y., Hirata, H., & Ishihara, H. (2018). Design and performance evaluation of a haptic interface based on MR fluids for endovascular tele-surgery. Microsystem Technologies, 24(2), 909–918.

Deep Learning and Robotics, Surgical Robot Applications

189

47. Kikuchi, T., Takano, T., Yamaguchi, A., Ikeda, A. and Abe, I. (2021, September). Haptic interface with twin-driven MR fluid actuator for teleoperation endoscopic surgery system. In Actuators (Vol. 10, No. 10, p. 245). MDPI. 48. Najmaei, N., Asadian, A., Kermani, M. R. & Patel, R. V. (2015, September). Performance evaluation of Magneto-Rheological based actuation for haptic feedback in medical applications. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 573–578). IEEE. 49. Gao, Q., Zhan, Y., Song, Y., Liu, J., & Wu, J. (2021, August). An MR fluid based master manipulator of the vascular intervention robot with haptic feedback. In 2021 IEEE International Conference on Mechatronics and Automation (ICMA) (pp. 158–163). IEEE. 50. Nguyen, N. D., Truong, T. D., Nguyen, D. H. & Nguyen, Q. H. (2019, March). Development of a 3D haptic spherical master manipulator based on MRF actuators. In Active and Passive Smart Structures and Integrated Systems XIII (Vol. 10967, pp. 431–440). SPIE. 51. Kim, S., Kim, P., Park, C. Y., & Choi, S. B. (2016). A new tactile device using magnetorheological sponge cells for medical applications: Experimental investigation. Sensors and Actuators A: Physical, 239, 61–69. 52. Cha, S. W., Kang, S. R., Hwang, Y. H., & Choi, S. B. (2017, April). A single of MR sponge tactile sensor design for medical applications. In Active and Passive Smart Structures and Integrated Systems (Vol. 10164, pp. 520–525). SPIE. 53. Oh, J. S., Sohn, J. W., & Choi, S. B. (2018). Material characterization of hardening soft sponge featuring MR fluid and application of 6-DOF MR haptic master for robot-assisted surgery. Materials, 11(8), 1268. 54. Park, Y. J., & Choi, S. B. (2021). A new tactile transfer cell using magnetorheological materials for robot-assisted minimally invasive surgery. Sensors, 21(9), 3034. 55. Park, Y. J., Yoon, J. Y., Kang, B. H., Kim, G. W., & Choi, S. B. (2020). A tactile device generating repulsive forces of various human tissues fabricated from magnetic-responsive fluid in porous polyurethane. Materials, 13(5), 1062. 56. Park, Y. J., Lee, E. S., & Choi, S. B. (2022). A cylindrical grip type of tactile device using Magneto-Responsive materials integrated with surgical robot console: design and analysis. Sensors, 22(3), 1085. 57. Martin, J. A., Regehr, G., Reznick, R., Macrae, H., Murnaghan, J., Hutchison, C., & Brown, M. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents. British Journal of Surgery, 84(2), 273–278. 58. Vassiliou, M. C., Feldman, L. S., Andrew, C. G., Bergman, S., Leffondré, K., Stanbridge, D., & Fried, G. M. (2005). A global assessment tool for evaluation of intraoperative laparoscopic skills. The American Journal of Surgery, 190(1), 107–113. 59. Goh, A. C., Goldfarb, D. W., Sander, J. C., Miles, B. J., & Dunkin, B. J. (2012). Global evaluative assessment of robotic skills: Validation of a clinical assessment tool to measure robotic surgical skills. The Journal of Urology, 187(1), 247–252. 60. Insel, A., Carofino, B., Leger, R., Arciero, R., & Mazzocca, A. D. (2009). The development of an objective model to assess arthroscopic performance. JBJS, 91(9), 2287–2295. 61. Champagne, B. J., Steele, S. R., Hendren, S. K., Bakaki, P. M., Roberts, P. L., Delaney, C. P., Brady, J. T., & MacRae, H. M. (2017). The American Society of Colon and Rectal Surgeons assessment tool for performance of laparoscopic colectomy. Diseases of the Colon & Rectum, 60(7), 738–744. 62. Koehler, R. J., Amsdell, S., Arendt, E. A., Bisson, L. J., Bramen, J. P., Butler, A., Cosgarea, A. J., Harner, C. D., Garrett, W. E., Olson, T., & Warme, W. J. (2013). The arthroscopic surgical skill evaluation tool (ASSET). The American Journal of Sports Medicine, 41(6), 1229–1237. 63. Shademan, A., Decker, R. S., Opfermann, J. D., Leonard, S., Krieger, A., & Kim, P. C. (2016). Supervised autonomous robotic soft tissue surgery. Science Translational Medicine, 8(337), 337ra64–337ra64. 64. Garrow, C. R., Kowalewski, K. F., Li, L., Wagner, M., Schmidt, M. W., Engelhardt, S., Hashimoto, D. A., Kenngott, H. G., Bodenstedt, S., Speidel, S., & Mueller-Stich, B. P. (2021). Machine learning for surgical phase recognition: A systematic review. Annals of Surgery, 273(4), 684–693.

190

M. S. Iqbal et al.

65. Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920–1930. 66. Lee, C. K., Hofer, I., Gabel, E., Baldi, P., & Cannesson, M. (2018). Development and validation of a deep neural network model for prediction of postoperative in-hospital mortality. Anesthesiology, 129(4), 649–662. 67. Maier-Hein, L., Vedula, S. S., Speidel, S., Navab, N., Kikinis, R., Park, A., Eisenmann, M., Feussner, H., Forestier, G., Giannarou, S., & Hashizume, M. (2017). Surgical data science for next-generation interventions. Nature Biomedical Engineering, 1(9), 691–696. 68. Maier-Hein, L., Eisenmann, M., Sarikaya, D., März, K., Collins, T., Malpani, A., Fallert, J., Feussner, H., Giannarou, S., Mascagni, P., & Nakawala, H. (2022). Surgical data science–from concepts toward clinical translation. Medical Image Analysis, 76, 102306. 69. Kosak, O., Wanninger, C., Angerer, A., Hoffmann, A., Schiendorfer, A., & Seebach, H. (2016, September). Towards self-organizing swarms of reconfigurable self-aware robots. In 2016 IEEE 1st International Workshops on Foundations and Applications of Self * Systems (FAS* W) (pp. 204–209). IEEE. 70. Pierson, H. A., & Gashler, M. S. (2017). Deep learning in robotics: A review of recent research. Advanced Robotics, 31(16), 821–835. 71. Sünderhauf, N., Brock, O., Scheirer, W., Hadsell, R., Fox, D., Leitner, J., Upcroft, B., Abbeel, P., Burgard, W., Milford, M., & Corke, P. (2018). The limits and potentials of deep learning for robotics. The International Journal of Robotics Research, 37(4–5), 405–420. 72. Miyajima, R. (2017). Deep learning triggers a new era in industrial robotics. IEEE Multimedia, 24(4), 91–96. 73. Degrave, J., Hermans, M., & Dambre, J. (2019) A differentiable physics engine for deep learning in robotics. Frontiers in Neurorobotics, 6. 74. Károly, A. I., Galambos, P., Kuti, J., & Rudas, I. J. (2020). Deep learning in robotics: Survey on model structures and training strategies. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(1), 266–279. 75. Mouha, R. A. (2021). Deep learning for robotics. Journal of Data Analysis and Information Processing, 9(02), 63. 76. Morales, E. F., Murrieta-Cid, R., Becerra, I., & Esquivel-Basaldua, M. A. (2021). A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning. Intelligent Service Robotics, 14(5), 773–805. 77. McLaughlin, E., Charron, N., & Narasimhan, S. (2020). Automated defect quantification in concrete bridges using robotics and deep learning. Journal of Computing in Civil Engineering, 34(5), 04020029. 78. Rassweiler, J. J., Autorino, R., Klein, J., Mottrie, A., Goezen, A. S., Stolzenburg, J. U., Rha, K. H., Schurr, M., Kaouk, J., Patel, V., & Dasgupta, P. (2017). Future of robotic surgery in urology. BJU International, 120(6), 822–841. 79. Chang, K. D., Abdel Raheem, A., Choi, Y. D., Chung, B. H., & Rha, K. H. (2018). Retziussparing robot-assisted radical prostatectomy using the Revo-i robotic surgical system: Surgical technique and results of the first human trial. BJU International, 122(3), 441–448. 80. Chen, J., Oh, P. J., Cheng, N., Shah, A., Montez, J., Jarc, A., Guo, L., Gill, I. S., & Hung, A. J. (2018). Use of automated performance metrics to measure surgeon performance during robotic vesicourethral anastomosis and methodical development of a training tutorial. The Journal of Urology, 200(4), 895–902. 81. Burgner-Kahrs, J., Rucker, D. C., & Choset, H. (2015). Continuum robots for medical applications: A survey. IEEE Transactions on Robotics, 31(6), 1261–1280. 82. Münzer, B., Schoeffmann, K., & Böszörmenyi, L. (2018). Content-based processing and analysis of endoscopic images and videos: A survey. Multimedia Tools and Applications, 77(1), 1323–1362. 83. Speidel, S., Delles, M., Gutt, C., & Dillmann, R. (2006, August). Tracking of instruments in minimally invasive surgery for surgical skill analysis. In International Workshop on Medical Imaging and Virtual Reality (pp. 148–155). Berlin, Heidelberg: Springer. 84. Doignon, C., Nageotte, F., & Mathelin, M. D. (2006). Segmentation and guidance of multiple rigid objects for intra-operative endoscopic vision. In Dynamical Vision (pp. 314–327). Berlin, Heidelberg: Springer.

Deep Learning and Robotics, Surgical Robot Applications

191

85. Pezzementi, Z., Voros, S., & Hager, G. D. (2009, May). Articulated object tracking by rendering consistent appearance parts. In 2009 IEEE International Conference on Robotics and Automation (pp. 3940–3947). IEEE. 86. Bouget, D., Benenson, R., Omran, M., Riffaud, L., Schiele, B., & Jannin, P. (2015). Detecting surgical tools by modelling local appearance and global shape. IEEE Transactions on Medical Imaging, 34(12), 2603–2617. 87. Ching, T., Himmelstein, D. S., Beaulieu-Jones, B. K., Kalinin, A. A., Do, B. T., Way, G. P., Ferrero, E., Agapow, P. M., Zietz, M., Hoffman, M. M., & Xie, W. (2018). Opportunities and obstacles for deep learning in biology and medicine. Journal of The Royal Society Interface, 15(141), 20170387. 88. Kalinin, A. A., Higgins, G. A., Reamaroon, N., Soroushmehr, S., Allyn-Feuer, A., Dinov, I. D., Najarian, K., & Athey, B. D. (2018). Deep learning in pharmacogenomics: From gene regulation to patient stratification. Pharmacogenomics, 19(7), 629–650. 89. Yong, C. W., Teo, K., Murphy, B. P., Hum, Y. C., Tee, Y. K., Xia, K., & Lai, K. W. (2021). Knee osteoarthritis severity classification with ordinal regression module. Multimedia Tools and Applications, 1–13. 90. Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P., & Saarakkala, S. (2018). Automatic knee osteoarthritis diagnosis from plain radiographs: A deep learning-based approach. Scientific Reports, 8(1), 1–10. 91. Iglovikov, V. I., Rakhlin, A., Kalinin, A. A., & Shvets, A.A. (2018). Paediatric bone age assessment using deep convolutional neural networks. In Deep learning in medical image analysis and multimodal learning for clinical decision support (pp. 300–308). Cham: Springer. 92. Garcia-Peraza-Herrera, L. C., Li, W., Fidon, L., Gruijthuijsen, C., Devreker, A., Attilakos, G., Deprest, J., Vander Poorten, E., Stoyanov, D., Vercauteren, T., & Ourselin, S. (2017, September). Toolnet: holistically-nested real-time segmentation of robotic surgical tools. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5717–5722). IEEE. 93. Attia, M., Hossny, M., Nahavandi, S., & Asadi, H. (2017, October). Surgical tool segmentation using a hybrid deep CNN-RNN auto encoder-decoder. In 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 3373–3378). IEEE. 94. Pakhomov, D., Premachandran, V., Allan, M., Azizian, M., & Navab, N. (2019, October). Deep residual learning for instrument segmentation in robotic surgery. In International Workshop on Machine Learning in Medical Imaging (pp. 566–573). Cham: Springer. 95. Solovyev, R., Kustov, A., Telpukhov, D., Rukhlov, V., & Kalinin, A. (2019, January). Fixed-point convolutional neural network for real-time video processing in FPGA. In 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus) (pp. 1605–1611). IEEE. 96. Shvets, A. A., Rakhlin, A., Kalinin, A. A., & Iglovikov, V. I. (2018, December). Automatic instrument segmentation in robot-assisted surgery using deep learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 624–628). IEEE. 97. Ronneberger, O., Fischer, P., & Brox, T. (2015, October). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Cham: Springer. 98. Hamad, G. G., & Curet, M. (2010). Minimally invasive surgery. The American Journal of Surgery, 199(2), 263–265. 99. Phee, S. J., Low, S. C., Huynh, V. A., Kencana, A. P., Sun, Z. L. & Yang, K. (2009, September). Master and slave transluminal endoscopic robot (MASTER) for natural orifice transluminal endoscopic surgery. In 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 1192–1195). IEEE. 100. Wang, Z., Sun, Z., & Phee, S. J. (2013). Haptic feedback and control of a flexible surgical endoscopic robot. Computer Methods and Programs in Biomedicine, 112(2), 260–271. 101. Ehrampoosh, S., Dave, M., Kia, M. A., Rablau, C., & Zadeh, M. H. (2013). Providing haptic feedback in robot-assisted minimally invasive surgery: A direct optical force-sensing solution for haptic rendering of deformable bodies. Computer Aided Surgery, 18(5–6), 129–141.

192

M. S. Iqbal et al.

102. Akinbiyi, T., Reiley, C. E., Saha, S., Burschka, D., Hasser, C. J., Yuh, D .D. & Okamura, A. M. (2006, September). Dynamic augmented reality for sensory substitution in robot-assisted surgical systems. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 567–570). IEEE. 103. Tavakoli, M., Aziminejad, A., Patel, R. V., & Moallem, M. (2006). Methods and mechanisms for contact feedback in a robot-assisted minimally invasive environment. Surgical Endoscopy and Other Interventional Techniques, 20(10), 1570–1579. 104. Hayward, V., Astley, O. R., Cruz-Hernandez, M., Grant, D., & Robles-De-La-Torre, G. (2004). Haptic interfaces and devices. Sensor Review. 105. Rosen, J., Hannaford, B., MacFarlane, M. P., & Sinanan, M. N. (1999). Force controlled and teleoperated endoscopic grasper for minimally invasive surgery-experimental performance evaluation. IEEE Transactions on Biomedical Engineering, 46(10), 1212–1221. 106. Tholey, G., Pillarisetti, A., Green, W., & Desai, J. P. (2004, June). Design, development, and testing of an automated laparoscopic grasper with 3-D force measurement capability. In International Symposium on Medical Simulation (pp. 38–48). Berlin, Heidelberg: Springer. 107. Tadano, K., & Kawashima, K. (2010). Development of a master–slave system with forcesensing abilities using pneumatic actuators for laparoscopic surgery. Advanced Robotics, 24(12), 1763–1783. 108. Valdastri, P., Harada, K., Menciassi, A., Beccai, L., Stefanini, C., Fujie, M., & Dario, P. (2006). Integration of a miniaturised triaxial force sensor in a minimally invasive surgical tool. IEEE Transactions on Biomedical Engineering, 53(11), 2397–2400. 109. Howe, R. D., Peine, W. J., Kantarinis, D. A., & Son, J. S. (1995). Remote palpation technology. IEEE Engineering in Medicine and Biology Magazine, 14(3), 318–323. 110. Ohtsuka, T., Furuse, A., Kohno, T., Nakajima, J., Yagyu, K., & Omata, S. (1995). Application of a new tactile sensor to thoracoscopic surgery: Experimental and clinical study. The Annals of Thoracic Surgery, 60(3), 610–614. 111. Lai, W., Cao, L., Xu, Z., Phan, P. T., Shum, P., & Phee, S. J. (2018, May). Distal end force sensing with optical fiber bragg gratings for tendon-sheath mechanisms in flexible endoscopic robots. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 5349–5255). IEEE. 112. Kaneko, M., Wada, M., Maekawa, H., & Tanie, K. (1991, January). A new consideration on tendon-tension control system of robot hands. In Proceedings of the 1991 IEEE International Conference on Robotics and Automation (pp. 1028–1029). IEEE Computer Society. 113. Lampaert, V., Swevers, J., & Al-Bender, F. (2002). Modification of the Leuven integrated friction model structure. IEEE Transactions on Automatic Control, 47(4), 683–687. 114. Piatkowski, T. (2014). Dahl and LuGre dynamic friction models—The analysis of selected properties. Mechanism and Machine Theory, 73, 91–100. 115. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., & Phee, S. J. (2015). Nonlinear friction modelling and compensation control of hysteresis phenomena for a pair of tendon-sheath actuated surgical robots. Mechanical Systems and Signal Processing, 60, 770–784. 116. Dinh, B. K., Cappello, L., Xiloyannis, M., & Masia, L. Position control using adaptive backlash. 117. Dinh, B. K., Cappello, L., Xiloyannis, M., & Masia, L. (2016, October). Position control using adaptive backlash compensation for bowden cable transmission in soft wearable exoskeleton. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5670–5676). IEEE. 118. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., & Phee, S. J. (2014). An investigation of frictionbased tendon sheath model appropriate for control purposes. Mechanical Systems and Signal Processing, 42(1–2), 97–114. 119. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., & Phee, S. J. (2015). A new approach of friction model for tendon-sheath actuated surgical systems: Nonlinear modelling and parameter identification. Mechanism and Machine Theory, 85, 14–24. 120. Do, T. N., Tjahjowidodo, T., Lau, M. W. S., Yamamoto, T., & Phee, S. J. (2014). Hysteresis modeling and position control of tendon-sheath mechanism in flexible endoscopic systems. Mechatronics, 24(1), 12–22.

Deep Learning and Robotics, Surgical Robot Applications

193

121. Lenz, I., Lee, H., & Saxena, A. (2015). Deep learning for detecting robotic grasps. The International Journal of Robotics Research, 34(4–5), 705–724. 122. Tan, Y. P., Liverneaux, P., & Wong, J. K. (2018). Current limitations of surgical robotics in reconstructive plastic microsurgery. Frontiers in surgery, 5, 22. 123. Longmore, S. K., Naik, G., & Gargiulo, G. D. (2020). Laparoscopic robotic surgery: Current perspective and future directions. Robotics, 9(2), 42. 124. Camarillo, D. B., Krummel, T. M., & Salisbury, J. K., Jr. (2004). Robotic technology in surgery: Past, present, and future. The American Journal of Surgery, 188(4), 2–15. 125. Kim, S. S., Dohler, M., & Dasgupta, P. (2018). The Internet of Skills: Use of fifth-generation telecommunications, haptics and artificial intelligence in robotic surgery. BJU International, 122(3), 356–358.

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation Armando de Jesús Plasencia-Salgueiro

Abstract Numerous fields, such as the military, agriculture, energy, welding, and automation of surveillance, have benefited greatly from autonomous robots’ contributions. Since mobile robots need to be able to navigate safely and effectively, there was a strong demand for cutting-edge algorithms. The four requirements for mobile robot navigation are as follows: perception, localization, planning a path and controlling movement. Numerous algorithms for autonomous robots have been developed over the past two decades. The number of algorithms that can navigate and control robots in dynamic environments is limited, even though the majority of autonomous robot applications take place in dynamic environments. A qualitative comparison of the most recent Autonomous Mobile Robot Navigation techniques for controlling autonomous robots in dynamic environments with safety and uncertainty considerations is presented in this paper. The work incorporates different angles like the essential technique, benchmarking, and showing parts of the improvement interaction. The structure, pseudocode, tools, and practical, in-depth applications of the particular Deep Reinforcement Learning algorithms for autonomous mobile robot navigation are also included in the research. This study provides an overview of the development of suitable Deep Reinforcement Learning techniques for various applications. Keywords Autonomous mobile robot navigation · Deep reinforcement learning · Methodology · Benchmarking · Teaching

A. de J. Plasencia-Salgueiro (B) National Center of Animals for Laboratory (CENPALAB), La Habana, Cuba e-mail: [email protected] BioCubaFarma, National Center of Animal for Laboratory (CENPALAB), La Habana, Cuba © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_7

195

196

A. de J. Plasencia-Salgueiro

1 Introduction Autonomous robots have significantly influenced the development of numerous social sectors. Since mobile robots need to be able to navigate safely and effectively, there was a strong demand for cutting-edge algorithms. Under the data-driven concept, mobile robots discovered a variety of effective algorithms for navigation and motion control with the development of Machine Learning. The four requirements for mobile robot navigation are as follows: perception, localization, path planning, and motion control. The number of algorithms that can navigate and control robots in dynamic environments is limited, although the majority of autonomous robot applications take place in dynamic environments. Applications and platform independent systems are created by introducing Deep reinforcement learning (DRL) as the general proposed framework for AI learning. At the moment, human-level control is the ultimate goal of AI and robotics. Because robotics and artificial intelligence (AI) are among the most complex engineering sciences and highly multidisciplinary, you should be well-versed in computer science, mathematics, electronics, and mechatronics before beginning the construction of an AI robot [1]. The most recent five-year-old autonomous robot control methods that can control autonomous robots in dynamic environments are the subject of a qualitative comparative study in this paper. Autonomous Mobile Robot Navigation (AMRN) methods using deep reinforcement learning algorithms are discussed in this paper. The experience of theoretical and practical implementation, validation through simulation and experimentation are all taken into consideration when discussing the evolution of each method’s application. Researchers benefit from this investigation by gaining an understanding of the development and applications of appropriate approaches that make use of DRL algorithms for AMRN. The outstanding contributions of this work will be as follows: – Define the benefits of developing mobile robots under a machine learning conception using DRL. – Give the relation and the detailed configuration of DRL for Mobile Robot Navigation (MRN). – Explain the methodology and the necessary techniques of benchmarking for the application more representative DRL algorithm for MRN. – Show the key considerations for Teaching DRL for MRN and propose two exercises using CoppeliaSim. – Defined the Application requirements of Autonomous Robots (AR) were established the Practical Safety The work is structured in the following incises. In Sect. 2. Antecedents briefly are related to the historical development of the AR control, at conventional linear control to DRL. In Sect. 3. Background, exposed the theoretical fundaments of AMRN and

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

197

Machine Learning (ML) including the requirements and the applications of ML algorithms like Reinforcement Learning (RL), Convolutional Neural Networks (CNN), the different approaches of DRL, Long Short-Term Memory (LSTM), and Application requirements particularly the navigation in dynamic environments, safety, and uncertainty. In Sect. 4. DRL Methods, make an accurate description of the more common methods described in the more recent scientific literature, including the theoretical conception, representation, logical flow chart, and pseudo-code of the different DRL algorithms and combinations of them. In Sect. 5. Design Methodology is described the necessary steps to follow in the design of DRL systems for autonomous navigation and the particularities and techniques benchmarking in different conceptions, In Sect. 6. Teaching has exposed the particularity of the teaching process of DRL algorithms and two exercises to develop in the class using a simulation conception. In Sect. 7. Discussion is treated as the principal conceptions and troubles exposed in the work. In Sect. 8. Conclusions, is provided a brief summary and the future perspective of the work. The nomenclature used in this paper is listed in Abbreviations part.

2 Antecedents 2.1 Control Theory, Linear Control, and Mechatronics Using electronics and ICs (Integrated Circuits) be able to control machines with more flexibility and accuracy using conventional linear control systems using sensors for providing feedback from the system output. Linear control is motivated by Control Theory using mathematical solutions and specifically linear algebra implemented on hardware using mechatronics, electronics, ICs (Integrated Circuits), and micro-controllers. These systems were using sensors to feedback on the error and were trying to minimize the error to stabilize the system output. These linear control systems were using a mathematical solution known as linear algebra to drive the function that maps input to the output. This field of interest was known as Automation and the goal was to create automatic systems [1].

2.2 Non-linear Control Non-linear control became more crucial to drive the non-linear function (or kernel function) mathematically for the more complicated task. The reason behind nonlinearity was the fact that input and output had different and sometimes big dimensionality and the complexity could just not be modeled using linear control and linear

198

A. de J. Plasencia-Salgueiro

Fig. 1 Control modules for generating the controlling commands [2] (Pieter Abbeel—UC Berkeley/OpenAI/Gradescope)

algebra. This was the main motivation and fuel for the rise of non-linear function learning or how to drive these functions [1].

2.3 Classical Robotics With the advancement in the computer industry, non-linear control gave birth to intelligent control which is using AI for high-level control of the robot and systems. Classical robotics was the dominating approach. These approaches were mostly application dependent and highly platform-dependent. Generally speaking, these approaches were hand-crafted, hand-engineered, and addressed as shallow AI [1]. These architectures are also referred to as GNC (Guidance, Navigation, and Control) architectures, mostly composed of perception, planning, and control modules. Perception modules were mostly used for mapping the environment and localization of the robot inside the environment, Planning modules (also referred to as navigation modules) to plan the path in terms of motion and mission, and Control modules for generating the controlling commands (controlling behaviors) required for the robot kinematics [1] (see Fig. 1).

2.4 Probabilistic Robotics Sebastian Thrun in “Probabilistic Robotics” introduces a new way of looking at robotics and how to incorporate ML algorithms for probabilistic decision-making and robot control [1]. These architectures are the best examples of classical robotics with the addition of machine learning in the planning/control (navigation/control) part. Sebastian Thrun impacted the field of robotics by adding machine learning from AI to high-level control architecture (or system software architecture). By looking at these three architectures, you can see how machine learning and computer vision have been used in a very successful way. Aside from the interfaces, the main core architecture is composed of Perception and Planning/Control or Navigation (as you can see planning/control is equal to navigation in Sebastian Thrun’s terminology). The perception part has been fully regarded as a computer vision problem and has been solved using computer vision approaches on the other hand

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

199

planning/control or navigation has been successfully solved using ML techniques and mostly Support Vector Machine (SVM) [1]. With the advances of ML algorithms in solving computer vision problems, ML as a whole (end-to-end) started to be taken a lot more seriously for intelligent robot control and navigation as an end-to-end approach for high-level control architecture (AI framework). This gave a huge boost to cognitive robotics among researchers and in the robotic community [1].

2.5 Introduction of Back-Propagation for Feed-Forward Neural Networks The main boost in reconsidering the use of Neural Networks (NN) was the introduction of the Back Propagation algorithm as a fast optimization approach. In one of Geoff Hinton´s talks, he explained the biological foundation of back-propagation and how it might happen in our brains [1].

2.6 Deep Reinforcement Learning DRL -based control have been initially introduced and coined by a company called Google Deep Mind (www.deepmind.com). This company started using this learning approach for the simulated agents in Atari games. The idea is to let the agent learn on its own till it reaches the human-control level of gaming or maybe a superior level. Recent excitement in AI was brought about by this DRL method Deep Q-network (DQN) in Atari games, a simple simulated environment, and robots for testing [1]. DRL is used where Deep Neural Network (DNN) is used to extract highdimensional observation features in Reinforcement Learning (RL). Figure 2 shows how a DNN is used to approximate the Q value for each state and how the agent acts by observing the environment accordingly. With the implementation of DRL, the robotics state will transform like in Fig. 3. Fig. 2 Working of DRL [3]

200

A. de J. Plasencia-Salgueiro

Fig. 3 Deep reinforcement learning [2]

3 Background: Autonomous Mobile Robot Navigation and Machine Learning 3.1 Requirements Examples of mobile robots (MR) include ships that move with their surroundings, autonomous vehicles, and spacecraft. Their navigation involves looking for an optimal or suboptimal route while simultaneously avoiding obstacles and considering their destination. To simplify this challenge, the majority of researchers have concentrated solely on the navigation issue in two-dimensional space. The robot’s sense of perception is its ability to perceive its surroundings. The effectiveness with which an intelligent robot completes its mission is influenced in part by the properties of the robot’s sensor and control systems, such as its capacity to plan the trajectory and avoid obstacles. Sensor monitoring for environments can be used in a broad range of locations. Mounting sensors on robotic/autonomous systems is one approach to addressing the issues of mobility and adaptability [4]. In order for the efficient robot’s action to be realized in real time, particularly in environments that are unknown or uncertain, the strict requirements of the robot’s sensor and control system parameters must be met. First, let’s talk about [5]: – – – –

Increasing the precision of the remote sensor information; Reduction of sensor signal formation time to a minimum; Reducing the processing time of the sensor data; Reducing the amount of time required for the robot’s control system to make decisions in a dynamic or uncertain environment with obstacles; – Spreading the robots’ functional characteristics through the use of fast calculation algorithms and effective sensors. The recommender software, which makes use of machine learning, gets to work once the anomaly detection software has discovered an anomaly. Using the automatic’s installed navigation system or a compass equipped with sensors and the automatic’s impact evasion system’s warning recognition data, the sensor data is combined with the mechanical’s current course. An off-policy deep learning (DL) model is used by the recommender to make recommendations for MR based on the current conditions, surroundings, and sensor readings. The MR can send the precise coordinates of the anomaly site and, if necessary, sensor data back to the base for additional investigation as required thanks to this DL ML. This is especially important when safety is at stake or when investigators only have to wear breathing apparatus or hazardous material suits for a short time. The drone can go straight to the tagged location while it analyzes additional sensors [4].

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

201

Localization Localization is the method of determining where the robot is in its environment. Ground or aerial vehicles’ precise positioning and navigation in complex spatial environments are essential for effective planning, unmanned driving, and autonomous operation [6]. In fact, the Kalman filter, which is associated with reinforcement learning, is regarded as one of the more promising strategies for precise positioning. The RLAKF (adaptive Kalman filter navigation algorithm) uses the deep deterministic policy gradient to find the most optimal state estimation and process noise covariance matrix from the continuous action space, taking the integrated navigation system as the environment and the opposite of the current positioning error as the reward. When the GNSS signal is unavailable, the RL-AKF significantly improves integrated navigation’s positioning performance [6]. Path-planning In Path-planning, the robot chooses how to maneuver to reach the goal without collision. Even though the majority of mobile robot applications take place in dynamic environments, there aren’t many algorithms that can guide robots through them [7]. For automatically mapping high-dimensional sensor data to robot motion commands without referring to the ground truth, DRL algorithms are regarded as powerful and promising tools. They only require a scalar reward function to encourage the learning agent to experiment with the environment to determine the best course of action for each state [8]. Building a modular DQN architecture to combine data from a variety of vehicle-mounted sensors is demonstrated in [8]. In the real world, the developed algorithm can fly without hitting anything. Path planning, 3D mapping, and expert demonstrations are not required for the proposed method. Using an end-to-end CNN, it turns merged sensory data into a robot’s velocity control input. Motion control The robot’s movements are controlled in Motion control to follow the desired trajectory. Linear and angular velocities, for example, fall under the category of motion control [9]. In plain contrast to the conventional framework for hierarchical planning, datadriven techniques are also being applied to the self-ruling navigation problem as a result of recent advancements in ML research. Systems that use end-to-end learning algorithms to find navigation systems that map directly from perceptual inputs to motion commands, avoiding the traditional hierarchical paradigm, have been developed in early work of this multiplicity. Without the symbolic, rule-based human knowledge or engineering design of these systems, a MR can move everywhere [9]. In Fig. 4, the mentioned requirements are linked.

202

A. de J. Plasencia-Salgueiro

Fig. 4 Requirement interrelation in AMRN [7]

3.2 Review of RL in AMR Reviewing the reinforcement learning approaches in robotics, proposing it as a framework in robotics as an application platform for RL experiments and case studies are proposed in [10]. In 2013 [11], using the reference of the works of Huang [12] and Hatem [13] proposed a framework for the simulation of RL using Artificial Neural Network (ANN) like in Fig. 5.

3.3 Introduction of Convolutional Neural Networks Bengio’s group with LeCun was the first to introduce CNN, and consistently propose them as the best AI architecture [14]. They successfully applied DL to OCR (Optical Character Recognition) for document analysis.

3.4 Advanced AMR: Introduction to DRL (Deep Reinforcement Learning) Approach Closer examination reveals that there are two versions of the DRL strategy: policies and approaches based on values. Value-based DRL indirectly obtains the agent’s policy by iteratively updating the value function. When the agent reaches an optimal value, the optimal policy is chosen using the optimal value function. Using the function approximation method, the policy-based approach directly builds a policy

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

203

Fig. 5 a A RL Structure using an ANN, b Function Q with ANN type MLP for Q-learning [13]

network. After that, it selects actions within the network to determine the value of the reward and optimizes the policy network parameters in the gradient direction to produce an optimized policy that maximizes the value of the reward [15]. Value-Based DRL Methods Deep Q network Mnih et al. published an interesting preliminary work about DQN-related research in Nature in 2015, affirming that after 49 games, the trained network could perform at a human level. In DQN, the activity estimation capability is addressed with a DNN that depend on Q learning and is referred to as a CNN. The feedback from game rewards is used to train the network. The following are DQN’s fundamental characteristics: [15]: (1) The time-difference algorithm’s TD error is handled separately by the target network. (2) An experience reiterate method is used to select the samples, and the experience pool stores and manages them (s; a; r; s ), In order to train the Q network, these samples are kept in the experience pool, where collection samples are selected at random. The elimination of sample correlation by the experience replay mechanism results in approximately independent and identical distributions for the training samples. Slope drop is utilized to refresh the NN’s boundaries [15].

204

A. de J. Plasencia-Salgueiro

Double DQN “Hado Van Hasselt” was the one who introduced Double DQN (DDQN). By breaking down the max operation in the target-to-action selection, it is used to reduce the overestimation problem to a minimum. A DQN and a DNN are combined in this system. This approach was developed to address the issue of overestimating Q values in models previously discussed. It’s aware that the action with a higher Q value is the best option for the next state, but the accuracy of the Q value depends on what It’s tried, what It got, and what will be the next state for this trial. It’s not have sufficient Q values at the beginning of the experiment to estimate the best possibility. Since there are fewer Q values at this point to choose from, the highest Q value may cause It to take an incorrect action toward the target. DDQN is used to solve this issue. One DQN is used to select the Q value, and the other uses the target network to calculate the target Q value for that particular action. This DDQN assists with limiting the miscalculation of the Q values which assists in decreasing the preparation with timing [3]. Dueling Q network A dueling Q network is utilized to tackle the issues in the DQN model by utilizing two networks, the ongoing network, and the objective network. The ongoing network approximates the Q values. Then again, the objective network chooses the following best action and plays out the action picked by the objective. It may not be necessary to approximate the value of each action in all circumstances. It uses a dueling Q network for this. In some gaming settings, when a collision occurs, it chooses to move left (or right), but in other situations, it need to know which action to take. A dueling network is an architecture that it creates for a single Q network. It’s employing two sequences rather than a single sequence following the convolution layer. Estimation values, the advantage function, and finally a single Q value are separated by employing these two sequences. As a result, the dueling network produces the Q function, which has been trained using a variety of existing algorithms, such as DDQN and SARSA. The progression of the dueling deep Q network is dueling double deep Q network (D3QN) and will revised later [3]. Policy-based DRL methods Deep Deterministic Policy Gradient (DDPG) Of particular interest is that Policy-based DRL methods can solve problems with a high-dimensional observation space, whereas value-based DRL methods (DQN and its variants) can only deal with discrete, low-dimensional action spaces. However, there are a number of appropriate tasks that necessitate constant, multilayered activity spaces, particularly actual control tasks. To solve this issue, the action space can be discretized, but it will still be dimensional, with an exponential increase in the number of actions proportional to the degree of freedom [15]. Deep Deterministic Policy (DDP), a policy gradient-based method, can be used to directly optimize the policy for problems involving a continuous action space.

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

205

In contrast to the random strategy represented by the probability distribution function, DDPG employs a deterministic policy function. It also learns from the DQN’s target network, uses a CNN to simulate the policy and Q functions, and uses experience replay to stabilize the training and guarantee high sample utilization efficiency. Gradient ascent updates the Q network over time, and the K samples in the experience pool are chosen at random. [15]. Asynchronous Advantage Actor-Critic (A3C) Also, Mnih and et al. developed the A3C algorithm, an AC-like Actor-Critical (A3C) algorithm. The agent’s policy is directly optimized by the conventional policy gradient algorithm; In order to update the policy, it must collect a series of complete sequence data. The collection of DRL sequence data can result in major adjustments [15]. In particular, the AC structure, which combines the policy gradient method with the value function, is receiving a lot of attention. The above affirmation is due to that the policy gradient method is used by the actor to select actions within the AC structure, and the value function method is used by the critic to evaluate those actions. Both the actor’s and the critic’s parameters are alternately updated during training. One of its advantages is that the AC structure transforms the sequence update in the policy gradient into a one-step update, removing the need to belief that the succession will reach a conclusion before evaluating and further developing the arrangement. The policy gradient algorithm’s variance and the difficulty of data collection are both reduced by this fitness [15]. A3C enhances the AC structure in the following ways: (1) Agents in parallel: Because the A3C algorithm generates multiple parallel environments, multiple agents with secondary structures can update the main structure’s parameters simultaneously in these environments. Multiple actors frequently explore the environment. (2) Return by taking N steps: In contrast to other algorithms, which usually use a one-step return of the instant reward calculation function obtained in the sample, A3C’s critic’s value function is updated using the multi-step cumulative return. When the N-step return is calculated, it accelerates the iterative update propagation and convergence. A3C has a lower computational cost than DQN and can run on a CPU with multiple cores. A3C has been successful in tasks like continuous robotic arm control and maze navigation, despite issues with hyperparameter adjustment and low sampling efficiency, according to the results of the experiments [15]. The particularities of Actor-Critic Learning will be explained in incise 4.4. Proximal Policy Optimization (PPO) The sampled minibatch can only be used for one update epoch in traditional policy gradient methods, which use an on-policy strategy, and must be resampled before the subsequent policy update can be implemented. The PPO algorithm’s capacity to

206

A. de J. Plasencia-Salgueiro

carry out minibatch updates over a number of epochs increases the effectiveness of sample utilization [15]. The PPO calculation utilizes an elective objective to advance the new approach utilizing the old arrangement. It is utilized to enhance the new policy’s actions in comparison to the previous policy. However, the training algorithm will become unstable if the new policy makes significant improvements. The objective function is improved by the PPO algorithm. A detailed explanation of PPO will be exposed later in the incise 4.5. Due to its ability to strike a balance between sample complexity, simplicity, and time efficiency, PPO outperforms A3C and other on-policy gradient methods. [15]. Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) RNNs are a family of NN that cannot be constrained in the feed-forward architecture. RNNs are obtained by introducing auto or backward connections—that is recurrent connections—into feed-forward neural networks [16]. When introducing a recurrent connection, is introduced the concept of time. This allows RNNs to take context into account; that is, to remember inputs from the past by capturing the dynamic of the signal. Introducing recurrent connections changes the nature of the NN from static to dynamic and is therefore suitable for analyzing time series. The simplest recurrent neural unit consists of a network with just one single hidden layer, with activation function tanh(), and with an auto connection. In this case, the output, h(t), is also the state of the network, which is fed back into the input—that is, into the input of the next copy of the unrolled network at time t + 1 [16]. This simple recurrent unit already shows some memory, in the sense that the current output also depends on previously presented samples at the input layer. However, often not enough to solve most required tasks. It needed something more powerful that can crawl backward farther in the past than just what the simple recurrent unit can do. To solve this introduced LSTM units [16]. LSTM is a more complex type of recurrent unit, using an additional hidden vector, the cell state or memory state, s(t), and the concept of gates. Figure 6 shows the structure of an unrolled LSTM unit.

Fig. 6 LSTM layer [16]

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

207

An LSTM layer contains three gates: a forget gate, an input gate, and an output gate. LSTM layers are a very powerful recurrent architecture, capable of keeping the memory of a large number of previous inputs. These layers thus fit—and are often used to solve—problems involving ordered sequences of data. If the ordered sequences of data are sorted based on time, then we talk about time series. Indeed, LSTM-based RNNs have been applied often and successfully to time series analysis problems. A classic task to solve in time series analysis is demand prediction [16].

3.5 Application Requirements Inside the applications requirement in Autonomous Robotic systems were considered only the dynamic environment, safety, and uncertainty how the more important requirements. Dynamic environment Any decision-maker in RL is an agent, and the environment is anything that isn’t the agent. As a training feedback signal, the agent interacts with the environment to maximize the reward accumulated, thereby gaining a reward value. Using the S, A, R, and P components, the agent-environment interaction process can be modeled as a Markov Decision Process (MDP). S addresses the environment’s condition, A addresses the agent’s action, R addresses the reward, and P addresses the probability of a state transition. The agent’s policy π is the mapping from state space to action space. When the state st  S, the agent takes action at  A, and then transfers to the next state st+1 according to the state transition probability P, while receiving reward value feedback r t  R from the environment [15]. Although the fact that the agent receives immediate rewards feedback at each time step, the objective of RL is to maximize the long-term cumulative reward value over the short term. The agent then continuously improves policy π by optimizing the value function. Two learning strategies have been proposed and developed by researchers because dynamic programming necessitates massive memory consumption and complete dynamic information, both of which are impracticable. Learning through Monte Carlo and temporal difference (TD). The Q-learning algorithm combines TD learning with the Bellman equations and MDP theories. Since then, RL research has made significant progress, and RL algorithms have been utilized to solve a wide variety of real-world issues [15]. Safety The course of Safety tasks in MR is viewed as within an Independent Web of Things Framework. The sensors gather data about the state of the system, which is used by intelligent agents in the Internet of Things devices and Edge/Fog/Cloud servers to decide how to control the actuators so they can act. A promising strategy

208

A. de J. Plasencia-Salgueiro

for autonomous intelligent agents is to use artificial intelligence decision-making techniques, particularly RL and DRL [17]. Uncertainty assessment for Autonomous Systems Robots can now operate in environments that are becoming increasingly dynamic, unpredictable, unstructured, and only partially observable thanks to improvements in robotics. Uncertainty is a major effect of a lack of information in these settings. Therefore, autonomous robots must be able to deal with uncertainty [18]. The operating environment of autonomous or unmanned systems is uncertain. Independent frameworks can rely on shocking insights and approximated models to plan and decide. The sensor data are inherently uncertain due to noise, inconsistency, and incompleteness. The analysis of such massive amounts of data necessitates sophisticated decision-making strategies and refined analytical techniques for effectively reviewing and/or predicting future actions with high precision. Without a measurement of prediction uncertainty, DL algorithms cannot be fully integrated into robotic systems under these circumstances. In deep DNN, there are typically two sources of prediction uncertainty: ambiguity in the model and uncertainty in the data. Autonomous vehicles operate in highly unpredictable, non-stationary, and dynamic environments. The most promising methods for treating DL weakness are Bayesian Deep Learning (BDL), which combines DL and Bayesian probability approaches, and fuzzy logic machine learning (ML) in a Web of Things structure. Under an informational conception of the Internet of Things and Big Data, it is necessary to describe the most well-known control strategies, map representation techniques, and interactions with external systems, including pedestrian interactions, for proper comprehension of this process [19].

4 Deep Reinforcement Learning Methods 4.1 Continuous Control As in previous studies behind the success of Deep Q-Learning (DQL) to the continuous action domain, the authors Lillicrap [20] developed an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. The proposed algorithm successfully solves numerous simulated physics problems, including well-known ones like cart pole swing-up, handy manipulation, legged locomotion, and car driving, by employing the same learning algorithm, network architecture, and hyper-parameters. Because it has full access to all of the space’s components and subsystems, the algorithm is able to identify strategies that resemble those of arranging calculations. It demonstrates that the algorithm can “end-to-end” learn policies for a variety of tasks: directly from sources of information and raw sensors.

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

209

4.2 A Simple Implementation of Q-learning The Bellman Eq. (1) provides the recognized definition of RL, where [21]: V (s) = max a (R(s, a) + γ V (s , ))

(1)

where: • • • • • •

V(s): Present value of a state s; R (s, a): Reward related to action a in state s; V(s ): Future value in future state s ; a: Action was taken by the agent; s: Current agent state. γ : Discount factor.

In order for the Bellman equation to be applied to Q-learning, formula (2) states that the formula needs to be transformed so that it can calculate the quality of each agent state’s actions in both the current time (t) and a previous time (t − 1). Q(s, a)t = Q(s, a)t−1 + α(R(s, a) + γ )(max a . Q(s , , a , ) − Q(s, a)(t−1) )

(2)

The Q-learning Eq. (2) was the foundation for the DQL algorithm, which makes Q values available based on the agent’s state so that actions can be taken. The agent’s Q values were determined in its current state using a dense artificial NN with four inputs. Sensors, and hidden layer with thirty artificial neurons, execute the Q-learning estimation, as displayed in the network diagram of Fig. 7. There are four Q values included in the network’s output.

Fig. 7 NN used in the learning algorithm [21]

210

A. de J. Plasencia-Salgueiro

Based on the Q weights processed by the network for the agent’s final action, the answer that was most likely to be chosen was determined using the SoftMax (3) equation. The agent received a positive reward for making a good decision and a negative reward for making a bad one after making a decision. eQ So f t Max =  Q e

(3)

With each interaction, the effectiveness of DQL increases. A technique known as replay experience was used to confirm this assertion. It forces the agent to recall his previous decisions while he made in this state. He sends these values back to the network as a result. If the previous interaction resulted in a positive reward he can keep the same decision, if he is punished, may choose a different path [21]. Starting execution, install the Q value table, set the learning times episode to 1, put the information about the learning times max-episode in the upper line, and then begin the cycle: After confirming the environment’s state S, achieving the next state S’, and obtaining the immediate environmental reward R, select the subsequent action A from the Q table; restoring the Q table; S should to be updated, the learning times should to be increased by one, and the endpoint and maximum path limit should to be checked to see if they have been reached. Continue with the “confirm s-select action A—to reach S’ exploration process” cycle if this is not the case. The program closes the circle when the specialist arrives at the desired state to check whether the current learning times are more significant than the greatest learning times. The program iterates until the required maximum number of learnings is reached in order to achieve the best Q value and path. If the judgment results are negative, the program ends. The most recent Q value table will be used in its place in the event that the judgment result is positive. The following example is shown in flowchart of Fig. 8. The number can be used to show the sequence how the algorithm works [22]. The pseudo-code in Fig. 9 shows the basic writing of the DQL algorithm.

4.3 Dueling Double DQN D3QN is used to navigate an unfamiliar indoor environment in the given example. The robot ought to avoid hitting anything in the interim. The results are the linear and angular velocities, and the raw data from the depth sensor that does not include a size scale is used as the input [23]. DDQN and Dueling DQN are combined in D3QN. Consequently, it solves the estimation issue and boosts performance. The learning sequences are considered as a MDP because they are the standard RL method. The robot interacts with its surroundings to make decisions. At each time step, the robot chooses an action based on the current observation st at time t, where the observation is a four-frame depth sensor data load.

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

Fig. 8 Q-learning Logical flow chart [22]

Fig. 9 DQL algorithm, (https://ailephant.com/category/algorithms/) [21]

211

212

A. de J. Plasencia-Salgueiro

Reward notices a sign r(st ,at ) delivered by the reward function. Moving forward, half-turning left, turning left, turning left–right, and turning right are the actions. The MR then moves on to the subsequent observation, st+1 . The (4) incremental reward for the future is: Rt =

T 

γ γ τ,

(4)

τ

and the MR’s goal is to get the most discount on the reward γ is a discount factor between 0 and 1 that weighs the importance of immediate versus future rewards. The immediate will be more significant the smaller γ it is, and vice versa. The termination time step is denoted by T. The algorithm’s goal is to make the action value function Q as maximal as possible. In contrast to DQN, D3QN’s Q function is (5). Q(s, a, θ, α, β) = V (s, θ, β) + (A(s, a; θ, α) − max(s, a; θ, α);

(5)

where are the parameters of the two streams CNN of fully connected layers, a loss function can be used to train the Q-network (6). L(θ ) =

n 1  (yk − Q(s, a; θ))2 n k

(6)

Figure 10 displays the network’s structure. Its components are the perception network and the control network. A three-layer CNN with activation and convolution at each layer makes up the perception network layer. The first CNN layer makes use of 32 5 × 5 stride 2 convolution kernels. With

Fig. 10 The structure of D3QN network. It has a dueling network and a three-layer CNN [23]

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

213

step 2, the subsequent layer makes use of 64 3 × 3 convolution parts. In the third layer, 64 2 × 2 convolution kernels are utilized with stride 2. The control network is the dueling network. The dueling network consists of two sequences of fully connected (FC) layers, with each FC layer independently estimating the state value and benefits of each action. In the FC1 layer, there are 512 nodes in each of the two FC layers. In the FC2 layer, there are two FC layers, each with six nodes. In the FC3 layer, there is a FC layer with six nodes. The ReLu function serves as the activation function for each and every layer. Figure 11 describes the D3QN model’s parameters. Fig. 11 Algorithm D3QN [23]

214

A. de J. Plasencia-Salgueiro

4.4 Actor-Critic Learning Hafner et al., developed the Dreamer algorithm without the use of robots learning simulators. Dreamer builds a world model from a replay buffer of previous experiences by acting in the environment. The predicted trajectories of the learned model are used by an actor-critic algorithm to learn actions. Decoupled learning updates from data collection to meet latency requirements and permit rapid training without waiting for the environment. A learner thread continuously trains the world model and actor-critic behavior in the implementation while an actor thread simultaneously computes actions for environment interaction [24]. World Model Learning: The world model, as shown in Fig. 13, is a DNN that teaches itself to anticipate the dynamics of the environment (Left). Future representations are predicted rather than future inputs because sensory inputs can be large images. This makes massively parallel training with large batch sizes possible and reduces the number of errors that accumulate. As a result, the world model can be interpreted as a brief environment simulation that the robot acquires on its own. As it looks into the real world, the model gets better all the time. The Recurrent StateSpace Model (RSSM), which has four parts, is the foundation for the world model [24]: Encoder Network : encθ (st |st−1 , at−1 , xt )

(7)

Dynamics Network : dyn θ (st |st−1 , at−1 ) Decoder Network : decθ (st ) ≈ xt Reward Network : r ewθ (st+1 ) ≈ rt Proprioceptive (the sense of one’s own movement, force, and body position) joint readings, force sensors, and high-dimensional inputs like RGB and depth camera images are all shared by the various physical AMR sensors. All of the sensory inputs x t are combined by the encoder network into the stochastic representations zt . Using its recurrent state, ht , the dynamics model learns to predict the sequence of stochastic representations. Because it reconstructs the sensory inputs to provide a rich signal for learning representations and permits human inspection of model predictions, the decoder is not required when learning behaviors from latent rollouts (Fig. 12). In the authors’ real-world experiments, the AMR must interact with the real world to discover task rewards, which the reward network learns to predict. It is also possible to use rewards that are specified by hand in response to the decoded sensory inputs. All of the world model’s components were jointly optimized using stochastic backpropagation [24].

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

215

Fig. 12 Dreamer algorithm uses a direct method for learning on AMR hardware without the use of simulators. The robot’s experience is collected by the current learned policy. The replay buffer is expanded by this experience. Through supervised learning, the world model is trained on replayed off-policy sequences. An actor-critic algorithm uses imagined rollouts in the world model’s latent space to improve a NN policy. Low-latency action computation is made possible by parallel data collection and NN learning, and learning steps can continue while the AMR is moving [24]

Actor-Critic algorithm The world model represents knowledge about the dynamics that the actor-critic algorithm learns is specific to the task at hand, whereas the actor-critic algorithm is independent of the task at hand. Without decoding observations, the world model’s predicted learned behaviors from rollouts are depicted in Fig. 13 (right). Similar to specialized modern simulators, this enables massively parallel behavior learning on a single GPU with typical batch sizes of 16 K. Two NN make up the actor-critic algorithm [24]: Actor − Network : π(at |st ); Critic Network : ν(st );

(8)

Finding a distribution of successful actions that maximizes the total sum of task rewards that can be predicted for each latent model state st is the job of the actor network. The critic network learns to anticipate the total quantity of future task rewards through temporal difference learning. This is important because it makes it possible for the algorithm to learn long-term strategies by, for example, taking into account rewards after the H = 16 step planning horizon has passed. It is necessary to regress the critic’s predicted model state trajectory return. An easy option is to calculate the return as the sum of N intermediate rewards and the critic’s prediction of the next state. The compute returns represent the average of all N  [1, H-1], not an arbitrary N value [24]:

216

A. de J. Plasencia-Salgueiro

Fig. 13 Training in NN Hafner et al‘s. Dreamer algorithm (2019; 2020) for rapid robot learning in real-world situations. Dreamer comprises two brain network parts. Left: A deep Kalman filter that has been trained on replay buffer subsequences is the structure of the world model. All sensory modalities are combined by the encoder into discrete codes. The decoder provides a significant learning signal and makes it possible for humans to examine model predictions by reconstructing the codes’ inputs. Without observing intermediate inputs, a RSSM is trained to predict subsequent codes based on actions. Right: Without having to reconstruct sensory inputs, the world model enables massively parallel policy optimization with large batch sizes and without remaking tangible data sources. Dreamer trains a value and policy network using imagined rollouts and a learned reward function [24]

 λ .  . λ , VH = ν(s H ). Vtλ = rt + γ (1 − λ)ν(st+1 ) + λVt+1

(9)

The actor network is trained to maximize returns, whereas the critic network is trained to regress λ-returns. Two examples of gradient estimators are reinforce and the reparameterization trick. Reiterate that the actor’s optimal policy gradient can be calculated. The differentiable dynamics network is used by Rezende and others to directly backpropagate return gradients. Following Hafner et al., reinforcement gradients for discrete action tasks and reparameterization gradients for continuous control tasks were selected. In addition to maximizing returns, the actor is encouraged to maintain a high entropy level throughout training in order to avoid a deterministic policy collapse and maintain some exploration [24]:    H  . lnπ (at |st sg Vtλ − ν(st ) + ηH [π (at |st )]]; L(π ) = −E t=1

(10)

Was optimized both actors and critics using the Adam optimizer (Kingma and Ba, 2014). As is typical in the literature (Mnih et al., 2015), a slowly updated copy of the critic network was used to calculate the λ-returns. (Lillicrap and others 2015). Because doing so would result in model predictions that are incorrect and overly optimistic, the world model is unaffected by the gradients of the actor and critic [24].

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

217

4.5 Learning Autonomous Mobile Robotics with Proximal Policy Optimization Based on the research [25], using PPO, the AMR’s current angular velocity distribution is learned. The PPO algorithm, which simplifies it for RNNs to work at a huge scope by utilizing first- order gradients, can be considered an inexact variant of trust district strategy enhancement. The continuous control learning algorithm is shown in pseudocode in Fig. 14 utilizing PPO. The proposed algorithm makes use of the actor-critic architecture. To begin, the policy stipulates πθ that the mobile robot progresses through the environment one step at a time. The state, the action, and the reward are gathered for later training at each step. The advantage function is then provided by temporal difference (TD) error,  which is the difference between the state value V ϕ (st ), and discounted rewards t1 >t γ t1 −t rt1 . By applying a gradient method to actor updates θ concerning J PPO (θ ), a surrogate function whose probability ratio is πθ (at |st )/πold (at |st ). is maximized. Actor optimizes new policy πθ (at |st ) based on the advantage function and old policy πold (at |st ). The larger the advantage function is the more probable the new policy changes are. However, if the advantage function is too large, the algorithm is very likely to be divergent. Therefore, a KL penalty is introduced to limit the learning rate from the old policy π old (at |st ) to the new policy πθ (at |st ). Critic updates ϕ by a gradient method concerning L BL (ϕ), which minimizes the loss function of TD error given data with length-T time steps. The desired change is set by the hyperparameter KLtarget in each policy iteration. If the actual change KL[πold|πθ] below or exceeds the KLtarget range in [βlow KLtarget , βhigh KLtarget ], the scaling term α > 1 would adjust the coefficient of KL[πold |πθ ]. The clipped surrogate objective is another approach that can be used in place of the KL penalty coefficient for updating the actor-network. The following summarizes the primary objective:

Fig. 14 Continuous control learning algorithm through PPO in pseudo-code

218

A. de J. Plasencia-Salgueiro

L C L I P (θ ) = t [min(rt (θ ) t , cli p(rt (θ ), 1 − ∈, 1 + ∈) t )]

(11)

where:  = 0.2 is the hyperparameter. The clip term clip(r t (θ ),1 − ,1 + )´t has the same motivation as the KL penalty, which is also used to limit too large policy updates. Reward Function. To simplify the reward function, the critic network uses only two distinct conditions without normalization or clipping:  rt (st , at ) =

r move i f not collition rcollition i f collition

(12)

A positive reward r move is given to the AMR for freely operating in the environment. Otherwise, a significant negative reward rcollision is given if the AMR collides with the obstacle during a minimum sensor scanning range check. This reward function encourages the AMR to maintain its lane and avoid collisions as it moves through the environment.

4.6 Multi-Agent Deep Reinforcement Learning Multi-Agent Reinforcement Learning (MARL) has been shown to perform well in decision-making tasks in such a dynamic environment. Multi-agent learning is challenging in itself, requiring agents to learn their policies while taking into account the consequences of the actions of others [25]. The last mentioned work [25], was proposed for energy optimization of AMR (or UAV) a direct collaborative Communication-Enabled Multi-Agent Decentralized Double Deep Q-Network (CMAD–DDQN) method where each agent relies on its local observations, as well as the information it receives from its nearby AMRs for decision making. The communicated information from the nearby AMRs will contain the number of connected ground users, instantaneous energy value, and distances from nearby AMRs in each time step. The authors propose an approach where each agent executes actions based on state information. It assumes a two-way communication link among neighboring AMRs [25]. For improved system performance, the cooperative CMAD-DDQN strategy, which relies on a communication mechanism between nearby UAVs, was proposed. In the scenario under consideration, each agent’s compensation reflects its neighborhood’s coverage performance. Each AMR is controlled by a DDQN agent, who aims to maximize the system’s energy efficiency (EE) by jointly optimizing its 3D trajectory, the number of connected ground users, and the amount of energy used by the AMRs, as shown in Fig. 15 [25].

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

219

Fig. 15 Top-Left: The CMAD-DDQN framework, in which each AMR (or UAV) j equipped with a DDQN agent interacts and learns from its state space’s closest neighbors. In order to boost system performance as a whole, each AMR directly works together. Bottom-Left: Framework known as the multi-agent decentralized double deep Q-network (MAD-DDQN), in which each AMR j that is equipped with a DDQN agent relies solely on the information it gathers from the surrounding environment and does not collaborate directly with its immediate neighbors. The DDQN agent observes its current state s in the environment at each time-step UAV j´s and updates its trajectory by selecting an action in accordance with its policy, receiving a reward r and moving to a new state s´ [25]

It was presuming that as the agents collaborate with one another in a dynamic shared environment, they might observe learning uncertainties brought on by other agents’ contradictory policies. Figure’s algorithm16 depicts Agent j’s direct collaboration with its neighbors’ DDQN. Agent j adheres to a “–greedy policy” by carrying out an action in its current state s, then moving to a new state s´ and receiving a reward that reflects its neighborhood’s coverage performance. Moreover, the DDQN procedure depicted in lines 23–31 advances the specialist’s choices (Fig. 16).

4.7 Fusion Method Due to AMR’s limited understanding of the environment when performing local pathplanning tasks, the issues of path redundancy and local deadlock that arise during planning are present in environments that are unfamiliar and complex. A novel algorithm based on the fusion (combination) of LSTM, NN, fuzzy logic control, and RL was proposed by Guo et al. This algorithm uses the advantages of each algorithm to overcome its disadvantages. With the end goal of nearby way arranging, a NN model with LSTM units is first evolved. Second, a low-dimensional input fuzzy

220

A. de J. Plasencia-Salgueiro

Fig. 16 DDQN for Agent j with direct collaboration with its neighbors [25]

logic control (FL) algorithm is used to collect training data, and the network model LSTM_FT is pretrained by transferring the learned method to acquire the required skill. RL is joined with autonomous gaining of new standards from the conditions to more readily adjust to various circumstances. In static and dynamic environments, the FL and LSTM_FT algorithms are contrasted with the fusion algorithm LSTM_FTR. Numerical simulations show that LSTM_FTR can significantly improve path planning success rate, path length optimization, and decision-making efficiency when compared to FL. The LSTM_FTR can learn new rules and has a higher success rate than the LSTM_FT [26]. The research’s simulation phase is still ongoing.

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

221

4.8 Hybrid Method Deep neuro-fuzzy systems New hybrid systems that are categorized as deep neuro-fuzzy systems (DNFS) were created as a result of the concepts of hybrid approaches. A DNN is a good way to deal with big data, the superior accuracy of the model, however, comes at the cost of high complexity. Before using this kind of network to solve some problems, it’s important to remember a few things. A deeper analytical model can be provided by a DNN because it employs multiple hidden layers; However, the computational complexity increases with each layer. In addition, these networks are based on a conventional NN that trains using the gradient descent optimization method. As a result, the DNN frequently runs into the issue of getting fixed in the local minima. The main disadvantage of a DNN, in addition to these difficulties, is that the model is frequently censured for not being transparent, and the black-box nature of the model prevents humans from tracing its predictions. It’s hard to trust the results that these deep networks produce. As a result, there is always a chance that analysts and DNNs won’t talk to each other. According to Talpur et al., this drawback restricts the usability of such networks in the majority of real-world problems, where verification of predicted results is a major worry [27]. Few studies in the literature have created a novel DNFS by combining a DNN with fuzzy systems to address these issues. Fuzzy systems are information-processingoriented structures built with fuzzy techniques. They are mostly put into systems where traditional binary logic is difficult or impossible to use. Their main characteristic is that they use fuzzy conditional IF–THEN rules to represent symbolic knowledge. Thusly, the original hybridization of DNN and fuzzy systems have shown a viable method for diminishing vulnerability utilizing fuzzy rules [27]. Figure 17 shows DNFS, which combines the advantages of a DNN and fuzzy systems (Aviles et al., 2016). Time-series data and other problems with high linearity can be solved with a sequential DNFS. As represented in Fig. 18a, b, in sequential structural design, a DNN and a fuzzy system process data sequentially. In fuzzy theory, a fuzzy set A

Fig. 17 Representation of DNFS by combining the advantages of fuzzy systems and a DNN [27]

222

A. de J. Plasencia-Salgueiro

Fig. 18 Sequential DNFS: a fuzzy systems incorporated with a DNN and b a DNN incorporated with fuzzy systems [28]

in a universe of discourse X is represented by a membership function μA taking the values from the unit interval as μA : X → [0, 1]. At this point, a membership function shows the degree of similarity for a data point within the universe of discourse of x ∈ X. To accurately describe real-world uncertainty, the fuzzy system makes use of the approximate reasoning and decision-making capabilities of fuzzy logic. It can work with data that shortage precision, ambiguity, or certainty [27]. A* hybrid with the deep deterministic policy gradient. The expansion of nodes based on an evaluation function, which is the sum of two costs, is a characteristic of the traditional graph-search algorithm known as A*: (1) from the starting point to the node under consideration and (2) from the node to the objective. In contrast to traditional path planning methods, current approaches use RL to solve the problem of robot navigation by implicitly learning to deal with the interaction ambiguity of surrounding moving obstacles. Because they are unaffected by changes in the number of robots and are decentralized, decentralized methods for dealing with multi-robot dynamic scene way arranging are becoming increasingly common. A*, adaptive Monte Carlo localization (AMCL), and RL are utilized in this additional hybrid approach to develop a robot navigation strategy for particular dynamic

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

223

scenes. Another hybrid strategy is AMRN with DRL [28]. Model migration costs are reduced when A* and DDPG method are used together. The actor-critic algorithm serves as a model for the DDPG algorithm. The procedure, as shown in Fig. 20 depicts how two critics use the model’s in Fig. 19 pseudocode to speed up the training process. One critic will advise the actor on how to avoid collision and estimate the probability of it. In addition to instructing the actor on how to get there, the additional critic will reduce the difference between the input speed and the output speed.

4.9 Hierarchical Framework Wei Zhu and authors following sampling efficiency and sim-to-real transfer capability, a hierarchical DRL framework for quick and secure navigation is described in [29]. The low-level DRL policy enables the robot to simultaneously move toward the target position while maintaining a safe distance from obstacles; The high-level DRL policy has been expanded to further enhance navigational safety. It is chosen as a sub-goal as a waypoint on the path from the robot to the ultimate goal to avoid sparse reward and reduce state space. The path can also be made with a local or global map, which can make the proposed DRL framework’s generalizability, safety, and sampling efficiency much better. The sub-goal can also be used to reduce action space and increase motion efficiency by creating a target-directed representation of the action space. The objective is a DRL strategy with high training efficiency for quick and secure navigation in complex environments that can be used in a variety of environments and robot platforms. The low-level DRL policy is in charge of quick motion, and the high-level DRL policy was added to improve obstacle avoidance safety. As a result, a two-layer DRL framework is built, as shown in Fig. 21. When the sub-goal, which is a waypoint on the path from the robot to the ultimate goal, is chosen, the observation space of RL is severely limited. When conventional global path planning strategies are used to generate the path, which takes into account both obstacles and the final goal position, the sampling space is further reduced. Due to the inclusion of the sub-goal, the training effectiveness of this DRL framework is significantly higher than that of pure DRL methods. The DNN only generates discrete linear velocity, and the sub-goal’s angular velocity is inversely proportional to its orientation in the robot frame. Consequently, there is less room for exploratory action and less room for observation. Additionally, the proposed DRL framework is very adaptable to a variety of robot platforms and environments for three reasons: (1) the DRL elements are represented by a sub-goal on a feasible path; (2) the observation includes a highdimensional sensor scan whose features were extracted using DNN; (3) Generalized linear and angular velocities are used to convert the actions into actuator commands [29]. The DQN algorithm is utilized by the value-based RL framework in both low-level and high-level DRL. The discrete action space was chosen due to its simplicity, even

224

Fig. 19 A* combined with DDPG method pseudo code

A. de J. Plasencia-Salgueiro

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

225

Fig. 20 A* utilized the DDPG architectural strategy. The laser input is provided by the robot sensor. The input for navigation comes from global navigation. Using the vel-output, the mobile base controls the robot. The gradient and computed mean squared error (MSE) are used to update the actors’ neural networks [28]

Fig. 21 DRL framework in a hierarchy. The high-level DRL strategy aims to safely avoid obstacles while the low-level DRL policy is utilized for rapid motion. A 37-dimension laser scan, the robot’s linear and angular velocities (v and w), and the sub-goal’s position in the robot frame (r and θ) are all part of the same state input for both the low-level and high-level DRL policies. In contrast, the high-level DRL policy generates two abstract choices, one of which relates to the low-level DRL policy, while the low-level DRL policy generates five specific actions. [29]

226

A. de J. Plasencia-Salgueiro

though the DDPG algorithm and the soft-actor-critic (SAC) framework are required for smoother motion.

5 Design Methodology With DRL systems for autonomous navigation, the two most significant issues are data inefficiency and a lack of generalizability to new goals. A design methodology to be used for designing DRL applications in autonomous systems is given in [30]. For the development of DRL-based systems, Hillebrand’s methodology was designed to accommodate the need to compromise a design method and make consecutive design decisions. The V-Model’s fundamental principles serve as the methodology’s foundation. Figure 22 describes the process’s four consecutive interactions. The Operational Domain Analysis is the first step. The operational environment and the robot’s tasks in terms of the use case and requirements are defined in this phase. The criteria for testing and evaluating are the requirements. The second stage is conceptual design. At this point, the primary characteristic of the problem with reinforcement learning is established. Powerful characteristics include the activity space, perception space, reward range, and significant gambling conditions. The Systems Design is the third step. The various design decisions that need to be made and an understanding of the fundamental factors that influence them are all part of this phase. The design of the reward is the first crucial aspect. The goal that the agent is supposed to achieve is implicitly encoded in their ward. The selection of an algorithm is the second design decision. When choosing an algorithm, there are a few things to consider. The kind of action space first. Both a discrete and a continuous action space can be handled by the DRL algorithm.

Fig. 22 Design Process for DRL [30]

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

227

The design of the NN is the third step. In DRL, NN are used to approximate value and policy networks’ functions. The inductive bias is the fourth design decision. Domain heuristics that are utilized to accelerate the algorithm’s learning processor performance are referred to as an inductive bias. The learning rate is the final design factor. Virtual commissioning is the fourth step. The rate at which the NN is trained is determined by this factor. X-in-the-Loop Techniques and Virtual Testbeds are used to evaluate agent performance and integrate the model in this step [30].

5.1 Benchmarking Benchmarking is a difficult problem because of the learning process’s stochastic nature and the limited datasets examined in algorithm comparisons. This problem gets even worse with DRL. Since DRL incorporates both the environment’s and model learning’s inherent stochasticities, it is particularly challenging to guarantee reproducibility and fair comparisons. To this end, benchmarks have been created using simulations of numerous sequential decision-making tasks [31]. Best practices to benchmark DRL Number of Trials, Random Seeds, and Significance Testing The number of trials, random seeds, and significance testing all plays a significant role in DRL. Stochasticity comes from stochasticity in environments and randomness in NN initializations. Simply changing the random seed can cause significant results variations. Therefore, it is essential to conduct a large number of trials with various random seeds when evaluating the performance of algorithms [31]. In DRL, it is common practice to simply use the average of several learning trials to determine an algorithm’s effectiveness. This is a reasonable benchmark strategy because methods derived from significance testing provide statistically supported arguments in support of a particular hypothesis. Using a variety of random seeds and environmental conditions, significance testing can be applied to DRL in practice to account for the standard deviation across multiple trials. A direct 2-sample t-test, for instance, can be used to determine whether performance gains are significantly attributable to the algorithm’s performance or whether the results are too noisy in highly stochastic environments. Particularly, it has been argued that for accurate comparisons, it is not sufficient to simply present the top-K trials as performance gains in several works. [31]. In addition, it is important to exercise caution when interpreting the outcomes. It is possible to demonstrate that a hypothesis holds for one or more specific environments and sets of hyperparameters, but fails in other contexts.

228

A. de J. Plasencia-Salgueiro

Hyperparameter Tuning and Ablation Comparisons Ablation and tuning are two additional important considerations. In this instance, a variety of random seed combinations are compared using an ablation analysis across a number of trials. For baseline algorithms, it is especially important to finetune hyperparameters as much as possible. A false comparison between a novel algorithm and a baseline algorithm may occur if the hyperparameters used are not selected appropriately. Particularly, a large number of additional parameters, such as the learning rate, reward scale, network architecture, and training discount factor, among others, have the potential to significantly affect outcomes. To ensure that a novel algorithm performs significantly better, the appropriate scientific procedure must be followed when selecting such hyperparameters. Reporting Results, Benchmark Environments, and Metrics Some studies have used metrics like the maximum return within Z samples or the average maximum return, but these metrics may be biased to make the results of highly unstable algorithms appear more significant. These metrics, for instance, would guarantee that an algorithm is successful if it quickly achieves a high maximum return but then diverges. It is essential to select measurements that provide a fair examination when selecting measurements to report. If the algorithm performs better in average maximum return but worse in average return, it is essential to highlight both results and describe the algorithm’s advantages and disadvantages. This also applies to selecting the evaluation’s benchmark environments. Ideally, observational outcomes ought to cover a large number of conditions to figure out which conditions a calculation succeeds in and which don’t. Identifying performance applications and capabilities in the real world requires this. Open-source software for DRL simulation A learning algorithm—model-based or model-free—and a particular structure or structures of function approximators make up a DRL agent. There is a lot of software available to simulate DRLs for autonomous mobile robot navigation. For Validation and Verification of AMR, there are 37 tools listed under Effect Tools in Araujo [32], Of those 37 tools, 27 have their source artifacts made public and are open-source. According to the same authors [32], the most significant gap is the absence of agreed-upon rigorous measures and real-world benchmarks for evaluating the interventions’ efficiency and effectiveness. Measures of efficiency and effectiveness that are applicable to the AMR sub-domains are scarce, and many of them are extremely generic. Due to the absence of domain-specific modeling languages and runtime verification methods, AMR testing strategies have possibility for improvement. Another huge hole is the shortfall of quantitative determination dialects for indicating the ideal framework properties; Property languages must cover topics like the combination of discrete and continuous dynamics, stochastic and epistemic aspects,

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

229

and user- and environment-related aspects of behavior because of the inherent heterogeneity of AMR. This is related to the lack of interventions that provide quantitative quality metrics as the test’s result and conduct a quantitative system analysis [32]. Techniques for benchmarking Gazebo (https://gazebosim.org/home): An open-source 3D simulator for robotics applications is called Gazebo. From 2004 to 2011, Gazebo was a part of the Player Project. Gazebo became an independent project in 2012, and the Open Source Robotics Foundation (OSRF) began providing support for it. Open Dynamics Engine (ODE), Bullet, Simbody, and Dynamic Animation and Robotics Toolkit (DART) are among the physics engines that are incorporated into Gazebo. Each of these physics engines is capable of loading a physical model that is described in XML format by a Simulation Description Format (SDF) or Unified Robotic Description. Also, Gazebo allows users to come up with their world, model, sensor, system, visual, and GUI plugins by implementing C ++ Gazebo extensions. This capability enables users to extend the simulator further into more complex scenarios. OSRF provides a bridge between Gazebo and Robot Operating System (ROS) with the gazebo_ros plugin package [33]. ROS is an open-source software framework for robot software development maintained by OSRF. ROS is a widely used middleware by robotics researchers to leverage the communication between different modules in a robot and between different robots and to maximize the re-usability of robotics code from simulation to the physical devices. ROS allows the running of different device modules as a node and provide multiple different types of communication layers between the nodes such as service, publisher-subscriber, and action communication models to satisfy different purposes. This allows robotics developers to encapsulate, package, and re-use each of the modules independently. Additionally, it allows each module to be used in both simulation and physical devices without any modification [33]. In robotics, as well as many other real-world systems, continuous control frameworks are required. Numerous works provide RL-compatible access to extremely realistic robotic simulations by combining ROS with physics engines like ODE or Bullet. The greater part of them can be run on genuine mechanical frameworks with a similar programming. Two examples of the implementation of Gazebo in benchmarking deep reinforcement learning algorithms are the following. Yue et al. [34] is troubled by the AMRN’s inability to construct an environment map prior to moving it to its desired position. Instead, it only relies on what is currently visible. The DQN is used to map the initial image to the mobile robot’s best action within the DRL framework. As previously stated, it is difficult to directly apply RL in a real-world robot navigation scenario due to the need for a large number of training examples. Prior to being utilized to tackle the issue in a genuine portable robot route situation, the DQN first goes through preparing in the Gazebo reproduction environment. The proposed method has been approved for both simulation and real-world testing. The experimental results of autonomous mobile robot navigation in the Gazebo simulation environment demonstrate that the trained DQN is able to

230

A. de J. Plasencia-Salgueiro

accurately map the current original image to the AMR’s optimal action and approximate the AMR’s state action-value function. The experimental results in real-world indoor scenes demonstrate that the DQN that was trained in a simulated environment can be utilized in a real-world indoor environment. The AMR can similarly stay away from problems and get to the planned area, even in unique conditions where there is interference. As a result, it can be used as an effective and environmentally adaptable AMRN method by AMR operating in an unknown environment. For robots moving in tight spaces [35], the authors assert that mapping, localization, and control noises could result in collisions when using motion planning based on the conventional hierarchical autonomous system. In addition, it is disabled when there is no map. To address these issues, the authors employ DRL, a self-decisionmaking technique, to self-explore in small spaces without a map and avoid collisions. The rectangular safety region, which represents states and detects collisions for robots with a rectangular shape, and a meticulously constructed reward function, which does not require information about the destination, were suggested to be used for RL using the Gazebo simulator. After that, they test five reinforcement learning algorithms—DDPG, DQN, SAC, PPO, and PPO-discrete—in a narrow track simulation. After training, the successful DDPG and DQN models can be applied to three brand-new simulated tracks and three actual tracks. (https://sites.google.com/view/ rl4exploration). Benchmarking Techniques for Autonomous Navigation Article [36] confirms, that “a lack of an open-source benchmark and reproducible learning methods specifically for autonomous navigation makes it difficult for roboticists to choose what RL algorithm to use for their mobile robots and for learning researchers to identify current shortcomings of general learning methods for autonomous navigation”. Before utilizing DRL approaches for AMRN, the four primary requirements that must be satisfied are as follows: thinking about safety, generalization to diverse and novel environments, learning from limited experimentation information, and thinking under uncertainty of partially observed sensory inputs. The four main categories of learning methods that can satisfy one or more of the aforementioned requirements are safe RL, memory-based NN architectures, model-based RL, and domain randomization. A comprehensive investigation of the extent to which these learning strategies are capable of meeting these requirements for RL-based navigation systems is carried out by incorporating them into a brand-new open-source large-scale navigation benchmark. This benchmarking’s codebase, datasets, and experiment configurations can be found at https://github.com/Daffan/ros_jackal. Benchmarking multi-agent deep reinforcement learning algorithms An open-source Framework for Multi-robot Deep Reinforcement Learning (MADRL), named MultiRoboLearn was proposed by Chen et al. [37]. In terms of generality, efficiency, and capability in an unstructured and large complex environment, it is also important to include the support of multi-robot systems in existing robot learning frameworks. More specifically, complex tasks such as search/rescue,

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

231

group formation control, or uneven terrain exploration require robust, reliable, and dynamic collaboration among robots. MADRL is a framework that acts as a bridge to link multiagent DRL algorithms with real-world multi-robot systems. This framework has two key characteristics compared with other frameworks. On the one hand, compared with learning-based single-robot frameworks, it is important to consider how robots collaborate to perform tasks intelligently, and how robots communicate with each other efficiently, also, this work extends the system to the domain of learning algorithms (https://github.com/JunfengChenrobotics/MultiRoboLearn).

6 Teaching Robot navigation is typically accomplished through imperative programming in typical educational robotics approaches. These methods, on the other hand, miss an opportunity to introduce ML techniques based in an authentic and engaging learning context due to the increasing presence of AI in everyday life. Additionally, barriers that prevent all students from participating in robotics experiences include the requirement of a lot of physical space as well as pricey, specialized equipment [38]. Individual learning path planning has become more common in online learning systems in recent years [39], but few studies have looked at teaching path planning in traditional classrooms. Authors in the work “Open Source Robotic Simulators Platforms for Teaching Deep Reinforcement Learning Algorithms” [40] suggest for the teaching process of RL and DRL algorithms according to their experience, two open source robotic simulator platforms. The union of Gym, a toolkit for developing and comparing RL algorithms, and the robotic simulator CoppeliaSim (V-REP). This conception was followed in this chapter. CoppeliaSim: A distributed control architecture is the foundation of the integrated development environment for the robotics simulator CoppeliaSim (https://www.cop peliarobotics.com/). Using an inserted script, a plugin, a ROS node, a remote API client, or a custom solution, each object or model can be individually controlled. This makes CoppeliaSim very versatile and ideal for multi-robot applications. CoppeliaSim is great for multi-robot applications because of this Control programs can be written in Octave, C/C + + , Python, Java, Lua, or Matlab. CoppeliaSim is utilized for a variety of purposes, including rapid algorithm development, simulations of factory automation, rapid prototyping and verification, robotics-related education, remote monitoring, safety double-checking, and more [41]. The first proposed exercise in this paper is to control the position of a simulated Kephera IV mobile robot in a virtual environment using RL algorithms. In order to carry out the experiments and control the robot’s movement in the simulated environment, the OpenAI Gym library and the 3D simulation platform CoppeliaSim are utilized. The results of the RL agents used are compared to those of Villela and IPC, two control algorithms, to represent the DDPG and DQN. The results of the

232

A. de J. Plasencia-Salgueiro

Fig. 23 Communication program [42]

analyses conducted under both obstacles and no obstacles conditions demonstrate that DDPG and DQN are able to learn and comprehend the best activities in the environment. This enables us to actually play out the position of control over various target locations and achieve the best results given a variety of metrics and records (https://github.com/Fco-Quiroga/exercise center Khepera position) [42]. An interface that converts the CoppeliaSim simulation into an environment that is compatible with RL agents and control algorithms was developed using the OpenAI Gym library. Gym provides a straightforward description of RL environments, formalized as Partially Observable Markov Decision Processes (POMDPs). Figure 23, which is a diagram of the communication that takes place in the environments, shows the information flow that takes place between the algorithm or agent, the Gym environment, the API, and the CoppeliaSim program. In the second exercise, CoppeliaSim (V-Rep) is used to implement a learningbased mapless motion planner. The objective situation according to the direction outline for the versatile robot is yield by this organizer, which takes constant guiding orders as information [44]. For mobile ground robots equipped with laser range sensors, the primary source of traditional motion planners is the obstacle map of the navigation environment. This requires both the environment’s obstacle map building work and the highly precise laser sensor. Using an unconventional profound support learning strategy, it is demonstrated how a mapless movement organizer can be prepared from beginning to end with virtually no physically planned highlights or previous exhibits. The trained planner can immediately put their knowledge to use, both in environments that are watched and those that are not. The tests demonstrate that the proposed mapless motion planner is able to successfully direct the no holonomic mobile robot to its intended locations. In the examples at https://paperswit hcode.com/paper/virtual-to-real-deep-reinforcement-learning, the gazebo is used.

7 Discussion The introduction of deep reinforcement learning as the learning, general proposed framework for AI creates application-independent and platform-independent systems.

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

233

The ultimate goal in AI and Robotics currently is to reach human-level control. Robotics and AI are among the most complicated engineering sciences and highly multidisciplinary, supported by a good amount of basic knowledge in other sciences. The theoretical origins of Autonomous Mobile robot parts at Control Theory and reach the actual machine learning applications. The requirements of Mobile Robot Navegation are Perception, Localization, Pathplanning, and Motion Control. In the case of Path-planning, for mapless robot navigation, it may be a fuzzy requirement how as represented in Fig. 4. Continuing with Background, the reader may be introduced in the basement of DRL with concepts and functional structures of Machine Learning algorithms that contribute to general DRL conception. Here are mentioned RL, CNN, LSTM, the value-based DQN, DDQN, DD3QN, and the policy-based DDPG, A3C, and PPO, all of them to execute the requirements of Dynamic behavior, safety, and uncertainty. The Methods section is more theoretical and embraces the explanation, structure, and pseudo-code of algorithms that permit the programming implementation of the more common and perspective methods for autonomous mobile robotics like simple algorithms, fusion, hybrid, hierarchical, and multi-agent that permit the researcher to visualize all the options and possibilities for the development mobile autonomous robots using DRL. The two most important problems with DRL systems for autonomous navigation are data inefficiency and lack of generalization to new goals. The methodology and benchmarking tools described in Sect. 5 support the minimization of these problems in the development effort. The emphasis is doing in the use of simulation tools. Gazebo is the Simulation tool more implemented in the Deep Reinforcement Learning world for AMR development. One important topic shown above is the key role of Teaching Robotics and Machine Learning and the importance of Simulation for better comprehension. For teaching, the text has proposed the use of CoppeliaSim. Is expected continuous theoretical development in this field, particularly in the fusion, hybrid and hierarchical implementations. With all this knowledge, is possible for the researchers to find a way the development of autonomous mobile robotics using Deep Reinforcement Learning algorithms with better functionalities in safe and efficient navigation fields. Papers like this exposed the general characterization of the use of DRL for the development of AMRN that embraces not all these topics in only one work don´t found by the author at this moment.

8 Conclusions Numerous fields have benefited greatly from autonomous robots’ contributions. Since mobile robots need to be able to navigate safely and effectively, there was a strong demand for innovative algorithms but contradictory, the number of algorithms that

234

A. de J. Plasencia-Salgueiro

can navigate and control robots in dynamic environments is limited, even though the majority of autonomous robot applications take place in dynamic environments. With development of machine learning algorithm, in particular Reinforcement Learning algorithms and Deep Learning, with the creation the Deep Reinforcement Learning algorithms, is opened a wide field of applications at new level of Autonomous Mobile Robot Navigation Techniques in dynamic environments with safety and uncertainty considerations. However, this is a very fast theme and for better development is necessary to establish a methodological conception that in first order, select and characterized the fundamental Deep Reinforcement Learning Algorithms at code level, making qualitative comparison of the most recent Autonomous Mobile Robot Navigation techniques for controlling in dynamic environments with safety and uncertainty considerations, and underlying the most complex and promissory techniques like fusion, hybrid and hierarchical framework. In second order, is included the design methodology and established the different benchmarking techniques for the selection of better algorithm according the specific environment. Finally, but no minor significant, is recommended the tool and more suitable examples, according the experience, for Teaching Autonomous robot navigation using Deep Reinforcement Learning Algorithms. Thinking about the future perspective of the work, is necessary the continuing development of methodology and write a more homogenous and practical document that permits the inclusion of new developed algorithms and a better comprehension of the exposed methodology. Hope that this methodology helps students and researcher in his work.

Abbreviations AC AI AMRN ANN AR BDL CNN CMAD-DDQN DRL DNN DNFS DL DQN DDQN D3QN DDPG

Actor-Critical Method Artificial Intelligence Autonomous Mobile Robot Navigation Artificial Neural Networks Autonomous Robot Bayesian Deep Learning Convolutional Neural Networks Communication-Enabled Multiagent Decentralized DDQN Deep Reinforcement Learning Deep Neural Networks Deep Neuro-fuzzy systems Deep Learning Deep Q-network Double DQN Dueling Double Deep Q-network Deep Deterministic Policy Gradient

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

DDP DQL DART FC FL GNC LSTM MNR ML MR MDP MARL MADRL MSE NN ODE OSRF POMDPs PPO RL RL-AKF RNN ROS RSSM SAC SDF URDF

235

Deep Deterministic Policy Deep Q-Learning Dynamic Animation and Robotics Toolkit Fully Connected Fuzzy Logic Control Guidance, Navigation and Control Long Short Term Memory Mobile Robot Navigation Machine Learning Mobile Robot Markov Decision Process Multi-Agent Reinforcement Learning Multi Robot Deep Reinforcement Learning Mean Square Error Neural Networks Open Dynamics Engine Open Source Robotics Foundation Partially observable Markov Decision processes Proximal Policy Optimization Reinforcement Learning Adaptive Kalman Filter Navigation Algorithm Recurrent Neural Network Robot Operating System Recurrent State-Space Model Soft Actor Critic Simulation Description Format Unified Robotic Description Format.

References 1. Dargazany DRL (2021). Deep Reinforcement Learning for Intelligent Robot Control–Concept, Literature, and Future (Vvol. 13806v1, no. 2105, p. 16). 2. Abbeel, P. (2016). Deep learning for robotics. In DL-workshop-RS. 3. Balhara, S. (2022). A survey on deep reinforcement learning architectures, applications and emerging trends. IET Communications, 16. 4. Hodge, V. J. (2020). Deep reinforcement learning for drone navigation using sensor data. Neural Computing and Applications, 20. 5. Kondratenko, Y., Atamanyuk, I., Sidenko, Machine learning techniques for increasing efficiency of the robot’s sensor and control information processing. Sensors MDPI, 22(1062), 31. 6. Gao, X. (2020). RL-AKF: An adaptive kalman filter navigation algorithm based on reinforcement learning for ground vehicles. Remote Sensing, 12(1704), 25. 7. Hewawasam, H. S. (2022). Past, present and future of path-planning algorithms for mobile robot navigation in dynamic environments. IEEE Industrial Electronics Society, 3(2022), 13.

236

A. de J. Plasencia-Salgueiro

8. Doukhi, O. (2022). Deep reinforcement learning for autonomous map-less navigation of a flying robot. IEEE Access, 13. 9. Xiao, X. (2022). Motion planning and control for mobile robot navigation using machine learning: A survey. Autonomous Robots, 29. 10. Kober, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, no. Res.0278364913495721. 11. Plasencia, A. (2013). Simulación de la navegación de los robots móviles mediante algoritmos de aprendizaje por refuerzo para fines docentes. In TCA-2013, La Habana. 12. H. B. (2005). Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In: Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou. 13. H. M. (2008). Simulation of the navigation of a mobile robot by the Q Learning using artificial neuron networks. In University Hadj Lakhdar, Batna, Algeria. 14. Bengio, Y. (2009). Learning deep architectures for AI. in Now Publishers Inc. 15. Zhu, K. (2021). Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Science and Technology, 26(5), 18. 16. Melcher, K., Silipo, R. (2020). Codeless deep learning with KNIME. Packt Publishing. 17. Plasencia, A.: Autonomous robotics safety. in X Taller Internacional De Cibernética Aplicada, La Habana. 18. González-Rodríguez, L. (2021). Uncertainty-Aware autonomous mobile robot navigation with deep reinforcement learning. In: Deep learning for unmanned systems, Switzerland AG (pp. 225–257). Springer. 19. Plasencia, A. (2021). Managing deep learning uncertainty for unmanned systems. In Deep Learning for Unmanned Systems, Switzerland (pp. 184–223). Cham: Springer. 20. Lillicrap, T. P. (2016). Continuous control with deep reinforcement. In ICLR 2016, London, UK. 21. Rodrigues, M. (2021). Robot training and navigation through the deep Q-Learning algorithm. In IEEE International Conference on Consumer Electronics (ICCE). 22. Jiang, Q. (2022). Path planning method of mobile robot based on Q-learning. in AIIM-2021 Journal of Physics: Conference Series. 23. Ruan, X. (2019). Mobile robot navigation based on deep reinforcement learning. in The 31th Chinese Control and Decision Conference (2019 CCDC), Beijing. 24. Wu, P. (2022). DayDreamer: World models for physical robot learning (p. 15). arXiv:2206.14176v1 [cs.RO]. 25. Omoniwa, Communication-Enabled multi-agent decentralised deep reinforcement learning to optimise energy-efficiency in UAV-Assisted networks. In IEEE transactions on cognitive communications and networking (p. 12). 26. Guo, N. (2021). A fusion method of local path planing for mobile robots based on LSTM neural network and reinforcement learning. Mathematical Problems in Engineering Hindawi, 2021, no. id 5524232, p. 21, 2021. 27. Talpur, N. (2022). Deep Neuro-Fuzzy System application trends, challenges, and future perspectives: a systematic survey. Artificial Intelligence Review, 49. 28. Zhao, K. (2022). Hybrid navigation method for multiple robots facing dynamic obstacles. Tsinghua Science and Technology, 27(6), 8. 29. Zhu, W. (2022). A hierarchical deep reinforcement learning framework with high efficiency and generalization for fast and safe navigation. IEEE Transactions on Industrial Electronics, 10. 30. Hillebrand, M. (2020). A design methodology for deep reinforcement learning in autonomous systems. Procedia Manufacturing, 52, 266–271. 31. François-Lavet, V. (2018). An introduction to deep reinforcement learning. Foundations and Trends in Machine Learning, 11(3–4), 140. arXiv:1811.12560v2 [cs.LG]. 32. Araujo, H. (2022). Testing, validation, and verification of robotic and autonomous systems: A systematic review. Association for Computing Machinery ACM, 62.

Deep Reinforcement Learning for Autonomous Mobile Robot Navigation

237

33. La, W. G. (2022). DeepSim: A reinforcement learning environment build toolkit for ROS and Gazebo (p. 10). arXiv:2205.08034v1 [cs.LG]. 34. Yue, P. (2019). Experimental research on deep reinforcement learning in autonomous navigation of mobile robot (2019) 35. Tian, Z. (2022). Reinforcement Learning for Self-exploration in Narrow Spaces (Vol. 17, p. 7). arXiv:2209.08349v1 [cs.RO]. 36. Xu, Z. Benchmarking reinforcement learning techniques for autonomous navigation. 37. Chen, J. (2022). MultiRoboLearn: An open-source Framework for Multi-robot Deep Reinforcement Learning (p. 7). arXiv:2209.13760v1 [cs.RO]. 38. Dietz, G. (2022). ARtonomous: Introducing middle school students to reinforcement learning through virtual robotics. In IDC ’22: Interaction Design and Children. 39. Yang, T., Zuo (2022). Target-Oriented teaching path planning with deep reinforcement learning for cloud computing-assisted instructions. Applied Sciences, 12(9376), 18. 40. Armando Plasencia, Y. S. (2019). Open source robotic simulators platforms for teaching deep reinforcement learning algorithms. Procedia Computer Science, 150, 9. 41. Coppelia robotics. Retrieved October 10, 2022, from https://www.coppeliarobotics.com/. 42. Quiroga, F. (2022). Position control of a mobile robot through deep reinforcement learning. Applied Sciences, 12(7194), 17. 43. Zeng, T. (2018). Learning continuous control through proximal policy optimization for mobile robot navigation. In: 2018 International Conference on Future Technology and Disruptive Innovation, Hangzhou, China. 44. Tai, L. (2017). Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In IROS 2017, Hong Kong.

Event Vision for Autonomous Off-Road Navigation Hamad AlRemeithi, Fakhreddine Zayer, Jorge Dias, and Majid Khonji

Abstract Robotic automation has always been employed to optimize tasks that are deemed repetitive or hazardous for humans. One instance of such an application is within transportation, be it in urban environments or other harsh applications. In said scenarios, it is required for the platform’s operator to be at a heightened level of awareness at all times to ensure the safety of on-board materials being transported. Additionally, during longer journeys it is often the case that the driver might also be required to traverse difficult terrain under extreme conditions. For instance, low light, fog, or haze-ridden paths. To counter this issue, recent studies have proven that the assistance of smart systems is necessary to minimize the risk involved. In order to develop said systems, this chapter discusses a concept of a Deep Learning (DL) based Vision Navigation (VN) approach capable of terrain analysis and determining the appropriate steering angle within a margin of safety. Within the framework of Neuromorphic Vision (NV) and Event Cameras (EC), the proposed concept is tackling several issues within the development of autonomous systems. In particular, the use of Transformer based backbone for off-road depth estimation using an event camera for better accuracy result and processing time. The implementation of the above mentioned deep learning system, using event camera is leveraged through the necessary data processing techniques of the events prior to the training phase. Besides, binary convolutions (BN) and alternately spiking convolution paradigms

H. AlRemeithi Tawazun Technology and Innovation, Abu Dhabi, United Arab Emirates e-mail: [email protected] H. AlRemeithi · F. Zayer (B) · J. Dias · M. Khonji Khalifa University, Abu Dhabi, United Arab Emirates e-mail: [email protected] J. Dias e-mail: [email protected] M. Khonji e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_8

239

240

H. AlRemeithi et al.

using the latest technology trend have been deployed as acceleration methods, with efficiency in terms of energy latency, and environmental robustness. Initial results hold promising potential for the future development of real-time projects with event cameras. Keywords Autonomous robotics · Off-road navigation · Event camera · Neuromorphic sensing · Robotic vision · Deep learning systems

1 Introduction Despite the advancements in autonomous driving algorithms, there still exists much room for development within the realm of off-road systems. Current state-of-theart techniques for self-driving platforms have matured in the context of inter-city travel [1, 2] and thus neglects the challenges faced when navigating environments such as deserts. Uneven terrain and the lack of relevant landmarks and/or significant features, pose serious challenges when the platform attempts to localize itself or analyze the terrain to determine suitable navigation routes [3]. When discussing desert navigation, even for skilled drivers, maneuverability is a complex task to achieve that requires efficient decision-making. It has been shown that self-driving platforms are capable of meeting such standards when using a combination of sensors that measure the state of the robot and its surroundings [2]. Popular modern approaches include stereo-vision in addition to ranging sensors like LIDARs to map the environment [4]. Utilizing several sensors allow for redundancy and safer navigation, but also at the cost of increased development complexity, system integration requirements, and financial burden [5]. Researchers have addressed this concern by developing algorithms that minimize on-board sensors, where in extreme cases a monocular vision-based approach is developed [6]. Whilst this reduces the computational resources required to run the system in real-time, it is still feasible to further reduce the system complexity. In the recent years, Neuromorphic computing has been researched heavily to further optimize these systems, and to accommodate these novel architectures, a new kind of vision sensor, Event cameras, are used in-place of traditional Frame-based cameras. So, from the above, efficient off-road navigation is yet to be achieved, especially in the context of extreme environments. The study presented discusses scenarios that may be experienced in the UAE deserts. The contributions presented in this chapter is a concept of an end-to-end neural network for depth and steering estimation in the desert. To the best of the authors’ knowledge, this work is the first to investigate and argue the implications of utilizing a Transformer based backbone for off-road depth estimation using an event camera. The implementation of the above-mentioned deep learning system, using event camera is leveraged through the necessary data processing techniques of the events prior to the training phase. During inference, an acceleration method, namely

Event Vision for Autonomous Off-Road Navigation

241

Binary Convolutions, is implemented and initial results hold promising potential for the future development of real-time projects with event cameras. The remaining sections of the paper are as the following, Sect. 2, present related work in terms of off-road navigation and neuromorphic vision as well as the use of event cameras and their feasibility. Section 3 discusses event-based vision navigation. Section 4 shows the the proposed end-to-end deep learning navigation model including event processing, depth estimation, steering prediction and the data set. Section 5 discusses the implementation of the system and its possible acceleration using binary convolution method the integration. In addition, energy efficient processing using memristive technology based neuromorphic infrastructure is proposed in the case study. Results and discussion are presented in Sect. 6, to show the obtained results and the main achievements. Finally, conclusions and future works are drowned in Sect. 7.

2 Related Work 2.1 Off-Road Navigation Given a desert setting, traditional navigation techniques may not be directly, or even completely compatible with the setting. Considerable modifications are required when navigating on off-road environments in terms of mechanical and algorithmic design when approaching this issue. In such environments, it is expected to be exposed to high temperatures, visual distortions due to dust, and instability when driving on uneven terrains [7, 8]. Due to these challenges, the risk involved for a human operator is increased significantly and it is often the case that some sort of enhancement is needed to existing systems [9]. Specifically, in night-time transportation, convoys may be more inclined to halt the journey to minimize such risks; results in profit loss for businesses or delays in critical missions [10–12]. It is crucial to devise a strategy that may be utilized around-the-clock and being standalone as well to allow for system integration flexibility with multiple platform types. Multiple paradigms are used to navigate autonomously as mentioned in the section previously, for example, the well established vision plus ranging sensor configuration (Camera and LiDAR) [9]. Unfortunately, using such a configuration in harsh and unpredictable environments is not reliable due to the degradation of LiDAR in performance due to heat and diffraction by sand particles [13, 14]. The main challenges faced would be to design a stronger filter to reconstruct the noisy LiDAR data due to the hot temperatures, and the diffraction and refraction from the laser and the sand particles [15, 16]. Although there has been a study that managed to overcome this issue by employing a high-power LiDAR [17], such a solution is not preferable. It is not viable in this scenario as adding a high-power variant will hinder performance when operating high temperatures. Subsequently, the complexity of integration is also increased as larger power supplies and cooling solutions must be integrated on the platform. Since

242

H. AlRemeithi et al.

the project proposes a standalone solution, using a monocular setup is preferable, to avoid additional peripherals. Software-wise, with extra sensors the data fusion complexity is increased because of the uneven terrain [18–20]. This research addresses the challenges faced in a dynamic desert environment when utilizing a purely vision-based strategy by introducing Event cameras to the system architecture [21]. Using Event cameras will allow us to reduce the computational requirements of the system and speed up processing [22]. By nature of its design, these cameras record change in light intensities asynchronously within a scene rather than synchronously recording current intensity values of a pixel within a frame [23, 24]. There are two main advantages of this feature, one is the reduced power consumption which yields a better thermal profile in hot environments, and in-turn implies better long-term system performance. More importantly, since only changes in light intensities are recorded, i.e. only moving objects are seen and static information is discarded inherently, in practice it is reflected as a reduction in system latency; it is expected to have an internal latency of almost 1 million times faster than traditional cameras [23, 25, 26]. To develop the desired system, a certain due diligence is required, and hence this section is dedicated towards literature review of state-of-the art techniques in autonomous driving, specifically for off-road settings. The rest of the chapter shall discuss the next topics: a modest introduction to Event-based vision sensors, algorithmic compatibility of event streams with traditional computing paradigms, possible improvements for high speed vision in robotics, state-of-the-art depth estimation techniques, and finally off-road steering.

2.2 Neuromorphic Vision Recent advances in the last couple of decades in neuromorphic computing have manifested in the form of event cameras. Event cameras are bio-inspired vision sensors whose purpose is to tackle the increased demands of dynamic environments in robotic application [27–29]. These sensors are capable of capturing the change in light intensities per pixel and record them asynchronously rather than continuously record frames. The operational concept is illustrated in Fig. 1 with such a neuromorphic sensor as presented in Fig. 2. Frames display the rotating disk, while events are in red and blue; positive and negative events to highlight change in position. Event cameras have also demonstrated to have robustness against environmental luminosity variations due to high dynamic range (HDR) of approximately 140 dB, in comparison to 60 dB in traditional setups [24, 29, 31], (see Fig. 3). Additionally, this also implies better feature detection since robustness against luminosity changes also indicate the same against motion blur. Low light performance is also superior to traditional frame-based cameras as the HDR allows for more reception from moonlight and daylight. This is evident in a study by [32–34] where an Event camera was used for high speed (10 µs) variant-light feature reconstruction.

Event Vision for Autonomous Off-Road Navigation

243

Fig. 1 Operation of frame camera versus event camera [30] Fig. 2 Simplified Neuromorphic sensor architecture [22]

As mentioned previously, since only non-static artifacts are recorded, this means that operationally the discarded pixels indirectly conserves power. This is seen in the power consumption of a regular Event camera being approximately 100–140 mW in an active state; four orders of magnitude less than a frame-based camera [35]. In summary, the advantages are, as listed by [22]: • Minimal Internal Latency and High Temporal Resolution • High Dynamic Range Analog Sensor • Reduced Power Consumption and Heat Dissipation.

244

H. AlRemeithi et al.

Fig. 3 Frame reconstruction under different illuminations [33]

2.3 Key Features of Event Cameras Event cameras output asynchronous sparse data according to log intensity differences through time [26]. Expanding on the advantages listed earlier in the chapter, this section elaborates further. The ms-scale latency and high dynamic range is a benefit towards robotics applications especially in navigation. The available readout data from the sensor is provided in Address-Event Representation, which was first introduced in [36]. Since the data is asynchronous unlike frames, it must be first interpreted in in terms of timestamps to correlate the event with a pixel counterpart from frames or with another event from the same stream for further processing. Techniques are discussed further below.

2.4 Data Registration Event Processing is capable of providing frame-based vision performance with the advantages mentioned in the prior section. Natively, events are not compatible with modern techniques, but by adding the necessary pre-processing steps it is possible to leverage existing state-of-the-art computer vision algorithms, such as depth estimation [37] becomes realizable. In order to fully utilize the techniques, event streams

Event Vision for Autonomous Off-Road Navigation

245

must first be interpreted as frames. Firstly, it must be noted that an event packet from a neuromorphic camera contains the following data e( p, x, y, ts ). This form of data is referred to as Address Event Representation (AER) [25, 26, 38]. In the AER packet, p is the polarity of the event, which implies direction, shows old and new position as seen in Fig. 1. Pixel position is represented by the pair (x, y), and ts indicates the timestamp the event was recorded; when an external trigger was visually detected by the silicon retina. The required pre-processing of AER streams consists of integrating the accumulation of packets and overlaying them after passing through an integration block [27]. By providing raw frames image enhancement is achieved because of the inherited characteristics from the event camera. Visually represented as. One drawback of using the approach in Fig. 4 is the saturation of the frame over time when no motion is present. It is expected to encounter this problem if no raw frames are provided; generated frames are inferred from events only. On a continuously moving platform this might not pose a serious issue, but if applied to surveillance applications, the added redundancy of having a dedicated frame sensor can be beneficial. Another study [39] demonstrated this concepts by using an Event camera unit that houses both a frame and an event sensor (Fig. 5).

Fig. 4 Event-To-Frame block diagram

Fig. 5 Event-To-Frame results comparison [27]

246

H. AlRemeithi et al.

2.5 Feature Detection Feature Engineering has been a well-established topic of research for data gathered using the traditional frame-based camera. Using a similar approach, object recognition on event data can be accomplished through feature detection, followed by classification. For event data, there are a few feature detection techniques. Corner features, edges, and lines are the most commonly used features. This section describes studies that detect corners from event data but were not further investigated for the use of object recognition. However, characteristics around detected corners could be retrieved and given into a classifier for further classification.The stance of using an event-based camera may be argued by addressing the issues commonly faced when operating a regular camera. In high-speed vision, it is seldom the case that ideal lighting situations and no motion blur exists. As a result, degraded features are registered onto the receiving sensor, which in turn reduces whatever strategy is being employed. Reconstruction techniques have been developed like in [8, 40, 41] to de-blur images and generate high-frame rate video sequences. The method discussed in [40] presents an efficient optimization process with rivals state-of-the-art in terms of high-speed reconstruction under varying lighting conditions and dynamic scenes. The approach is described as an Event-based Double Integral model, the mathematical formulation and derivation is left as an exercise to the reader to be referred to from the original material; the results are shown in Fig. 6. It was also noted in [8] that reconstructing a frame-based video output purely from events is not feasible, if the objective is to achieve the same temporal resolution. This factor restricts the output video to be only

Fig. 6 De-blurring results comparison [40]

Event Vision for Autonomous Off-Road Navigation

247

Fig. 7 Event-only feature tracking

as fast as the physical limitations brought by the frame-based counterpart. This conclusion has been had for some time where other research published challenged the perspective and suggests the nullification of frame-based cameras completely in such approaches. For instance in [42] the feature tracking technique defined was independent of actual frames, but rather on reconstructed logarithmic intensities from events as depicted in Fig. 7. The authors also argue that this approach retains the HighDynamic Range aspect and is favorable than using the original frames. The tested case scenario was in uneven lighting settings or objects with conceived motion blur, where the reconstructed frames from events were resistant against these degradation factors, unlike the original frames.

2.6 Algorithmic Compatibility Since Event cameras are a recent commercial technology, often researchers investigate the applicability of legacy computer vision techniques and their effectiveness with neuromorphic sensors. Firstly, it is crucial to have a distinct pipeline to calibrate a vision sensor prior to any algorithm deployment. One method shown in [43] tackles this by having flashing patterns in specific intervals which define sharp features within the video stream. Although the screen is not moving, due to the flashing, the intensity value is changing with time in that position, which emulates the concept of motion within a frame, as such, calibration patterns are recorded. Moreover, other techniques like in [44] surfaced which use a deep learning model to achieve a generic event camera calibration system. The paper shows that neural-network-based image reconstruction is ideally suited for the task of intrinsic and extrinsic calibration of event cameras, rather than depending on blinking patterns or external screens like in [43]. The benefit of the suggested method is that it allows to employ conventional calibration patterns that do not require active lighting. Furthermore, the technique enables extrinsic calibration between frame-based and event-based sensors without adding complexity. Both simulation and real-world investigations from the paper show that picture reconstruction calibration is accurate under typical distortion models and a wide range of distortion factors (Fig. 8).

248

H. AlRemeithi et al.

Fig. 8 Learning-based calibration pipeline

3 Event-Based Vision Navigation In this section, relevant techniques used for autonomous navigation are discussed. Challenges related to said techniques will also be addressed with respect to the case scenario presented by desert environments. Currently, state-of-the-art approaches utilize a combination of Stereoscopic vision setups, LiDAR, and single-view cameras around the vehicle for enhanced situational awareness (as seen in Tesla vehicles). However, for this study, research has been limited to front-view for driving assistance or full autonomy. As a result, the discussed methods include the well established stereo-vision configurations, with the optional addition of a LiDAR, or in extreme cases, a monocular setup is used. Additionally, is not much reliable research conducted regarding steering using event data, apart from [45, 46]. The presented results in these studies mainly address a driving scenario of that similar to inter-city navigation, which may not be fruitful when applied onto an off-road environment, and so, further investigations such as this chapter must be conducted. Researchers in [47] have also proposed an event-frame driving dataset for end-to-end neural networks. This work shall also be extended to accommodate other environments like those in the Middle East, specifically, the United Arab Emirates (Fig. 9).

Event Vision for Autonomous Off-Road Navigation

249

Fig. 9 Driving sample for day/night on-road navigation [47]

3.1 Vision and Ranging Sensor Fusion Different feature extraction methods are discussed in the literature that handles 3D space generation from 3D representations [48]. The algorithms mentioned opt for LiDAR-camera data fusion to generate dense point-clouds for geometric and depthcompletion purposes. The images are first processed from the camera unit and the LiDAR records the distances based on the surrounding obstacles. Algorithms such as RANSAC are also employed during the correlation phase to determine the geometric information of the detected targets. Machine Learning based approaches are also gaining popularity as they can optimize the disparity map when using a stereo configuration alongside a LiDAR, which reduces development complexity when designing an algorithm and running it in real-time [1, 48, 49]. The purpose of this study is to investigate reliable techniques and address viable real-time options for offroad navigations, as such, the previous factors are taken into consideration. Environmental challenges are also addressed in [15–17]. They highlight the issues of using a ranging sensor, a LiDAR, in adverse environments which may yield to sub-optimal data readings. Due to the LiDAR essentially being a laser, it is expected for performance degradation to be observed if the mode of operation is in harsh off-road setting, such as seen in the Arab regions. A study discussing a similar attenuation profile to the sand particles seen in the desert is reported next (Fig. 10).

3.2 Stereo Event Vision An alternative to using a LiDAR and Camera combination is by having a stereo setup instead. It is well established that depth may be determined by generating a disparity map between two images from the same scene being observed at different perspectives. Given the homography matrix and intrinsic camera parameters, depth can be estimated using regular cameras. The same can be seen by using Event cameras

250

H. AlRemeithi et al.

Fig. 10 Laser attenuation during Sandstorms [13]

Fig. 11 Event-based stereo configuration [50]

in a stereo configuration. Like its frame-based counterpart, this configuration has applicability towards Depth Estimation, Visual Odometry (extended to SLAM), and because of the Event stream, high-speed semi-dense reconstruction is possible [50]. For the purposes of off-road navigation the disparity maps can be used for slope detection of incoming dunes, the detailed strategy of the proposed method will be discussed in the following sections (Fig. 11).

3.3 Monocular Depth Estimation from Events The data fusion difficulties is apparent as there has been a shift in leading institute’s research, such as DARPA [51] and NASA [52] opting for purely vision based strategies. Avoiding such complexities allows for simpler hardware integration and simpler software design with improved real-time performance as well. Also, since monocular setups require less setup and calibration than stereo configurations, it is possible to capitalize further on these optimizations. The reliability of monocular

Event Vision for Autonomous Off-Road Navigation

251

Fig. 12 Traversability analysis during DARPA challenge [51]

Fig. 13 Event mono-depth network overview [53]

vision in regular frame cameras has been tested early on in the literature to push the limits of autonomous systems in the context of off-road terrain analysis [51]. The terrain traversability tests have been conducted in the DARPA challenge involving an extensive 132 mile test in 7 h in off-road environments (Fig. 12). Researchers have also proposed deep neural network capable of combining both modalities and achieving better results than the state-of-the-art (∼30%) for monocular depth estimation when tested on the KITTI dataset. The developed method in [39, 53] presents a network architecture that generates voxel grids from recurrent neural networks that yield logarithmic depth estimations of the terrain. The model may be simply reproduced in future work by implementing and encoder-decoder based model with residual networks as a backbone, or Vision Transformers as well may be investigated due to their recent proven reliability [54]. From [53], we see that although contrast information is not provided about the scene, meaningful representations of events are recorded which is evident in the depth estimation from the following (Figs. 13 and 14).

252

H. AlRemeithi et al.

Fig. 14 Qualitative analysis of night-time depth estimation (On-road)

4 Proposed End-to-End Navigation Model From the previous findings and the discussed literature, we can theorize a possible solution to address the gaps. The main contribution is towards a Realizable Eventbased Deep Learning Neural Network for Autonomous Off-road Navigation. The system can be designed using well-established frameworks in the community, like PyTorch, to ensure stability during development. Moreover, the hardware implementation may be done on a CUDA enabled single-board computer. This is done to further enhance the real-time performance by the on-board GPU parallelism capabilities [55]. The main task is to optimize steering angle of mobile platform during off-road navigation by the assistance of a vision sensor. In principle, the strategy shall incorporate Depth/Height estimation of terrain and possible obstacles using events and frame fusion. The system shall contain a pre-processor block to filter noise and enhance fused data before injecting them into the deep learning pipeline. In addition, as seen in [39, 53], for meaningful latent space representation, Encoder-Decoder based architectures are employed. Some examples to reduce unwanted artifacts and recreate the scene from event-frame fusion are Variational Auto-Encoders, Adversarial Auto-Encoders, and UNet Architectures. By including relevant loss functions and optimizers to generate viable depth maps, the yielded results are then taken to the

Event Vision for Autonomous Off-Road Navigation

253

Fig. 15 Proposed end-to-end deep neural network

steering optimizer block. At this stage, the depth maps are analyzed using modern computer vision technique or inferred from a Neural Network as shown in [45]. The two main branches in the model, adapted from the aforementioned literature, are the Depth Estimation branch and the Steering branch (Fig. 15). The end-to-end model is derived from two independent models developed by the department of informatics in the University of Zurich. The main contribution in our concept is to combine the federated systems into one trainable model and adjust the asynchronous encoder-decoder based embedding backbone into a more computationally efficient lightweight variant suitable for deployment under restricted hardware; further details discussed in Sect. 6 of the chapter.

4.1 Event Preprocessing Due to the design of the sensors, they are particularly susceptible to the Background Activity noise caused by cyclic noise and circuit leakage currents. Since background activity rises when there is less light or when the sensitivity is increased, a filter is frequently required for a variety of applications. A noise filter can be helpful in certain situations for eliminating real events that are caused by slight changes in the light and maintaining a greater delineation between the mobile target and the surroundings. For this study, the Neuromorphic camera tool from the Swiss-based startup iniVation, DV, is used for prototyping and selection of denoising algorithm. From the literature, two prevalent noise removal techniques are used, coined as knoise [56] and ynoise [57]. The former algorithm is depicted to have O(N) memory complexity background removal capability. The proposed method is preferred for memory sensitive tasks where near sensor implementations and harsh energy and memory constraints are imposed. The method stores recovered events from the stream as long as they are unique per row and column within a specific timestamp. Doing so, minimizes the memory utilization of the on-board processor, and the reported error

254

H. AlRemeithi et al.

Fig. 16 Qualitative comparison of noise removal algorithms

rates were tens of magnitudes better than previous spatiotemporal filter designs. The main use-case for such a filter would be for mobile platforms with limited inmemory computing resources, such as off-road navigation platforms. The latter, ynoise, presents a two-stage filtering solution. The method discards background activity based on the duration of events within a spatiotemporal window around a hot-pixel. Results of both knoise, ynoise and a generic noise removal algorithm from iniVation is shown next (Fig. 16).

4.2 Depth Estimation Branch The depth estimation branch is adopted from [39] where the network architecture is RAMNet. RAMNet is a Recurrent Asynchronous Multimodal Neural Network which serves as a generalized variant of RNNs that can handle asynchronous datastreams depending on sensor-specific learnable encoding parameters [39]. The architecture is a fully convolutional encoder-decoder architecture based on U-Net. The architecture of the depth estimation branch is given by (Fig. 17).

Event Vision for Autonomous Off-Road Navigation

255

Fig. 17 Depth estimation branch

4.3 Steering Prediction Branch It was demonstrated in [45] how a deep learning model can take advantage of the event stream from a moving platform to determine the steering angles. The steering branch for this study is based on the previous study, where subtle motion cues during desert navigation will be fed into the model to learn the necessary steering behaviour in harsh terrain. For simplicity, this steering algorithm will not be adjusted to take into consideration slip, but rather only the planar geometry of the terrain to avoid collision with hills and overturning the vehicle. To implement the discussed approach from the reference material, the events are dispatched into an accumulator first to obtain the frames which will be used in a regression task to determine an appropriate angular motion. As a backbone, ResNet models are to be investigated as a baseline for the full model (Fig. 18).

256

H. AlRemeithi et al.

Fig. 18 Steering regression branch

Fig. 19 Sample structure—MADMAX dataset

4.4 Desert Driving Dataset To the best of the author’s knowledge, there is yet to be a purely ground-based monocular vision driving dataset for desert navigation in day/night-time. One of the contributions of this study is to produce a UAE specific Driving Dataset. Another off-road driving dataset which was generated from the Moroccan Deserts is used as reference throughout the investigation. Original details pertaining to the dataset organization is in the paper “The MADMAX data set for visual-inertial rover navigation on Mars” [58] (Fig. 19).

Event Vision for Autonomous Off-Road Navigation

257

Fig. 20 NVIDIA xavier AGX peripherals

5 System Implementation The proposed embedded system for processing and real-time deployment is the NVIDIA Xavier AGX. From the official datasheet, The developer kit consists of 512-core Volta GPU with Tensor Cores, 32GB memory, and 8-core ARM v8.2 64bit CPU on an Linux-based distribution. NVIDIA Volta also allows for various data operations which gives flexibility during the reduction of convolution operations [55, 59]. Moreover, it has a moderate power requirement of 30−65 W while delivering desktop-grade performance in a small form-factor. Consequently, the NVIDIA Xavier is a suitable candidate for the deployment of a standalone real-time system (Fig. 20). For a more extreme approach it is also possible to fully implement the proposed network on an FPGA board as demonstrated by [60–62]. Since the design of an FPGA framework is not within the scope of the project, as a proof of concept to motivate further research, this chapter will detail the implementation of convolution optimization techniques in simplistic networks on the PYNQ-Z1. A key feature is it allows for ease of implementation and rapid prototyping because of the Pythonenabled development environment [60]. System features are highlighted in the next figure (Fig. 21).

258

H. AlRemeithi et al.

Fig. 21 PYNQ-Z1 key technologies and environment breakdown

5.1 Deep Learning Acceleration When developing an autonomous driving platform it is often the case that tradeoffs between performance and system complexity is taken into consideration [9]. Efforts to reduce the challenges are seen by systems discussed in the previous sections transitioning towards a singular sensor approach. The solutions are bio-inspired, purely linked to vision and neuromorphic computing paradigms as they tend to offer significant performance boost [22, 63, 64]. Considerable studies have been published in the Deep Learning field aiming towards less computationally expensive inference setups, mainly through CUDA optimizations [55, 59] and FPGA Hardware Acceleration [60]. The optimizations mentioned in the previous work take advantage of the primary used hardware in autonomous system deployment, NVIDIA Development boards. The enhancements are done through improving the pipeline through CUDA programming and minimizing the needed operations to perform a convolution operation. In the discussed methods one achieved comparable performance to regular convolution by converting floating-point operations into binary operations. While the other reduces the non-trivial elements by averaging the pixels around a point-of-interest and discarding the surrounding neighbours [59]; coined as Perforated Convolution (Fig. 22).

Event Vision for Autonomous Off-Road Navigation

259

Fig. 22 Local Binary Convolution Layer (LBC) [55]

5.2 Memristive Neuromorphic Computing When addressing the hardware implementation of deep learning systems often Neuromorphic computing platforms are mentioned. In the context of Event-based vision, neuromorphic architectures are recently being favored over traditional von Neumann architectures. This is due to comparative improvements in metrics such as computational speed up and reduced power consumption [65, 66]. Studies show that developing neuromorphic systems for mobile robotics, such as employing Spiking Neural Networks (SNN), will yield faster and less computationally intensive pipelines with reduced power consumption [67, 68]. SNNs are defined as a neuromorphic architecture which provides benefits such as improved parallelism due to neurons firing asynchronously and possibly at the same time. The aforementioned is realizable because by design, neuromorphic systems combine the functionality of processing and memory within the same subsystem that carries out tasks by the design of the artificial network rather than a set of algorithms as defined in von Neumann architectures [66]. The following figures demonstrate the main differences between both architectures in addition to the working concept of a SNN (Fig. 23).

Fig. 23 Von Neumann and neuromorphic computing architectures

260

H. AlRemeithi et al.

Fig. 24 SNN operation

In theory, SNNs, in contrast to other forms of artificial neural networks, such as multilayer perceptrons, the function of the network’s neurons and synapses is more closely modeled after biological systems in spike neural networks. The most important distinction between standard artificial neural networks and SNNs is that the operation of SNNs takes timing into consideration [66] (Fig. 24). Recently, implementations of neuromorphic hardware have been deployed for robotics applications as seen in [69–71]. Applications included optical flow and autonomous navigation of both aerial and ground vehicles. Additionally, they also demonstrated power-efficient and computationally modest approaches on neuromorphic hardware, which is related to the current offroad study in this chapter. The previously mentioned studies have proven the concepts discussed in earlier neuromorphic literature, and extending on that, this offroad study aims to build on that knowledge by possibly combining SNN like architectures in conjunctions with Event-based vision sensors. Furthermore, for a hardware friendly architecture and efficient processing , it has also been researched in the literature that memristor based implementations are developed for more reductions in power consumption [72]. In addition to the traditional CMOS for signal control, compatible sub-blocs of nanocomposite materials known as resistive memory, also known as memristive devices due to their ability to store information in a nonvolatile manner with high density integration [73, 74] is considered as processing unit. The later is ideally suited for the development of in-memory computing engines for computer vision application [75–77] and efficient substrates for spiking neural networks. The previous claim is explained by the fact that crossbar-configured memristive devices can imitate synaptic efficacy and plasticity [78, 79]. Synaptic efficacy is the creation of a synaptic output depending on incoming neural activity. This may be measured using Ohm’s law by measuring the device’s current when a read voltage signal is supplied. Synaptic plasticity is the synapse’s capacity to adjust its weight during learning. By applying write pulses along the crossbar’s wires, the crossbar design may execute synaptic plasticity in a parallel and efficient manner. The mentioned architecture is seen next (Fig. 25).

Event Vision for Autonomous Off-Road Navigation

261

Fig. 25 Memristor based architecture

6 Results and Discussion We take the two available networks, mainly what we are proposing is to change the encoder-decoder CNN backbone into a purely attention-based model, Vision Transformer, and the usage of Binary convolution layers for the steering regression. For the transformer, we argue that there is an apparent performance boost to the real-time, and hardware implementation. The advantages include reduced power consumption and performance time. This is seen through a reduced amount of parameters in the Vision Transformer model, which implies less processing, and in-terms of the hardware implementation of an accelerator, less arithmetic operations. Less physical computations is also preferable as it yield less power consumption for power constrained platforms, such as ones seen in a desert environment which may have reduced computing capabilities to deal with extreme heat and other adverse weather conditions. Our proposed solution is to change the architecture seen in the Fig. 17, by the transformer model (Fig. 26). Studies have proven similar concepts in different domains, and what this chapter aims to contribute towards is a first step towards efficient means of depth estimation in desert navigation from the above rationale and the following results [81].

262

H. AlRemeithi et al.

Fig. 26 Proposed event transformer model for the system’s integration [80]

Architecture

Speed (Frames/sec) ↑ Energy (Joules/Coulomb) ↓

Estimate

CNN-backbone

Depth Intrinsics Transformer-backbone Depth Intrinsics

84.132 97.498 40.215 60.190

3.206 2.908 5.999 4.021

In terms of hardware implementation, using a Vision Transformer improves the power consumption profile which is inferred from less parameters in the trainable model. During implementation this of course translates to less dissipation of heat and power loss due to reduced arithmetic operations conducted by for example the discussed memristor based architecture in the previous section. Method EST M-LSTM MVF-Net ResNet-50 EventNet RG-CNNs EV-VGCNN VMV-GCN Event Transformer [80]

Type Parameters FLOPs Top-1 Accuracy frame 21.38M 4.28G 52.4 frame 21.43M 4.82G 63.1 frame 33.62M 5.62G 59.9 frame 25.61M 3.87G 55.8 voxel 2.81M 0.91G 17.1 voxel 19.46M 0.79G 54.0 voxel 0.84M 0.70G 65.1 voxel 0.86M 1.30G 66.3 tensor 15.87M 0.51G 71.2

The initial results obtained towards a computationally efficient regression scheme done on the PYNQ-Z1. Implementation was conducted to optimize the matrix multiplication operations in the convolution layers for the steering estimation by regressing an angle according to the depth map from the previous branch. The first step is

Event Vision for Autonomous Off-Road Navigation Table 1 Results of SMR with adjusted LBC block Resource SMR Conv2D CPU RAM VRAM Processing time per batch δT

∼40% ∼35% ∼51.25% 1.1477 s 20.5193%

263

BinConv2D ∼40% ∼33% ∼40% 0.9122 s

to recreate the results on GPU, then validate the approach on FPGA. The specific algorithm optimized was first discussed in [82]. The study by Hu et al. demonstrated Single-Mesh Reconstruction (SMR) to construct a 3D model from a single RGB image. The approach depends on consistency between interpolated features and learnt features through regression: • Single-view RGB • Silhouette mask of detected object. The applicability of SMR is useful in the context of autonomous platforms. Potentially, the platform can establish a 3D framework of itself with respect to the detected 3D obstacles (instead of the artifacts mentioned in [82]). Evidently, this can enhance navigation strategies in the absence of additional sensors. The adjustment of the convolution layers were aligned with the methods discussed to create a LBC layer [55]. The purpose of this experiment was to prove the reduction in processing resources required when performing convolution, regardless during training or inference (Table 1). Clearly, we cannot establish a high level of confidence for this software-based acceleration technique without examining a more relevant network and specifically the KITTI dataset for self-driving vehicles, as a standard. The described network in [83] follows a similar approach to SMR, however, applied to pose detection of vehicles. During training, the pipeline is given: • Single-view RGB • Egocentric 3D Vehicle Pose Estimation. The network architecture is given by (Fig. 27). The 2D/3D Intermediate Representation stage is of interest to us as the main objective is to recreate a 3D framework from 2D inputs. Instead of regular convolution, implementing LBC block yields the next results (Table 2). From the results shared above, it is seen that the approach is viable, but it reduces in potency and the dataset complexity increases. In the first dataset, singular artifacts were provided without any background or ambiguous features. But in the KITTI dataset, the vehicles exist in a larger ecosystem which may influence incorrect results, and thus requires the network more time to determine clear boundaries from the background before transforming the 2D information into 3D representation. Furthermore,

264

H. AlRemeithi et al.

Fig. 27 EgoNet architecture—dotted outline is the required region of optimization Table 2 Results of EgoNet with adjusted LBC block Resource EgoNet Conv2D CPU RAM VRAM Processing time per batch δT

∼43% ∼32% ∼65% 0.8554 s 11.0708%

BinConv2D ∼42% ∼30% ∼61% 0.7607

Fig. 28 PYNQ-Z1 LBC implementation results

we have successfully deployed an implementation of a binary convolution block done on the PYNQ-Z1 to test the improved performance on FPGA hardware. The FPGA specific metrics of our implementation are shown below (Fig. 28).

7 Conclusions The implications of our experiments extend beyond the development of autonomous robotics, but it also tackles another problem which is the financial aspect of deploying such systems. Firstly, for the case study addressed, desert navigation has yet to achieve

Event Vision for Autonomous Off-Road Navigation

265

autonomy due to the terrain analyses complexities, ranging from depth, steering, and slip estimation. The computational requirements for these approaches to be done in real-time are often computationally expensive, but from the proposed approach, we believe our system to be the first attempt towards efficient real-time computing in constrained settings for off-road navigation. Secondly, the pipeline proposed may be implemented for other systems, such as unmanned aerial platforms which tend to be deployed for search-and-rescue missions and are seldom to have sufficient on-board computing resources.This chapter served as a modest introduction towards Event-based camera systems within the context of off-road navigation. The chapter began by establishing the foundations behind the Neuromorphic sensor hardware driving the camera. Moving on to the data processing aspect and the applicability of traditional techniques. The knowledge-base was assessed to determine whether the traditional techniques were indeed viable in this novel sensor. Furthermore, implementations of a Deep Learning system utilizing event camera is also possible through the necessary data processing techniques of the events prior to the training phase. During inference, a possible implementation of an acceleration method, namely Binary Convolutions, were implemented and initial results hold promising potential for the future development of real-time projects with event cameras. Future work is still necessary, specifically when addressing the data collection aspect within the UAE environment. However, to summarize, the proposed Event Transformer-Binary CNN (EvT-BCNN) concept in this chapter is a first attempt towards the deployments of memristive-based systems and neuromorphic vision sensors as computing-efficient alternatives to classical vision systems. Acknowledgements This project is funded by Tawazun Technology & Innovation (TTI), under Tawazun Economic Council, through the collaboration with Khalifa University. The work shared is part of a MSc Thesis project by Hamad AlRemeithi, and all equipment is provided by TTI. Professional expertise is also a shared responsibility between both entities, and the authors extend their deepest gratitude for the opportunity to encourage research in this field.

References 1. Badue, C., Guidolini, R., Carneiro, R. V., Azevedo, P., Cardoso, V. B., Forechi, A., Jesus, L., Berriel, R., Paixão, T. M., Mutz, F., de Paula Veronese, L., Oliveira-Santos, T., & De Souza, A. F. (2021). Self-driving cars: A survey. Expert Systems with Applications, 165. 2. Ni, J., Chen, Y., Chen, Y., Zhu, J., Ali, D., & Cao, W. (2020) A survey on theories and applications for self-driving cars based on deep learning methods. Applied Sciences (Switzerland), 10. 3. Chen, G., Cao, H., Conradt, J., Tang, H., Rohrbein, F., & Knoll, A. (2020). Event-based neuromorphic vision for autonomous driving: A paradigm shift for bio-inspired visual sensing and perception. IEEE Signal Processing Magazine, 37. 4. Lin, M., Yoon, J., & Kim, B. (2020) Self-driving car location estimation based on a particleaided unscented kalman filter. Sensors (Switzerland), 20. 5. Mugunthan, N., Naresh, V. H., & Venkatesh, P. V. (2020). Comparison review on lidar vs camera in autonomous vehicle. In International Research Journal of Engineering and Technology.

266

H. AlRemeithi et al.

6. Ming, Y., Meng, X., Fan, C., & Yu, H. (2021) Deep learning for monocular depth estimation: A review. Neurocomputing, 438. 7. Li, X., Tang, B., Ball, J., Doude, M., & Carruth, D. W. (2019). Rollover-free path planning for off-road autonomous driving. Electronics (Switzerland), 8. 8. Pan, Y., Cheng, C. A., Saigol, K., Lee, K., Yan, X., Theodorou, E. A., & Boots, B. (2020). Imitation learning for agile autonomous driving. International Journal of Robotics Research, 39. 9. Liu, O., Yuan, S., & Li, Z. (2020). A survey on sensor technologies for unmanned ground vehicles. In Proceedings of 2020 3rd International Conference on Unmanned Systems, ICUS 2020. 10. Shin, J., Kwak, D. J., & Kim, J. (2021). Autonomous platooning of multiple ground vehicles in rough terrain. Journal of Field Robotics, 38. 11. Naranjo, J. E., Jiménez, F., Anguita, M., & Rivera, J. L. (2020). Automation kit for dual-mode military unmanned ground vehicle for surveillance missions. IEEE Intelligent Transportation Systems Magazine, 12. 12. Browne, M., Macharis, C., Sanchez-diaz, I., Brolinson, M., & Illsjö, R. (2017). Urban traffic congestion and freight transport : A comparative assessment of three european cities. Interdisciplinary Conference on Production Logistics and Traffic. 13. Zhong, H., Zhou, J., Du, Z., & Xie, L. (2018). A laboratory experimental study on laser attenuations by dust/sand storms. Journal of Aerosol Science, 121. 14. Koepke, P., Gasteiger, J., & Hess, M. (2015). Technical note: Optical properties of desert aerosol with non-spherical mineral particles: Data incorporated to opac. Atmospheric Chemistry and Physics Discussions, 15, 3995–4023. 15. Raja, A. R., Kagalwala, Q. J., Landolsi, T., & El-Tarhuni, M. (2007). Free-space optics channel characterization under uae weather conditions. In ICSPC 2007 Proceedings - 2007 IEEE International Conference on Signal Processing and Communications. 16. Vargasrivero, J. R., Gerbich, T., Buschardt, B., & Chen, J. (2021). The effect of spray water on an automotive lidar sensor: A real-time simulation study. IEEE Transactions on Intelligent Vehicles. 17. Strawbridge, K. B., Travis, M. S., Firanski, B. J., Brook, J. R., Staebler, R., & Leblanc, T. (2018). A fully autonomous ozone, aerosol and nighttime water vapor lidar: A synergistic approach to profiling the atmosphere in the canadian oil sands region. Atmospheric Measurement Techniques, 11. 18. Hummel, B., Kammel, S., Dang, T., Duchow, C., & Stiller, C. (2006). Vision-based pathplanning in unstructured environments. In IEEE Intelligent Vehicles Symposium, Proceedings. 19. Mueller, G. R., & Wuensche, H. J. (2018). Continuous stereo camera calibration in urban scenarios. In IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, 2018March. 20. Rankin, A. L., Huertas, A., & Matthies, L. H. (2009). Stereo-vision-based terrain mapping for off-road autonomous navigation. Unmanned Systems Technology X, I, 7332. 21. Litzenberger, M., Belbachir, A. N., Donath, N., Gritsch, G., Garn, H., Kohn, B., Posch, C., & Schraml, S. (2006). Estimation of vehicle speed based on asynchronous data from a silicon retina optical sensor. In IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC. 22. Gallego, G., Delbruck, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger, S., Davison, A. J., Conradt, J., Daniilidis, K., & Scaramuzza, D. (2020). Event-based vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44. 23. Delbrück, T., Linares-Barranco, B., Culurciello, E., & Posch, C. (2010). Activity-driven, eventbased vision sensors. In ISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems. 24. Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2021). High speed and high dynamic range video with an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43.

Event Vision for Autonomous Off-Road Navigation

267

25. Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128× 128 120 db 15 µs latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43, 566–576. 26. Brändli, C., Berner, R., Yang, M., Liu, S.-C., & Delbruck, T. (2014). A 240 × 180 130 db 3 µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49, 2333–2341. 27. Scheerlinck, C., Barnes, N., & Mahony, R. (2019). Continuous-time intensity estimation using event cameras. Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), 11365 LNCS. 28. Gallego, G., Lund, J. E. A., Mueggler, E., Rebecq, H., Delbruck, T., & Scaramuzza, D. (2018). Event-based, 6-dof camera tracking from photometric depth maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40. 29. Mostafavi, M., Wang, L., & Yoon, K. J. (2021). Learning to reconstruct hdr images from events, with applications to depth and flow prediction. International Journal of Computer Vision, 129. 30. Mueggler, E., Huber, B., & Scaramuzza, D. (2014). Event-based, 6-dof pose tracking for highspeed maneuvers. 31. Posch, C., Matolin, D., & Wohlgenannt, R. (2011). A qvga 143 db dynamic range frame-free pwm image sensor with lossless pixel-level video compression and time-domain cds. IEEE Journal of Solid-State Circuits, 46. 32. Lee, S., Kim, H., & Kim, H. J. (2020). Edge detection for event cameras using intra-pixel-area events. In 30th British Machine Vision Conference 2019, BMVC 2019. 33. Rebecq, H., Ranftl, R., Koltun, V., & Scaramuzza, D. (2019). Events-to-video: Bringing modern computer vision to event cameras. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June. 34. Xu, H., Gao, Y., Yu, F., & Darrell, T. (2017). End-to-end learning of driving models from large-scale video datasets. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January. 35. Xu, H., Gao, Y., Yu, F., & Darrell, T. (2017). End-to-end learning of driving models from large-scale video datasets. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January. 36. Boahen, K. A. (2004). A burst-mode word-serial address-event link - i: Transmitter design (p. 51). IEEE Transactions on Circuits and Systems I: Regular Papers. 37. Wang, C., Buenaposada, J. M., Zhu, R., & Lucey, S. (2018). Learning depth from monocular videos using direct methods. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 38. Guo, S., Kang, Z., Wang, L., Zhang, L., Chen, X., Li, S., & Xu, W. (2020). A noise filter for dynamic vision sensors using self-adjusting threshold. 39. Gehrig, D., Ruegg, M., Gehrig, M., Hidalgo-Carrio, J., & Scaramuzza, D. (2021). Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction. IEEE Robotics and Automation Letters, 6. 40. Pan, L., Scheerlinck, C., Yu, X., Hartley, R., Liu, M., & Dai, Y. (2019). Bringing a blurry frame alive at high frame-rate with an event camera. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June. 41. Pan, L., Hartley, R., Scheerlinck, C., Liu, M., Yu, X., & Dai, Y. (2022). High frame rate video reconstruction based on an event camera. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44. 42. Gehrig, D., Rebecq, H., Gallego, G., & Scaramuzza, D. (2020). Eklt: Asynchronous photometric feature tracking using events and frames. International Journal of Computer Vision, 128. 43. Saner, D., Wang, O., Heinzle, S., Pritch, Y., Smolic, A., Sorkine-Hornung, A., & Gross, M. (2014). High-speed object tracking using an asynchronous temporal contrast sensor. In 19th International Workshop on Vision, Modeling and Visualization, VMV 2014. 44. Muglikar, M., Gehrig, M., Gehrig, D., & Scaramuzza, D. (2021). How to calibrate your event camera. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

268

H. AlRemeithi et al.

45. Maqueda, A. I., Loquercio, A., Gallego, G., Garcia, N., & Scaramuzza, D. (2018). Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 46. Galluppi, F., Denk, C., Meiner, M. C., Stewart, T. C., Plana, L. A., Eliasmith, C., Furber, S., & Conradt, J. (2014). Event-based neural computing on an autonomous mobile platform. In Proceedings - IEEE International Conference on Robotics and Automation. 47. Hu, Y., Binas, J., Neil, D., Liu, S. C., & Delbruck, T. (2020). Ddd20 end-to-end event camera driving dataset: Fusing frames and events with deep learning for improved steering prediction. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems, ITSC 2020. 48. Zhong, H., Wang, H., Wu, Z., Zhang, C., Zheng, Y., & Tang, T. (2021). A survey of lidar and camera fusion enhancement. Procedia Computer Science, 183. 49. Song, R., Jiang, Z., Li, Y., Shan, Y., & Huang, K. (2018). Calibration of event-based camera and 3d lidar. In 2018 WRC Symposium on Advanced Robotics and Automation, WRC SARA 2018 - Proceeding. 50. Zhou, Y., Gallego, G., & Shen, S. (2021). Event-based stereo visual odometry. IEEE Transactions on Robotics, 37. 51. Dahlkamp, H., Kaehler, A., Stavens, D., Thrun, S., & Bradski, G. (2007). Self-supervised monocular road detection in desert terrain. Robotics: Science and Systems, 2. 52. Bayard, D. S., Conway, D. T., Brockers, R., Delaune, J., Matthies, L., Grip, H. F., Merewether, G., Brown, T., & Martin, A. M. S. (2019). Vision-based navigation for the nasa mars helicopter. AIAA Scitech 2019 Forum. 53. Hidalgo-Carrio, J., Gehrig, D., & Scaramuzza, D. (2020). Learning monocular dense depth from events. In Proceedings - 2020 International Conference on 3D Vision, 3DV 2020. 54. Li, Z., Asif, M. S., & Ma, Z. (2022). Event transformer. 55. Juefei-Xu, F., Boddeti, V. N., & Savvides, M. (2017). Local binary convolutional neural networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January. 56. Khodamoradi, A., & Kastner, R. (2021). O(n)-space spatiotemporal filter for reducing noise in neuromorphic vision sensors. IEEE Transactions on Emerging Topics in Computing, 9. 57. Feng, Y., Lv, H., Liu, H., Zhang, Y., Xiao, Y., & Han, C. (2020). Event density based denoising method for dynamic vision sensor. Applied Sciences (Switzerland), 10. 58. Meyer, L., Smíšek, M., Villacampa, A. F., Maza, L. O., Medina, D., Schuster, M. J., Steidle, F., Vayugundla, M., Müller, M. G., Rebele, B., Wedler, A., & Triebel, R. (2021). The madmax data set for visual-inertial rover navigation on mars. Journal of Field Robotics, 38. 59. Figurnov, M., Ibraimova, A., Vetrov, D., & Kohli, P. (2016). Perforatedcnns: Acceleration through elimination of redundant convolutions. Advances in Neural Information Processing Systems, 29. 60. Salman, A. M., Tulan, A. S., Mohamed, R. Y., Zakhari, M. H., & Mostafa, H. (2020). Comparative study of hardware accelerated convolution neural network on pynq board. In 2nd Novel Intelligent and Leading Emerging Sciences Conference, NILES 2020. 61. Yoshida, Y., Oiwa, R., & Kawahara, T. (2018). Ternary sparse xnor-net for fpga implementation. In Proceedings - 7th International Symposium on Next-Generation Electronics. ISNE, 2018. 62. Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., & Liang, Y. (2019). Req-yolo: A resource-aware, efficient quantization framework for object detection on fpgas. In FPGA 2019 - Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 63. Li, J. N., & Tian, Y. H. (2021). Recent advances in neuromorphic vision sensors: A survey. Jisuanji Xuebao/Chinese Journal of Computers, 44. 64. Chen, G., Cao, H., Aafaque, M., Chen, J., Ye, C., Röhrbein, F., Conradt, J., Chen, K., Bing, Z., Liu, X., Hinz, G., Stechele, W., & Knoll, A. (2018) Neuromorphic vision based multivehicle detection and tracking for intelligent transportation system. Journal of Advanced Transportation, 2018. 65. Gutierrez-Galan, D., Schoepe, T., Dominguez-Morales, J. P., Jiménez-Fernandez, A., Chicca, E., & Linares-Barranco, A. (2020). An event-based digital time difference encoder model implementation for neuromorphic systems.

Event Vision for Autonomous Off-Road Navigation

269

66. Schuman, C. D., Kulkarni, S. R., Parsa, M., Mitchell, J. P., Date, P., & Kay, B. (2022). Opportunities for neuromorphic computing algorithms and applications. Nature Computational Science, 2. 67. Richter, C., Jentzsch, S., Hostettler, R., Garrido, J. A., Ros, E., Knoll, A., et al. (2016). Musculoskeletal robots: Scalability in neural control. IEEE Robotics & Automation Magazine, 23(4), 128–137. 68. Zenke, F., & Gerstner, W. (2014). Limits to high-speed simulations of spiking neural networks using general-purpose computers. Frontiers in Neuroinformatics, 8. 69. Dupeyroux, J., Hagenaars, J. J., Paredes-Vallés, F., & de Croon, G. C. H. E. (2021). Neuromorphic control for optic-flow-based landing of mavs using the loihi processor. In Proceedings IEEE International Conference on Robotics and Automation, 2021-May. 70. Mitchell, J. P., Bruer, G., Dean, M. E., Plank, J. S. Rose, G. S., & Schuman, C. D. (2018). Neon: Neuromorphic control for autonomous robotic navigation. In Proceedings - 2017 IEEE 5th International Symposium on Robotics and Intelligent Sensors, IRIS 2017, 2018-January. 71. Tang, G., Kumar, N., & Michmizos, K. P. (2020). Reinforcement co-learning of deep and spiking neural networks for energy-efficient mapless navigation with neuromorphic hardware. In IEEE International Conference on Intelligent Robots and Systems. 72. Rajendran, B., Sebastian, A., Schmuker, M., Srinivasa, N., & Eleftheriou, E. (2019). Lowpower neuromorphic hardware for signal processing applications: A review of architectural and system-level design approaches. IEEE Signal Processing Magazine, 36. 73. Lahbacha, K., Belgacem, H., Dghais, W., Zayer, F., & Maffucci, A. (2021) High density rram arrays with improved thermal and signal integrity. In 2021 IEEE 25th Workshop on Signal and Power Integrity (SPI) (pp. 1–4). 74. Fakhreddine, Z., Lahbacha, K., Melnikov, A., Belgacem, H., de Magistris, M., Dghais, W., & Maffucci, A. (2021). Signal and thermal integrity analysis of 3-d stacked resistive random access memories. IEEE Transactions on Electron Devices, 68(1), 88–94. 75. Zayer, F., Mohammad, B., Saleh, H., & Gianini, G. (2020). Rram crossbar-based in-memory computation of anisotropic filters for image preprocessingloa. IEEE Access, 8, 127569–127580. 76. Bettayeb, M., Zayer, F., Abunahla, H., Gianini, G., & Mohammad, B. (2022). An efficient in-memory computing architecture for image enhancement in ai applications. IEEE Access, 10, 48229–48241. 77. Ajmi, H., Zayer, F., Fredj, A. H., Hamdi, B., Mohammad, B., Werghi, N., & Dias, J. (2022). Efficient and lightweight in-memory computing architecture for hardware security. arXiv:2205.11895. 78. Zayer, F., Dghais, W., Benabdeladhim, M., & Hamdi, B. (2019). Low power, ultrafast synaptic plasticity in 1r-ferroelectric tunnel memristive structure for spiking neural networks. AEUInternational Journal of Electronics and Communications, 100, 56–65. 79. Zayer, F., Dghais, W., & Belgacem, H. (2019). Modeling framework and comparison of memristive devices and associated stdp learning windows for neuromorphic applications. Journal of Physics D: Applied Physics, 52(39), 393002. 80. Li, Z., Asif, M., & Ma, Z. (2022). Event transformerh. 81. Varma, A., Chawla, H., Zonooz, B., & Arani, E. (2022). Transformers in self-supervised monocular depth estimation with unknown camera intrinsics. 82. Hu, T., Wang, L., Xu, X., Liu, S., & Jia, J. (2021). Self-supervised 3d mesh reconstruction from single images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 83. Li, S., Yan, Z., Li, H., & Cheng, K. T. (2021). Exploring intermediate representation for monocular vehicle pose estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot in the Warehouse Ajay Kumar Sandula , Pradipta Biswas , Arushi Khokhar , and Debasish Ghose

Abstract Robot task scheduling is an increasingly important problem in a multirobot system. The problem turns more complicated when the robots are heterogeneous with complementary capabilities and must work in coordination to accomplish a task. This chapter describes a scenario where a fixed-base and a mobile robot with complementary capabilities accomplish the ‘task’ of moving a package from a pickup point to a shelf in a warehouse environment. We propose a two-fold optimised task scheduling approach. The proposed approach reduces the task completion time based on spatial and temporal constraints of the environment. The approach ensures that the fixed-base robot reaches the mobile robot exactly when it brings the package to the reachable workspace of the robotic arm. This helps us to reduce the waiting time of the mobile robot. The multi-armed bandit (MAB) based stochastic task scheduler considers the history of the tasks to estimate the probabilities of corresponding pickup requests (or tasks). The stochastic MAB scheduler ensures that the mobile robot with higher estimates of probabilities is given top priority. Results demonstrate that a stochastic multi-armed bandit based approach reduces the time taken to complete a set of tasks compared to a deterministic first-come-first-serve approach to solving the scheduling problem. Keywords Task scheduling · Multi-agent coordination · Multi-armed bandit · Heterogeneous robots · Reinforcement learning

A. K. Sandula (B) · P. Biswas · D. Ghose Indian Institute of Science, Bangalore 560012, India e-mail: [email protected] P. Biswas e-mail: [email protected] D. Ghose e-mail: [email protected] A. Khokhar Jaypee University of Information Technology, Waknaghat 173234, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_9

271

272

A. K. Sandula et al.

1 Introduction In the near future, we will experience high penetration of heterogeneous multi-robot systems in industries and daily life. Fragapane et al. [9] explain how material handling technologies have advanced over the decades with innovation in robotics technology. Multi-robot systems (MRS) are employed in warehouses and industries to handle the logistics within an environment. In multi-robot systems, task allocation is the assignment of tasks to a single or a group of agents based on the nature of the task. Task scheduling means the arrangement of the tasks or sub-tasks for execution, depending on the objectives and constraints. The problem of multi-agent pickup and dispatch involves task allocation, task scheduling, multi-agent path planning and control. The objective of task allocation or task scheduling techniques is to allocate or schedule the tasks in a way that optimises a desired cost or objective(time for execution, distance travelled). In the previous decades, several approaches have been used to solve the task allocation problem, such as Centralized, Hybrid and Decentraliszed approaches. Among the decentraliszed approaches, auction- based approaches have been proven to be more efficient. The works by Zlot and Stentz [30], Wang et al. [28], Viguria et al. [26] have explained the advantage of decentralized auction- based multirobot task allocation methods for various scenarios. This chapter focuses on the task scheduling problem. The problem, multi-agent task scheduling, becomes challenging when robots with different capabilities need to work in a coalition. Any schedule of tasks or sub-tasks affects the individual task completion times for each agent. In the literature, Stavridis and Doulgeri [23], Borrell Méndez et al. [1] and Szczepanski et al. [24] investigated task schedule optimization algorithms in specific scenarios where a single robot is executing the tasks. The papers by Wang et al. [27], Ho and Liu [11], Zhang and Parker [29], Kalempa et al. [12] and Kousi et al. [14] describe the multi-agent task scheduling scenarios where robots need to work in coalition to accomplish the tasks. These works have detailed heuristic and rule- based approaches for the task scheduling problem. This chapter investigates possible task scheduling strategies for a warehouse-type scenario, where a fixed -base and a mobile robot must collaborate to accomplish a task. The mobile robots carry a package from a pickup point toward the shelf. Once it reaches the workspace of the fixed-base robot at the shelf, the fixed-base robot pick and places the load from the mobile robot. AWhen multiple mobile robots arrive at the fixed-base robot, a schedule of sub-tasks will need to be generated for the pick and place execution at the shelf when multiple mobile robots arrive at the fixed-base robot. This schedule affects the individual subtask completion time for the mobile robots because of the extra waiting time; that is, the mobile robot has to wait for the pick and place execution until the scheduler chooses the mobile robot. In this chapter, we contributed towards the multi-armed bandit (Slivkins [22]) formulation of a task scheduling problem. Multi-armed bandit is a classical reinforcement learning problem where an agent has to prioritisze among several arms to receive the best reward. The agent learns the reward probability of reward at a given arm by random exploration and exploitation. We proposed a multi-armed bandit-

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

273

based stochastic scheduler that gives priority to the mobile robot with a higher probability estimate. The proposed approach further ensures the coordination between agents (fixed-base and mobile) considering the temporal and spatial constraints. The coordination between the fixed-base and mobile robot coordination is ensured while scheduling the sub-tasks. The sub-tasks are scheduled so that the fixed-base robot moves towards the parking spot of the mobile robot at the same time as the mobile robot reaches the shelf. We organise the chapter as follows. In Sect. 2, we present the literature review. In Sect. 3, we elaborate the problem formulation, along with a detailed explanation of the motion planning algorithms used. In Sect. 4, we propose a multi-armed bandit formulation to organise the sequence of tasks of the robot. We report the results with their analysis and interpretation in Sect. 5. In Sect. 6, we conclude the chapter with a summary of the work presented in the chapter and future work.

2 Related Work Task allocation is the assignment of tasks to the agents. In the context of MRS, multi-robot task allocation (MRTA) was extensively investigated in the literature (Zlot and Stentz [30], Wang et al. [28], Viguria et al. [26] and Tang and Parker [25]). However, the assigned tasks are to be scheduled by the individual agents for the execution. Task scheduling is the arrangement of tasks while execution. Researchers investigated task scheduling for a robotic arm’s pick and place operation for several applications. The work by Stavridis and Doulgeri [23] proposed an online task priority strategy for assembling two parts considering the relative motions of robotic arms while avoiding dynamic obstacles. Borrell Méndez et al. [1] investigated a decision tree model to predict the optimal sequence of tasks for pick and place operation for a dynamic scenario. The application is to assemble pieces which arrive in a tray to manufacture footwear. Szczepanski et al. [24] explored a nature-inspired algorithm for task sequence optimisation considering multiple objectives. Wang et al. [27] investigated the heterogeneous multi-robot task scheduling for a robotic arm by predicting the total time for execution of tasks and minimising the pick and place execution time. However, the picking robot did not choose any priority in the case when multiple mobile robots approached at the same time. Ho and Liu [11] investigated the performance of nine pickup-dispatching rules for task scheduling. The paper by Ho and Liu [11] found that LTIS (Longest Time In System) rule has the worst performance, whilst GQL (Greater Queue Length) has the best performance for the multiple-load pickup and dispatch problem. The station that was not served for a long time will have the top priority in the LTIS rule. On the other hand, the GQL rule gives priority to the station, which has more pickup requests that need to be addressed. However, the study did not investigate the tasks that needed collaboration between heterogeneous robots with complementary abilities. Zhang and Parker [29] explored four heuristic approaches to solve the multi-robot task scheduling in the case where robots needed to work in a coalition to accomplish a

274

A. K. Sandula et al.

task. The proposed methods have tried to schedule the tasks to reduce interference with other tasks. However, the approach did not use the history of the tasks to prioritise the scheduling process. The study by Kalempa et al. [12] reported a robust preemptive task scheduling approach by categorising the tasks as ‘Minor’, ‘Normal’, ‘Major’ and ‘Critical’. The categories are decided based on the number of robots needed to allocate for the task execution and urgency. ‘Minor’ tasks often do not require any robot to perform the job. There are alternative means to accomplish these minor tasks. ‘Normal’ tasks need one robot to finish the task. A task is ’Major’ when two robots are required to complete the job. For the ‘Critical’ tasks, execution should ideally be started as soon as the task is generated. A minimum of three robots are required to accomplish the task. However, the proposed model did not consider the criticality of the tasks within the categories. Kousi et al. [14] investigated a serviceoriented architecture (SOA) for controlling the execution of in-plant logistics. The suggested scheduling algorithm in the architecture is search-based. The scheduler finds all possibilities of alternatives available at the decision horizon and calculates the utility for each of the alternatives. The task sequences with the highest utility is then executed. The scheduler continues to generate the task sequences until the task execution is completed. The utility is calculated by taking the weighted sum of the consequence for the alternatives, considering the criteria such as distance travelled and time for execution. However, the study did not consider robots working in a coalition. To the best of our knowledge, none of the existing works in the current literature has used the tasks’ history to set the task scheduler’s priority. In this chapter, we investigate a multi-armed bandit approach to estimate the probability of a task appearing in the future and using this information; the task scheduler assigns the priority accordingly. In robotics, the Multi-armed bandit (MAB) approach has been utilised, where the robots must learn the preferences of the environment to allocate limited resources among multiple alternatives. Korein and Veloso [13] reported a MAB approach to learning the users’ preferences to schedule the mobile robots during their spare time while servicing the users. Claure et al. [2] suggested a MAB approach with fairness constraints for a robot to distribute resources based on the skill level of humans in a human collaboration task. Dahiya et al. [3] investigated a MAB formulation to allocate limited human operators for multiple semi-autonomous robots. Pini et al. [20] explored a MAB formulation for task partitioning problems in swarm robotics. Task partitioning could be useful in saving resources, less physical interference and increasing efficiency, etc. But, it could also be costly to coordinate among different sub-tasks linked to one another. The paper by Pini et al. [20] proposed a MAB approach to estimate whether a task needed to be partitioned or not. The results are compared with an ad-hoc algorithm given by Ozgul et al. [19], and the suggested approach is shown to outperform the ad-hoc approach. Koval et al. [15] investigated a MAB approach to select the most ‘robust’ trajectory under uncertainty for rearrangement planning problem Durrant-Whyte et al. [5]. Eppner and Brock [6] reported a MAB approach to decide on the best trajectory for a robotic arm to grasp an object exploiting the environment surrounding the object. Krishnasamy et al. [16] proposed a MAB formulation to reduce the queue regret of service by learning the service

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

275

probabilities over time. The learned probabilities help the server choose a service with more probability. In this chapter, we propose a MAB formulation to decide the priority for scheduling pick and place operations for a fixed-base robot (a limited resource) among multiple mobile robots (competing alternatives) carrying the load (Fig. 1).

3 Problem Formulation 3.1 Problem Framework and Description In this subsection, we formulate the problem statement of task scheduling in a heterogeneous multi-robot system. We consider an environment of a warehouse with pickup points and shelves. A pickup point Φi where i ∈ [1, p] ( p is the number of pickup points) is defined as a point in a warehouse where a load-carrying mobile robot picks up the load of type Ti . We define a shelf υ j where j ∈ [1, s] (s is the number of shelves) as a point in the warehouse where the load is delivered. At the shelves, a fixed-base robot picks up the load that is carried by a mobile robot and places it on a shelf. A set η = [η1 , η2 , ..., η|η| ] is defined as the set of mobile robots that are present in the environment, each capable of carrying a specific type of load. We consider |X | as the cardinality of the set X . Similarly, the set  = [γ1 , γ2 , ..., γ|| ] is the set of the fixed-base robots which are needed to work in coalition with mobile robots η to accomplish a task. A task τ [i, j] is said to be accomplished when a mobile robot carries a load of type Ti from pickup point Φi and moves to υ j where a fixed-base robot γ j at the shelf would pick and place the load Ti into the shelf. The task τ [i, j] can be decomposed into the following sub-tasks. (a) Mobile robot moving to the pickup point Φi , (b) the mobile robot picks up the load of type Ti , (c) the mobile robot carries the load towards the shelf υ j , (d) the fixed-base robot moves towards the parking spot of the mobile robot, and (e) the fixed-base robot picks and places the load Ti onto the shelf. We simulate the tasks at every time step using a preset probability matrix defined as P, where P[i, j] is the probability that a task τ [i, j] is going to be generated within the next time step ‘t’. Note that P is a constant two-dimensional matrix used to simulate the task requests. At every time-step ‘t’, a new set of tasks are generated based on the probability matrix P. Hence, we can have multiple requests for a fixedbase robot to execute at any given instant, which is stored in a queue. We define the queue for γ j as Q j , which contains the list of tasks of the type τ [:, j]. Now, we investigate the assignment of a priority among the tasks in Q j , using the previous history of tasks, to reduce the overall task completion time. We use the multi-armed

276

A. K. Sandula et al.

(a) Side-view of the simulation environment

(b) Top-view of the simulation environment (as seen in Rviz)

Fig. 1 Simulation of the environment where η1 , η2 andη3 are mobile robots approaching the fixed base robot γ1 from their respective starting points carrying different load types

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

277 p

bandit approach to schedule the tasks in Q j by generating a priority queue Q j . The tasks are scheduled based on the previous history of tasks and the estimated time of arrival(s) of the mobile robot(s) carrying the load towards the shelf.

3.2 Motion Planning For the motion planning algorithm of the robotic arm, we use the RRT-connect algorithm Kuffner and LaValle [17]. The RRT-connect algorithm is a sampling-based motion planning algorithm. RRT-connect is an extension of the RRT (Randomly exploring Rapid Tree) motion planning algorithm given by LaValle et al. [18], which is probabilistically complete. The RRT algorithm generates a uniformly exploring random tree with every iteration and finds the path if it exists. Initially, we label the entire workspace into free and obstacle spaces, which are assumed to be known. If the state of the robot is not colliding with the obstacles present in the environment, we consider that state belongs to free space and vice-versa obstacle space. The algorithm generates a node (a state in the configuration space) with a bias towards the goal. The node is determined whether it is in the free space or obstacle. From the nearest neighbour, we steer towards the sampled node, determine a new node, and add to the tree, provided that a straight line exists without colliding obstacles from the nearest neighbour to the sampled node. The node to which the newly sample node was attached in the tree is said to be the parent node. The exploration ends when we encounter a sample node lying within a tolerance of the goal connected to the tree. Any node connected to the tree can be reachable from the start node. However, the RRT-connect, on the other hand, explores the environment from both start and goal regions. The two trees will stop exploring the region if any newly sampled node connected to a tree falls within a tolerance of any other node from the other tree. The Fig. 2 illustrates how the random trees from the start (red) and goal (green) approach each other for an increasing number of iterations. Now, the full path from start to goal can be found by approaching the parent nodes from the intersection point of the tress. This algorithm is known to give the quickest solution when the obstacles are not very narrow. In our simulation, we did not have narrow passages to move the robotic arm. Hence RRT-connect would be well suited for executing the pick and place task. We used ROS1 navigation framework (also called move_base) for moving the mobile robot to a given goal location while avoiding obstacles Quigley et al. [21]. The goal of the navigation framework is to localize the robot within the indoor environment map and simultaneously move towards the goal. Fox et al. [7] proposed a probabilistic localization algorithm with great practical success, which places computation where it needs. The mobile robot has a 360-degree laser scan sensor, which gives the distance of the obstacles(in two dimensions only) around the robot, that is, the 2D point cloud. The Adaptive Monte-Carlo Localization algorithm by Fox et al. [7] uses the 2D point cloud data and locates the position and orientation of the robot in an indoor environment. The ROS navigation stack allows the robot to navigate

278

A. K. Sandula et al.

Fig. 2 RRT-connect algorithm illustration

from its current localized position to the goal point. The navigation framework uses a two-level navigation approach by using global and local planning algorithms. The goal of a global planner is to generate a path avoiding the static obstacles in the environment. The goal of the local planner is to move the mobile robot along the planned global path while avoiding dynamic obstacles. The local planner heavily reduces the computational load of replanning the global path for changing environments in the case of dynamic obstacles. A* Hart et al. [10], and Dijkstra [4] are popular graph search algorithms which guarantee the optimal solution if a path exists from one point to another. Dijkstra algorithm is an undirected search algorithm which follows a greedy approach. A* algorithm is a directed search algorithm which uses heuristics to focus the search towards the goal. The Dijkstra and A* algorithms are proven to give the optimal solutions. The A* algorithm takes lesser time to reach the goal because the search is directed. We used the A* algorithm as the global planner in the navigation framework. The global planner gives a set of waypoints on the map

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

279

Fig. 3 Forward simulation

for the mobile robot to follow to reach the goal. These waypoints avoid the static obstacles on the map. The mobile robot uses the Dynamic-Window Approach (DWA) Fox et al. [8], a reactive collision avoidance approach, as the local path planning algorithm in the navigation framework. The DWA algorithm is proposed for robots equipped with a synchro-drive system. In a synchro-drive system, all the wheels of the robot orient in the same direction and rotate with the same angular velocity. The control inputs for such a system are linear and angular velocities. The DWA algorithm changes the control inputs of the robot at a particular interval. This approach considers the mobile robot’s dynamic constraints to narrow the search space for choosing the control input. Based on the maximum angular and linear accelerations of the motors at any given instant, the reachable set of control inputs (angular and linear velocity for the mobile robot) was determined. The reachable set of inputs is discretized uniformly. For each sample, a kinematic trajectory is generated, and the algorithm estimates the simulated forward location of the mobile robot. Figure 3 shows the forward simulation of the ’reachable’ velocities for the robot. The reachable velocities are the set of linear and angular velocities in the control input plane that can be reached within the next time step of choosing another velocity. For each trajectory of the forward simulation, a cost is computed. The original implementation of Fox et al. [8] computes the cost based on clearance from obstacles, progress towards the goal and forward velocity. The ROS’s implementation is as follows. The cost is the weighted sum of three components. The weighted sum includes the distance of the path to the endpoint of the simulated trajectory. Hence, increasing the weight of this component would make the robot stay on the global path. The second component of the cost is the

280

A. K. Sandula et al.

Fig. 4 RViz

distance from the goal to the endpoint of the trajectory. Increasing the weight of this component makes the robot choose any higher velocity to move towards the goal. The other component is the obstacle cost along the simulated forward trajectory. We assume that the points on the map with obstacles are very high while computing obstacle costs. Hence, if the robot collides at any simulated forward trajectory, the cost would be very high, and the (v, ω) pair will not be chosen by the dwa planner (Fig. 4). cost = path_distance_bias * (distance to path from the endpoint of the trajectory) + goal_distance_bias * (distance to goal from the endpoint of the trajectory) + occdist_scale * (maximum obstacle cost along the trajectory in obstacle cost) This cost depends on the distance from obstacles, proximity to the goal point and velocity. The trajectory with the least cost is chosen, and the process is repeated periodically till the goal point is reached.

4 Methodology This section explains the proposed approach to schedule the tasks based on a multiarmed bandit formulation solved with the ε-greedy algorithm. We can further optimize by scheduling the tasks’ execution so that the fixed-base robot reaches the parking spot of the mobile robot synchronously when the mobile robot reaches the workspace of the fixed-base robot. The following subsections explain the modules associated with the suggested approach and summarise the methodology.

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

281

4.1 Multi-armed Bandit Formulation Multi-armed bandit (MAB)(Slivkins [22]) problem is a classical reinforcement learning problem in which a fixed set of resources should be allocated among competing choices. We formulate the approach such that, at every time step, a choice has to be made by the algorithm. A reward is associated with each choice (arm or competing alternative) based on a preset probability. A MAB solver works on the principle of exploration and exploitation. The objective is to choose the arm with more expected gain. In the context of the current problem, multiple mobile robots approaching a given shelf are the competing choices. The fixed base robot is a limited resource which should collaborate with the mobile robots to accomplish the tasks. The scheduling of the tasks would be prioritised in a way that increases the maximum expected gain of the multi-armed bandit problem. We used the Bernoulli multi-armed bandit, where the reward is binary, that is, 0 or 1. The goal of this module is to prioritise the order of the requests based on the previous history of requests. We schedule the order of the task requests based on the priority we get from the MAB solver(s). The history of task requests is the input to the module; the output is the estimated order of probabilities of these task requests. Hence we define a function α : τ ∗ → P ∗ , where τ ∗ represents the history of the tasks accomplished until that time point. Note that τ ∗ is a three-dimensional binary matrix to which a two-dimensional matrix is appended at every time-stamp. Each row represents the list of tasks, and P ∗ represents the estimated set of probabilities which is the output of this module. The ε-greedy algorithm works on the principle of exploration and exploitation as explained in Algorithm 1. The parameter ‘ε’ is the probability of whether the algorithm is in the state of exploration or exploitation at a given time step. In the state of exploitation, we obtain the cumulative reward and pick the best(greedy) bandit available. Here, we consider a total of |υ| number of choices, each corresponding to a shelf, taken at a given time stamp. The aim is to find the most probable pickup-shelf task request for each shelf. Hence, the robot which carries the load type of that corresponding pickup point can be given priority while scheduling the task when multiple load-carrying mobile robots come to a shelf simultaneously. The value of ε should be set to a sufficiently high value to let the algorithm explore all the arms to get the correct priority order. The arms’ order keeps changing as the algorithm seeks to learn with the updated history of tasks. The following equation calculates the estimates of the bandits. This module prioritises the order of requests by calculating the estimated probability P ∗ , which is updated at every time stamp and helps us to schedule the tasks.

P ∗ (i, j) =

T =cur 1 τ ∗ (i, j, T )β(aT = a(i, j)) N j (aT = a(i, j)) T =1

(1)

282

A. K. Sandula et al.

Algorithm 1 ε − gr eedy(T imesteps) Ts = 0 n = a randomly generated number at each time step for Ts < T imesteps do if ε > n then Explore Ts = Ts + 1 else Exploit Ts = Ts + 1 end if Update reward end for

Algorithm 2 Grid Approach(i,G i ,vi ,ωi ) Robot Id ← i Current location of robot ← G i = [xgi , ygi ] Current linear velocity of robot ← vi Current angular velocity of robot ← ωi if GRID_LOCK(i,G i ) then P O L I CY2 (G i ) Grid Approach(i,G i ,vi ,ωi ) end if Forward simulation(i,vi ,ωi ) → Priorit yi if Conflict_Check(Priorit yi ) then P O L I CY1 (G i , Priorit yi ) end if Continue moving to the goal using ROS-NAV1 architecture

In Eq. 1, P ∗ (i, j) represents the estimated probability of task request from ith pickup point to jth shelf. Here, ‘T’ is the variable for the time step when the tasks are generated. Here, N j (aT = a(i, j)) represents the number of times the action a(i, j) was chosen by the MAB solver corresponding to the shelf j until the current number of time-stamps, that is, T=cur. The function β(X ) is a binary function which returns one if the condition  X is satisfied and zero if it is not satisfied. The denominator N j (aT = a(i, j)) = TT =cur =1 β(aT = a(i, j)), represents the number of times the arm i was chosen by the MAB solver at shelf j until the current time-step, that is, T=cur.

4.2 Task Scheduling Based on Time Synchronization In this subsection, we present the scheduling of the task requests that a fixed-base robot at a shelf must execute. We assign the priority among the task requests based on P ∗ we receive from the MAB solvers and the estimated time of arrival(s) of the mobile robot(s). As explained in Sect. 3.1, accomplishing a task requires a mobile robot and

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

283

a fixed-base robot to work in a coalition. The mobile robot carries the package and parks itself at a parking slot within the reachable workspace of the fixed-base robot. We schedule the tasks in such a way that the fixed-base robot reaches the mobile robot at the same time. This formulation helps us achieve collaboration between them to p ) by reduce the overall time to finish the task. From P ∗ , we get the priority order ( finding the element with the highest probability at each column. Sorting the indices of the rows based on the highest estimate to the lowest gives us the priority order. Algorithm 3 MAB Scheduler(E T A, p) Estimated Time of Arrivals ← E T A for ∀x ∈ E T A do if x < δ then Append load type of x to list E end if end for count=0 for ∀y ∈  p do if y ∈ E then Move y in E to position count Shift the requests after count by one position count = count+1 end if end for

 sort E subject to  p

So, we define priority order as  p = sor t (i ← max − to − min(max(P ∗ (i, :))))

(2)

The priority order of load types ( p ) is obtained by sorting the highest to least probabilities from the estimated probability of task requests (P ∗ ) with respect to the maximum element among the rows of P ∗ . Hence, we check the highest probability value in every row and compare the same for other rows. The row numbers from highest to lowest are sorted in the priority order  p . We conclude that the robots that carry a specific load type which appears at the beginning of the  p have a higher probability of the tasks accumulated than the latter. We assume t j as the set of pickup requests for the fixed-base robot at shelf j at any given instant. The estimated time of arrival(s) of the mobile robot(s) is(are) calculated from the path length(s) with current velocity as given in Fig. 5. We use the RRT-connect Kuffner and LaValle [17] algorithm for the motion planning algorithm of the robotic arm. In this algorithm, a tree from the source and a tree from the goal point are grown towards each other until they meet. The shortest path from the set of nodes (tree) is then chosen to execute the movement of the robotic arm. The scheduled requests are executed by the fixed-base robot when the movement time of the fixed-base robot equals the estimated time of arrival of the mobile robot. The estimated arrival time is calculated based on the distance remaining for the mobile robot to travel, divided by the current velocity. The time taken by the fixed-base robot is calculated by the angular distance (argument θ )

284

A. K. Sandula et al.

Fig. 5 Calculation of estimated time of arrival

the base joint has to travel and the velocity profile of the controller. The mobile robot navigates using a global planner A∗ and a local planner dynamic-window approach Fox et al. [8], which is an online collision avoidance algorithm. The fixed base robot has the velocity profile as shown in Fig. 6. The angular velocity, ω, increases with a uniform angular acceleration for θ < 10◦ after which ω = 20◦ /s. For the last 10◦ of the angular displacement, ω decreases till it becomes zero. As shown in Algorithm 2, the time taken to execute the movement of the fixedbase robot, γ1 , is calculated using the angular displacement θ of the base joint of the fixed-base robot from its current position. Algorithm 4 Movement Time(θ ) Movement Time ← M T if θ ≤ 10◦ √ then M T = (2θ)/2 end if if 10 ≤ θ ≤ 20√then M T = 1 + (2θ − 10)/2 else M T = 2 + (θ − 20)/2 end if

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

285

Fig. 6 Velocity profile of the base joint of the fixed base robot

Algorithm 5 MR_Task_Executor_ηi (T ,statei ) if i in T then if statei = start then Wait for the package deployment MR_Task_Executor_ηi (T ,to shelf) end if if statei = to shelf then Send_Goal(parking point) Wait for execution MR_Task_Executor_ηi (T ,pick) end if if statei = pick then Wait for the pick execution of fixed-base robot MR_Task_Executor_ηi (T ,to start) end if if statei = to start then Send_Goal(pickup point) Wait for execution MR_Task_Executor_ηi (T ,start) end if else Update(T) MR_Task_Executor_ηi (T ,statei ) end if

The Fig. 7 shows the entire architecture of the proposed methodology and the modules involved in the work. The two modules of the architecture are titled ‘Multiarmed bandit’ and ‘Multi-agent coordination. The module titled ‘Multi-armed bandit’ gives us the estimated probabilities of the task requests based on the history of the tasks, as explained in Sect. 4.1. The module titled ‘Multi-agent coordination’ takes into consideration the movement time and estimated time of arrivals and the probability estimates to plan the sequence of tasks.

286

A. K. Sandula et al.

Fig. 7 Data flow between the sub-modules of the proposed architecture

Since this work focuses on task scheduling of fixed-base robots, we only considered one shelf and three pickup points. We have a total of four agents, one fixed-base robotic arm γ1 , and three different load-carrying mobile robots (η1 , η2 , η3 ), which start from three different pickup points. Tasks are allocated to the mobile robot, which can carry the particular load type. Each mobile robot can carry a specific load type from a pickup point. Hence, after finishing the task, the mobile robots move back to the pickup point from the shelf to execute future tasks, if any. Figure 8 shows the flow chart of the execution of requests by a load-carrying robot. The detailed algorithm is explained in Algorithm 3. The input T is the list of tasks which are not accomplished. If ‘i’ is in the list T , a load from pickup point ‘i’ is to be carried to the shelf. We define four states of the mobile robot. State ‘start’ means the robot is waiting for the load to be deployed at the pickup point. State ‘to shelf’ means that the mobile robot will move with the load to the shelf at a parking point reachable to the robotic arm. State ‘pick’ means the robot is waiting for the fixed-base robot to reach the package to execute the pick and place operation. State ‘to start’ means the robot has finished its task and is moving back to its corresponding pickup point. Once the robot reaches the start position, it will follow the same loop if there are unfinished tasks.

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

287

Fig. 8 Mobile robot task execution

Now, the fixed base robot at a given shelf prioritizes the pickup requests based on the MAB scheduler algorithm, as explained in Algorithm 1. The output E of the algorithm gives the order of the tasks that the fixed-base robot must execute. The algorithm uses the multi-armed bandit formulation to estimate the priority to be allocated among the mobile robots at that current instant. We only prioritize the requests approaching the fixed-base robot within a threshold δ equal to the time taken for executing a pick and place task. We move the fixed-base robot towards the parking spot of the mobile robot in such a way that the robotic arm could reach for pickup precisely when the mobile robot delivers the package to make the scheduler

288

A. K. Sandula et al.

robust. This can be achieved by moving the robotic arm when movement time equals the estimated time of arrival of the mobile robot. We schedule the tasks based on the priority  p because we want to reduce the waiting time for the mobile robot, which has more probability of getting the tasks accumulated in future. We investigate the performance of the MAB task scheduler to a deterministic scheduler which works on a first-come-first-serve (FCFS) approach. In the FCFS approach, the mobile robot, which is estimated (based on the ETA) to arrive the earliest to the shelf, would be scheduled first for the pick and place operation irrespective of the history of the task requests. The position of the load is estimated using a classical colour-based object detection approach. A mask is used on the image frame to recognise the object in real-time, which only detects a particular colour. The mask is created using the upper and lower limits of the hue, value of greyness and brightness. We can detect any colour which falls within the specified range by the camera, which is attached to the end-effector of the robotic arm. In our simulation, the red-coloured region in the camera view is first detected using the colour-based object detection technique, as explained in Algorithm 6. A contour is created around the red object, which is used to detect the object’s centroid, the weighted average of all the pixels that make up the object. A contour is a curve that joins all the continuous (along the boundary) points having the same colour. As the depth (Z) was constant, the X and Y coordinates of

Fig. 9 View from the camera attached to the end-effector of the robotic arm

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

289

the block can be determined by the difference (in pixels) between the centre of the red bounded region from the image’s centre (Fig. 9). Algorithm 6 Colour-based object detection Lower limit of colour range ← L Upper limit of colour range ← U Area of contour ← A while true do Find contours for the colour range Calculate A if A > 2000 then Find centroid of object end if end while

5 Results and Analysis 5.1 Simulation Results We simulated a warehouse scenario in ROS-Gazebo architecture with pickup points 1, 2, 3 at [1, 4], [−3.5, 0.5] and [1, −3], respectively. The fixed-base robotic arm is placed on the shelf located at [0,0.7]. The task requests are generated at a probability of 0.6, 0.4, and 0.8 from each of the pickup points to the shelf at a time-step of 35 s. After empirical analysis, the variable δ is set to 4.0 s. Figure 10 shows a comparison between the deterministic and the stochastic task scheduling approach for Simulation 1. In Simulation 1, ε is chosen to be 0.3. Figure 11 shows a comparison between the deterministic and the stochastic task scheduling approach in Simulation 2. In Simulation 2 also, ε is chosen to be 0.3. Table 1 shows the total time taken by Robot 1, Robot 2 and Robot 3 to complete the tasks using the deterministic first-come-firstserve approach and the stochastic multi-armed bandit approach in Simulation 1 and Simulation 2. In Simulation 1, the stochastic approach reduced the total time taken to complete the tasks by Robot 1, Robot 2 and Robot 3 by 25.3%, 64.9% and 41.8%. In Simulation 2, the stochastic approach reduced the total time taken to complete the tasks Robot 2 and Robot 3 by 2.608% and 11.966%, respectively. However, the total time taken to complete the tasks of Robot 1 was increased by 44.033% using the multi-armed bandit based approach. Table 2 shows the cumulative time taken by Robot 3 to complete consecutive sets of 20 tasks using the deterministic and the stochastic approach. In Simulation 2, the difference between the time Robot 3 to complete the first 20 tasks using the suggested multi-armed bandit based approach and the first-come-first-serve approach is 0.76 h. For Simulation 1, the difference is 1.1 h. This difference exponentially increases for higher sets of consecutive tasks because of the accumulated waiting time. Figures 12 and 13 capture this difference. For Robot 2, the difference between the time taken to complete the first six tasks

290

A. K. Sandula et al.

Fig. 10 Task duration in Simulation 1

using the proposed multi-armed bandit based approach and the first-come-first-serve approach is 0.02 h in Simulation 1 (see Figs. 14 and 15). However, in Simulation 2, the FCFS approach is faster by 0.05 h. It was observed that the multi-armed bandit

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

291

Fig. 11 Task duration in Simulation 2

based approach is faster for the last set of 6 tasks for Robot 2 by 0.45 h and 0.23 h in Simulation 1 and Simulation 2, respectively. In Simulation 1, the difference between the time for Robot 1 to complete the first 11 tasks using the suggested multi-armed

292

A. K. Sandula et al.

Table 1 Time taken (in hours) to complete 100 tasks in Simulation 1 and Simulation 2 Robot Simulation 1 Simulation 2 FCFS (h) MAB (h) FCFS (h) MAB (h) 1 2 3

7.2 1.2 34.9

5.4 0.4 20.3

7.2 1.2 34.9

10.4 1.2 30.7

Table 2 Time taken (in hours) to complete consecutive sets of 20 tasks by Robot 3 in Simulation 1 and Simulation 2 Tasks Simulation 1 Simulation 2 FCFS (h) MAB (h) FCFS (h) MAB (h) 20 40 60 80

2.52 6.33 10 15.29

1.42 3.73 6.25 8.54

2.52 6.33 10 15.29

1.76 5.41 9.42 13.53

Fig. 12 Cumulative task duration in Simulation 1 for Robot 3

bandit approach and the first-come-first-serve approach is 0.01 h, FCFS approaches being the faster approach for this case. For Simulation 1, the difference is 0.17 h in favour of the deterministic FCFS approach. However, for higher sets of consecutive 11 tasks, the stochastic multi-armed bandit based approach is faster in Simulation 1, which is not the case in Simulation 2, as shown in Figs. 16 and 17.

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

293

Fig. 13 Cumulative task duration in Simulation 2 for Robot 3

Fig. 14 Cumulative task duration in Simulation 1 for Robot 2

Table 3 shows the cumulative time taken by Robot 2 to complete consecutive sets of 6 tasks using the deterministic and the stochastic approach. Table 4 shows the cumulative time taken by Robot 2 to complete consecutive sets of 11 tasks using the deterministic and the stochastic approach. We observe that the total task completion time for all the robots was more for the FCFS approach than the MAB approach in both the simulations.

294

A. K. Sandula et al.

Fig. 15 Cumulative task duration in Simulation 2 for Robot 2

Fig. 16 Cumulative task duration in Simulation 1 for Robot 1

5.2 Discussion We observe that the proposed approach can outperform the deterministic task scheduler. However, the uncertainty in executing the path by the mobile robot can affect the task completion time. In Simulation 2, even though Robot 1 was given priority, the task completion time was more than Simulation 1. The reason for this difference is the uncertainty in executing the path by the mobile robot. The mobile robot uses the Dynamic-Window Approach Fox et al. [8], a robust local path planning algorithm, to avoid dynamic obstacles. Even though the global path decided by the robot is the same in every case, it keeps updating based on the trajectory decided by the local path

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

295

Fig. 17 Cumulative task duration in Simulation 2 for Robot 1 Table 3 Time taken (in hours) to complete consecutive sets of 6 tasks by Robot 2 in Simulation 1 and Simulation 2 Tasks Simulation 1 Simulation 2 FCFS (h) MAB (h) FCFS (h) MAB (h) 6 12 18 24 30

0.09 0.24 0.14 0.2 0.45

0.07 0.08 0.1 0.07 0.09

0.09 0.24 0.14 0.2 0.45

0.14 0.38 0.25 0.15 0.22

Table 4 Time taken (in hours) to complete consecutive sets of 11 tasks by Robot 1 in Simulation 1 and Simulation 2 Tasks Simulation 1 Simulation 2 FCFS (h) MAB (h) FCFS (h) MAB (h) 11 22 33 44 55

0.26 0.62 1.15 1.91 2.52

0.27 0.42 0.85 1.53 1.8

0.26 0.62 1.15 1.91 2.52

0.43 0.94 1.71 2.73 3.52

planner. Hence, the execution of the path by the mobile robot does not necessarily have the same time taken for execution for the same initial and final goal point. In this chapter, we have proposed a novel task scheduling approach in the context of heterogeneous robot collaboration. However, we did not consider the case where robots are semi-autonomous. We observe that the difference between the time taken

296

A. K. Sandula et al.

Fig. 18 Top view of the mixed-reality warehouse environment

to complete the tasks using the deterministic first-come-first-serve approach and the stochastic multi-armed bandit based approach was significantly high for Robot 3 in Simulation 1 and Simulation 2. Although the total task completion time for the MAB approach was still less in Simulation 2, making the scheduler robust to handle uncertainties would save more time. A robust task scheduler considering the uncertainty of the task execution has to be investigated in future work. In this work, we considered only one shelf and three pickup points. However, the performance of this approach could be better investigated with multiple shelf points coupled with multi-robot task allocation. This needs to be further going investigated. In a warehouse scenario, some tasks require human intervention; that is when the robots have to work collaboratively with humans. Hence, we simulated a mixedreality warehouse environment with real robots in an indoor environment. Figure 18 shows the top view of the simulated mixed reality interface. Figures 19, 20 and 21 illustrate the experiment setup in a mixed reality environment. The user manually places the load on the top of the mobile robot. The mobile robot will carry the load to the fixed-base robot (a shelf point). Once the load is placed, the user presses the button where the load is to be carried. The experimental setup considers avoiding collision with the virtual obstacles present in the indoor environment. This helps us recreate a warehouse-type scenario in an indoor environment. In the same mixed reality environment, we added two virtual buttons to follow the user wherever he moves. These two buttons serve as the user input. We considered two fixed-base robots to be chosen by the user for transporting the load. In Fig. 17, the user presses the button on the left. The goal of the mobile robot

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

297

Fig. 19 The mobile robot carries the load towards the Fixed-base Station 1 when the user presses the first button

is to autonomously avoid real and virtual obstacles and reach the desired goal. The robot is clearly observed avoiding the virtual obstacles while reaching the goal. The top part of the figure shows the Hololens camera view from the user. Hololens is a holographic device which is developed and manufactured by Microsoft. The bottom part shows the real-world view of the environment. In Fig. 20, the user pressed the other button, choosing the other station. Hence, a new goal position is sent to the navigation stack. The robot changes the global path and, consequently, stops at that moment. Immediately after that, the robot started moving along the new global path. The arrows in the Hololens view and the realworld view represent the direction of the robot’s velocity at that moment. It can be observed that the direction of velocity is such that the robot avoids the virtual obstacles present on both its sides. This is done by providing an edited map to the navigation stack. We send the edited map to the move_base node of ROS as a parameter. We use the unedited map as an input parameter to the amcl node, which is responsible for the indoor localisation of the robot. In Fig. 21a. we can observe that the mobile robot is avoiding both the real and virtual obstacles while parallelly progressing towards the new goal. The real-world camera view of the figure shows the real obstacle and the end-effector of the fixedbase robot. In Fig. 21b. we show that the mobile robot reached the workspace of the fixed-base robot, and the fixed-base robot picks and places the load carried towards

298

A. K. Sandula et al.

Fig. 20 Mobile robot changes its trajectory when the second button is pressed

(a) Mobile robot moves towards Fixed-base Station 2, avoiding the real obstacle

(b) Fixed-base robot executes the pick and place task

Fig. 21 Pick and place task execution in a mixed-reality warehouse environment with real and virtual obstacle avoidance - link to video

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

299

it using a camera attached to the end-affector. This work can be extended to a case where the scheduler considers human input and prioritises the mobile robots for collaboration to finish the task.

6 Conclusion This chapter has presented the challenges of a heterogeneous multi-robot task scheduling scenario. It has proposed a solution and has discussed the multi-armed bandit technique used in this application domain. Future work will investigate a scenario with multiple shelves coupled with dynamic task allocation. The uncertainty of the execution of a mobile robot can cause increased task completion time. The delays in the initial tasks could add up to the later tasks. Future work will further investigate a task scheduler which can account for the uncertainty in the execution of the tasks. A task scheduler that considers a human in the loop system for the case with semi-autonomous robots is to be further investigated.

References 1. Borrell Méndez, J., Perez-Vidal, C., Segura Heras, J. V., & Pérez-Hernández, J. J. (2020). Robotic pick-and-place time optimization: Application to footwear production. IEEE Access, 8, 209428–209440. 2. Claure, H., Chen, Y., Modi, J., Jung, M. F. & Nikolaidis, S. (2019). Reinforcement learning with fairness constraints for resource distribution in human-robot teams. arXiv:abs/1907.00313. 3. Dahiya, A., Akbarzadeh, N., Mahajan, A. & Smith, S. L. (2022). Scalable operator allocation for multi-robot assistance: A restless bandit approach. IEEE Transactions on Control of Network Systems, 1. 4. Dijkstra, E. W. (1959). A note on two problems in connexion with graphs. Numerische mathematik, 1(1), 269–271. 5. Durrant-Whyte, H., Roy, N., & Abbeel, P. (2012). A framework for Push-Grasping in clutter (pp. 65–72). 6. Eppner, C., & Brock, O. (2017). Visual detection of opportunities to exploit contact in grasping using contextual multi-armed bandits. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 273–278). 7. Fox, D., Burgard, W., Dellaert, F., & Thrun, S. (1999). Monte Carlo localization: Efficient position estimation for mobile robots. AAAI/IAAI (343–349), 2. 8. Fox, D., Burgard, W., & Thrun, S. (1997). The dynamic window approach to collision avoidance. IEEE Robotics and Automation Magazine, 4(1), 23–33. 9. Fragapane, G., de Koster, R., Sgarbossa, F., & Strandhagen, J. O. (2021). Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda. European Journal of Operational Research, 294(2), 405–426. 10. Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107. https://doi.org/10.1109/tssc.1968.300136 11. Ho, Y.-C., & Liu, H.-C. (2006). A simulation study on the performance of pickupdispatching rules for multiple-load agvs. Computers and Industrial Engineering, 51(3), 445– 463. Special Issue on Selected Papers from the 34th. International Conference on Comput-

300

12.

13.

14.

15.

16.

17.

18. 19.

20.

21.

22. 23.

24.

25.

26.

27.

28.

A. K. Sandula et al. ers and Industrial Engineering (ICC&IE). https://www.sciencedirect.com/science/article/pii/ S0360835206001069 Kalempa, V. C., Piardi, L., Limeira, M., & de Oliveira, A. S. (2021). Multi-robot preemptive task scheduling with fault recovery: A novel approach to automatic logistics of smart factories. Sensors 21(19). https://www.mdpi.com/1424-8220/21/19/6536. Korein, M., & Veloso, M. (2018). Multi-armed bandit algorithms for a mobile service robot’s spare time in a structured environment. In D. Lee, A. Steen & T. Walsh, (Eds.), GCAI-2018. 4th Global Conference on Artificial Intelligence. EPiC Series in Computing, EasyChair (Vol. 55, pp. 121–133). https://easychair.org/publications/paper/cLdH Kousi, N., Koukas, S., Michalos, G. & Makris, S. (2019). Scheduling of smart intra - factory material supply operations using mobile robots. International Journal of Production Research 57(3), 801–814. https://doi.org/10.1080/00207543.2018.1483587 Koval, M. C., King, J. E., Pollard, N. S., & Srinivasa, S. S. (2015). Robust trajectory selection for rearrangement planning as a multi-armed bandit problem. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 2678–2685). Krishnasamy, S., Sen, R., Johari, R. & Shakkottai, S. (2021). Learning unknown service rates in queues: A multiarmed bandit approach. Operations Research, 69(1), 315–330. https://doi. org/10.1287/opre.2020.1995 Kuffner, J., & LaValle, S. (2000). Rrt-connect: An efficient approach to single-query path planning. In Proceedings 2000 ICRA. Millennium conference. IEEE international conference on robotics and automation. Symposia proceedings (Cat. No.00CH37065) (Vol. 2, pp. 995– 1001). LaValle, S. M. et al. (1998). Rapidly-exploring random trees: A new tool for path planning. Ozgul, E. B., Liemhetcharat, S., & Low, K. H. (2014). Multi-agent ad hoc team partitioning by observing and modeling single-agent performance. In Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific (pp. 1–7). Pini, G., Brutschy, A., Francesca, G., Dorigo, M., & Birattari, M. (2012). Multi-armed bandit formulation of the task partitioning problem in swarm robotics. In International conference on swarm intelligence (pp. 109–120). Springer. Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A. Y., et al. (2009). Ros: an open-source robot operating system. In ICRA workshop on open source software (Vol. 3, p. 5). Kobe: Japan. Slivkins, A. (2019). Introduction to multi-armed bandits. CoRR. arXiv:abs/1904.07272 Stavridis, S., & Doulgeri, Z. (2018). Bimanual assembly of two parts with relative motion generation and task related optimization. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 7131–7136). IEEE Press. https://doi.org/10.1109/IROS.2018. 8593928 Szczepanski, R., Erwinski, K., Tejer, M., Bereit, A. & Tarczewski, T. (2022). Optimal scheduling for palletizing task using robotic arm and artificial bee colony algorithm. Engineering Applications of Artificial Intelligence, 113, 104976. https://www.sciencedirect.com/science/ article/pii/S0952197622001774 Tang, F., & Parker, L. E. (2005). Distributed multi-robot coalitions through asymtre-d. In 2005 IEEE/RSJ international conference on intelligent robots and systems, Edmonton, Alberta, Canada, August 2–6, 2005 (pp. 2606–2613). IEEE. https://doi.org/10.1109/IROS. 2005.1545216 Viguria, A., Maza, I., & Ollero, A. (2008). S+t: An algorithm for distributed multirobot task allocation based on services for improving robot cooperation. In 2008 IEEE international conference on robotics and automation (pp. 3163–3168). Wang, H., Chen, W., & Wang, J. (2020). Coupled task scheduling for heterogeneous multirobot system of two robot types performing complex-schedule order fulfillment tasks. Robotics and Autonomous Systems, 131, 103560. https://www.sciencedirect.com/science/article/pii/ S0921889020304000 Wang, J., Gu, Y., & Li, X. (2012). Multi-robot task allocation based on ant colony algorithm. Journal of Computers, 7.

Multi-armed Bandit Approach for Task Scheduling of a Fixed-Base Robot …

301

29. Zhang, Y., & Parker, L. E. (2013). Multi-robot task scheduling. In 2013 IEEE international conference on robotics and automation (pp. 2992–2998). 30. Zlot, R., & Stentz, A. (2006). Market-based multirobot coordination for complex tasks. The International Journal of Robotics Research, 25(1), 73–101. https://doi.org/10.1177/ 0278364906061160

Machine Learning and Deep Learning Approaches for Robotics Applications Lina E. Alatabani , Elmustafa Sayed Ali , and Rashid A. Saeed

Abstract Robotics plays a significant part in raising the standard of living. With a variety of useful applications in several service sectors, such as transportation, manufacturing, and healthcare. In order to make these services useable with efficacy and efficiency in having robotics obey the directions supplied to them by the program, continuous improvement is required. Intensive research has been focusing on the way to improve these services which has led to the use of sub-sections of artificial intelligence represented by ML and DL with their state-of-the-art algorithms and architecture adding positive improvements to the field of robotics. Recent studies prove various ML/DL algorithms for robotic system architectures to offer solutions for different issues related to, robotics autonomy, and decision making. This chapter provides a thorough review about autonomous and automatic robotics along with their uses. Additionally, the chapter discusses extensive machine learning techniques such as machine learning for robotics. And finally, a discussion about the issues and future of artificial intelligence applications in robotics. Keywords Robotics applications · Machine learning · Deep learning · Visioning applications · Assistive technology · Imitation learning · Soft robotics

L. E. Alatabani Faculty of Telecommunications, Department of Data Communications and Network Engineering, Future University, Khartoum, Sudan E. S. Ali (B) Faculty of Engineering, Department of Electrical and Electronics Engineering, Red Sea University (RSU), Port Sudan, Sudan e-mail: [email protected] E. S. Ali · R. A. Saeed Department of Electronics Engineering, Collage of Engineering, Sudan University of Science and Technology (SUST), Khartoum, Sudan R. A. Saeed · R. A. Saeed Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_10

303

304

L. E. Alatabani et al.

1 Introduction Robotics has recently emerged as one of the most significant and pervasive technological technologies. Artificial intelligence has played a significant role in the development of advanced robots, which makes them more coherent and responsive [1]. Machine learning (ML) and deep learning (DL) approaches helped with the creation of enhanced and intelligent control capabilities as well as the development of smart solutions to a variety of problems affecting robotics applications [2]. Artificial intelligence techniques have recently been used to create a variety of robots, giving them the capacity to increase correlation, human traits, and productivity in addition to the enhanced humanistic cognitive capacities [3]. Robots can learn using precise ML techniques to increase their precision and understanding of spatial relations functions, grab objects, control movement, and other tasks that let them comprehend and respond to unobserved data and circumstances. Recently, the robotic process automation and the capacity to interact with the environment have been incorporated into the mechanisms of DL [4]. It also enables robots to perform various tasks by understanding physical and logistical data patterns and act accordingly. Due to the difficulty of translating and analyzing natural events, event control is one of the hardest activities to design through code when creating robots. This is especially true when there is a wide variety of actions that the robot performs in reality [5]. Therefore, algorithms that can gain expert human knowledge of the robot as structured parameters and improve control techniques are needed when constructing robots [6]. These justifications state that ongoing modifications to the robot’s programming are required because the world around it is constantly changing and because a composite analytical model is required to create application solutions [7]. One strategy that can address these challenges in a comprehensive and unique manner is the usage of machine and DL architecture. The rationale for utilizing ML and DL in robotics is that they are more broad, and deep networks are excellent for robots in characterless environments since they are capable of high-level reasoning and conceptualization [8]. According to the importance of AI approaches in robotics automotive to solve complex tasks, the contributions of this chapter is to provides a brief concept about the autonomous and automatic robots, and the differences between them. Also the chapter will discuss the most important robotics applications and the solution can be provides by the AI approaches, in addition to reviews the concept of extreme learning machines methods for robotics. Moreover, it will reviews different robotic learning approaches such as multi, self, and imitation learning approaches. This chapter’s remaining sections are arranged as follows, in Sect. 2, the differences between the autofocus robots and automatic robots were reviewed. Section 3 will review different robotics applications with respect to the AI solution approaches. The extreme learning machines methods for robotics is provided in Sect. 4. The machine learning approach for soft robotics, and machine learning based robotics applications were presented in Sects. 5, and 6 respectively. In Sect. 7, chapter provides a challenges and open issues in robotics applications. The chapter was concluded in Sect. 8.

Machine Learning and Deep Learning Approaches for Robotics …

305

2 Autonomous Versus Automatic Robots Autonomous robots were defined by Massachusetts Institute of Technology (MIT) as an intelligent machine that can execute tasks in the world by themselves, without explicit human control [9]. Modern robots are expected to execute complex commands to perform tasks related to sensitive fields, therefor the introduction of machine learning and deep learning approach was present to improve the accuracy of command execution in autonomous robots. The use of these approached improved the decision-making capabilities in robots adding self-learning abilities to autonomous robotics systems [10]. Using neural networks (NN) as an approach of ML adds more time to the learning process as a result of the size of the neural network for the complexity of commands to be learnt. As a result, the introduction of convolutional neural network (CNN) was present, CNN introduced a concept for decreasing leaning time through, (a) tensor decompositions of weights and activations of convolutional layer, (b) quantization of weights, (c) thinning weights to decrease network size, (d) flexible network architecture, (e) portioning the training to have a large network train a small network [6, 11]. The complex control task in robotic systems requires innovatory planning strategy to let robotics interacted like humans in doing tasks. Some classical planning mechanisms were used such as fast forward which has a limitation when it comes to the span of scenarios used. For example, if some environment objects are not known, the robot cannot execute the actions. Thus, knowledge-based planning was produced aimed at making robots complete complex control tasks the mechanism was successful with limitation in the computation of long series of actions [12]. To overcome the robotic limitation, another approach was introduced which is the knowledge-based reasoning which used to accomplishing complex control tasks with motion planning along with a perception module a semantic description for the robot to follow will be present [13]. Knowledge representation is accomplished through numerous approaches such as ontologies aimed at enabling the use of structuring concepts and relations in reasoning tasks used by robots. Authors in [9]. As a formal, clear specification of a common conceptualization, define the ontology. The abstract or concept of the entities in a certain estate is referred to as conceptualization [14]. The abstracts are accomplished by determining their related concept and relations. In robotics, the knowledge oriented approach uses model to describe actions for the control domain, while others use terminologies and inference in a given domain [7, 10]. Although these approaches have contributed to the performance of autonomous robotics, but some limitations occurred which is represented by not adapting the generic framework, the previous approaches are more task specific. Thus, the range of heterogeneous robotic collaborative tasks can be increased by other methodologies, for example, Core Ontology for Robotics and Automation (CORA), which enable the addition of more value to autonomous robotic systems and knowledge development [11]. The task a motion planning approach is illustrated in Fig. 1, which combining the domains of autonomous robotics.

306

L. E. Alatabani et al.

Fig. 1 Task and motion planning (TAMP) for robotics

Automatic robots is a concept that defines expert systems in terms of having a software robot that imitates human responses or actions. The process of automating the human processes is accomplished through the application of robotic process automation (RPA). The RPA is collector bowls a set of tools that operate on computer system interface and enable robotic acting like a human [12, 15]. RPA has a variety of applications in today’s industries, such as agriculture, power plants, and manufacturing. This technology targets the automation of simple, repetitive, and traditional work steps [13, 14]. Figure 2 represents the potentiality of process automation in relation with cognitive and routine tasks.

Fig. 2 Robotics process automation based on tasks characteristics

Machine Learning and Deep Learning Approaches for Robotics …

Input

Process

Goal

Time

307

UI

Size Variety Ambiguity

Ambiguity Relationship

Variavility

Novelty

Temporal Demand Task-Technology

Performance

fit type

Impact

AI Technology Capabilities Image Recogniton

Speech Recognition

Search

Prediction

Data Analysis

Language

Automation Level

Natural

Optimization

Understanding

Fig. 3 Intelligent robotic process automation (RPA) framework with AI approach

Artificial Intelligence (AI) gives the traditional concept of RPA more sense in multiples areas as AI has a range of capabilities that can give added value to bots in two major areas, (a) capturing information and (b) understanding the captured information. In capturing information, the aim is speech recognition, image recognition, search, data analysis/clustering. In understanding the captured information, the aim is natural language understanding i.e. acting as translator between humans and machines, optimization, and prediction. An intelligent framework to be used with RPA includes classifying tasks according to their characteristics and fitting them with the AI capabilities in order to select the task that is most suitable to be automated [15]. The potential framework is illustrated in Fig. 3.

3 Robotics Applications Robotics has made a significant contribution to the advancement of both current technology and the next generation of technology by incorporating features that increase the efficiency of numerous fields and creatively integrating machine learning (ML) into a variety of applications, some of which will be discussed in the following subsections.

308

L. E. Alatabani et al.

3.1 Computer Vision Image segmentation, detection, colorization, and classification are all examples of complex digital image problems where machine learning is used. The ML techniques represented in DL methods like Convolutional Neural Network (CNN) has impacted results in enhancing prediction efficiency using several resources. A branch of machine learning known as “Deep Learning” was developed in accordance with Artificial Neural Networks (ANN), which simulate how the human brain works [16].

3.2 Computer Vision Complex digital picture challenges like image segmentation, detection, colorization, and classification frequently involve machine learning. Convolutional Neural Networks (CNN), a DL method that incorporates ML techniques, have improved prediction accuracy using a variety of resources. A branch of machine learning known as “Deep Learning” was developed in accordance with Artificial Neural Networks (ANN), which simulate how the human brain works [16]. The comparison between traditional and deep learning computer vision is shown in Fig. 4. In CNN, multiple layers are used to generate the network. The data passes through many steps pre-process such as subtraction and normalization and then processed by the CNN network [17].

Traditional Computer Vision

Features

Input

Feature Engineering Manual Extraction+Selection

Output

Classifier with shallow structure

Deep Learning Workflows

Input

Output

Feature Learning + Classifier (End-to-End Learning)

Fig. 4 Traditional computer vision versus DL workflows

Machine Learning and Deep Learning Approaches for Robotics …

309

A. Convolutional layer The network’s convolutional layer is made up of several sets of filters (kernels) that can take in a certain input and output a feature map. Filters are a representation of a multi-dimensional gird of discrete numbers. The numbers represent the weights of the filter which are learnt during the training phase of the network. CNN uses sub-sampling feature to reduce the dimension resulting to be smaller as feature map output. The output feature allows mild constancy to scale and makeup objects which is useful for applications like object recognizing used by image processing. Zero-Padding is introduced to reduce noise, super-resolution, or segmentation when an image’s spatial size needs to be kept constant or larger after convolution, because these operations require more pixel-intensive predictions. It also gives more room to design deeper networks. In order to increase the size of the output feature map through processing with multi-dimensional filters, zero-mapping involves sliding zeros in the input feature map [18]. B. Pooling layers This layer defines parts or blocks of the input feature map and then integrates feature activations. This integration is represented by a voting function such as the max or middle function in the convolutional layer where there is a need to specify the size of the aggregation area. The amount of the output feature map is calculated by the function if we consider the size of an voted region of f x f. 

h =



   h− f +s w− f +s  ,w = s s

(1)

where h is the height, w is the width, and s is the stride. In order to extract the compressed feature representation, the voting operation efficiently down-sizes the input feature map [19]. C. Fully Connected Layers This layer is placed at the end of the network. Some researches proved that it also efficient to place it at the middle of the network. This layer correlate with layers with filter size of 1 × 1 where each unit is fully is intensively joint to all units in the previous layer. By performing a straightforward matrix multiplication, adding a bias vector, and then adding an element-wise nonlinear function, the layer functions are calculated. y = f (W T x + b)

(2)

where x and y denote the input and output parameter, b represents the bias parameter and W is the weights of connections between the units [20]. D. Region of Interest (ROI) It used in object detection, this layer in an important element of CNN. By creating a bounding box and labeling each object with a specific object class, this method

310

L. E. Alatabani et al.

makes it possible to precisely pinpoint every object in a picture. Objects may be located in any area of an image with a variety of different attributes. For the aim of determining the approximate location of an object, the ROI pooling layer simply employs the input feature map of an image and the coordinates of each region as its inputs. However, an issue with varied spatial sizes results in each ROI having variable dimensions. Because the CNN only operates on a fixed dimensional input, the ROI pooling layer modifies these variable sized features to a predetermined sized output feature maps. ROI has demonstrated that using a single set of input feature maps to create a feature representation for each region improves the performance of a deep network [21].

3.3 Learning Through Imitation The idea of a robot was first proposed in the early 1960s, when a mobile industrial robot with two fingers was designed to transport goods and execute specified duties. Therefore, much study was done to enhance the gripping and directional processes. Learning through presentation, also known as imitation learning, is a theory that has been demonstrated to exist in the performance of difficult maneuvering tasks that recognize and copy human motion without the use of sophisticated behavior algorithms [22, 27]. Deep Reinforcement Learning (DRL) is extremely valuable in the imitation learning field because it has the ability to create policies on its own, which is not possible with traditional imitation learning techniques that require prior knowledge of the learning system’s full model. Robots can immediately learn actions from images thanks to DRL, which combines decision-making and insight abilities. The Markov Decision Process (MDP), which is the basis of reinforcement learning, produces the anticipated summation of rewards as the output of the action state function. Q π (s, a) = E π

 T 

 γ |st = s, at = a trt

(3)

t=0

where the Q π (s, a) represents the action state value where E π is the expected outcome in the motion strategy case π, rt representing the reward value γt, denotes the discount factor [23, 24]. Robots learn motions and maneuvers by watching the expert’s demonstration through the process known as imitation learning, which is the concept of exactly mimicking the instructor’s behavior or action. In order to optimize the learning process, the robot also learns to correlate the observed motion with the performance [27, 28]. Instead of having to learn the entire process from scratch, training data can be achieved by learning from available motion samples. This has a significant positive impact on increasing learning efficiency. Combining many reinforcement

Machine Learning and Deep Learning Approaches for Robotics …

Supervised Learning

311

Optimal Policy

Behavior Cloning

Imitation Learning

Reinforcement Learning

Reward

Optimal Policy

Inverse Reinforcement Learning

Generator

Adversarial

Discriminator

Expert Policy

Generative Adversarial Imitation Learning

Fig. 5 Imitation learning classification

learning approaches can increase the speed and accuracy of imitation learning [25]. The three major types into which imitation learning is divided are presented in Fig. 5. To enable the distribution of the state action path generated by the agent to match the known learning path, learning is done in the behavior reproduction process after the policy is acquired. A robotic arm or other item would typically only be able to repeat a specific movement after receiving manual instruction or a teaching pack, with the exception of becoming accustomed to an unfamiliar environment change. The availability of data-driven machine learning techniques allows the robot to recognize superior robot maneuvering units and adjust to environmental changes. In inverse reinforcement learning a reward function is introduced to test if the action is performed as it should be. This approach outperforms the traditional behavior cloning for its adaptation to different environments qualities, it is described as an efficient approach of imitation learning. Generative adversarial imitation learning concept is satisfied by ensuring that the generated strategy is aligned with the expert strategy [26]. The imitation learning framework contains trajectory and force learnings. In a trajectory training approach, an existing trajectory profile is taken for task as an input, then create the nominal trajectory of the next task. The force learning part of the framework uses Reinforcement Learning agent, and an equivalent controller is to learn both the position and the parameters commands of the controller [27]. Figure 6 illustrates an imitation learning framework where the update of dynamic movement primitives (DMPs) are updated using a modular learning strategy.

312

L. E. Alatabani et al. Trajectory Learning

Final Goal

IL Trajectory Profile

DMPs

Update

Skill Policy

Position Feedback

Sub-Goal

Force Learning

Control Force RL

Feedback

Fig. 6 Adaptive robotic imitation framework

3.4 Self-supervised Learning Self-supervised learning has been used to improve the functionality of several robotic application characteristics, such as improving robot navigation, movement, and vision. Since they rely on positional information for efficient movement and task fulfillment, almost all robots navigate by evaluating input from sensors. Robots use motion-capture and GPS systems, as well as other external sources, to determine their positions. They can also use on-board sensors that are currently in vogue, such as 3D LiDARs that record varied distances, to determine their positions [29]. Selfsupervised learning, in which the robot does not require supervision and the target does not require labeling, is a successful learning technique. Self-supervised learning is therefore most suitable for usage when the data being investigated is unlabeled [30]. A variety of machine learning approaches has been used in the visual localization area to enhance the effectiveness of the visual based manipulation. Researchers have established that self-supervised learning techniques for feature extraction based on image production and translation enhanced the performance of robotic systems. Feature extraction using Domain-Invariant Super Point (DISP) is satisfied through two major tasks: key point detection and description. A function detects key points on a certain image via map calculation, the aim of this process is to compare and match

Machine Learning and Deep Learning Approaches for Robotics …

313

key points found in other images using similarity matrix, the output is a fixed-length vector to describe each pixel of the image. With the goal of feature extraction domain-invariant function, picture domain adaptation and self-supervised training are combined. The image is parsed into subdomains, and the network is instructed to train from a specified function to locate associated key points. Instead of using domain translation for the feature detector and descriptor, optimization is employed to reduce the matching loss, which aids in the extraction of a better feature extraction function. This enables the network to have the ability to filter essential points and match them under varying circumstances. When applied within deep convolutional layers, this cross-domain idea expands the learning process’s scope. Using the scenes and objects as opposed to a single visual space. When particular conditions are being examined, feature extraction has stronger abilities when image-to-image translation is used [31].

3.5 Assistive and Medical Technologies According to its definition, assistive technology is a concept of applications designed to help the elderly and others with long-term disabilities overcome their lack of capability or decline. The demand for assistive technology has increased quickly, particularly in a port covid-19 world, which has necessitated the need to improve performance. The recent advancement in machine learning methods has made it easier for the development and improvement of autonomous robots which gave them the abilities of adapting, responding, and interacting to the surrounding environment. This advancement led to the enhancement of human–machine collaboration when it comes to building companion robots, exoskeletons, and autonomous vehicles. Companion robots are designed to do certain duties that improve the patient’s quality of life by keeping an eye on their mental and physical well-being, communicating with them, and providing them with amusement [32]. Utilizing care robots primarily focuses on human–robot interaction, which necessitates careful consideration of the user’s history. These users, who typically work in the medical profession, lack the technical know-how to interact with robots. The development of assistive technology places a significant emphasis on the nursing staff and patients’ families, due to the fact that they are regarded as secondary users of the system [33]. Incorporating Part Affinity Field (PAF) for human body prediction after thoroughly analyzing human photos has a good effect on improving the performance of assistive technology when using convolutional neural network as a sub-system of deep learning methodologies. High efficiency and accuracy are two features of the PAF approach, which can successfully identify a human body’s 2D posture using picture analysis [34].

314

L. E. Alatabani et al.

3.6 Multi-agent Learning The use of learning-based methods for multi-robot planning has yielded promising results due to their abilities to manage a multi-dimensional environment with statespace representation. Robotics difficult problems, such as teaching numerous robots or multi-agents to perform a task simultaneously, were resolved by applying reinforcement learning [35, 36]. By moving the compute portion to the cloud, multi-agent systems must create strategies to overcome problems like energy consumption and computational complexity. The performance of multi-agent systems has substantially improved as a result of the integration of multi-robot systems with edge and cloud computing, adding value and enhancing user experience. Depending on the application being used, especially when deploying robots in the medical industry, robotic applications have increased the demand for speedier processing. Resource allocation made it simple to handle this issue by utilizing multi-agent resource allocation. To meet the Quality-of-Service standards, which vary from one robot to another according on its application, resource allocation refers to allocating a resource to a job based on its availability. Robotics applications required a variety of latency-sensitive, data-intensive, and computational operations. For these tasks, a variety of resources are needed, including computing, network, and storage resources [37]. The Markov Decision Making Process is applied to a Markov model in MultiAgent Reinforcement Learning (MARL), where a group of agents N are taken into account together with S State Space, A Joint Action Space, and a Reward Function R. Each time-step, the Reward function computes N rewards, one for each actor. Considering T as a transitional function that takes into account the likelihood that a state will be reached after taking a combined action of a. Every time-step, an observation function O is sampled from each agent and placed under the observation space Z of each agent. Multi-Agent systems can be either heterogeneous or homogenous, i.e. having either distinct or share the same action space respectively. MARL systems configurations vary depending on their reward functions which can be either cooperative or competitive, in addition to their learning setups which directly impact the type of policies learnt [38].

4 Extreme Learning Machines Methods for Robotics ELM was produced to overcome the restrictions of gradient-based learning algorithms in areas of classifications productivity, ELM convergence is faster because there are no iterations in the learning process. Due its superiority of learning speed and generalization capability ELM has been deployed in a wide range of learning problems such as clustering, classification, regression, and feature mapping. The evolution of ELM was present to further improve its performance by elevating the

Machine Learning and Deep Learning Approaches for Robotics …

315

classification accuracy, lessen the number of manual interventions, and less training time. ELM uses two random parameter and freezes them during the training process. The random parameters are kept on ELM hidden layer. To be more efficient in the training process, the input vector is mapped into a random feature space with random configurations and nonlinear activation functions. In the linear parameter solution step βi is acquired by Moore–Penrose inverse as it is a linear problem Hβ = T [39, 41]. The Extreme Learning Machine and Color Feature Variant technique were combined to create a single layer feed forward neural network. A completely connected output layer and three tiers of hidden layers are both included in CFELM. During the training phase, the extraction of the product of the random weight values and the inputs is recorded in the hidden layer output matrix H. The activation function g (x) is used to process the input signal and convert it to an output, and the CF-ELM is processed using the soft-sign activation function g(x), which is represented by the following function. g(x) =

x 1 + x

(4)

The H matrix can be seen as follows;     H W1 . . . , W N˜ , b1 . . . , b N˜ , Y1 . . . , Y N , U1 . . . , U N , V1 . . . , VN



⎤   g W1 .Y1 + b1 . . . g W N˜ .Y1 + bn˜ ⎢ ⎥ ... ⎢ ⎥ ⎢ g W .Y  + b . . . g W .Y  + b ⎥ ˜ 1 1 1 n ˜ ⎢ ⎥ N 1 ⎢ g(W .U + b ) . . . g W .U + b ⎥ ⎢ ⎥ ˜ 1 1 1 1 n ˜ N ⎢ ⎥ 3N . N˜ =⎢ ...

⎥ ⎢ ⎥ ⎢ g(W1 .U N + b1 ) . . . g W N˜ .U N + bn˜ ⎥

⎥ ⎢ ⎢ g(W1 .V1 + b1 ) . . . g W N˜ .V1 + bn˜ ⎥ ⎢ ⎥ ⎣ ⎦ ...

g(W1 .VN + b1 ) . . . g W N˜ .VN + bn˜ ⎡

(5)

where N represents the samples for training. N˜ denoted for number of neurons in  the hidden layer, W represents the input weight, b represent bias. Y , U and V are the color input samples for each pixel. The difference between CF-ELM and ELM is that ELM uses grey scale images. The output coming from the hidden layer, and turned to be as an input multiplier for the output weights β, T represents the output target which equals to β.H T = β.H The β can be expressed by,

(6)

316

L. E. Alatabani et al.

⎤ β1T ⎢ . ⎥ ⎥ ⎢ ⎥ ⎢ β = ⎢ . ⎥ N˜ .m ⎥ ⎢ ⎣ . ⎦ β NT˜ ⎡

(7)

where m is the layer’s total number of neurons. The following equation can be used to represent the goal output T matrix: ⎡

⎤ t1T ⎢ . ⎥ ⎢ ⎥ ⎢ ⎥ T = ⎢ . ⎥ N .m ⎢ ⎥ ⎣ . ⎦ t NT˜

(8)

Vector of (1 s) typically represents the value for each t which is stored based on the input training sample. The value of β can be obtained by making it as the subject. β = H −1 .T

(9)

where, H −1 represents the Moore–Penrose Pseudo Inverse of the matrix H. The feed dot product forward between the input picture and the weights in the hidden layer and the dot product with the output layer are added up to form the classification function. The hidden layer needs to be broken up into three sub layers in order to reflect the color properties. Therefore, the count of neurons of a hidden layer that must be divisible by 3. To get the yi output, the m classes must be processed using the classifier function. N˜

yi =

3 

     (βi j W j .Y + b j + βi j + N˜ W j+ N˜ .U + b j+ N˜

j=1

+ βi j+2 N˜ (W j+2N ˜ .V + b j+2 N˜ ))

(10)

wher e W j Represents the weight vector for the jth neuron [40]. Another joint approach based on convolutional neural networks and ELM is proposed to harvest the strength of CNN, this is to overcome the gradient calculations that are used for updating network weights. Convolutional Extreme Learning Machine (CELM) are fast training CNNs which are used for feature extraction by effectively define filters [41]. Level-Based Learning Swarm Optimizer (LLSO) based ELM approach is introduced to solve the limitations of ordinary ELM which are the reduced generalization performance. The concept is to use LLSO to harvest the paradigmatic parameters for ELM, the optimization problem is present at large scale in ELM because of the fully

Machine Learning and Deep Learning Approaches for Robotics …

317

connected input and hidden layers. The essential considerations in the optimization of ELM parameters are fitness and particle encoding. The input layer’s weight vector and the hidden neurons’ bias vector make up the LLSO-ELM particles. The following equations can be used to numerically represent particle P: P = [w11 , w12 , . . . w1L , . . . wn1 , . . . wnl , b1 . . . b L ]

(11)

where n is the dimension of the input dataset, and L is the number of hidden neurons. The following equations represent the particle’s length, Len of Particle: Leno f Par ticle = (n + 1) × L

(12)

Fitness is used to assess the quality of the particles, with smaller values indicating better classification. As a result, the equation below can be used to determine fitness value. Fitness = 1 − Acc

(13)

After the LLSO had sought and harvested the optimal parameters, Acc is the ratio of the number of samples that were correctly classified to the count of all samples that were harvested by the ELM algorithm [42].

5 Machine Learning for Soft Robotics The general concept of robotics has been known for being precise and rigid. However, in the latest years an improved technology was introduced with modern concepts such as flexibility and adaptability. Soft robotics presented abilities that were not present in the stiff formation of robotics [43]. Large and multidimensional data sets can be used by ML to extract features. When coupled with the application of soft sensors to robots, soft robotics’ performance has increased significantly. By using soft sensors, robotics sensitivity and adaptability may be improved. Due to the non-linearity and high hysteresis of soft materials, soft sensors are however constrained in terms of their capabilities and calibration difficulty. Learning approaches can overcome the limitations of implementing soft sensors. ML enable to accurate the calibration and characterization by considering their nonlinearity and hysteresis [44]. Robotics tasks can be divided into two sections: perception and control, perception tasks aim at collecting information about the environment via sensors for extracting the target properties, while in the control policy the agent interconnects with the environment for the purpose of learning a certain behavior depending on the received award. However, control in soft robotic is much complicated and needs more efforts for the following reasons:

318

L. E. Alatabani et al.

• In the case of data distribution, perception observations are equally distributed. While in controls, the data is collected at one place and stored online. • Supervision Signal: in perceptions are completely supervised while controls only selected rewards are available. • Data Collection: data is collected offline for perception and online for controls. Control tasks necessitate a big amount of training data, expensive interaction between the soft robot and its environment, a sizable action space, and a robot whose structure is constantly changing due to the robot’s constant bending and twisting. These challenges dragged attention to the development of models that can easily adapt and evolve to learn using previous experiences and can also handle large datasets. DL techniques are known to have high performance characteristics in applications with control nature of tasks, DL has the ability to learn from previous experiences where the technique considers previous experience as a factor in the learning algorithm and processes large datasets. DL algorithms has exceled over other approaches with its characteristics of accuracy and precision. The continuous growth and evolution of soft robotics and its applications has raised the need for smarter control systems that can perform tasks involving objects of different shapes, adapt to different environments, and perform considerable number of tasks merging soft robots [45]. Due to the wide action and state space that soft robot applications represent, policy optimizations pose a significant difficulty. By combining reinforcement learning and neural networks, soft robotics applications performed better overall. A wide variety of Deep Reinforcement Learning (DRL) techniques have been introduced to address control-related issues. A. Deep Q-Networks (DQN) Is a deep neural network based on CNN with the aim to obtain the value of Q-function by denoting the weights of the network by Q*(s,a) where the error and target can be mathematically represented by the following equations. δtD Q N = ytD Q N − Q(st , at ; θtQ )

(14)

ytD Q N = Rt+1 + γ

(15)

And,

The weights are iteratively updated by,  2  θt+1 ← θt − ∝ (∂ δtD Q N θtQ /∂tQ )

(16)

Two learning strategies are used by DQN, including the target network, which uses the same framework as the Q-network and updates the weights on the Q-networks while iteratively copying them to the weights of the target network. Additionally,

Machine Learning and Deep Learning Approaches for Robotics …

319

the experience replay shows how the network’s input is gathered in state-action pairs, together with their rewards, and then saved in a replay memory before being retrieved as samples to serve as mini-batches for learning. Gradient descent is used in the remaining learning tasks to lessen the possibility of loss between the learnt Q-network and the target Q-network. B. Deep Deterministic Policy Gradients (DDPG) The DDPG is a combination of Actor-critic methods aimed at modeling problems with high dimensional action space. DDPG is represented mathematically by stochastic and deterministic policies in the following Eqs. (17) and (18). Q π (st , at ) = E Rt+1 ,st+1 ∼E [Rt+1 + γ E at+1 ∼π [Q π (st+1 , at+1 )]]

(17)

 

 Q μ (st , at ) = E Rt+1 ,st+1 ∼E Rt+1 + Q μ Q π st+1 , μ(s t+1

(18)

And,

This method is one of the first algorithms in the field of DRL applied to soft robots. C. Normalized Advantage Function (NAF) With the aid of deep learning, Q-Learning is made possible in continuous and highdimensional action space. This technique differs from standard DQN in that it outputs V, L, and V in the neural network’s final layer. The advantage required for the learning strategy is predicted by and L. To lessen correlations in the observation data gathered over time, NAF uses the target network and experience replays.  

1 A s, a; θμ , θ L = − (a − μ(s, θμ ))T P(s; θ L )(a − μ(s; θμ )) 2

(19)

T P s; θ L = L(s; θ L )L(s; θ L )

(20)

where:

D. Asynchronous Advantage Actor Critic The algorithm operates to collect observations by various actor-learners where each network weights are updated by each storing gradient with their corresponding observations. The algorithm uses score function in a form of advantage function which relay on policy representation π(a|s; θ π ) and value estimation V (s; θ V ). The observations made by the action-learners are what led to the creation of the advantage function. In iterative phases up to T steps, each actor-learner gathers observations about the local environment, which accumulates gradients from samples in the rollouts. The following equations mathematically represent the advantage function,

320

L. E. Alatabani et al.



π

A st , a t , θ , θ

V

T −1   



π k−t T −t V V = γ +γ V s T ; θ − V st ; θ ; θ

(21)

k=t

where: θ π , θ V are the network parameters, s is the state, a, is the action, and t is the learning step. Instead of updating the parameters sequentially, they are updated simultaneously, negating the need to learn stabilizing strategies like memory replay. The mechanism uses action-learners who aid in inspecting a larger perspective of the environment to aid in learning the best course of action. E. Trust Region Policy Optimization (TRPO) This algorithm was introduced to overcome the limitations which might occur using other algorithms, for optimizing large nonlinear policies that enhances the accuracy. Cost function is used in place of reward function to achieve this. Utilizing conjugate gradient followed by line search to solve optimization problems has been shown to improve performance [46]. The equation that follows illustrates it,  η(π ) = E π

∞ 

 γ c(st )|s0 ∼ ρ0 t

(22)

t=0

With the same concept the state-value function is replaced, represented by Eq. (23), Aπ = Q pi (s, a) − V π (s)

(23)

The result of the optimization function would be an updating rule for the policy given by Eq. (24),  η(π ) = η(πold ) + E π

∞ 

γ A t

t=0

πold

   (st , at )s0 ∼ ρ0 

(24)

6 ML-Based Robotics Applications By incorporating improved sensorimotor capabilities that can give a robot the ability to adapt to a changing environment, robotics technology is a field that is quickly developing. This is made possible by the integration of AI into robotics, which enables optimizing the level of autonomy through learning using ML techniques. The usefulness of adding intelligence to machines is determined by their capacity to predict the future through planning how to finish a job and interacting with the environment through successfully manipulating or navigating [47, 48].

Machine Learning and Deep Learning Approaches for Robotics …

321

6.1 Robotics Recommendation Systems Using ML The development of modern life has made it clear that better decision-making processes are required when it comes to addressing e-services to enhance client decision making. These systems utilize personalized e-services with artificial intelligence-based approaches and procedures to identify user profiles and preferences [49]. With the application of multiple ML methods, recommendation quality was raised and user experience increased. Recommendation systems are primarily designed to help people who lack the knowledge or skills to deal with a multitude of alternatives through systems that estimate use preferences as a result of reviewing data from multiple sources. Knowledge-based recommender systems, collaborative filtering-based recommender systems, and content-based recommender systems are the three types of recommender systems that exist [50]. Making recommendations for products that are comparable to those that have previously caught the user’s attention is the aim of content-based recommender systems. Items attributes are collected from documents or pictures using retrieval techniques like the vector space model [51]. As a result, information about the user’s preferences, or what they like and dislike, is included in their user profile. Collaborative filtering (CF), the most popular technique in recommender systems, is predicated on the idea that consumers who are similar to one another will typically consume comparable products. A system based on user preferences, on the other hand, will function using information about users with comparable interests. Memory-based and model-based CF techniques are the two categories; memory-based is the more traditional type and uses heuristic algorithms to identify similarities between people and things. The fundamental method utilized in memory-based CF is closest neighbor, which is simple, efficient, and precise. As a result, user-based CF and item-based CF are two subcategories of memory-based CF. Model-based CF was initially offered as a remedy for the shortcomings in the prior methodology, but its use has since spread to suit additional applications. Model-based CF uses machine learning and data mining techniques to forecast user behavior. Knowledge-based recommender systems are built on the user knowledge already in existence and are based on user knowledge gleaned from past behavior. A popular method known as “case-based” is used by knowledge-based systems to use previous difficulties to address present challenges. Knowledge-based application fields include, but are not limited to, real estate sales, financial services, and health decision assistance. Each of these application areas deals with a different issue and necessitates a thorough knowledge of that issue. Figure 7 displays robotics applications for machine learning systems. Figure 8 illustrates how the implementation of AI methods has improved the performance of many techniques in numerous fields, including knowledge engineering, reasoning, planning, communication, perception, and motion [52]. Simply defined, recommender systems use historical behavior to predict client needs [53].

322

L. E. Alatabani et al.

Fig. 7 Recommended machine learning system in robotics

The Root Mean Square Error (RMSE), which is widely used to evaluate prediction accuracy, is the most basic performance evaluation technique for qualitative evaluation metrics. This evaluation is done using the mean squared error (MSE), which is calculated by dividing the sum of the squares of the difference between the actual score and the anticipated score by the total number of expected scores. Additional qualitative evaluations that are covered by a confusion matrix and used to calculate the value of the qualitative evaluation index include precision, re-call, accuracy, F-measure, ROC curve, and Area under curve (AUC). By identifying whether or not the user’s preference is based on a recommender system, this matrix enables the evaluation of a recommender system. Each row in Table 1 represents a user-preferred item, and each column shows whether a recommender model has suggested a related item. Where: The number of items that fit the user’s preferences is represented by TP. The number of user favorites that the recommendation system does not suggest is known as TN. FP represents the frequency at which systems suggest products that people dislike. Additionally, FN stands for the number of cases for which the system does not offer a suggestion. Table 2 also includes other qualitative measures such as accuracy, which is the proportion of recommendations that are successful, precision, which is the number of choices that exactly match the user’s preference, recall, which is the proportion of recommendations made by a recommender system based on actual data gathered from users, F-means, which calculates the harmonic average of precision and recall, and the ROC curve, which is a graph demonstrating the relationship between the True Positive Rate (TPR) and precision and recall [54].

Machine Learning and Deep Learning Approaches for Robotics …

AI Areas

323

Techniques

Transfer Learning

Knowledge Engineering

Active Learning

Deep Neural Network Fuzzy Technologies

Reasoning

Evolutionary Algorithm

Reinforcement Learning

Planning

Communication

Natural Language Processing

Computer Vision

Perception

Fig. 8 Artificial intelligence techniques in robotics applications

Table 1 Confusion matrix in the qualitative evaluation matrix

Priority

Recommended

Not recommended

User’s favorite item

True Positive (TP)

True Negative (TN)

User unfavorable item

False Positive (FP)

False Negative (FN)

6.2 Nano-Health Applications Based on Machine Learning With the development of robots and ML applications toward smart life, nanotechnology is advancing quickly. Nanotechnology is still in its infancy, though, and this has researchers interested in expanding this subject. The term “Nano” describes the

324

L. E. Alatabani et al.

Table 2 Qualitative evaluation using confusion matrix components

Evaluation matrix

Equation

Thoroughness

TP/TP + FP

Rendering

TP/TP + FN

Accuracy

TP + TN/TP + FN + FP + TN

F-measure

2 × (Precision × Recall) / (Precision × Recall)

ROC curve

Ratio of TP rate = (TP/TP + FN) and FP rate = (FP/FP + TN)

AUC

Area under the ROC curve

creation of objects with a diameter smaller than a human hair [55]. Reference [56] claims that nanotechnology entails the process of creating, constructing, and manufacturing materials at the Nano scale. Robots that are integrated on the Nano scale are known as Nano robots [57]. According to the authors, Nano robots are objects that can sense, operate, and transmit signals, process information, exhibit intelligence, or exhibit swarm behavior at the Nano scale. They are made up of several parts that work together to carry out particular functions; the parts are constructed at the Nano scale, where sizes range from 1 to 100 nm. In the medical industry, Nano robots is frequently utilized in procedures including surgery, dentistry, sensing and imaging, drug delivery, and gene therapy. Reference [58] Fig. 9 shows an application of Nano robotics. By providing value to image processing through picture recognition, grouping, and classifications via medical imaging processing using ML, this will improve the performance of Nano health applications. By incorporating machine learning (ML) into biological analysis using microscopic images, disease identification accuracy can be increased. In order to better understand the influence nanoparticles have on their characteristics and interactions with the targeted tissue and cells, machine learning (ML) methods have been utilized to predict the pathological response to breast cancer treatment with a high degree of accuracy. Artificial neural networks, which decreased the prediction error rate, enable the use of techniques without the need for enormous data sets [59].

Medical Applications Surgery

Dentistry

Sensing & Imaging

Fig. 9 Nano robots for medical applications

Drug Delivery

Gene Therapy

Machine Learning and Deep Learning Approaches for Robotics …

325

6.3 Localizations Based on ML Applications Locating objects within a predetermined area is a process called localization. Machine learning-based localization has been used in a wide variety of applications. Simultaneous Localization and Mapping to pedestrian localization systems are all examples of localization systems. Numerous fields, including autonomous driving, indoor robotics, and building surveying and mapping, utilize SLAM. The two parts of SLAM are indoor and outdoor. A SLAM system typically consists of front-end sensors to gather information about unknown scenes and back-end algorithms to map the scene and pinpoint the sensor’s location within it. Fringe projection profilometry (FPP), a technique that has demonstrated good accuracy performance and speed, is one of the optical meteorological tools utilized in SLAM [60]. Simple hardware configuration, flexible application, and dense point cloud gathering are benefits of FPP. Front-End FPP technique: the system consists of a projector and a camera, and it projects coded fringe patterns while simultaneously capturing height-modulated fringe patterns with the camera. Using transform-based or phase-shifting algorithms, the desired phase is then determined from the collected patterns. In order to gather the obsolete phase since the obtained computation is warped into the region of (π, π), a phase unwarping technique is needed. With the help of system calibration and 3-D reconstruction, the obsolete phase creates the picture correlation between the projector and camera, which then yields a 3-D shape [60]. The wrapped phase and obsolete phase are used in phase-shifting to achieve system calibration and 3-D reconstruction. The fringe pattern can be represented as follows using a phase-shifting algorithm:







 In u c , vc = a u c , vc + b u c , vc cos ϕ u c , vc − δn , n = 1, 2, . . . N

(25)

where: u c , vc are the pixel coordinates, N represents phase steps. δn = 2π(n − 1)/N representing the phase shift a, bandϕ represents the background. The desired phase is then calculated using the least-square algorithm represented in the Eq. (26).   N c c c c n=1 [In (u , v )sin(δn )] ϕ u , v = arctan  N c c n=1 [In (u , v )cos(δn )]

(26)

A gray code-based phase un-wrapping technique, which may be arrived at theoretically via the Eq. (27), is used to capture accurate absolute phase data.





u c , vc = ϕ u c , vc + K u c , vc × 2π

(27)

where K represents the fringe order. The mapping is in form of 3D points in a form of a matrix with coordinate frame (x w , y w , z w ) represented by the following equation.

326

L. E. Alatabani et al.

⎤ xw u ⎢ yw ⎥ ⎥ s c ⎣ v c ⎦ = Ac ⎢ ⎣ zw ⎦ 1 1 ⎡

c





⎡ ⎤ xw c c c c a11 a12 a13 a14 ⎢ w ⎣ a c a c a c a c ⎦⎢ y 21 22 23 24 ⎣ w z c c c c a31 a32 a33 a34 ⎡

⎤ ⎥ ⎥ ⎦

(28)

⎡ w⎤ p p p p ⎤ x a11 a12 a13 a14 ⎢ w ⎥ p p p p ⎦⎢ y ⎥ = ⎣ a21 a22 a23 a24 ⎣ w ⎦ z p p p p a31 a32 a33 a34 1

(29)

=

1







⎤ xw up ⎢ yw ⎥ ⎥ s p ⎣ v p ⎦ = Ac ⎢ ⎣ zw ⎦ 1 1 ⎡

where: c is the camera and p are the projector. s = scaling factor. A = the resultant product of the intrinsic and extrinsic matrix which is represented as follows;   Ac = I c × R c |T c ⎤ ⎡ c c c c⎤ ⎡ c f x 0 u c0 r11r12 r13 t1 c c c c⎦ = ⎣ 0 f yc v0c ⎦ = ⎣ r21 r22 r23 t2 c c c c r31 r32 r33 t3 0 0 1   A p = I p × R p |T p ⎡ p p p p⎤ ⎡ p p⎤ r11r12 r13 t1 fx 0 u0 = ⎣ 0 f yp v p ⎦ = ⎣ r p r p r p t p ⎦ 0

0

0

1

21 22 23 2 p p p p r31r32 r33 t3

(30)

(31)

where: I is the intrinsic parameter, f x , f y = projector focal, u 0 , v0 = principal coordinate. [R|T] = extrinsic matrix. Hence, the matching of absolute phase can be satisfied through the equation given as follows, up =

(u c , vc ) × λ 2π

(32)

Which is the matching between the points from the camera and projector. The back-end: Following the collection of high-quality data, rapid and accurate mapping is required. This is accomplished by using a registration technique based on

Machine Learning and Deep Learning Approaches for Robotics …

327

a 2D to 3D descriptor. By resolving the transformation matrix, which uses the coordinate transformation to convert the affine transformation between 2D feature points to a 3D point cloud to carry out the registration, accurate mapping is accomplished. Speeded Up Robust Features (SURF) algorithm is used to extract 2D matching points, resulting in a 2D transformation matrix [60, 61]. The 2D transformation matrix is then converted to a 3D transformation matrix, after which the corresponding 3D points are extracted. Next, we acquire the transformation matrix joined with the initial prior. Finally, we perform the cloud registration by applying the output transformation matrix from the previous step [62]. The SURF algorithm can be represented mathematically by the following equation. min

n 





2

P1 − R P2 − T 

(33)

i=1

where: P1 and P2 represent the 3D data P1: (xw1 , yw1 , z w1 ) and P2: (xw2 , yw2 , z w2 ) The pedestrian localization systems has been growing recently, machine learning has been applied to different types of pedestrian localization. There is a tendency to apply supervised learning and scene analysis in pedestrian localization aspects for its accuracy. The use of DL as a branch is implemented for its high processing capacity. Scene analysis is the most frequency used ML approach in pedestrian localization for its easy implementation and fair performance [61, 62].

6.4 Control of Dynamic Traffic Robots Automated guided vehicles (AGV) have developed into autonomous mobile robots as a result of recent breakthroughs in robotic applications (AMR). To get to the vision-based system we have today, the main component of AGV material handling systems has advanced through numerous mechanical, optical, inductive, inertial, and laser guided milestones [63]. The technologies that improve the performance of these systems include sensors, strong on-board processors, artificial intelligence, simultaneous location and mapping, and strong on-board processors. The robot can comprehend the workplace thanks to these technologies. AMRs have AI algorithms applied to improve their navigation; they travel on their own through unpredictable terrain. Machine learning (ML) methods can be used to identify and categorize obstacles. A few examples include fuzzy logic, neural networks, genetic algorithms, and neuron fuzzy. To move the robot from one place to another while avoiding collisions, all of these techniques are routinely used. The ability of the brain to do specific tasks serves as the source of inspiration for these strategies [63, 64]. For example, if we take into consideration a dual-arm robot, we may construct and analyze a control algorithm for the oscillation, position, and speed control of the dual-arm robot. This required the usage of a dynamic system. The system design incorporates time delay control and pole placement-based feedback

328

L. E. Alatabani et al.

control for the control of oscillation (angular displacement), precise position control, and speed control, respectively.

7 Robotics Applications Challenges As robots are employed in homes, offices, the healthcare industry, operating automobiles, and education, robotics applications have become a significant part of our life. In order to improve performance, including accuracy, efficiency, and security, it is increasingly common to deploy bots to integrate several applications utilizing machine learning techniques [65]. Open difficulties in AI for robots include the type of learning to be used, ML application, ML architecture, standardization, and incorporation of other performance evaluation criteria in addition to accuracy [66]. Exploring different learning approaches would be beneficial for performance and advancement, even though supervised learning is the most typical type of learning in robotic applications [67]. Use ML tools to solve issues brought on by wireless connectivity, which increases multipath and lowers system performance. Bots are adopting DL architectures more frequently, particularly for localization, but their use is restricted since DL designs require a significant amount of difficult-to-obtain training data. To analyze the efficacy of ML systems, it is crucial to identify best practices in these fields and take into account alternative evaluation criteria because standard performance evaluation criteria are constrained [68]. There are many uses for machine learning, including in forensics, energy management, health, and security. Since they are evolving so quickly, new trends in robotics and machine learning require further study. Among the trends are end-to-end automated, common, and continuous DL approaches for data-driven intelligent systems. Technologies that are quickly evolving, such as 5G, cloud computing, and blockchain, offer new opportunities to improve the system as a whole [69, 70]. Issues with user security, privacy, and safety must be resolved. Black box smart systems have opportunities in AI health applications because to their low deployment costs, rapid performance, and accuracy [71, 72]. These applications will aid in monitoring, rehabilitation, and diagnosis. The future research trends also includes, • AI algorithms which offer essential role in data analytics and decision making for robotics operations. • IT infrastructure such as 5G which plays an integral role with low latency, high traffic, and fast connection for robotic based Industrial Applications. • Human–robot Interaction HRC gained famous reputation lately in health applications during the pandemic. • Big data and cloud-based applications are expected to accelerate in the coming years to be applied with robotics for their powerful analytics that helps the decision-making process.

Machine Learning and Deep Learning Approaches for Robotics …

329

8 Conclusions Machine learning provides effective solutions in several areas such as robotics. ML enabled to develop an autonomous robots for some applications related to industrial automation, the Internet of Things, and autonomous vehicles. It has also contributed significantly to the medical field. When ML is combined with other technology such as computer vision, this can develop a number of improved applications that serve healthcare with the possibility of high classification efficiency and the development of a generation of robots that have the ability to move and interact with patients according to their movement and gestures [73]. ML based robotics applications to add value to the performance of robotics systems through the use of recommender systems which is used in marketing by analyzing user behavior to recommend certain choices, Nano health applications which introduced Nano size robotics to perform certain health tasks in treatment and diagnoses, localization to determine the location of certain items or users within a certain area, dynamic traffic using ML improved the movement and manipulation of robots by obstacle avoidance and smooth movements [74, 75]. The challenges of robotics applications can be solved by extended research in areas that are considered as trending issues which will give promising solutions and improvements to the currently available issues and challenges.

References 1. Saeed, M., OmriS., Abdel-KhalekE. S., AliM. F., & Alotaibi M.: Optimal path planning for drones based on swarm intelligence algorithm. Neural Computing and Applications, 34, 10133– 10155 (2022). https://doi.org/10.1007/s00521-022-06998-9. 2. Niko, S., et al. (2018). The limits and potentials of deep learning for robotics. The International Journal of Robotics Research, 37(4), 405–420. https://doi.org/10.1177/0278364918770733 3. Ali, E. S., Zahraa, T., & Mona, H. (2021). Algorithms optimization for intelligent IoV applications. In J. Zhao & Vinoth K. (Eds.), Handbook of Research on Innovations and Applications of AI, IoT, and Cognitive Technologies (pp. 1–25). Hershey, PA: IGI Global (2021). https://doi. org/10.4018/978-1-7998-6870-5.ch001 4. Matt, L, Marie, F, Louise, A., Clare, D, & Michael, F. (2020). Formal specification and verification of autonomous robotic systems: A survey. ACM Computing Surveys, 52I(5), 100, 1–41. https://doi.org/10.1145/3342355. 5. Alexander, L., Konstantin, M., & Pavol. B. (2021). Convolutional Neural Networks Training for Autonomous Robotics, 29, 1, 75–79. https://doi.org/10.2478/mspe-2021-0010. 6. Hassan, M., Mohammad, H., Othman, O., & Aisha, H. (2022). Performance evaluation of uplink shared channel for cooperative relay based narrow band internet of things network. In 2022 International Conference on Business Analytics for Technology and Security (ICBATS). IEEE. 7. Fahad, A., Alsolami, F., & Abdel-Khalek, S. (2022). Machine learning techniques in internet of UAVs for smart cities applications. Journal of Intelligent & Fuzzy Systems, 42(4), 3203–3226 8. Salih, A., & Sayed A.: Machine learning in cyber-physical systems in industry 4.0. In A. Luhach & E. Atilla (Eds.), Artificial Intelligence Paradigms for Smart Cyber-Physical Systems. (pp. 20–41). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-7998-5101-1.ch002. 9. Gruber, T. R. (1995). Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies, 43, 907–928.

330

L. E. Alatabani et al.

10. Lim, G., Suh, I., & Suh, H. (2011). Ontology-Based unified robot knowledge for service robots in indoor environments. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 41, 492–509. 11. Mohammed, D., Aliakbar, A., Muhayy, U., & Jan, R. (2019). PMK—A knowledge processing framework for autonomous robotics perception and manipulation. Sensors, 19, 1166. https:// doi.org/10.3390/s19051166 12. Wil, M., Martin, B., & Armin, H. (2018). Robotic Process Automation, Springer Fachmedien Wiesbaden GmbH, part of Springer Nature (2018) 13. Aguirre, S., & Rodriguez, A. (2017). Automation of a business process using robotic process automation (RPA): A case study. Applied Computational Science and Engineering Communications in Computer and Information Science. 14. Ilmari, P., & Juha, L. (2021). Robotic process automation (RPA) as a digitalization related tool to process enhancement and time saving. Research. https://doi.org/10.13140/RG.2.2.13974. 68161 15. Mona, B., & Sayed, A. (2021). Intelligence IoT Wireless Networks. Intelligent Wireless Communications, IET Book Publisher. 16. Niall, O. et al. (2020). In K. Arai & S. Kapoor (Eds.), Deep Learning versus Traditional Computer Vision. Springer Nature Switzerland AG 2020: CVC 2019, AISC 943 (pp. 128–144). https://doi.org/10.1007/978-3-030-17795-9_10. 17. Othman, O., & Muhammad, H. et al. (2022). Vehicle detection for vision-based intelligent transportation systems using convolutional neural network algorithm. Journal of Advanced Transportation, Article ID 9189600. https://doi.org/10.1155/2022/9189600. 18. Ross, G., Jeff, D., Trevor, D., & Jitendra, M. (2019). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158. 19. Ian, G., Yoshua, B., & Aaron. C. (2016). Deep Learning (Adaptive Computation and Machine Learning series) Deep Learning. MIT Press. 20. Macaulay, M. O., & Shafiee, M. (2022). Machine learning techniques for robotic and autonomous inspection of mechanical systems and civil infrastructure. Autonomous Intelligent Systems, 2, 8. https://doi.org/10.1007/s43684-022-00025-3 21. Khan, S., Rahmani, H., Shah, S. A. A., Bennamoun, M. (2018). A guide to convolutional neural networks for computer vision. Springer. https://doi.org/10.2200/S00822ED1V01Y201712CO V01. 22. Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32, 1238–1274. 23. Bakri, H., & Elmustafa, A., & Rashid, A.: Machine learning for industrial IoT systems. In J. Zhao & V. Vinoth Kumar, (Eds.), Handbook of Research on Innovations and Applications of AI, IoT, and Cognitive Technologies (pp. 336–358). Hershey, PA: IGI Global, (2021). https:// doi.org/10.4018/978-1-7998-6870-5.ch023 24. Luong, N. C., Hoang, D. T., Gong, S., Niyato, D., Wang, P., Liang, Y. C., & Kim, D. I. (2019). Applications of deep reinforcement learning in communications and networking: A survey. IEEE Communications Surveys Tutorials, 21, 3133–3174. 25. Chen, Z., & Huang, X. (2017). End-to-end learning for lane keeping of self-driving cars. In 2017 IEEE Intelligent Vehicles Symposium (IV) (pp. 1856–1860). https://doi.org/10.1109/IVS. 2017.7995975. 26. Jiang, H., Liangcai, Z., Gongfa, L., & Zhaojie, J. (2021). Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning, learning for a robot: Deep reinforcement learning, imitation Learning. Transfer Learning. Sensors, 21, 1278. https://doi.org/10.3390/ s21041278 27. Yan, W., Cristian, C., Beltran, H., Weiwei, W., & Kensuke, H. (2022). An adaptive imitation learning framework for robotic complex contact-rich insertion tasks. Frontiers in Robotics and AI, 8, 90–123. 28. Ali, E. S., Hassan, M. B., & Saeed, R. (2020). Machine learning technologies in internet of vehicles. In: M. Magaia, G. Mastorakis, C. Mavromoustakis, E. Pallis & E. K Markakis (Eds.),

Machine Learning and Deep Learning Approaches for Robotics …

29.

30.

31. 32.

33. 34.

35.

36.

37.

38. 39.

40.

41.

42.

43.

44.

45.

46.

331

Intelligent Technologies for Internet of Vehicles. Internet of Things. Cham: Springer. https:// doi.org/10.1007/978-3-030-76493-7_7. Alatabani, L. E., Ali, E. S., & Saeed, R. A. (2021). Deep learning approaches for IoV applications and services. In: N. Magaia, G. Mastorakis, C. Mavromoustakis, E. Pallis, E. K. Markakis (Eds.), Intelligent Technologies for Internet of Vehicles. Internet of Things. Cham: Springer. https://doi.org/10.1007/978-3-030-76493-7_8 Lina, E., Ali, E., & Mokhtar A. et al. (2022). Deep and reinforcement learning technologies on internet of vehicle (IoV) applications: Current issues and future trends. Journal of Advanced Transportation, Article ID 1947886. https://doi.org/10.1155/2022/1947886. Venator, M. et al. (2021). Self-Supervised learning of domain-invariant local features for robust visual localization under challenging conditions. IEEE Robotics and Automation Letters, 6(2). Abbas, A., Rania, A., Hesham, A. et al. (2021). Quality of services based on intelligent IoT wlan mac protocol dynamic real-time applications in smart cities. Computational Intelligence and Neuroscience, 2021, Article ID 2287531. https://doi.org/10.1155/2021/2287531. Maibaum, A., Bischof, A., Hergesell, J., et al. (2022). A critique of robotics in health care. AI & Society, 37, 467–477. https://doi.org/10.1007/s00146-021-01206-z Yanxue, C., Moorhe, C., & Zhangbo, X. (2021). Artificial intelligence assistive technology in hospital professional nursing technology. Journal of Healthcare Engineering, Article ID 1721529, 7 pages. https://doi.org/10.1155/2021/1721529. Amanda, P., Jan, B., & Qingbiao, L. (2022). The holy grail of multi-robot planning: Learning to generate online-scalable solutions from offline-optimal experts. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022). Lorenzo, C., Gian, C., Cardarilli, L., et al. (2021). Multi-agent reinforcement learning: A review of challenges and applications. Applied Sciences, 11, 4948. https://doi.org/10.3390/app 11114948 Mahbuba, A., Jiong, J., Akhlaqur, R., Ashfaqur, R., Jiafu, W., & Ekram, H. (2021). Resource allocation and service provisioning in multi-agent cloud robotics: A comprehensive survey. Manuscript. IEEE. Retrieved February 10, 2021. Wang, Y., Damani, M., Wang, P., et al. (2022). Distributed reinforcement learning for robot teams: A review. Current Robotics Reports. https://doi.org/10.1007/s43154-022-00091-8 Elfatih, N. M., et al. (2022). Internet of vehicle’s resource management in 5G networks using AI technologies: Current status and trends. IET Communications, 16, 400–420. https://doi.org/ 10.1049/cmu2.12315 Edmund, J., Greg, F., David, M., & David, W. (2021). The segmented colour feature extreme learning machine: applications in agricultural robotics. Agronomy, 11, 2290. https://doi.org/10. 3390/agronomy11112290 Rodrigues, I. R., da Silva Neto, S. R.,Kelner, J., Sadok, D., & Endo, P. T. (2021). Convolutional extreme learning machines: A systematic review. Informatics 8, 33. https://doi.org/10.3390/inf ormatics8020033. Jianwen, G., Xiaoyan, L, Zhenpeng, I., & Yandong, L. et al. (2021). Fault diagnosis of industrial robot reducer by an extreme learning machine with a level-based learning swarm optimizer. Advances in Mechanical Engineering 13(5), 1–10. https://doi.org/10.1177/168781402 11019540 Ali, Z., Lorena, D., Saleh, G., Bernard, R., Akif, K., & Mahdi, B. (2021). 4D printing soft robots guided by machine learning and finite element models. Sensors and Actuators A: Physical, 322, 112774. Elmustafa, S. et al. (2021). Machine learning technologies for secure vehicular communication in internet of vehicles: Recent advances and applications. Security and Communication Networks, Article ID 8868355. https://doi.org/10.1155/2021/8868355. Ho, S., Banerjee, H., Foo, Y., Godaba, H., Aye, W., Zhu, J., & Yap, C. (2017). Experimental characterization of a dielectric elastomer fluid pump and optimizing performance via composite materials. Journal of Intelligent Material Systems and Structures, 28, 3054–3065. Sarthak, B., Hritwick, B., Zion, T., & Hongliang, R. (2019). Deep reinforcement learning for soft, flexible robots: brief review with impending challenges. Robotics, 8, 4. https://doi.org/10. 3390/robotics8010004

332

L. E. Alatabani et al.

47. Estifanos, T., & Mihret, M.: Robotics and artificial intelligence. International Journal of Artificial Intelligence and Machine Learning, 10(2). 48. Andrius, D., Jurga, S., Žemaitien, E., & Ernestas, Š. et al. (2022). Advanced applications of industrial robotics: New trends and possibilities. Applied Science, 12, 135. https://doi.org/10. 3390/app12010135. 49. Elmustafa, S. A. A., & Mujtaba, E. Y. (2019). Internet of things in smart environment: Concept, applications, challenges, and future directions. World Scientific News (WSN), 134(1), 151. 50. Ali, E. S., Sohal, H. S. (2017). Nanotechnology in communication engineering: Issues, applications, and future possibilities. World Scientific News (WSN), 66, 134-148. 51. Reham, A. A., Elmustafa, S. A., Rania, A. M., & Rashid, A. S. (2022). Blockchain for IoT-Based cyber-physical systems (CPS): Applications and challenges. In: D. De, S. Bhattacharyya, & Rodrigues, J. J. P. C. (Eds.), Blockchain based Internet of Things. Lecture Notes on Data Engineering and Communications Technologies (Vol. 112). Singapore: Springer. https://doi. org/10.1007/978-981-16-9260-4_4. 52. Zhang, Q, Lu, J., & Jin, Y. (2020). Artificial intelligence in recommender systems. Complex & Intelligent Systems. Retrieved September 28, 2020 from, https://doi.org/10.1007/s40747-02000212-w. 53. Abdalla, R. S., Mahbub, S. A., Mokhtar, R. A., Elmustafa, S. A., Rashid, A. S. (2021). IoE design principles and architecture. In Book: Internet of Energy for Smart Cities: Machine Learning Models and Techniques. USA: Publisher: CRC group, Taylor & Francis Group. 54. Hyeyoung, K., Suyeon, L., Yoonseo, P., & Anna, C. (2022). A survey of recommendation systems: recommendation models, techniques, and application fields, recommendation systems: recommendation models, techniques, and application fields. Electronics, 11, 141. https:// doi.org/https://doi.org/10.3390/electronics11010141. 55. Yuanyuan, C., Dixiao, C., Shuzhang, L. et al. (2021). Recent advances in field-controlled micro– nano manipulations and micro–nano robots. Advanced Intelligent Systems, 4(3), 2100116, 1–23 (2021). https://doi.org/10.1002/aisy.202100116, 56. Mona, B., et al. (2021). Artificial intelligence in IoT and its applications. Intelligent Wireless Communications, IET Book Publisher. 57. Neto, A., Lopes, I. A., & Pirota, K. (2010). A Review on Nanorobotics. Journal of Computational and Theoretical Nanoscience, 7, 1870–1877. 58. Gautham, G., Yaser, M., & Kourosh, Z. (2021). A Brief review on challenges in design and development of nanorobots for medical applications. Applied Sciences, 11, 10385. https://doi. org/10.3390/app112110385 59. Egorov, E., Pieters, C., Korach-Rechtman, H., et al. (2021). Robotics, microfluidics, nanotechnology and AI in the synthesis and evaluation of liposomes and polymeric drug delivery systems. Drug Delivery and Translational Research, 11, 345–352. DOI: 10.1007/s13346-02100929-2. 60. Yang, Z., Kai, Z., Haotian, Y., Yi, Z., Dongliang, Z., & Jing, H. (2022). Indoor simultaneous localization and mapping based on fringe projection profilometry 23, arXiv:2204.11020v1 [cs.RO]. 61. Miramá, V. F., Díez, L. E., Bahillo, A., & Quintero, V. (2021). A survey of machine learning in pedestrian localization systems: applications, open issues and challenges. IEEE Access, 9, 120138–120157. https://doi.org/10.1109/ACCESS.2021.3108073 62. Tian, Y., Adnane, C., & Houcine, C. (2021). A survey of recent indoor localization scenarios and methodologies. Sensors, 21, 8086. https://doi.org/10.3390/s21238086 63. Giuseppe, F., René, D., Fabio, S., Strandhagen, J. O. (2021). Planning and control of autonomous mobile robots for intralogistics: Literature review and research agenda European Journal of Operational Research, 294(2), (405–426). https://doi.org/10.1016/j.ejor.2021.01. 019. Published 2021. 64. Alfieri, A., Cantamessa, M., Monchiero, A., & Montagna, F. (2012). Heuristics for puzzle-based storage systems driven by a limited set of automated guided vehicles. Journal of Intelligent Manufacturing, 23(5), 1695–1705.

Machine Learning and Deep Learning Approaches for Robotics …

333

65. Ahmad, B., Xiaodong, Z., & Haiming, S. et al. (2022). Precise motion control of a power line inspection robot using hybrid time delay and state feedback control. Frontiers in Robotics and AI 9(24). https://doi.org/10.3389/frobt.2022.746991. 66. Elsa, J., Hung, K., & Emel, D. (2022). A survey of human gait-based artificial intelligence applications. Frontiers in Robotics and AI, 8. https://doi.org/10.3389/frobt.2021.749274. 67. Xi, V., & Lihui, W. (2021). A literature survey of the robotic technologies during the COVID19 pandemic. Journal of Manufacturing Systems, 60, 823–836. https://doi.org/10.1016/j.jmsy. 2021.02.005 68. Ahir, S., Telavane, D., & Thomas, R. (2020). The impact of artificial intelligence, blockchain, big data and evolving technologies in coronavirus disease-2019 (COVID-19) curtailment. In: Proceedings of the International Conference of Smart Electronics Communication ICOSEC 2020 (pp. 113–120). https://doi.org/10.1109/ICOSEC49089.2020.9215294. 69. Lana, I. S., Elmustafa, S., & Saeed, A. (2022). Machine learning in healthcare: Theory, applications, and future trends. In R. El Ouazzani & M. Fattah & N. Benamar (Eds.), AI Applications for Disease Diagnosis and Treatment (pp. 1–38). Hershey, PA: IGI Global, (2022). https://doi. org/10.4018/978-1-6684-2304-2.ch001 70. Jat, D., & Singh, C. (2020). Artificial intelligence-enabled robotic drones for COVID-19 outbreak. Springer Briefs Applied Science Technology, 37–46 (2020). DOI: https://doi.org/ 10.1007/978–981–15–6572–4_5. 71. Schulte, P., Streit, J., Sheriff, F., & Delclos, G. et al. (2020). Potential scenarios and hazards in the work of the future: a systematic review of the peer-reviewed and gray literatures. Annals of Work Exposures and Health, 64, 786–816, (2020), DOI: https://doi.org/10.1093/annweh/wxa a051. 72. Alsolami, F., Alqurashi, F., & Hasan, M. K. et al. (2021). Development of self-synchronized drones’ network using cluster-based swarm intelligence approach. IEEE Access, 9, 48010– 48022. https://doi.org/10.1109/ACCESS.2021.3064905. 73. Alatabani, L. E., Ali, E. S., Mokhtar, R. A., Khalifa, O. O., & Saeed, R. A. (2022). Robotics architectures based machine learning and deep learning approaches. In 8th International Conference on Mechatronics Engineering (ICOM 2022), Online Conference, Kuala Lumpur, Malaysia (pp. 107–113). https://doi.org/10.1049/icp.2022.2274. 74. Malik, A. A., Masood, T., & Kousar, R. (2020). Repurposing factories with robotics in the face of COVID-19. IEEE Transactions on Automation Science and Engineering, 5(43), 133–145. https://doi.org/10.1126/scirobotics.abc2782. 75. Yoon, S. (2020). A study on the transformation of accounting based on new technologies: Evidence from korea. Sustain, 12, 1–23. https://doi.org/10.3390/su12208669

A Review on Deep Learning on UAV Monitoring Systems for Agricultural Applications Tinao Petso and Rodrigo S. Jamisola Jr

Abstract In this chapter we present literature review on UAV monitoring systems that utilized deep learning algorithms to ensure improvement on plant and animal production yields. This work is important because of the growing world population and thus increased demand for food production, that threaten food security and national economy. Hence the need to ensure sustainable food production that is made more complicated with the advent of global warming, occupational preference for food consumption and not food production, diminishing land and water resources. We choose to consider studies that utilize the UAV platform to collect images compared to satellite because UAVs are easily available, cheaper to maintain, and the collected images can be updated at any time. Previous studies with research foci on plant and animal monitoring are evaluated in terms of their advantages and disadvantages. We looked into different deep learning models and compared their model performances in using various types of drones and different environmental conditions during data gathering. Topics on plant monitoring include pest infiltration, plant growth, fruit conditions, weed invasion, etc. While topics on animal monitoring include animal identification and animal population count. The monitoring systems used in previous studies utilize computer vision that include processes such as image classification, object detection, and segmentation. It aids in increasing efficiency, high accuracy, automatic, and intelligent system for a particular task. The recent advancements in deep learning models and off-the-shelf drones open more opportunities with lesser costs and faster operations in most agricultural monitoring applications. Keywords Deep learning algorithms · UAV monitoring systems · Animal agricultural applications · Plant agricultural applications · Computer vision · Literature review

T. Petso (B) · R. S. Jamisola Jr Botswana International University of Science and Technology, Palapye, Botswana e-mail: [email protected] R. S. Jamisola Jr e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_11

335

336

T. Petso and R. S. Jamisola Jr

1 Introduction Agriculture significantly contributes to the growth of an individual country’s economy [95, 125]. The rapid increase of the growing population and negative challenges of changing environmental conditions to name a few, cultivates the need to research technological applications to enhance sustainable agriculture productivity with better monitoring strategies [114]. The capability to employ remote sensing technology had greatly advanced precision agriculture. Precision agriculture is the capacity to make appropriate decision at the ideal time to evaluate the agricultural data [136]. It optimizes the input elements, such as water, fertilizer to enhance productivity, quality and yield and minimize pests and diseases through specific targeted application and appropriate amount and time to crop and livestock management [73, 81]. An efficient plant monitoring is important as excessive usage of chemicals such as pesticides and fertilizers can destroy the fertility of the soil [110]. In animal monitoring an inappropriate usage of antibiotics or zoonosis can harm human health [81]. The implementation of precision agriculture ensures an efficient approach to food security in the world [112]. The conventional open field farming had been used for many years, it has been established to suffer environmental conditions that hamper agricultural production hence extensive monitoring is needed [116]. Drones had been identified to be an ideal to complement the existing agricultural monitoring applications for effective and better production yield [94]. The common reason for using drones for agricultural application is the capability of their remote sensing [77]. The usage of drones have replaced satellites which are deemed to be costly and with low performance in efficiency and accuracy as compared to drones [76, 93]. They also hold the capability to access inaccessible agricultural areas [82]. A controlled environmental condition farming also known as indoor farming holds great benefits such as reduced labour and increased production yield for both plant and livestock [18]. The challenge of crop diseases and pests is also reduced [38]. It has been established that indoor drone operation is a challenge hence the investigation in the greenhouse and dairy farm barn was conducted [59]. From this study, it had been established to be effective with the implementation of visual simultaneous localization and mapping (VSLAM). This approach can assist in real-time agricultural monitoring to aid in precision farming. It was established that drones reduce labour and time, hence saving costs [30]. Drones had also been established to increase agricultural productivity, efficiency and sustainability [92, 96, 121]. Drones hold the capability to provide near real-time information for the farmers to make an appropriate decision in agricultural management [10]. These include crop as well as livestock conditions. Deep learning has been applied in several agricultural application fields such as automatic diseases detection, pest detection, weed detection, fruit detection, livestock detection, to name a few [39, 71, 105, 122]. Its greatest attribute is the capability of automatic feature extraction [126]. Recently an increase in deep learning applications had been investigated due several aspects, such as affordability of a graphic process-

A Review on Deep Learning on UAV Monitoring Systems …

337

Fig. 1 DJI Phantom 3 SE performing fruit condition monitoring

ing unit (GPU) hardware and advancements of deep learning algorithms architectures in the computer vision domain [36]. There are several drone agricultural monitoring applications an example application is illustrated in Fig. 1.

The application of drones with deep learning algorithms improves agricultural management efforts [13]. The combination of drones and deep learning algorithms aids as a powerful tool to agricultural applications to ensure food security [20, 57]. They have proven to hold promising near real-time agricultural monitoring capability for better production yield as compared to human effort [20]. Crop diseases and pest infections had been established to cause signification loss to agricultural production yields [34]. The loss of pest infections alone can result in as much as 30–33% of crop production in a year [126]. The study conducted by [7] used drones for automatic early detection of crop diseases to prevent future spreading. The advancements in unmanned systems for automated livestock monitoring had been established to be a promising alternative approach to better production yield [75]. The implementation of drones and deep learning algorithms holds some limitations, such as image quality due to elements like drone altitude, shadows and natural occlusion. The continuous advancing technological and hardware improvements highlight the need to periodically review the deep learning algorithms with drone-based systems and establish vital potential research directions. This chapter strives to achieve the following objectives: • Identification of research studies with contributions to drone application incorporation with deep learning strategies in agricultural applications

338

T. Petso and R. S. Jamisola Jr

• Identification of research focus, advantages and disadvantages • Identification of possible future research areas This chapter provides researchers with fundamental basics into the usage of drones with deep learning strategies for agricultural applications and directions for future research. It also guides future research works and highlights UAV operational capabilities, advantages and disadvantages of using deep learning models. The rest of the chapter is organised as follows: Sect. 2 outlines the methodology and the basic steps involved in the development of UAV agricultural monitoring applications. Sections 3 and 4 present findings on applications of deep learning models in plant and animal monitoring, respectively. Section 5 entails a discussion of the findings with contributions and advantages and disadvantages. Finally, Sect. 6 concludes the chapter and highlight possible future research areas.

2 Proposed Methodology In this literature review study, we perform analysis to explore the use of drones for different agricultural application through the use of deep learning strategies. Google Scholar was mainly used as a search engine for data collection. The keywords that were used include “animal agricultural applications”, “computer vision”, “deep learning algorithms”, “plant agricultural application”, and “UAV monitoring system”. The initial step was data collection of related work and secondly detailed review of that research work. The approach reveals a detailed overview deep learning strategies, advantages, disadvantages, and future research ideas that can be exploited in agriculture.

2.1 An Overview of Deep Learning Strategies used in Agriculture In recent years, advancements in deep learning has tremendous achievements in agricultural applications such as saving labour cost, time, and expertise needed [72]. Deep learning architectures passes the input data through several layers, each layer it passes is capable to extract features along the way [74]. Figure 2 highlights feature extraction and classification process. Deep learning based models are made up of different elements such as convolution layers, pooling layers, activation functions dependent upon the the architecture [51]. Each architecture holds the pros and cons, hence an appropriate selection is needed for a specific task [90]. Most of the architectures are pre-trained from a larger dataset for agricultural applications and this process is known as transfer learning [31, 128]. Deep learning architecture training time is gradually reduced by the use of transfer learning. The two most common types of detectors include one stage and two

A Review on Deep Learning on UAV Monitoring Systems … Convolution layer

Convolution layer

339 Fully connected

Output layer

Input layer

Pooling layer

Pooling layer Feature extraction and classification

Fig. 2 The basic architecture of the convolutional neural network

stage detectors. The difference between the two is a trade-off of detection speed and accuracy. The one stage detectors hold the capability for high detection speeds as compared to two stage detectors [63]. These include single shot detector (SSD), you only look once (YOLO) models, RetinaNet, to name a few [87]. The two stage detectors have the capability of high detection accuracy compared to one stage detectors [26]. Some of these includes Region convolutional neural network (R-CNN), Fast region-based convolutional neural network (Fast R-CNN), Faster R-CNN, Mask R-CNN, to name a few [102]. The five basic steps for the development of UAV agricultural monitoring application includes UAV platform, data collection, deep learning model, performance evaluation and agricultural application (Fig. 3).

UAV Platform The most three common types of drones used for agricultural applications include rotary-wing, fixed-wing, and fixed-wing hybrid Vertical Take-Off and Landing (VTOL). These types hold different advantages and disadvantages. Rotary wing drones are capable of flying at low altitude, hover and highly manoeuvrable hence beneficial for image related agricultural monitoring. Their greatest challenge is low endurance due to high power usage. Fixed-wing drones have high endurance and payload capability which can be used for agricultural pesticide spraying. Fixed-wing hybrid is a combination of rotary and fixed-wing characteristics. They hold the needed attributes for agricultural applications such as hovering, high endurance, and better manoeuvrability. Data Collection The images collected for the development of deep learning agricultural monitoring systems are obtained using remote sensing from the drone. The limited public datasets for agricultural drone videos and images for development of deep learning algorithms

340 Fig. 3 Deep learning agricultural monitoring application development steps include data collection through a UAV platform, deep learning model selection, model performance evaluation and agricultural application through an intelligent robotic system

T. Petso and R. S. Jamisola Jr

Agricultural Application

Performance Evaluation

Deep Learning Model

UAV Platform

Data Collection

highlights the need to collect datasets. The remote sensing specification and drone altitude highly contributes to the image quality for monitoring systems capabilities. Different environmental conditions such as brightness, contrast, drone altitude and sharpness, to name a few are taken into account to ensure the development of a robust deep learning agricultural monitoring system. A sample drone altitude variation which constitutes model performance due to feature extraction characteristics are highlighted in Figs. 4 and 5. Deep Learning Model The selection of the deep learning algorithms used for agricultural applications is dependent upon the research objective and hardware capability. To ensure proper

Fig. 4 Low drone altitude capable for detailed features characteristics needed by deep learning model

A Review on Deep Learning on UAV Monitoring Systems …

341

Fig. 5 High drone altitude susceptible to limited feature characteristics

training the data augmentation, hyperparameters like an optimizer, batch size, learning rate are set to optimum results during model training. Data augmentation is primarily used to strengthen the size and quality of the training dataset, thus a robust agricultural deep learning model can be developed [113]. The process of collecting training dataset is prone to be expensive, thus data augmentation also has holds the capability to increase the limited training dataset [23]. The hyperparameters are mainly used to fine-tune the deep learning models for improved and better performance [83]. Performance Evaluation The four fundamental evaluation elements are graphically represented in Fig. 6. These feature elements are used to compute the performance evaluation metrics: precision, recall, f1 score, and accuracy. The Eqs. (1)–(6) presents their mathematical definitions. Other metrics commonly used in research studies are average precision (AP) and mean average precision (mAP). The following are the fundamental evaluation elements definitions; True positive (TP): model correctly detect a true object. True negative (TN): model correctly detects a false object. False positive (FP): model incorrectly detect a true object. False negative (FN): model incorrectly detect a false object. The mathematical deep learning performance evaluation metrics are as follows: Pr ecision (P) = Recall (R) =

TP T P + FP

TP T P + FN

(1)

(2)

342

T. Petso and R. S. Jamisola Jr

Fig. 6 Basic confusion matrix for model evaluation

Model Class Negative Positive

Positive

True Positive (TP)

False Negative (FN)

False Positive (FP)

True Negative (TN)

Actual Class

Negative

F1 Scor e = Accuracy (A) =

2P R P+R

(3)

TP +TN T P + T N + FP + FN

Average Pr ecision (A P) =

 Recalln − Recalln−1 Pr ecision n n

Mean Average Pr ecision (m A P) =

1  A P. N

(4)

(5)

(6)

Agricultural Applications The agricultural applications that are reviewed consider both plant and animal monitoring A graphically representation of this UAV monitoring systems are detailed in Fig. 7.

3 Findings on Applications of Deep Learning Models in Plant Monitoring The traditional agricultural monitoring systems to tackle challenges such as changing climate, food security, growing population had been established to be insufficient to address these requirements [118]. The recent advancement of automated UAV monitoring systems with deep learning models had proven to improve plant yield efficiently and minimum environmental damage [134]. Plant monitoring at different growth stages is an important attribute to ensure combating plant problems in time for

A Review on Deep Learning on UAV Monitoring Systems …

343

Pest Infiltration Plant Growth Plant/Crop Fruit Conditions Monitoring Weed Invasion Drone Monitoring

Crop Disease Detection

Systems

Livestock Livestock

Detection

Monitoring

Livestock Counting

Fig. 7 Agricultural applications

better management strategy [29]. It is vital for farmers to have proper yield estimates to enable preparation for harvesting and market supply projections [9]. Pest outbreaks in crops are unpredictable, continuous monitoring of them to prevent crop losses are of adamant importance [42]. The capability of pest detection in near real-time aids to make immediate and appropriate decision on time [25]. The implementation of deep learning algorithms with drones contribute to appropriate decision making at the right time for prompt plant monitoring tactics [125]. It has been established to increase agricultural productivity as well as efficiency and save costs [29]. The study conducted by [101] addressed insect monitoring through automated detection to protect soft-skinned fruits, it was established that to be cost-effective, less timeconsuming and labour-demanding as compared to the existing monitoring methods.

3.1 Pest Infiltration Pest infiltration greatly impacts the overall plant production yield [70]. They can cause a production yield loss of roughly 20–40% [49]. Due to the gradual climate change over the years pest outbreaks are common [84]. The usage of deep learning techniques with drone to combat pests in plants is a practical approach [120]. The study conducted by [25] automatically detected pest through deep learning technique (YOLO v3 tiny) near real-time. This capability enhanced the drone to spray the pest to the appropriate location thereby ensuring less destruction and improving production

344

T. Petso and R. S. Jamisola Jr

yield. The capability to detect, locate and quantify pest distribution with the aid of drones and deep learning techniques eliminates human labour which is costly and time-consuming [33]. The capability to detect pests during production minimise yield loss on time [24]. Rice is one of the primary agricultural produce in Asian countries, an effective monitoring is vital to ensure quality and high production yield [17, 58]. Table 1 highlight the advantages and disadvantages of a summary of studies with the use of UAV monitoring systems for automatic pest detection with the application deep learning models. The capability of deep learning models yield capability of near real-time plant monitoring. The mobile hardware required needed to achieve these needs a high graphical processing unit (GPU) to improve the model performance. Deep learning models hold great opportunities for better feature extraction capabilities vital for pest detection. An increase drone altitude and occlusion reduced the capability of automatic pest detection.

3.2 Plant Growth Agriculture has a great impact to the human survivability. Agricultural yield production during plant growth is ideal to ensure improvement for food planning and security. The ability to acquire agricultural data at different growth stages are vital for better farm management strategies [28]. The capability of near real-time monitoring for seedling germination is vital for early crop yield estimation [48]. The determination of plant flowing is vital for agricultural monitoring purposes [3]. It is also essential to establish the plant maturity at the appropriate time for decision-making purposes [79]. Thus, enabling yield prediction for better field management. The use of deep learning models with drones had been established to save costs and time as compared to the physical traditional approach [139]. The study conducted by [79] highlighted the effectiveness of saving time and costs for estimating the maturity of soybeans over time. The ability to attain pre-harvest information such as product number and size is important for management decision-making and marketing planning [127]. The conventional maize plant growth monitoring was established to be time demanding and labour intensive as compared to the use of deep learning with drones [88]. A study by [45] highlighted the capability of automatic oil palm not accessible to human to establish if they are ready for harvest or not. Table 2 highlights the capability of plant growth monitoring at different stages such as germination, flowering, immaturity and maturity for UAV agricultural applications. The capabilities of plant growth monitoring aids an effective approach to early decision making needed to ensure high production yield and better agricultural planning. Figure 8 demonstrates the capability of automatic seedling detection with lightweight deep learning algorithm from a UAV platform. Some factors such as occlusion due to plant growth, environmental background conditions, and environment thickness to name a few impact the overall performance of UAV monitoring systems. Hardware with graphic processing units are needed for the capability to aid near real-time essential for appropriate needed action.

A Review on Deep Learning on UAV Monitoring Systems …

345

Table 1 Advantages and disadvantages of UAV based pest infiltration monitoring applications Main focus Pests Advantages Disadvantages References Rice pest detection

Leafhopper, Instant evaluation Insect, Arthropod, and prevent rice Invertebrate, Bug damage in timely manner Coconut trees Rhinoceros Immediate pest pest detection beetle, Red palm monitoring weevil, Black capability headed caterpillar, Coconut Eriophyid mite, Termite Longan crop pest Tessaratoma Approach hold detection papillosa near real-time capability with optimum pest location and distribution Maize pest Spodoptera Early detection of detection frugiperda infected maize leaves in time Maize leaves affected by fall armyworms

Fall armyworm

Detect and measure leaf-cutting ant nest

Insects

Soybean crop pest detection

Acrididae, Anticarsia gemmatalis, Coccinellidae, Diabrotica speciosa, Edessa meditabunda, Euschistus heros adult, Euschistus heros nymph, Gastropoda, Lagria villosa, Nezara viridula adult, Nezara viridula nymph, Spodoptera spp.

Requires an internet platform

[17]

High rate Wi-Fi required

[24]

Limited power for GPU systems is a challenge

[25]

Some leaves under occlusion could not detect pest damage Capability of near Challenge to real-time results established exact location for future reference Deep learning Hardware with models high memory (YOLOv5s) (GPU) need outperformed the traditional multilayer perceptron neural network Deep learning Pest detection approach challenge at outsmarted the higher drone other feature altitudes extraction models

[33]

[43]

[106]

[120]

346

T. Petso and R. S. Jamisola Jr

Table 2 Advantages and disadvantages of plant growth monitoring applications Main focus Stage Advantages Disadvantages References Maize growth

Flowering

Sea cucumber distribution

Population growth

Peanut seedling detection and counting

Seedling emergence

Cotton seedling detection and count

Seedling emergence

Feasibility of automatic tassel detection The ability to monitor sea cucumber efficiently over a large piece of land The approach saved approximately 80% total time to count germinated seedling as compared to human effort Positive capability for accurate and timely cotton detection and count

Challenge for thin [3] tassel detection Challenge for sea [65] cucumber detection in a dense environment Requires a GPU hardware to achieve near real-time capability

Environmental conditions such as soil background, lighting, clouds, wind affects deep learning model performance Soybean maturity Maturity The approach Weather or soil aided making conditions can decision in lead to soybean maturity discrepancy over time results Individual Olive Tree biovolume Positive Fine resolution tree estimation estimation of provided better biovolume for detection automatic tree accuracy as growth compared to monitoring coarse resolution Rice phenology Germination, leaf Estimated harvest Challenge for development, time detection of rice tillering, stem corresponded phenology at elongation, well with early stage inflorescence expected harvest emergence and time heading, flowering and anthesis, development of fruit, ripening

[67]

[68]

[79]

[104]

[133]

A Review on Deep Learning on UAV Monitoring Systems …

347

Fig. 8 Sample seedling detection from a UAV platform with custom YOLO v4 tiny

3.3 Fruit Conditions The capability to estimate the fruit yield and location is important for farm management and planning purposes [45]. A combination drone with deep learning algorithms had been established to be effective method to locate the ideal areas to pick fruits [64]. The traditional approach has been determined to be time-consuming and labour demanding [130]. This approach enhances fruit detection in challenging terrains. The study conducted by [8] established automatic detection, counting and size of citrus fruits on an individual trees aided yield estimation. Another study by [129] highlighted the capability of automatic mango detection and estimation using UAV monitoring system with deep learning algorithms. The study conducted by [50] highlighted the melon detection, size estimation and location through UAV monitoring system to be less labour intensive. Table 3 highlights the positive capability and research gaps of UAV monitoring systems using deep learning algorithms for fruit condition monitoring. Some of the advantages highlighted includes effective approaches such as promising automatic counting accuracy as compared to the manual approach. Figure 9 illustrates the capability of automatic ripe lemon detection needed for harvesting planning purposes. High errors are contributed by the fruit tree canopy during fruit detection and counting yield estimation [78]. Environmental conditions such as occlusion, lighting consistent as existing challenges to fruit condition monitoring.

3.4 Weed Invasion The presence of weed holds great challenge for the overall agricultural production yield [80]. They are capable to cause a production loss to as much as 45% of the yield [99]. The presence of weeds also known as unwanted crops in the agricultural field

348

T. Petso and R. S. Jamisola Jr

Table 3 Advantages and disadvantages of fruit conditions monitoring applications Main focus Advantages Disadvantages References Fruit size and yield estimation

Strawberry yield prediction

Detection of oil palm loose fruits

Positive yield estimation of citrus fruits with average standard error of 6.59% Promising strawberry forecast monitoring system with counting accuracy of 84.1%

The capability of lightweight models aids in fruit detection at lower costs To detect and locate Capable to establish longan fruits an accurate location to pick longan fruits with a capability total time of 13.7 s Detection of green An effective mango mangoes yield estimation approach with estimation error rate of 1.1% Strawberry monitoring Efficient and effective approach to monitor strawberry plant growth Strawberry maturity To automatically monitoring detect strawberry flowering, immature and mature fruits

Lightning conditions, [8] distance to the citrus fruit to name a few affected the detection capability Misclassification of [27] mature and immature strawberry with dead leaves. Occlusion still exists for strawberries under leaves Some fruits are [47] subjected to occlusion

Occlusion from branches and leaves hampered the detection capability

[64]

Lighting variation affected the mango detection accuracy

[129]

False classification [135] and missed strawberry detection still exist Strawberry flowers [139] and fruits under leaves were not detected

competes for the needed resources such as sunlight, water, soil nutrition and growing space [107]. An effective approach for weed monitoring is vital for appropriate farm management [66]. Early weed detection is important to improve agricultural productivity and ensure sustainable agriculture [86]. The use of herbicides to combat weeds have negative consequences such as destruction to the environmental surrounding and harmful to the human health [11]. An appropriate quantity usage of herbicides and pesticides based from correct weed identification and their location is an important factor [41, 56]. Drones hold the capability of precise location and appropriate chemical usage [56]. The ability of near real-time weed detection is vital farm man-

A Review on Deep Learning on UAV Monitoring Systems …

349

Fig. 9 Automatic ripe lemon detection from a UAV platform with custom YOLO v4 tiny

agement [76, 124]. The traditional approach to weed detection is manual and labour intensive, and time-consuming [85]. Table 4 highlight advantages and disadvantages for UAV monitoring systems for weed detection through deep learning models. The capability of weed detection with deep learning models is an effective approach for early weed management strategies to ensure high agricultural production yield. From the reviewed studies some of the challenges encountered includes high drone altitude, lighting conditions, and automatic detection of unclassified weed species during the deep learning model training.

3.5 Crop Disease Monitoring Crop diseases had been established to hamper the production yield hence the need for extensive crop monitoring [19, 119, 138]. It also contributes to economical impact in agricultural production concerning quality and quantity [55]. Automatic crop disease detection is critically important for crop management and efficiency [54]. Early and correct crop disease detection aids in plant management strategies in a timely manner and ensures high production yield [52, 62]. Early disease symptoms are likely to be identical and proper classification is vital to tackle them [1, 35]. The plant disease have been established to affect the continues food supply. The study by [117] used mask R-CNN for successful automatic detection of northern leaf blight which affects maize. The traditional manual methods of diseases identification had been established to be time consuming and labour demanding as to drones [60]. It is also susceptible to human error [46]. Figure 10 presents a visualisation crop disease symptom vital for monitoring purposes. The advantages and disadvantages of crop disease monitoring are highlighted in Table 5.

350

T. Petso and R. S. Jamisola Jr

Table 4 Advantages and disadvantages of weed invasion monitoring applications Main focus Weed Advantages Disadvantages References Weed detection based on inter-row detection

Species name not Capability of provided automatic weed localisation

Crop and weed classification

Matricaria chamomilla L., Papaver rhoeas L., Veronica hederifolia L., Viola arvensis ssp. arvensis –

Weed detection and localisation

Automatic near real-time rice weed detection

Detection of hogweed

Weed estimation

Weed detection and location in wheat field

Rice weed

Promising classification capability

A cost-effective approach for weed detection and location estimation Aids the capability of near real-time weed detection

Uneven plant [11] spacing hampered the model performance as weeds are identical to plants Limited [22] capability to classify unknown species

Demanding task for labelling the dataset for the growing season

[44]

Different drone [61] altitude and angle under various light conditions affect detection capability Heracleum Capability for Power [76] sosnówskyi hogweed consumption for identification on embedded system embedded system (NVIDIA Jetson for near real-time Nano) is sensitive to combat the to input voltage weed spreading in a timely manner Species name not Capability of An increase in [86] provided automatic plant area coverage the detection and deep learning weed estimation model (YOLO) overestimated the weed distribution Avena Fatu, Capability of near Lightweight [137] Cirsium Setosum, real-time models have low Descurainia automatic detection Sophia, detection with accuracy as Euphorbia lightweight compared to full Helioscopia, models on mobile models Veronica Didyma devices

A Review on Deep Learning on UAV Monitoring Systems …

351

Fig. 10 Discolouration at the edges of the vegetable leaves

4 Findings on Applications of Deep Learning Models in Animal Monitoring Drones have been established to be an effective approach to monitoring livestock remotely as compared to stationary cameras for farm management [13, 69]. Their incorporation with deep learning techniques had highlighted positive outcomes for livestock automatic identification [100]. They also hold the feasibility of monitoring animals for various purposes such as behaviour, health and movement patterns which is a critical element of proper farm management [12, 91, 98]. They save time and cost-effective alternative approach as compared to the traditional approaches such as the use of horse riding or vehicle for frequent farm visual physical animal monitoring [2, 14, 16]. The capability of continuous animal monitoring is deemed as a complex task, other alternative approaches such as the use of GPS devices are expensive [123]. The study conducted by [6] for automatic individual cattle identification with the implementation using a drone and deep-learning yielded promising proof-ofconcept for agricultural automation for cattle production monitoring in near real-time. Another study by [4] highlighted that deep learning techniques can be implemented for automatic cattle counting to be less labour demanding as compared to the manual approach.

352

T. Petso and R. S. Jamisola Jr

Table 5 Advantages and disadvantages of crop disease monitoring applications Main focus Crop disease Advantages Disadvantages References Automatic Esca detection of Esca disease

Efficient monitoring approach

Automatic detection of vine disease

Mildew

Automatic detection of mildew disease

Mildew

Identification of affected okra plants

Cercospora leaf spot

Approach monitoring method incorporates visible, infrared spectrum and depth mapping Potential early monitoring under visible and infrared spectrum Cost-effective approach to okra disease plant monitoring

Detection and Brown rust & classification of yellow rust diseased-infected wheat crops

Automatic soybean leaf disease identification

Powdery mildew

Automatic detection of yellow rust disease

Yellow rust

Data [52] preprocessing required human with expertise to label data for training purposes Colour variations [53] slightly affect the model performance

Limited training dataset

Drone motion highlight to hamper the detection accuracy Capability to Colour variation detect and similar automatically background detect diseased hampered individual leaves detection capability Efficient Different lighting approach to and background monitor soaybean variation disease conditions can hamper identification capabilities Elimimates the Low possibility manual approach of and skilled misclassification personnel still exists

[54]

[97]

[103]

[119]

[138]

A Review on Deep Learning on UAV Monitoring Systems …

353

Table 6 UAV based deep learning models used for plant monitoring applications Deep learning

UAV platform

Application

Findings

Performance

YOLO v3 YOLO v3 tiny

Self assembled APD-616X

Pest detection Pest location

Efficient pesticide usage

mAP: [25] YOLO v3: 93.00% YOLO v3 tiny: 89.00% Speed: YOLO v3: 2.95 FPS YOLO v3 tiny: 8.71 FPS

References

ResNeSt-50 ResNet-50 Efficient Net RegNet

DJI Mavic air 2 Pest detection Pest location Pest Quantification

Effective pest monitoring approach

Validation accuracy: ResNeSt-50: 98.77% ResNet-50: 97.59% Efficient Net: 97.89% RegNet: 98.07%

[33]

VGG-16 VGG-19 Xception v3 MobileNet v2

DJI Phantom 4 Pro

Pest damage

Effective approach to increase crop yield

Accuracy: VGG-16: 96.00% VGG-19: 93.08% Xception v3: 96.75% MobileNet v2: 98.25%

[43]

YOLO v5xl YOLO v5l YOLO v5m YOLO v5s

DJI Phantom 4 Adv

Ant nest pest detection

Precise monitoring approach

Accuracy: YOLO v5xl: 97.62% YOLO V5l: 98.45% YOLO v5m: 97.89% YOLO v5s: 97.65%

[106]

Inception-V3 ResNet-50 VGG-16 VGG-19 Xception

DJI Phantom 4 Adv

Pest control

Alternative pest monitoring strategies

Accuracy: Inception-V3: 91.87% ResNet-50: 93.82% VGG-16: 91.80% VGG-19: 91.33% Xception: 90.52%

[120]

Faster R-CNN CNN

DJI Phantom 3 Pro

Maize tassel detection

Positive productivity growth monitoring

F1 Score: Faster R-CNN: 97.90% CNN: 95.90%

[3]

VGG-16

eBee Plus

Crop identification

Useful crop identification

F1 Score: 86.00%

[28]

YOLO v3

DJI Phantom 4 Pro

Sea cucumber detection Sea cucumber density

Successful growth density estimation

mAP = 85.50% Precision = 82.00% Recall = 83.00% F1 Score = 82.00%

[65]

CenterNet MobileNet

DJI Phantom 4 Pro

Cotton stand detection Cotton stand count

Feasibility of early seedling monitoring

mAP: CenterNet: 79.00% MobileNet: 86.00% Average Precision: CenterNet: 73.00% MobileNet: 72.00%

[68]

Faster R-CNN

DJI Phantom 3 Pro

Citrus grove Positive yield detection Count estimation Size estimation

SE: 6.59%

[8]

R-CNN

DJI Phantom 4 Pro

Apple detection Effective Apple approach for estimation yield prediction

F1 Score > 87.00% Precision > 90.00%

[9]

(continued)

354

T. Petso and R. S. Jamisola Jr

Table 6 (continued) Deep learning

UAV platform

Application

Performance

References

RetinaNet

DJI Phantom 4 Pro

Melon detection Successful Melon number yield estimation estimation Melon weight estimation

Findings

Precision = 92.00% F1 Score > 90.00%

[50]

FPN SSD YOLO v3 YOLO v4 MobileNetYOLO v4

DJI Jingwei M600 PRO

Longan fruit detection Longan fruit location

Effective fruit detection

mAP: FPN: 54.22% SSD: 66.53% YOLO v3: 72.56% YOLO v4: 81.72% MobileNet-YOLOv4: 89.73%

[64]

YOLO v2

DJI Phantom 3

Mango detection Mango estimation

Effective approach for mango estimation

mAP: 86.40% Precision: 96.10% Recall: 89.00%

[129]

YOLO v3

DJI Phantom 4 Pro

Flower detection Immature fruit detection Mature fruit detection

Effective approach for yield prediction

mAP: 88.00% AP: 93.00%

[139]

YOLO v3

DJI Matrice 600 Pro

Monocot weed detection Dicot weed detection

Capability of weed detection in the field

AP Monocot: 91.48% AP Dicot: 86.13%

[32]

FCNN

DJI Phantom 3 DJI Mavic Pro

Hogweed detection Embedded devices

Positive results for hogweed detection

ROC AUC: 96.00% Speed: 0.46 FPS

[76]

Faster R-CNN SSD

DJI Matrice 600 Pro

Weed detection

Weed monitoring

Precision: Faster R-CNN: 65.00% SSD: 66.00% Recall: Faster R-CNN: 68.00% SSD: 68.00% F1 Score: Faster R-CNN: 66.00% SSD: 67.00%

[124]

YOLO v3 tiny

DJI Phantom 3

Weed detection

Effective approach for weed detection in wheat field

mAP = 72.50%

[137]

SegNet

Quadcopter (Scanopy)

Mildew disease detection

Promising disease detection grapes

Accuracy: Grapevine-level > 92% Leaf-level > 87%

[54]

SqueezeNet ResNet-18

Quadcopter (Customised)

Cercospora Leaf Spot disease detection

Promising disease detection

Validation accuracy: SqueezeNet: 99.10% ResNet-18: 99.00%

[97]

DCNN

DJI S1000

Yellow dust detection

Crop disease monitoring

Accuracy: 85.00%

[138]

(continued)

A Review on Deep Learning on UAV Monitoring Systems …

355

Table 6 (continued) Deep learning

UAV platform

Application

Findings

Performance

References

Inception-V3 ResNet-50 VGG-19 Xception

DJI Phantom 3 Pro

Leaf disease detection

Crop disease monitoring

Accuracy: Inception-V3: 99.04% ResNet-50: 99.02% VGG-19: 99.02% Xception: 98.56%

[119]

1 [DCNN—Deep

Convolutional Neural Network; Faster R-CNN—Faster Region-based Convolutional Neural Network; FCNN—Fully Convolutional Neural Networks; FPN—Feature Pyramid Network; VGG-16—Visual Geometry Group 16; VGG-19—Visual Geometry Group 19; YOLO v2—You Only Look Once version 2; YOLO v3—You Only Look Once version 3; YOLO v4—You Only Look Once version 4;YOLO v5—You Only Look Once version 5; R-CNN—Region-based Convolutional Neural Network; CNN—Convolutional Neural Network; SSD—Single Shot Detector; SE—Standard Error]

4.1 Animal Population The capability of farm management is deemed to be a demanding task on a large piece of land [15]. The traditional approach of livestock counting is mainly a manual task and labour demanding, which is susceptible to errors [108]. Farms are most likely to be in the rural area with limited personnel to perform frequent animal count [37]. Drone had been investigated to to enhance animal population management [131]. The study conducted by [123] investigated an alternative approach for automatic detection of animals and their activities from the natural habitant using a drone. Figure 11 illustrates automatic identification of sheep from a UAV platform with a deep learning algorithm. Low drone altitudes hold the great probability to alter animal behaviour as well minimal animal coverage [89, 123]. The study conducted by [109] highlighted the increase in drone altitude from 80 to 120 m to increase sheep coverage whereas the capability to detect sheep contours was hampered. The

Fig. 11 Sample sheep detection from a UAV image with custom YOLO v4 tiny model

356

T. Petso and R. S. Jamisola Jr

study conducted by [21] highlighted a great unease in a flock of sheep towards a presence of a drone as compared to dogs and human presence. Other challenges included the vegetation growth and old trees for better sheep detection capability. Environmental conditions such as high winds or rain affects the deployment of a drone. The advantages and disadvantages of the reviewed studies are highlighted in Table 7.

Table 7 Advantages and disadvantages of animal identification and population count monitoring applications Main focus Advantages Disadvantages References Sheep detection and counting

Individual cattle identification

Online approach yielded promising results immediately as compared to offline Capability of non intrusive cattle detection and individual identification

Online approach used more power as compared to offline

[2]

Challenge of false [5] positive exists in cases such as multi-cattle alignment and cattle with similar features on the cattle coat Aerial cattle Several deep learning Challenges conditions [13] monitoring algorithms highlighted such as blur images capability of livestock hampered monitoring Aerial livestock Automatic sheep Challenge such as [109] monitoring detection to aid sheep close contact sheep for counting individual identification and sheep under trees and bushes could not be detected Cattle detection and Capability for cattle The model [111] count management for performance decreases grazing purposes with fast moving animals Detecting and Capability of cattle Challenge of cattle [115] counting cattle counting by deleting movement hampers duplicate animals cattle counting capability Livestock detection Capability of livestock Challenge with [132] for counting capability monitoring based on overestimation due to mask detection limited training images

A Review on Deep Learning on UAV Monitoring Systems …

357

5 Discussion and Comparison of Deep Learning Strategies in Agricultural Applications The capability of effective agricultural monitoring is vital to ensure sustainable food security to the crucial growing human population and changing climate. The recent advancements in open based deep learning models and readily available drones have opened research studies in the application of automatic agricultural monitoring applications. Tables 6 and 8 from the reviewed studies primarily highlights over 91% usage of off-the-shelve drones as compared to self-assembled or customised drones. The advantages and disadvantages of different deep learning models from different agricultural applications on UAV monitoring platforms are summarised under each subsection: pest infiltration, plant growth, fruit conditions, weed invasion, crop disease, animal identification and count. The capability of deep learning strategies from a UAV monitoring system greatly contributes to efficient and effective agricultural outputs. They reduce skilled human involvement and the time needed for immediate decision-making. From the reviewed studies, the application of transfer learning is a widely used approach for different agricultural deep learning models. The model size in terms of model layers, power consumption and performance speed are the deciding factors in the deep learning architecture selection. The recent architecture version outperforms the older versions. An effective pest monitoring strategy is vital for localisation and appropriate pesticide usage to combat plant pests. Pest detection can also be interpreted as the extent of the damage, and other indicators include ant nests. Pest management identification of pest location is essential for pesticides applied to the appropriate site to save time and costs. Based on the reviewed studies, the investigated deep learning models yielded a detection accuracy greater than 90.00%. The highest accuracy was with ResNeSt-50 with 98.77%. The possible reason can be a high number of layers for feature extraction needed for better pest detection. It is vital to investigate different deep learning models to identify the optimum one with respect to performance evaluation. The study conducted by [106] highlighted that significantly increasing the number of convolutional kernels and hidden layers in the model architecture can increase model accuracy to a certain degree. YOLO v5xl with more convolutional kernels and hidden layers had a lower accuracy as compared to YOLO v5l. Plant growth monitoring at different stages aids in better agricultural management such appropriate and timely interventions to ensure high production yield. Early timely intervention efforts to replenish unsuccessful germination crops to catch up with appropriate plant density, thus increasing the probability of high production yield. The latter stage of plant growth monitoring aids in harvest estimation to help in the market decision, manpower estimation, and equipment requirements. Plant growth is not significant within a day. Thus, two stage detectors are ideal for the application. Lightweight and non-lightweight deep learning models were investigated in the reviewed studies. The two stage detectors attained high model performance in terms of the f1 score with faster R-CNN at 97.90%. Two stage detectors had been established to hold high detection accuracy due to the division of region

358

T. Petso and R. S. Jamisola Jr

Table 8 UAV based deep learning models used for animal monitoring applications Deep learning UAV platform Application

Findings

Performance

References

YOLO

SelfAssembled drone

Sheep detection Sheep counting

Promising on-board system

Accuracy: Offline processing: 60.00% Online pre-processing: 89% Online processing: 97.00%

[2]

LRCN

DJI Inspire Mk1

Cattle detection Single frame individual Video based individual

Non-intrusive

mAP: Cattle detection: 99.3% Single frame individual: 86.07% Accuracy: Video based individual: 98.13%

[5]

YOLO v2 Inception v3

DJI Matrice 100

Individual cattle identification

Practical biometric identification

Accuracy: YOLO v2: 92.40% Inception v3: 93.60%

[6]

CNN

DJI Phantom 4 Pro DJI Mavic 2

Cattle detection Cattle counting

Effective and efficient approach

Accuracy > 90.00%

[15]

YOLO v3 Deep Sort

DJI Tello

Cattle detection Cattle counting

Improved livestock monitoring

Not provided

[37]

YOLO v2

DJI Phantom 4

Cattle detection Cattle counting

Positive cattle grazing management

Precision: 95.70% Recall: 94.60% F1 Score: 95.20%

[111]

VGG-16 DJI Phantom VGG-19 4 Pro ResNet-50 v2 ResNet-101 v2 ResNet-152 v2 MobileNet MobileNet v2 DenseNet 121 DenseNet 169 DenseNet 201 Xception v3 Inception ResNet v2 NASNet Mobile NASNet Large

Canchim cattle detection

Promising results for detection

Accuracy: VGG-16: 97.22% VGG-19: 97.30% ResNet-50 v2: 97.70% ResNet-101 v2: 98.30% ResNet-152 v2: 96.70% MobileNet: 98.30% MobileNet v2: 78.70% DenseNet 121: 85.20% DenseNet 169: 93.50% DenseNet 201: 93.50% Xception v3: 97.90% Inception ResNet v2: 98.30% NASNet Mobile: 85.70% NASNet Large: 99.20%

[13]

Mask R-CNN DJI Mavic Pro Cattle detection Cattle counting

Potential cattle monitoring

Accuracy: Pastures: 94.00% Feedlot: 92.00%

[131]

Mask R-CNN DJI Mavic Pro Livestock classification Livestock counting

Effective approach for livestock monitoring

Accuracy: Cattle: 96% Sheep: 92%

[132]

1 [LRCN—Long-term

Recurrent Convolutional Network; YOLO—You Only Look Once; YOLO v2—You Only Look Once version 2; YOLO v3—You Only Look Once version 3; R-CNN—Regionbased Convolutional Neural Network; VGG-16—Visual Geometry Group 16; VGG-19—Visual Geometry Group 19]

A Review on Deep Learning on UAV Monitoring Systems …

359

of interest. The disadvantage it holds is the lower detection speed compared to one stage detectors for near real-time capability. Thus, applying a two stage detector is appropriate for plant growth agricultural monitoring. An effective fruit condition monitoring is beneficial for better agricultural decisionmaking in relation of fruit quantity, size, weight and degree of maturity estimation, to name a few. These capabilities are needed for production yield prediction needed for agricultural management planning such as fruit picking and market value estimation. Fruit detection can help in planning fruit picking to consider the ease or difficulty of picking and possible dangers. This will help acquire appropriate equipment to ensure a smooth harvest process during fruit picking time. Fruit detection is performed at different stages, flowering, mature, and immature, to help decide the harvest time and ensure the maximum number of ripe fruits. The reviewed studies investigated both lightweight and non-lightweight deep learning models. This approach provides minimum time, less labour demand and lower erroneous as compared to manual fruit monitoring. The highest performance evaluation was 91.10% precision for mango detection and estimation from the reviewed studies. Though the model performance can be good, some challenges such as tree occlusion and lighting variations can hamper the overall model performance. The presence of weeds hamper the plant growth, thus they compete for sunlight, water, space and soil nutrients. Early detection and appropriately addressing them greatly contributes to better agricultural production yield. The capability to detect weeds is a challenging task due to similar characteristic features of plants. To help in the accuracy of weed detection, we have to increase our knowledge of the expected weed associated with a particular crop. Considering that they are many types of weeds, we can concentrate on the more appropriate ones for a specific crop and disregard others. This way, we can save time in deep learning model training. The highest performance evaluation for weed detection was established to be 96.00% with the classification model performance with FCN. The possible reason for this high model performance is connected to the fact that FCN simplifies the feature extraction learning process faster as it avoids the application of dense layers in the model architecture. The plant disease detection is commonly characterized by a change in colour leaves such as isolated spots, widespread spots, isolated lesions, a cluster of lesions, to name a few. SqueezeNet had the highest accuracy of 99.10%. Other studies highlighted high model accuracy of over 85.00%. The possible reason for high accuracy capability is due to the change in plant leaves for detection purposes. We recommend studies on detecting fallen leaves or broken leaves caused by an external force to help determine the plant’s health. Automatic livestock identification and count from a UAV also encompasses minimal animal behavioural change from the presence of a drone. Most of the reviewed studies individually identified livestock for population and health monitoring. However, other studies are capable of counting livestock without individual identification. The higher the drone altitude, the greater the challenge of acquiring the required deep learning distinguishing features. There are limited studies establishing livestock response towards drones for appropriate altitude, thus great caution must be

360

T. Petso and R. S. Jamisola Jr

taken into consideration in livestock monitoring applications [40]. The highest performance evaluation for livestock detection from the reviewed studies was identified at 99.30% in terms of mean average precision with the LRCN model. LRCN is a model approach ideal for visual features in videos, activities and image classification. The incorporation of deep learning algorithms and livestock graze monitoring capability on drones can aid in animal grazing management system. It is essential to ensure animal grazing management for agricultural sustainability and maintain continuous animal production [111].

6 Conclusions The conventional approach of manual agricultural monitoring had been established to be human skill-oriented and demanding in terms of time and labour. The UAV monitoring system in agriculture provides an automatic tool to speed up assessment and application of intervention methods that can improve its productivity. Deep learning is the most used machine learning tool in automatic agricultural applications. Its advancements provide an efficient approach in intelligent drone systems. The application of transfer learning speeds up the deep learning training process. Performance evaluations are highlighted in with high processing results. The capability of near real-time had been established to be vital for immediate agricultural management decision making to ensure the probability of a high production yield. However, the development capability of deep learning models with appropriate hardware, with high power consumption and good internet in the field remain an area of improvement. Innovative methods to tackle environmental conditions such as lighting variations, tree occlusion, and drone altitude to improve the UAV monitoring systems can also be developed. The collection of a vast, diverse amount of data in these different conditions can be implemented to ensure a development of an accurate UAV system. Additionally availability of publicly shared datasets obtained from UAVs to compare the different deep learning networks to accelerate the development of better monitoring systems. The overall benefits motivate the development of an integrated robust intelligent system for sustainable agriculture to ensure world food security. Acknowledgements Not applicable.

References 1. Abdulridha, J., Ampatzidis, Y., Kakarla, S. C., & Roberts, P. (2020). Detection of target spot and bacterial spot diseases in tomato using UAV-based and benchtop-based hyperspectral imaging techniques. Precision Agriculture, 21(5), 955–978. 2. Al-Thani, N., Albuainain, A., Alnaimi, F., & Zorba, N. (2020). Drones for sheep livestock monitoring. In 2020 IEEE 20th Mediterranean Electrotechnical Conference (MELECON) (pp. 672–676). IEEE.

A Review on Deep Learning on UAV Monitoring Systems …

361

3. Alzadjali, A., Alali, M. H., Sivakumar, A. N. V., Deogun, J. S., Scott, S., Schnable, J. C., & Shi, Y. (2021). Maize tassel detection from UAV imagery using deep learning. Frontiers in Robotics and AI, 8. 4. de Andrade Porto, J. V., Rezende, F. P. C., Astolfi, G., de Moraes Weber, V. A., Pache, M. C. B., & Pistori, H. (2021). Automatic counting of cattle with faster R-CNN on UAV images. In Anais do XVII Workshop de Visão Computacional, SBC (pp. 1–6). 5. Andrew, W., Greatwood, C., & Burghardt, T. (2017). Visual localisation and individual identification of holstein friesian cattle via deep learning. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 2850–2859). 6. Andrew, W., Greatwood, C., & Burghardt, T. (2019). Aerial animal biometrics: Individual friesian cattle recovery and visual identification via an autonomous UAV with onboard deep inference. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 237–243). IEEE. 7. Anghelache, D., Persu, C., Dumitru, D., B˘altatu, C., et al. (2021). Intelligent monitoring of diseased plants using drones. Annals of the University of Craiova-Agriculture, Montanology, Cadastre Series, 51(2), 146–151. 8. Apolo, O. E. A., Guanter, J. M., Cegarra, G. E., Raja, P., & Ruiz, M. P. (2020). Deep learning techniques for estimation of the yield and size of citrus fruits using a UAV. European Journal of Agronomy: The Official Journal of the European Society for Agronomy, 115(4), 183–194. 9. Apolo-Apolo, O. E., Pérez-Ruiz, M., Martínez-Guanter, J., & Valente, J. (2020). A cloudbased environment for generating yield estimation maps from apple orchards using UAV imagery and a deep learning technique. Frontiers in Plant Science, 11, 1086. 10. Ayamga, M., Akaba, S., & Nyaaba, A. A. (2021). Multifaceted applicability of drones: A review. Technological Forecasting and Social Change, 167(120), 677. 11. Bah, M. D., Hafiane, A., & Canals, R. (2018). Deep learning with unsupervised data labeling for weed detection in line crops in UAV images. Remote Sensing, 10(11), 1690. 12. Barbedo, J. G. A., & Koenigkan, L. V. (2018). Perspectives on the use of unmanned aerial systems to monitor cattle. Outlook on Agriculture, 47(3), 214–222. 13. Barbedo, J. G. A., Koenigkan, L. V., Santos, T. T., & Santos, P. M. (2019). A study on the detection of cattle in UAV images using deep learning. Sensors, 19(24), 5436. 14. Barbedo, J. G. A., Koenigkan, L. V., & Santos, P. M. (2020). Cattle detection using oblique UAV images. Drones, 4(4), 75. 15. Barbedo, J. G. A., Koenigkan, L. V., Santos, P. M., & Ribeiro, A. R. B. (2020). Counting cattle in UAV images-dealing with clustered animals and animal/background contrast changes. Sensors, 20(7), 2126. 16. Behjati, M., Mohd Noh, A. B., Alobaidy, H. A., Zulkifley, M. A., Nordin, R., & Abdullah, N. F. (2021). Lora communications as an enabler for internet of drones towards large-scale livestock monitoring in rural farms. Sensors, 21(15), 5044. 17. Bhoi, S. K., Jena, K. K., Panda, S. K., Long, H. V., Kumar, R., Subbulakshmi, P., & Jebreen, H. B. (2021). An internet of things assisted unmanned aerial vehicle based artificial intelligence model for rice pest detection. Microprocessors and Microsystems, 80(103), 607. 18. Bhoj, S., Tarafdar, A., Singh, M., Gaur, G. (2022). Smart and automatic milking systems: Benefits and prospects. In Smart and sustainable food technologies (pp. 87–121). Springer. 19. Bouguettaya, A., Zarzour, H., Kechida, A., Taberkit, A. M. (2021). Recent advances on UAV and deep learning for early crop diseases identification: A short review. In 2021 International Conference on Information Technology (ICIT) (pp. 334–339). IEEE. 20. Bouguettaya, A., Zarzour, H., Kechida, A., & Taberkit, A. M. (2022). Deep learning techniques to classify agricultural crops through UAV imagery: A review. Neural Computing and Applications, 34(12), 9511–9536. 21. Brunberg, E., Eythórsdóttir, E., D`yrmundsson, Ó. R., & Grøva, L. (2020). The presence of icelandic leadersheep affects flock behaviour when exposed to a predator test. Applied Animal Behaviour Science, 232(105), 128. 22. de Camargo, T., Schirrmann, M., Landwehr, N., Dammer, K. H., & Pflanz, M. (2021). Optimized deep learning model as a basis for fast UAV mapping of weed species in winter wheat crops. Remote Sensing, 13(9), 1704.

362

T. Petso and R. S. Jamisola Jr

23. Cauli, N., & Reforgiato Recupero, D. (2022). Survey on videos data augmentation for deep learning models. Future Internet, 14(3), 93. 24. Chandy, A., et al. (2019). Pest infestation identification in coconut trees using deep learning. Journal of Artificial Intelligence, 1(01), 10–18. 25. Chen, C. J., Huang, Y. Y., Li, Y. S., Chen, Y. C., Chang, C. Y., & Huang, Y. M. (2021) Identification of fruit tree pests with deep learning on embedded drone to achieve accurate pesticide spraying. IEEE Access, 9, 21,986–21,997. 26. Chen, J. W., Lin, W. J., Cheng, H. J., Hung, C. L., Lin, C. Y., & Chen, S. P. (2021). A smartphone-based application for scale pest detection using multiple-object detection methods. Electronics, 10(4), 372. 27. Chen, Y., Lee, W. S., Gan, H., Peres, N., Fraisse, C., Zhang, Y., & He, Y. (2019). Strawberry yield prediction based on a deep neural network using high-resolution aerial orthoimages. Remote Sensing, 11(13), 1584. 28. Chew, R., Rineer, J., Beach, R., O’Neil, M., Ujeneza, N., Lapidus, D., Miano, T., HegartyCraver, M., Polly, J., & Temple, D. S. (2020). Deep neural networks and transfer learning for food crop identification in UAV images. Drones, 4(1), 7. 29. Delavarpour, N., Koparan, C., Nowatzki, J., Bajwa, S., & Sun, X. (2021). A technical study on UAV characteristics for precision agriculture applications and associated practical challenges. Remote Sensing, 13(6), 1204. 30. Dileep, M., Navaneeth, A., Ullagaddi, S., & Danti, A. (2020). A study and analysis on various types of agricultural drones and its applications. In 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 181– 185). IEEE 31. Espejo-Garcia, B., Mylonas, N., Athanasakos, L., Vali, E., & Fountas, S. (2021). Combining generative adversarial networks and agricultural transfer learning for weeds identification. Biosystems Engineering, 204, 79–89. 32. Etienne, A., Ahmad, A., Aggarwal, V., & Saraswat, D. (2021). Deep learning-based object detection system for identifying weeds using UAS imagery. Remote Sensing, 13(24), 5182. 33. Feng, J., Sun, Y., Zhang, K., Zhao, Y., Ren, Y., Chen, Y., Zhuang, H., & Chen, S. (2022). Autonomous detection of spodoptera frugiperda by feeding symptoms directly from UAV RGB imagery. Applied Sciences, 12(5), 2592. 34. Fenu, G., & Malloci, F. M. (2021). Forecasting plant and crop disease: an explorative study on current algorithms. Big Data and Cognitive Computing, 5(1), 2. 35. Görlich, F., Marks, E., Mahlein, A. K., König, K., Lottes, P., & Stachniss, C. (2021). UAVbased classification of cercospora leaf spot using RGB images. Drones, 5(2), 34. 36. Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., & Lew, M. S. (2016). Deep learning for visual understanding: A review. Neurocomputing, 187, 27–48. 37. Hajar, M. M. A., Lazim, I. M., Rosdi, A. R., & Ramli, L. (2021). Autonomous UAV-based cattle detection and counting using YOLOv3 and deep sort. 38. Hande, M. J. (2021). Indoor farming hydroponic plant grow chamber. International Journal of Scientific Research and Engineering Trends, 7, 2050–2052. 39. Hasan, A. M., Sohel, F., Diepeveen, D., Laga, H., & Jones, M. G. (2021). A survey of deep learning techniques for weed detection from images. Computers and Electronics in Agriculture, 184(106), 067. 40. Herlin, A., Brunberg, E., Hultgren, J., Högberg, N., Rydberg, A., & Skarin, A. (2021). Animal welfare implications of digital tools for monitoring and management of cattle and sheep on pasture. Animals, 11(3), 829. 41. Huang, H., Lan, Y., Yang, A., Zhang, Y., Wen, S., & Deng, J. (2020). Deep learning versus object-based image analysis (OBIA) in weed mapping of UAV imagery. International Journal of Remote Sensing, 41(9), 3446–3479. 42. Iost Filho, F. H., Heldens, W. B., Kong, Z., & de Lange, E. S. (2020). Drones: innovative technology for use in precision pest management. Journal of Economic Entomology, 113(1), 1–25.

A Review on Deep Learning on UAV Monitoring Systems …

363

43. Ishengoma, F. S., Rai, I. A., & Said, R. N. (2021). Identification of maize leaves infected by fall armyworms using UAV-based imagery and convolutional neural networks. Computers and Electronics in Agriculture, 184(106), 124. 44. Islam, N., Rashid, M. M., Wibowo, S., Wasimi, S., Morshed, A., Xu, C., & Moore, S. (2020). Machine learning based approach for weed detection in chilli field using RGB images. In The International Conference on Natural Computation (pp. 1097–1105). Fuzzy Systems and Knowledge Discovery: Springer. 45. Jintasuttisak, T., Edirisinghe, E., & Elbattay, A. (2022). Deep neural network based date palm tree detection in drone imagery. Computers and Electronics in Agriculture, 192(106), 560. 46. Joshi, R. C., Kaushik, M., Dutta, M. K., Srivastava, A., & Choudhary, N. (2021). Virleafnet: Automatic analysis and viral disease diagnosis using deep-learning in vigna mungo plant. Ecological Informatics, 61(101), 197. 47. Junos, M. H., Mohd Khairuddin, A. S., Thannirmalai, S., & Dahari, M. (2022). Automatic detection of oil palm fruits from UAV images using an improved YOLO model. The Visual Computer, 38(7), 2341–2355. 48. Juyal, P., & Sharma, S. (2021). Crop growth monitoring using unmanned aerial vehicle for farm field management. In 2021 6th International Conference on Communication and Electronics Systems (ICCES) (pp. 880–884). IEEE 49. Kaivosoja, J., Hautsalo, J., Heikkinen, J., Hiltunen, L., Ruuttunen, P., Näsi, R., Niemeläinen, O., Lemsalu, M., Honkavaara, E., & Salonen, J. (2021). Reference measurements in developing UAV systems for detecting pests, weeds, and diseases. Remote Sensing, 13(7), 1238. 50. Kalantar, A., Edan, Y., Gur, A., & Klapp, I. (2020). A deep learning system for single and overall weight estimation of melons using unmanned aerial vehicle images. Computers and Electronics in Agriculture, 178(105), 748. 51. Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey. Computers and Electronics in Agriculture, 147, 70–90. 52. Kerkech, M., Hafiane, A., & Canals, R. (2018). Deep leaning approach with colorimetric spaces and vegetation indices for vine diseases detection in UAV images. Computers and Electronics in Agriculture, 155, 237–243. 53. Kerkech, M., Hafiane, A., & Canals, R. (2020). Vddnet: Vine disease detection network based on multispectral images and depth map. Remote Sensing, 12(20), 3305. 54. Kerkech, M., Hafiane, A., & Canals, R. (2020). Vine disease detection in UAV multispectral images using optimized image registration and deep learning segmentation approach. Computers and Electronics in Agriculture, 174(105), 446. 55. Kerkech, M., Hafiane, A., Canals, R., & Ros, F. (2020). Vine disease detection by deep learning method combined with 3D depth information. In International Conference on Image and Signal Processing (pp. 82–90). Springer. 56. Khan, S., Tufail, M., Khan, M. T., Khan, Z. A., & Anwar, S. (2021). Deep learning-based identification system of weeds and crops in strawberry and pea fields for a precision agriculture sprayer. Precision Agriculture, 22(6), 1711–1727. 57. Kitano, B. T., Mendes, C. C., Geus, A. R., Oliveira, H. C., & Souza, J. R. (2019). Corn plant counting using deep learning and UAV images. IEEE Geoscience and Remote Sensing Letters. 58. Kitpo, N., & Inoue, M. (2018). Early rice disease detection and position mapping system using drone and IoT architecture. In 2018 12th South East Asian Technical University Consortium (SEATUC) (Vol. 1, pp. 1–5). IEEE 59. Krul, S., Pantos, C., Frangulea, M., & Valente, J. (2021). Visual SLAM for indoor livestock and farming using a small drone with a monocular camera: A feasibility study. Drones, 5(2), 41. 60. Lan, Y., Huang, Z., Deng, X., Zhu, Z., Huang, H., Zheng, Z., Lian, B., Zeng, G., & Tong, Z. (2020). Comparison of machine learning methods for citrus greening detection on UAV multispectral images. Computers and Electronics in Agriculture, 171(105), 234. 61. Lan, Y., Huang, K., Yang, C., Lei, L., Ye, J., Zhang, J., Zeng, W., Zhang, Y., & Deng, J. (2021). Real-time identification of rice weeds by UAV low-altitude remote sensing based on improved semantic segmentation model. Remote Sensing, 13(21), 4370.

364

T. Petso and R. S. Jamisola Jr

62. León-Rueda, W. A., León, C., Caro, S. G., & Ramírez-Gil, J. G. (2022). Identification of diseases and physiological disorders in potato via multispectral drone imagery using machine learning tools. Tropical Plant Pathology, 47(1), 152–167. 63. Li, B., Yang, B., Liu, C., Liu, F., Ji, R., & Ye, Q. (2021) Beyond max-margin: Class margin equilibrium for few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7363–7372). 64. Li, D., Sun, X., Elkhouchlaa, H., Jia, Y., Yao, Z., Lin, P., Li, J., & Lu, H. (2021). Fast detection and location of longan fruits using UAV images. Computers and Electronics in Agriculture, 190(106), 465. 65. Li, J. Y., Duce, S., Joyce, K. E., & Xiang, W. (2021). Seecucumbers: Using deep learning and drone imagery to detect sea cucumbers on coral reef flats. Drones, 5(2), 28. 66. Liang, W. C., Yang, Y. J., & Chao, C. M. (2019). Low-cost weed identification system using drones. In 2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW) (pp. 260–263). IEEE. 67. Lin, Y., Chen, T., Liu, S., Cai, Y., Shi, H., Zheng, D., Lan, Y., Yue, X., & Zhang, L. (2022). Quick and accurate monitoring peanut seedlings emergence rate through UAV video and deep learning. Computers and Electronics in Agriculture, 197(106), 938. 68. Lin, Z., & Guo, W. (2021). Cotton stand counting from unmanned aerial system imagery using mobilenet and centernet deep learning models. Remote Sensing, 13(14), 2822. 69. Liu, C., Jian, Z., Xie, M., & Cheng, I. (2021). A real-time mobile application for cattle tracking using video captured from a drone. In 2021 International Symposium on Networks (pp. 1–6). IEEE: Computers and Communications (ISNCC). 70. Liu, J., & Wang, X. (2021). Plant diseases and pests detection based on deep learning: A review. Plant Methods, 17(1), 1–18. 71. Liu, J., Abbas, I., & Noor, R. S. (2021). Development of deep learning-based variable rate agrochemical spraying system for targeted weeds control in strawberry crop. Agronomy, 11(8), 1480. 72. Loey, M., ElSawy, A., & Afify, M. (2020). Deep learning in plant diseases detection for agricultural crops: A survey. International Journal of Service Science, Management, Engineering, and Technology (IJSSMET), 11(2), 41–58. 73. Maes, W. H., & Steppe, K. (2019). Perspectives for remote sensing with unmanned aerial vehicles in precision agriculture. Trends in plant science, 24(2), 152–164. 74. Mathew, A., Amudha, P., & Sivakumari, S. (2020). Deep learning techniques: An overview. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 599–608). Springer 75. Meena, S. D., & Agilandeeswari, L. (2021). Smart animal detection and counting framework for monitoring livestock in an autonomous unmanned ground vehicle using restricted supervised learning and image fusion. Neural Processing Letters, 53(2), 1253–1285. 76. Menshchikov, A., Shadrin, D., Prutyanov, V., Lopatkin, D., Sosnin, S., Tsykunov, E., Iakovlev, E., & Somov, A. (2021). Real-time detection of hogweed: UAV platform empowered by deep learning. IEEE Transactions on Computers, 70(8), 1175–1188. 77. van der Merwe, D., Burchfield, D. R., Witt, T. D., Price, K. P., & Sharda, A. (2020). Drones in agriculture. In Advances in agronomy (Vo. 162, pp. 1–30). 78. Mirhaji, H., Soleymani, M., Asakereh, A., & Mehdizadeh, S. A. (2021). Fruit detection and load estimation of an orange orchard using the YOLO models through simple approaches in different imaging and illumination conditions. Computers and Electronics in Agriculture, 191(106), 533. 79. Moeinizade, S., Pham, H., Han, Y., Dobbels, A., & Hu, G. (2022). An applied deep learning approach for estimating soybean relative maturity from UAV imagery to aid plant breeding decisions. Machine Learning with Applications, 7(100), 233. 80. Mohidem, N. A., Che’Ya, N. N., Juraimi, A. S., Fazlil Ilahi, W. F., Mohd Roslim, M. H., Sulaiman, N., Saberioon, M., & Mohd Noor, N. (2021). How can unmanned aerial vehicles be used for detecting weeds in agricultural fields? Agriculture, 11(10), 1004.

A Review on Deep Learning on UAV Monitoring Systems …

365

81. Monteiro, A., Santos, S., & Gonçalves, P. (2021). Precision agriculture for crop and livestock farming-brief review. Animals, 11(8), 2345. 82. Nazir, S., & Kaleem, M. (2021). Advances in image acquisition and processing technologies transforming animal ecological studies. Ecological Informatics, 61(101), 212. 83. Nematzadeh, S., Kiani, F., Torkamanian-Afshar, M., & Aydin, N. (2022). Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases. Computational Biology and Chemistry, 97(107), 619. 84. Nguyen, H. T., Lopez Caceres, M. L., Moritake, K., Kentsch, S., Shu, H., & Diez, Y. (2021). Individual sick fir tree (abies mariesii) identification in insect infested forests by means of UAV images and deep learning. Remote Sensing, 13(2), 260. 85. Ofori, M., El-Gayar, O. F. (2020). Towards deep learning for weed detection: Deep convolutional neural network architectures for plant seedling classification. 86. Osorio, K., Puerto, A., Pedraza, C., Jamaica, D., & Rodríguez, L. (2020). A deep learning approach for weed detection in lettuce crops using multispectral images. AgriEngineering, 2(3), 471–488. 87. Ouchra, H., & Belangour, A. (2021). Object detection approaches in images: A survey. In Thirteenth International Conference on Digital Image Processing (ICDIP 2021) (Vol. 11878, pp. 118780H). International Society for Optics and Photonics. 88. Pang, Y., Shi, Y., Gao, S., Jiang, F., Veeranampalayam-Sivakumar, A. N., Thompson, L., Luck, J., & Liu, C. (2020). Improved crop row detection with deep neural network for early-season maize stand count in UAV imagery. Computers and Electronics in Agriculture, 178(105), 766. 89. Petso, T., Jamisola, R. S., Mpoeleng, D., & Mmereki, W. (2021) Individual animal and herd identification using custom YOLO v3 and v4 with images taken from a UAV camera at different altitudes. In 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP) (pp. 33–39). IEEE. 90. Petso, T., Jamisola, R. S., Jr., Mpoeleng, D., Bennitt, E., & Mmereki, W. (2021). Automatic animal identification from drone camera based on point pattern analysis of herd behaviour. Ecological Informatics, 66(101), 485. 91. Petso, T., Jamisola, R. S., & Mpoeleng, D. (2022). Review on methods used for wildlife species and individual identification. European Journal of Wildlife Research, 68(1), 1–18. 92. Ponnusamy, V., & Natarajan, S. (2021). Precision agriculture using advanced technology of iot, unmanned aerial vehicle, augmented reality, and machine learning. In Smart Sensors for Industrial Internet of Things (pp. 207–229). Springer. 93. Qian, W., Huang, Y., Liu, Q., Fan, W., Sun, Z., Dong, H., Wan, F., & Qiao, X. (2020). UAV and a deep convolutional neural network for monitoring invasive alien plants in the wild. Computers and Electronics in Agriculture, 174(105), 519. 94. Rachmawati, S., Putra, A. S., Priyatama, A., Parulian, D., Katarina, D., Habibie, M. T., Siahaan, M., Ningrum, E. P., Medikano, A., & Valentino, V. (2021). Application of drone technology for mapping and monitoring of corn agricultural land. In 2021 International Conference on ICT for Smart Society (ICISS) (pp. 1–5). IEEE. 95. Raheem, D., Dayoub, M., Birech, R., & Nakiyemba, A. (2021). The contribution of cereal grains to food security and sustainability in Africa: potential application of UAV in Ghana, Nigeria, Uganda, and Namibia. Urban Science, 5(1), 8. 96. Rahman, M. F. F., Fan, S., Zhang, Y., & Chen, L. (2021). A comparative study on application of unmanned aerial vehicle systems in agriculture. Agriculture, 11(1), 22. 97. Rangarajan, A. K., Balu, E. J., Boligala, M. S., Jagannath, A., & Ranganathan, B. N. (2022). A low-cost UAV for detection of Cercospora leaf spot in okra using deep convolutional neural network. Multimedia Tools and Applications, 81(15), 21,565–21,589. 98. Raoult, V., Colefax, A. P., Allan, B. M., Cagnazzi, D., Castelblanco-Martínez, N., Ierodiaconou, D., Johnston, D. W., Landeo-Yauri, S., Lyons, M., Pirotta, V., et al. (2020). Operational protocols for the use of drones in marine animal research. Drones, 4(4), 64. 99. Razfar, N., True, J., Bassiouny, R., Venkatesh, V., & Kashef, R. (2022). Weed detection in soybean crops using custom lightweight deep learning models. Journal of Agriculture and Food Research, 8(100), 308.

366

T. Petso and R. S. Jamisola Jr

100. Rivas, A., Chamoso, P., González-Briones, A., & Corchado, J. M. (2018). Detection of cattle using drones and convolutional neural networks. Sensors, 18(7), 2048. 101. Roosjen, P. P., Kellenberger, B., Kooistra, L., Green, D. R., & Fahrentrapp, J. (2020). Deep learning for automated detection of Drosophila suzukii: Potential for UAV-based monitoring. Pest Management Science, 76(9), 2994–3002. 102. Roy, A. M., Bose, R., & Bhaduri, J. (2022). A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Computing and Applications, 34(5), 3895– 3921. 103. Safarijalal, B., Alborzi, Y., & Najafi, E. (2022). Automated wheat disease detection using a ROS-based autonomous guided UAV. 104. Safonova, A., Guirado, E., Maglinets, Y., Alcaraz-Segura, D., & Tabik, S. (2021). Olive tree biovolume from UAV multi-resolution image segmentation with mask R-CNN. Sensors, 21(5), 1617. 105. Saleem, M. H., Potgieter, J., & Arif, K. M. (2021). Automation in agriculture by machine and deep learning techniques: A review of recent developments. Precision Agriculture, 22(6), 2053–2091. 106. dos Santos, A., Biesseck, B. J. G., Latte, N., de Lima Santos, I. C., dos Santos, W. P., Zanetti, R., & Zanuncio, J. C. (2022). Remote detection and measurement of leaf-cutting ant nests using deep learning and an unmanned aerial vehicle. Computers and Electronics in Agriculture, 198(107), 071. 107. dos Santos, Ferreira A., Freitas, D. M., da Silva, G. G., Pistori, H., & Folhes, M. T. (2017). Weed detection in soybean crops using convnets. Computers and Electronics in Agriculture, 143, 314–324. 108. Sarwar, F., Griffin, A., Periasamy, P., Portas, K., & Law, J. (2018). Detecting and counting sheep with a convolutional neural network. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1–6). IEEE. 109. Sarwar, F., Griffin, A., Rehman, S. U., & Pasang, T. (2021). Detecting sheep in UAV images. Computers and Electronics in Agriculture, 187(106), 219. 110. Shankar, R. H., Veeraraghavan, A., Sivaraman, K., Ramachandran, S. S., et al. (2018). Application of UAV for pest, weeds and disease detection using open computer vision. In 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 287–292). IEEE 111. Shao, W., Kawakami, R., Yoshihashi, R., You, S., Kawase, H., & Naemura, T. (2020). Cattle detection and counting in UAV images based on convolutional neural networks. International Journal of Remote Sensing, 41(1), 31–52. 112. Sharma, A., Jain, A., Gupta, P., & Chowdary, V. (2020). Machine learning applications for precision agriculture: A comprehensive review. IEEE Access, 9, 4843–4873. 113. Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1–48. 114. Skendži´c, S., Zovko, M., Živkovi´c, I. P., Leši´c, V., & Lemi´c, D. (2021). The impact of climate change on agricultural insect pests. Insects, 12(5), 440. 115. Soares, V., Ponti, M., Gonçalves, R., & Campello, R. (2021). Cattle counting in the wild with geolocated aerial images in large pasture areas. Computers and Electronics in Agriculture, 189(106), 354. 116. Stein, E. W. (2021). The transformative environmental effects large-scale indoor farming may have on air, water, and soil. Air, Soil and Water Research, 14(1178622121995), 819. 117. Stewart, E. L., Wiesner-Hanks, T., Kaczmar, N., DeChant, C., Wu, H., Lipson, H., Nelson, R. J., & Gore, M. A. (2019). Quantitative phenotyping of northern leaf blight in UAV images using deep learning. Remote Sensing, 11(19), 2209. 118. Talaviya, T., Shah, D., Patel, N., Yagnik, H., & Shah, M. (2020). Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artificial Intelligence in Agriculture, 4, 58–73. 119. Tetila, E. C., Machado, B. B., Menezes, G. K., Oliveira, Ad. S., Alvarez, M., Amorim, W. P., Belete, N. A. D. S., Da Silva, G. G., & Pistori, H. (2019). Automatic recognition of soybean

A Review on Deep Learning on UAV Monitoring Systems …

120.

121.

122.

123. 124.

125.

126.

127.

128.

129.

130. 131.

132.

133.

134. 135.

136.

137.

367

leaf diseases using UAV images and deep convolutional neural networks. IEEE Geoscience and Remote Sensing Letters, 17(5), 903–907. Tetila, E. C., Machado, B. B., Astolfi, G., de Souza Belete, N. A., Amorim, W. P., Roel, A. R., & Pistori, H. (2020). Detection and classification of soybean pests using deep learning with UAV images. Computers and Electronics in Agriculture, 179(105), 836. Tiwari, A., Sachdeva, K., & Jain, N. (2021). Computer vision and deep learningbased framework for cattle monitoring. In 2021 IEEE 8th Uttar Pradesh Section International Conference on Electrical (pp. 1–6). IEEE: Electronics and Computer Engineering (UPCON). Ukwuoma, C. C., Zhiguang, Q., Bin Heyat, M. B., Ali, L., Almaspoor, Z., & Monday, H. N. (2022). Recent advancements in fruit detection and classification using deep learning techniques. Mathematical Problems in Engineering. Vayssade, J. A., Arquet, R., & Bonneau, M. (2019). Automatic activity tracking of goats using drone camera. Computers and Electronics in Agriculture, 162, 767–772. Veeranampalayam Sivakumar, A. N., Li, J., Scott, S., Psota, E., Jhala, A., Luck, J. D., & Shi, Y. (2020). Comparison of object detection and patch-based classification deep learning models on mid-to late-season weed detection in UAV imagery. Remote Sensing, 12(13), 2136. Velusamy, P., Rajendran, S., Mahendran, R. K., Naseer, S., Shafiq, M., & Choi, J. G. (2021). Unmanned aerial vehicles (UAV) in precision agriculture: Applications and challenges. Energies, 15(1), 217. Wani, J. A., Sharma, S., Muzamil, M., Ahmed, S., Sharma, S., & Singh, S. (2021). Machine learning and deep learning based computational techniques in automatic agricultural diseases detection: Methodologies, applications, and challenges. In Archives of Computational Methods in Engineering (pp. 1–37). Wittstruck, L., Kühling, I., Trautz, D., Kohlbrecher, M., & Jarmer, T. (2020). UAV-based RGB imagery for hokkaido pumpkin (cucurbita max.) detection and yield estimation. Sensors, 21(1), 118. Xie, W., Wei, S., Zheng, Z., Jiang, Y., & Yang, D. (2021). Recognition of defective carrots based on deep learning and transfer learning. Food and Bioprocess Technology, 14(7), 1361– 1374. Xiong, J., Liu, Z., Chen, S., Liu, B., Zheng, Z., Zhong, Z., Yang, Z., & Peng, H. (2020). Visual detection of green mangoes by an unmanned aerial vehicle in orchards based on a deep learning method. Biosystems Engineering, 194, 261–272. Xiong, Y., Zeng, X., Chen, Y., Liao, J., Lai, W., & Zhu, M. (2022). An approach to detecting and mapping individual fruit trees integrated YOLOv5 with UAV remote sensing. Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Chen, G., Tait, A., & Schneider, D. (2020). Automated cattle counting using mask R-CNN in quadcopter vision system. Computers and Electronics in Agriculture, 171(105), 300. Xu, B., Wang, W., Falzon, G., Kwan, P., Guo, L., Sun, Z., & Li, C. (2020). Livestock classification and counting in quadcopter aerial images using mask R-CNN. International Journal of Remote Sensing, 41(21), 8121–8142. Yang, Q., Shi, L., Han, J., Yu, J., & Huang, K. (2020). A near real-time deep learning approach for detecting rice phenology based on UAV images. Agricultural and Forest Meteorology, 287(107), 938. Yang, S., Yang, X., & Mo, J. (2018). The application of unmanned aircraft systems to plant protection in china. Precision Agriculture, 19(2), 278–292. Zhang, H., Lin, P., He, J., & Chen, Y. (2020) Accurate strawberry plant detection system based on low-altitude remote sensing and deep learning technologies. In 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 1–5). IEEE. Zhang, H., Wang, L., Tian, T., & Yin, J. (2021). A review of unmanned aerial vehicle lowaltitude remote sensing (UAV-LARS) use in agricultural monitoring in china. Remote Sensing, 13(6), 1221. Zhang, R., Wang, C., Hu, X., Liu, Y., Chen, S., et al. (2020) Weed location and recognition based on UAV imaging and deep learning. International Journal of Precision Agricultural Aviation, 3(1).

368

T. Petso and R. S. Jamisola Jr

138. Zhang, X., Han, L., Dong, Y., Shi, Y., Huang, W., Han, L., González-Moreno, P., Ma, H., Ye, H., & Sobeih, T. (2019). A deep learning-based approach for automated yellow rust disease detection from high-resolution hyperspectral UAV images. Remote Sensing, 11(13), 1554. 139. Zhou, X., Lee, W. S., Ampatzidis, Y., Chen, Y., Peres, N., & Fraisse, C. (2021). Strawberry Maturity Classification from UAV and Near-Ground Imaging Using Deep Learning. Smart Agricultural Technology, 1(100), 001.

Navigation and Trajectory Planning Techniques for Unmanned Aerial Vehicles Swarm Nada Mohammed Elfatih, Elmustafa Sayed Ali , and Rashid A. Saeed

Abstract Navigation and trajectory planning algorithms is one of the most important issues in unmanned aerial vehicle (UAV) and robotics. Recently, UAV swarm or flying ad-hoc network which have much interest and extensive attentions from aviation industry, academia and research community, as it becomes one of the great tools for smart cities, rescue/disaster managements and military applications. UAV swarm is a scenario makes the UAVs interacted with each other. The control and communication structure in UAVs swarm require a specific decision to improve the trajectory planning and navigation operations of UAVs swarm. In addition, it requires high processing time and power with resources scarcity to efficiently operates the flights plan. Artificial intelligence (AI) is a powerful tool for optimization and accurate solutions for decision and power management issues. However, it comes with high data communication and processing. Leveraging AI with navigation and path planning it gives much adding values and great results for the system robustness. UAV industry moves toward the AI approaches in developing UAVs swarm and promising more intelligence UAV swarm interaction, according to the importance of this topic, this chapter will provide a systematic review on AI approaches and most algorithms those enable to developing the navigation and trajectory planning strategies for UAV swarm. Keywords UAV swarm · Drones · Small unmanned aircraft systems (UASs) · Flight robotics · Artificial intelligent · Control and communication N. M. Elfatih · E. S. Ali (B) · R. A. Saeed Department of Electrical and Electronics Engineering, Red Sea University (RSU), Port Sudan, Sudan e-mail: [email protected] E. S. Ali Department of Electronics Engineering, Collage of Engineering, Sudan University of Science and Technology (SUST), Khartoum, Sudan R. A. Saeed Department of Computer Engineering, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_12

369

370

N. M. Elfatih et al.

1 Introduction The Drones are known as an unmanned aerial vehicle (UAVs) which can operate remotely without onboard humans [1]. UAVs have been investigated as a disruptive technology that complement and support operations, which are performed traditionally by human. Due to their excellent mobility, flexibility, easy deployment, highperformance, low maintenance, adaptive altitude UAVs are widely used in many applications related to civil and military issues, for example, wildfire and monitoring, traffic control, emergency rescue, medical field and intelligent transportation. UAVs enable to provide wide coverage sensing for different environments [2]. For AUVs, various communication technologies and standards emerge. Such techniques like cloud computing and software defined network, in addition to big data analytics. The UAV design also passed through different communications evolutions, beginning from using 3G broadband signals, and to achieve high data rate, in addition to 5G end to end connectivity. The evolution of UAVs communications from 4 to 5G provides new technologies to support cellular communications for UAVs operations with high reliability, and high energy utilization [3]. The cellular network based 5G provide an enhanced UAVs broadband communication, and also enables the UAVs to act as a flying base station for swarm UAVs, and gateways to the ground cellular stations. Navigation process and trajectory planning are the most important issues which are considered a crucial for UAVs. The process of planning the UAVs trajectories in complex environments that contains a number of obstacles is one of the major challenges facing its application [4]. In addition, the establishment of a network consisting of a number of UAVs that have the ability to avoid collision while taking into account the kinetic characteristics is the most important requirements in UAVs swarm applications. In accordance with the challenges mentioned, and to achieve the operational efficiency of the swarm UAVs with their safety, it is important to intelligently estimate the journey plan especially in complex environments. Therefore, trajectory planning for UAVs has become a research hotspot. In this chapter we provide a comprehensive review about technical conceptual about UAVs swarm architectures, application, navigation and trajectory planning technologies. Our main contributions are summarized as follows. • Provides a review about UAVs swarm Architecture, communications and control systems. • Discussed the UAVs swarm navigations and trajectory planning classifications. • Provides a review about most important intelligent technologies used for UAVs swarm trajectory planning. The rest of this chapter is organized as follows, Sect. 2 provides UAVs technical background in addition to UAV swarm advantages and applications. Swarm communication and control system architectures are provided in Sect. 3. Section 4 provides navigation and path planning for UAV swarm. The classical techniques for UAV

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

371

swarm navigation and path planning are reviewed in Sect. 5. In Sect. 6, the reactive approaches for UAV Swarm navigation and path planning are discussed. Finally, the conclusion is provided in Sect. 7.

2 UAV Technical Background The beginning of the development of drone technology is when the federal law was announced from the United States in the year 2016 regarding regulating the public use of UAVs. Corresponds to the purpose, it has been used in a number of fields such as agricultural monitoring, power lines and photography, in addition to various search and rescue operations [5]. In recent years, the concept of a UAVs swarm has become an important research topic, which is the possibility of managing a swarm of drones and enabling interaction between them through intelligent algorithms that enable the swarm to conduct joint operations among themselves according to predefined paths that are controlled from the operations center. The operations of UVAs are depending on it capability of controlling, maneuvering and power utilization. The following section provides a brief concept about UVAs architecture and intelligence operations.

2.1 UAV Architecture The architecture of UAV consists of three layers, as shown in Fig. 1. These layers relate to data collection, processing and operation [7]. The data collection layer consists of a number of devices such as sensors, light detectors and cameras. The other layers also contain processing devices, control systems, and other systems related to maps and decision-making [8]. In UAVs, the central control system shown in Fig. 2 controls the UAV trajectory in the real environment. The controller adjusts the speed, flight control, and radio and power source. More clearly, the components are described as follows [9]. • Speed controller: provides high frequency to operate the UAV motors and control their speed. • Positioning system: It calculates the time and location information of the UAV and determines the coordinates and altitude of the aircraft. • Flight controller: It manages flight operations by reading location system information while controlling communications. • Battery: In UAVs, batteries are made of materials that give greater energy and a long range as a material, like Lithium polymer, in addition other batteries are added to help long-range flight • Gimbal: It stabilizes the UAVs on its three-dimensional axis.

372

N. M. Elfatih et al.

Fig. 1 UAVs architecture layers

Fig. 2 UAV systems and devices

• Sensors: There are a number of sensors in the UAVs that work to capture 3D images or detect and avoid collisions.

2.2 UAV Swarm Current State UAVs can be act and make different operation scenarios in swarm as a set of UAV group. Recent studies tried to explore the benefits and features of swarm insect’s behavior in nature [10]. As example, bee and eagle bird swarm provides an intelligent concept of flying which can help to provide a good solution for UAVs tasks. Also, the concept of swarm as a behavior of complex collective operations can take place through interactions between large numbers of UAVs in an intelligent manner [11]. The UAVs swarm can provide a tactical operation with high efficiency and performance, in addition to increase the operations quality and reliability when used in different applications [12].

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

373

UAV swarm enable to provide high-level tasks compared to one UAV. Moreover, UAV swarms also allow for fault tolerance, if one UAV of the swarm teams are lost, others swarms or group member can accomplish the assigned tasks by reallocating the missions to the surviving team members [13]. A swarm of UAVs can be deployed to perform various delivery missions including searching, performing target tracking, high-accuracy search and surveillance [14]. All of these operations are carried out by a squadron or group of UAVs that are directed by navigation, guidance and control systems that work to manage the UAVs allocation, flying area coordination and communications. All these tasks operate within a complex system that includes various integrated and integrated technologies [15]. Additionally, the artificial intelligence (AI) technologies are also used in building UAVs systems for different purposes such for intelligent maneuvering, trajectory planning, and swarm interaction.

2.3 UAV Swarm Advantages Many previous papers discussed individual UAV systems and their various applications however, few did offer the study of UAV swarms and their associated limitations versus advantages. Through a lot of literature, it is clear that UAV work in a number of applications and scenarios related to surveillance and have an advantage when used alone [15]. However, the operations of the UAV in a swarm gives more advantages appear in exchange for the operation of single UAV, especially in search tasks. By using swarm of UAVs searching task can be done in parallel and the range of operations can be increase largely. Even though that, UAV swarm can face issues in trajectory planning and interactions. In general, the swarm UAV advantages can be summaries in table 1, when compared with single UAV [16]. A. Synchronized Actions A swarm of UAV can simultaneously collect information from different locations, and it can also take advantage of the collected information to build a model decisionmaking system for complex tasks [17]. Table 1 Comparisons between a single UAV and swarm UAVs systems

Features

Single UAV

Swarm UAV

Operations duration

Poor

High

Scalability

Limited

High

Mission speed

Slow

Fast

independence

Low

High

Cost

High

Low

Communication requirements

High

Low

Radar cross sections

Large

Small

374

N. M. Elfatih et al.

B. Time Efficiency The swarm of UAV enable to reduce the time of making task and missions of searching or monitoring. As an example, author in [7] provide a study of using UAV swarm for detecting the nuclear radiation to build a map for rescue operations. C. Complementarities a Team Member Having a swarm of heterogeneous UAVs, more advantages can be achieved due to the possibility of its integration in different operations and tasks at the same time [18]. D. Reliability The UAV swarm system delivers solutions that provide greater fault tolerance and flexibility in case of single UAV mission fails. E. Technology Evaluation With the development of integrated systems and techniques of miniaturization, models of UAVs to operate in a swarm can be produced, characterized by lightness and small size. [18]. F. Cost Single high-performance UAV to perform complex tasks are very costly when compared to using a number of low-cost UAVs to perform the same task. Where cost is related to power, size and weight [18].

2.4 UAV Swarm Applications A big variety application as shown in Fig. 3 exists where UAVs swarm systems are used. Figure 4 shows the review publications in UAVs swarm and their applications between years 2018 to 2022. The following subsection provides an overview of most important UAVs applications. A. Photogrammetry Photogrammetry enables to extract the quantitative information’s from scanned images, in addition to recover point surface position. Several works addressed UAVs swarm performing imagery collection. For example, [19] presented low altitude thermal image system enable to observe specific area respected to flight plan. B. Security and Surveillance Many applications use UAVs swarms in video surveillance by cameras to cover specific targets [20]. It also helps in monitoring, the traffic control operations, in addition to many military monitoring operations.

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

375

Fig. 3 UAVs swarm systems applications

Number of Publications (2018 - 2022)

3000 2500 2000 1500 1000 500 0

Fig. 4 Publications of UAVs Swarm systems applications (Google scholar)

C. Battlefields Swarms of UAVs help cover battlefields to gather intelligence and transfer it to ground receiving stations for decision-making [21]. In many military applications,

376

N. M. Elfatih et al.

UAV swarms serve to locate enemy locations in urban or remote areas, in lands or seas. D. Earth Monitoring UAVs swarms are used to monitor geophysical processes and pollutant levels by means of sensors and processing units that make independent surveys along a predetermined path [21]. E. Precision agriculture In the agricultural field, UAVs swarm help in spraying pesticides on plants to combat agricultural pests while ensuring high productivity and efficiency [22]. They can also monitor specific areas and analyze data to make spraying decisions. F. Disaster management and good delivery AUVs swarm assist in rescue operations during disasters, especially in the early hours, and help deliver emergency medical supplies [24]. It can also assess risks and damages in a timely manner by means of integrated atmospheric computing. Some companies, such as Amazon, are working on using UAVs swarms to deliver goods through computing systems and the Internet [25, 26]. G. Healthcare Application In healthcare, UAVs help to collect data from different medical levels, from sensor information related to patients, to health centers [27]. One of the examples of these applications is the UAVs star network topology which uses radio alert technology to allocate resources, which consists of the following stages. • Stage 1: data collection, enables the UAV to gathering patients’ information. • Stage 2: data reporting, enable to reporting the information’s collected by the UAV to medical servers, or doctors end devices. • Stage 3: data processing enables to take decisions about patient’s healthcare to provide diagnosis and prescription.

3 Swarm Communication and Control System Architectures To design an efficient and high stable UAV swarm communication architectures and protocols, it is necessary to take these challenges into account [28]. Figure 5 shows general reliable communication service scenario for UAVs. The UAVs swarm communications architectures and it provided services are deal with many requirements as follows. • Increasing in spectrum demand accordingly to expected UAV applications. • Higher bandwidth and data rates to support upstream data traffic for several UAVs surveillance applications. Accordingly, there is a need to develop new strategy to handle the big data traffic between the UAV members in swarm.

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

377

Fig. 5 UAVs network communication architecture

• A heterogeneous QoS is required in both uplink and downlink communications for integrate the operations of UAV in swarm through the cellular network [19]. • High mobility and dynamic topology for UAV swarms, in addition to the highspeed distinction where it needs high reliability and low latency communication networks. • Ability to manage spectrum overcrowding, since the UAVs can be operated on different IEEE bands such as L and S bands, in addition to others related for medical and industrial bands [13]. The UAVs also able to communicate with other wireless technologies such as Bluetooth, WI-FI networks [29]. Accordingly, UAVs such consist many operating devices to have deal with all these communication bands.

3.1 Centralized Communication Architecture The UAV swarm communications architecture illustrates the mechanism for exchanging information between UAVs among themselves or with the ground infrastructure, as it plays an essential role in network performance, intelligent control and cooperation of UAV swarms. [30]. Figure 6 shown the general UAV swarm communication architecture in centralized based approach. The center station is known as Ground Control Station (GCS), which enable commutations to all UAV swarm members [31]. The centralized architecture approach enables to let UAV swarm network extended from single UAV to manage many UAVs. The GCS motoring the UAV swarm to have decision making related to UAVs to manage their speeds and positions [32]. The GCS also provides message control to let UAVs communicate together [14].

378

N. M. Elfatih et al.

Fig. 6 Centralized UAVs swarm architecture

3.2 Decentralized Communication Architecture With the increase in the number of UAVs in the swarm, the centralized communication approach can be used, which provides an organizational structure that reduces the number of swarm and UAVs numbers connected to the central network and gives the independency for some UAVs [33]. Also, the long distances that UAV can travel can lose their connection to the central network, so other decentralized networks are allocated to the aircraft to carry out interactive communications in real time [15].

3.2.1

Single Group Swarm Ad Hoc Architecture

In single-group swarm Ad hoc network as shown in Fig. 7, the swarm’s internal communication is not dependent on infrastructure. The communication between the swarm and the infrastructure is a single point link based on a specific UAV acting as a gateway [34]. In single group swarm ad-hoc networks, some UAVs act as a relay node to forward data between UAV members is swarm. The UAVs can share situation information in real time to improve collaborative control and efficiency. Likewise, the interconnection between the UAV gateway and the infrastructure also enables exchange the swarm information [35]. The UAV gate works as a tool for communicating with UAVs in short distances and also with infrastructure in the long range. The gates reduce the burden on other UAVs by reducing their devices and reducing their cost, which helps to expand the range of communication and speed up the maneuvering performance of the UAVs [16, 35]. However, in order to ensure consistent swarm communication, the flight patterns of all UAVs in the swarm must be similar and operate under a single scenario proportional to their size and speed [17, 36].

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

379

Fig. 7 Single group swarm Ad hoc network architecture

3.2.2

Multi Group Swarm Ad Hoc Architecture

A dedicated single group swarm network can be combined with other networks as shown in Fig. 8, so that each network has a central architecture and a special architecture with different applications depending on the task. The architecture is organized in a centralized manner but the difference is at the level of the UAVs within each private network group [37]. The architecture of communication within UAVs swarm groups is similar to the architecture of communication within a swarm, with a mechanism for communication between groups defined by the infrastructure. The responsibility for connecting to the infrastructure lies with the gateway’s UAVs and for coordinating communications between the missions of the various UAV groups. This architecture helps to conduct specific multitasking applications for groups to conduct joint multi-theater military operations so that the central control center can communicate with different UAV swarms [18, 37].

3.2.3

Multi-Layer Swarm Ad Hoc Architecture

An ad hoc multi-layer swarm network architecture is an important type of architecture that is suitable for a wide range of UAVs, as shown in Fig. 9. In which a group of neighboring drones of the same type form a dedicated network as the first layer of the communications infrastructure [38]. In this architecture there are different types of drone kits on the drone gateway and it contains a second layer to enable connection to the nearest drone gateway to the infrastructure. At the third layer it does not require communication between any two drones in an ad hoc multi-layer swarm network architecture but the interconnection of the drones in the same group is done on the first level. Multi-layer custom network architecture compensates for the increase or decrease of UAV nodes and quickly implement network reconstruction [39]. The multi-layer ad-hoc network architecture works with scenarios where swarm UAVs

380

N. M. Elfatih et al.

Fig. 8 Multi-group swarm Ad hoc network architecture

missions are complex and there are a large number of UAVs performing the missions allowing for a change in the network topology, and communication between the UAVs [19, 39]. According to what has been reviewed, UAV communications engineering has evolved significantly to serve a number of different and important scenarios. According to this, there are different communication structures to choose from among these structures. Table 2 summarizes the advantages and disadvantages of the discussed architectures. It turns out that the central communications architecture

Fig. 9 Multi-layer swarm Ad hoc network architecture

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

381

Table 2 UAVs Swarm communication architectures summary Features

Centralized architecture

Decentralized architecture Single-group

Multi-group

Multi-layer

Multi-hop communication









UAVs relay









Heterogeneous UAVs









Auto configuration









Limited coverage









Single point of failure









Robustness









Note “✓” = supported “✕” = not supported

is suitable for scenarios with a small number of UAVs swarms with relatively simple tasks. The more complex the tasks and the larger the swarms, the other architectures are used according to the required scenario [40]. In case of expanded coverage through, multi-hop network scenario, the decentralized communication architecture is suitable for this purpose [41]. There are many communication technologies enable to provide UAVs communications. Figure 10, shows the classifications different UAVs communication technologies categorized four types based on, cellular, satellite, Wi-Fi based, and cognitive radio UAVs communications.

4 Navigation and Path Planning for UAV Swarm Navigation of UAV could be described as a procedure for robot makes a strategy on how to quickly and safely reach the goal position, which typically depend on current location and environment. In order to effectively for the scheduled assignment to be completed, a UAVs should be aware fully of its statuses, comprising, heading direction, navigation speeds, location, as well as target location and starting point [41]. Today, several navigation techniques have been introduced and could be basically alienated into three groups: satellite, vision-based and inertial navigations. However, all these techniques are not perfect thus, it is critical to propose a suitable one for navigation of UAV conferring to the explicit mission [42]. The navigation base vision demonstrates to be a promising and primary autonomous research of navigation direction with the fast computer vision development. First, the visual sensor could offer rich surroundings operational information second, sensors also, are extremely suitable for active environment perception due to their extraordinary anti-interference capability. Third, utmost visual sensors are passive, which correspondingly avoid to be detected by attackers [43].

382

N. M. Elfatih et al.

Table 3 Summarization of Trajectory planning methods Algorithms

Approach

Extend to 3D

Advantages

Disadvantages

CD

Workspace modeling

Yes

Could be expanded to 3D, Mobile robots’ applications

Needs searching A* algorithm

VD

Workspace modeling, and Roadmap

Yes

The obstacles will be far from the planned routes

Difficult to apply in a 3D environment

VG

Workspace modeling, and Roadmap

Yes

In polygon-based and regular environments have better performance

The obstacles will be closed to planned route Hard in cluttered environments

APF

Potential field

Yes

No need algorithm of modeling environment, complexity time is low, generated easy path, avoidance of obstacle in real-time

Easy to fall in a local minimum

A*

Searching

Yes

A low-cost and short Possible high time path, could be complexity associated with algorithm of modeling environment i.e., GD

Dijkstra

Searching

Yes

the shortest path is Guaranteed

High time complexity

RM

Workspace modeling, and Roadmap

Yes

Path finding is Guaranteed (needs sampling nodes to be increased with no boundaries)

No optimal path is guaranteed Hard generate path in a slight gap, Complexity of computation

Planning of the path is defined as the method of finding shortest and an optimal path between destination and source. One of the utmost important difficulties to be discovered in the UAVs arena. The core goal of UAV’s path planning is to discover an effective flight cost through a path that fulfil the requirements of UAV performance with small collision probability during the flight [20]. Planning of UAVs route, normally comprises three main terms [21, 44]: Planning of motion, navigation, and planning of trajectory. Planning of motion contents restraints alike flight route, turning the motion crank of the route planning. On the other side, Planning of trajectory includes the route planning having velocity, time, and UAVs mobility kinematics of whereas navigation is concerned with localization, and avoidance of collision.

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

383

Fig. 10 UAV swarm communication technologies

For UAV route planning, a three-dimension for environment is essential, as in a multifaceted environment two-dimension (2D) route planning technique would not be capable to discover the objects and obstacles. There are several UAVs route planning techniques for obstacle navigation. The three-dimension (3D) techniques are formulated route-planning as a problem of optimization [45].

4.1 UAVs Network Communication and Path Planning Architecture UAV needs to accomplish route planning during motion from a source to a destination. UAVs recognize the neighboring environment by utilizing sensors to navigate, control and plan flight mobility. The UAV route planning stages that required to be tracked during the operations execution are (i) climate and weather sensing, (ii) navigations, (iii) UAVs movement control. The mentioned above stages are to be applied throughout the trip [46]. The climate and weather sensing for the environments gives the UAVs awareness. The route planning and navigation methods are to be applied continuously seeking for an optimal route. The UAVs movement and velocity are monitored by central controller for avoidance of collision. Furthermore, the UAVs need to communicate with neighbor UAVs for the network management during their goal of mission [47].

384

N. M. Elfatih et al.

There are requirements for 3D route planning in multifaceted environment. The 2D route planning techniques are not suitable for such environments which would be confused about discovering objects and obstacles compared to the sophisticated 3D environments. Then, 3D route planning methods are on severe demand for UAV navigation and surveillance applications a complex and cluttered environments [47]. The UAVs energy of communication of base-stations could be decreased by optimizing the power of transmission. Likewise, reducing the UAVs machine energy, it required a model for consumption in UAVs systems where the efficient UAVs energy could be modeled as: E = (Pmin + h) (t) + (Pmax)(h/s)

(1)

where, t presents the time of operating, h is the height, and s is the UAVs velocity. Pmin and Pmax depend on motor specification and weight. One can write, Pmin is the lowest power that required for UAV to start with α as the motor velocity. Hence, forth, the entire communication cost (Tcom ) that reduced the UAVs cost and time can be modeled as: Tcom = ts + (to + th )l

(2)

whereas, ts denoted as UAVs start time, to denoted as time overhead, th denoted as UAVs hop time, and l denoted as the accumulated links between the start point and target. Having such parameters, collision avoidance, robustness and completeness aspects, which were used and considered for optimal path finding for UAVs algorithms.

4.1.1

Trajectory Planning in 2D Environments

In conventional route planning techniques, information of environment has generally been defined by a 2D level. In UAV case, it is supposed to maintain manual adjustment or height for its flight. Optimization in a 2D route planning issues are nondeterministic polynomial-time hardness (NP-hard) issue hence, no certain solution is occurred [48]. Luckily, Collective Influence (CI) algorithm reduces the gradients computing requirements of constraint and cost functions. This empowers the NP-hard technique to be optimized and resolved. Usually, 2D route planning algorithms can be categorized to three forms rendering to the UAV constraints. The first algorithms deal with the UAVs as particles. In this situation, designers could concentrate on the optimal path computation. Though, this type computation is quite sophisticated and hard to implement, as NP-hard problems could be consistently converted to optimal

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

385

route constraint with spatial-search [49]. Though NP-hard problem has no regular solution, multiple CI algorithms [23, 24] could work for route planning optimization by simplifying the cost and constraint functions gradients computation. The second algorithm model the problem based on the shape of the UAV. For UAV shape-based algorithm, the problem can be converted [25, 49] to be considered as 2D shape with shape parameters i.e., gravity center and wing span. Then, it could be resolved by the method used UAV as a particle. The third technique is UAV to be modeled as dynamic and kinematic constraints, such as, its max and min radii turning. Related to the previous techniques, the third technique is more complicated however more applied in applications. For the dynamic and kinematic constraints, CI method [26, 50] could be used with the advantages in computation and fast convergence.

4.1.2

Trajectory Planning in 3D Environments

As growing fields range, for example, navigation, detection, operations and transportation, are all needed for UAVs application. Because of the environment’s complexity, which have many factors with uncertainty and unstructured, 3D route planning robust methods are crucially required [51]. Though route planning of UAV for 3D spaces presents excessive opportunities, contrasting to route planning for 2D. The challenges are dramatically increased for kinematic constraint. One of the traditional problems can be modeled for the 3D space while considering the kinematic constraint for collision-free route planning. Bear in mind, kinematic constraints such as temporal, geometric, and physical difficult to be resolved by conventional CI methods which may encounter numerous of difficulties i.e., convergence rate is low and exploration range is wide [52]. This article focuses and discusses 3D environments with emphasis on the challenges mentioned in the above sections. 3D techniques have various advantages and characteristics when joint with suitable CI algorithms. To avoid the challenges of 3D unmature and slow convergences in UAV route planning with low-height, numerous of Genetic Algorithm (GA) can be used for route planning [27, 28]. The enhanced particles swarm optimizations (PSO) techniques [29, 30] can be utilized to solve blind exploration of wide range problems and execute comprehensive 3d route optimization. An improved ant colony optimization (ACO) techniques [31] for planning for 3D route have been discussed extensively, where it could also enhance the selection speed and reduce the finding optimal local points probability. Unlike swarm methods, Federated learning (FL) algorithm has generally been utilized for navigation vision in UAV to enable images decisions and detection [32]. To discuss UAV route planning attack problem, fusion neural-network (NN) based technique has been proposed [33]. This method could simply be enhanced by parallelization methods. Recently, with the computing chip development, need more computing time and high performance

386

N. M. Elfatih et al.

Fig. 11 Classification of trajectory planning algorithms

for deep learning (DL) and machine learning (ML) techniques [34, 35] have been guaranteed. These techniques (i.e., ML and DL methods) were greatly been used in UAV 3D route planning to resolve the NP-hard problems more accurately in a wide search region [53].

4.2 Trajectory Planning for UAVs Navigation Classifications The route planning for UAV could be categorized in three methods namely combinative sampling-based and biologically-inspired methods as presented in Fig. 11.

4.3 Route Planning Challenges • Route length: The route length is identified as the total path that UAVs can move from start points to the end points. • Optimization: which defined as the route calculation and parameters should be efficient in time, energy and cost. It could be identified to three classes, i.e., non-optimal, sub-optimal and optimal. • Extensiveness: which is identified as the characteristics that utilized in route planning for discovery the route. It offers the UAVs a platform and an optimal route solution.

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

387

• Cost-efficiency: it relies on the UAVs network cost computation. It comprises of several influences such as peer-to-peer cost, cost of fuel, cost of charging the battery, cost of memory spaces, cost of hardware and software. • Time-efficiency: it defines the minimum time that UAVs can move from start point to the target point assuming there are obstacles on the path. This could be likely if UAVs utilize shortest and optimal path. • Energy-efficiency: it is meaning the minimum UAVs consumption of energy in terms of energy, fuel, battery used for pass from starting to destination points. • Robustness: which is identified as tolerance of the UAVs and resilience against errors i.e., hardware, software, protocols and communication during route planning. • Collisions avoidance: is identified as the capability of UAVs to for collision detection and avoidance to avoid any crashes or physical mutilation to UAV.

5 Classical Techniques for UAV Swarm Navigation and Path Planning 5.1 Roadmap Approach (RA) The Roadmap Approaches (RA) is also identified as highway approaches. It is a two-dimensional network of straight lines connecting the start and the destination points without intersecting the obstacles defined in the map [54]. The basic idea of this algorithm is to generate sampling nodes in the C-space randomly and connect them. The RA consists of generating a fixed number of random points, which could be called milestones in the search space. Milestones within an obstacle are discarded. All remaining milestones are sequentially interconnected by straight lines, starting from the robot starting point. Straight line segments within obstacles are discarded [55]. The remaining line segments become the edges through which the robot can travel collision-free. For a given start point (Ps) and target or finish point (Pf), all possible connecting paths or routes are generated by collision to avoid obstacles. A typical factor such as A* search technique is used to finding the shortest route between initial point and destination. The resulting route consists of a series of way points connecting the start and target locations. In overall, the algorithm works as shown in Fig. 12. The map is identified by a range named the total range, Crange. Then, separated to free obstacle range Cfree and the obstacle range Cobst. Then, a connectivity graph network Qfree is created by choosing a group of points that could be linked by straight line such that the consequential discretization generates a group of polygons that surrounded obstacles in Crange. The achieved graph connectivity is utilized to create a proposal for all probable collision-free paths. Then, A* search algorithm is utilized to discover one or more paths based on parameters that used from start to end points or positions in between [56].

388

N. M. Elfatih et al.

Fig. 12 The road map method

This technique is also utilized for obstacles in polygonal environments in which the polygon ribs are illustrated by edges and nodes. Two of these techniques were used to represents paths by graph connectivity. An example of such techniques is the graph visibility and Voronoi’s diagram [36, 56]. A. Visibility Graphs The visibility graphs (VGs) are widely utilized for route planning algorithms based on the Cspace modeling roadmap algorithm. This is an earliest method used for path planning. As the name proposes, the VG produce a lines-of-sight (LoS) routes throughout an environment. In the visibility graphs, the obstacles vertices represented by finite number of nodes between start and end point. Each node VG is representing a location of point, while the route between points is represented by a connected lines [57]. If the connected lines do not cross any obstacles, means the path is feasible and considered as visible path and draws in the visible graph as solid line, as shown in Fig. 13. Otherwise, it considered as un feasible route which requires to be deleted from the visible graph. The similar procedure is recurring for the other nodes remaining until the finish point/node. VG builds the roadmap which identifies the free spaces around obstacles, thus translating the connected Cspace into a structure with graph skeleton. Lastly, A path is then produced utilizing searching graph algorithm such as Dijkstra protocol to discover the shortest route that links route from start to end point [58]. The VG concept could be expanded to a 3D graph environment, which utilizing the 3D plane rather than lines. Many papers in literature discuss the usage of VG in 3D spaces. For example, authors in [38] presented a technique for transferring 3D into 2D problems, then discover the route by using the legacy 2D VG algorithms. Finally, it adds additional view which is the path altitude [59].

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

389

Fig. 13 Visibility graph

B. A* algorithm A* algorithms are a traversal graph and route searching algorithms which is commonly utilized for discovering the optimum route due to its optimality and completeness [60]. It discovers the optimum route with less processing time and it has to store and remember all nodes that have been visited earlier. It uses those memories to identify the best path that can be taken from its current state. To find the following node, one can use the below expression. f(n) = g(n) + h(n)

(3)

where n denotes as following node on the route, g(n) is the route cost from node S to n, and h(n) is a heuristic process that calculates the lowest path cost from G to n. The minimum path cost is approximated and estimated to reach the next optimum node point. The repeated optimum node points are estimated based on these expenses of the optimum route by obstacles avoiding [61]. Figure 12 illustrates the phases that elaborate in searching the optimum route in the A* searching algorithms. It is basically based on efficient heuristic cost, the high expanded search areas, and appropriate only in a static circumstance. Since A* algorithm solve the optimum routes that made by neighbor nodes to build the roadmap, it results with route jagged and long solution.

390

N. M. Elfatih et al.

Algorithm 1: The A* Algorithm Input: start, goal(n), h(n), expand(n) Output: path if goal(start) = true then return makePath(start) end open←start closed ← ∅ while open ≠ ∅ do sort(open) n ←open.pop() kids expand(n) forall kid ∈ kids do kid.f (n.g + 1) + h(kid) if goal(kid) = true then return makePath(kid) if kid ∩closed = ∅ then open←kid end closed n end return path

5.2 Cell Decomposition (CD) Cell decomposition (CD) is a path-planning algorithm based on a 2D C-space modeling approach. In the cell decomposition method, the environment is divided into non overlapping grids (cells) and uses connectivity graphs for traversing from one cell to another to achieve the goal. Possible routes from the start to finish points are then created that pass-through neighbor free cells (no obstacles in these cells) [62]. The obstacles are isolated by connectivity finding among the free cells. Thus, a discrete version of the environments is created. Search algorithms are utilized to connect neighbor free cells. Figure 16 presents the process schematic. The shaded cells are removed due to obstacles occupation in grey. A connectivity between the start and destination points is computed by linking the free cells by a straight lines’ series [63]. The Fig. 14 represents a simple environment division, which is known as the cell decomposition exact method. If no path is found, then the cells are then decomposed into smaller cells and a new search is completed The CD method is characterized as exact, adaptive, and approximate. In the exact CD cells does not have a precise size and shape, but could be determined by the environment map, location and obstacle shape [63]. This approach utilizes the regular grid in several ways. Firstly, the available environment free space is disintegrated into small parameters (triangular and trapezoidal) followed by a number for each parameter. Each parameter in the environment represents a node in the graph connectivity. The neighbour nodes are then permitted to joint in the space

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

391

Fig. 14 Cell decomposition (CD)

arrangement and a route in this chart equivalences to a digression in free space. This is drawn by the striped cells succession [64]. A route in this graph associated to a free space network, which is drawn by the striped cells succession. These channels are then changed into a free route by linking the underlying arrangement to the objectives design through a midpoint of the crossing points of the contiguous cells in the channel. In approximate CD, planning spaces have been utilized to identify a regular grid has an explicit size and shape, henceforth it is easy to configure. Adaptive CD recognizes the presented information in the free space and follows the avoidance basic concept of the free space in regular CD [44, 64]. The benefits of this method are that it is practical to implement above twodimensions and relatively quick to compute. However, because it is an iterative process, it is not necessarily practical to compute online as there is no guarantee when, or if, a solution is found. Additionally while there are both exact and approximate cell decomposition methods, the approximate method (shown in the figure above) can provide very suboptimal solutions.

5.3 Artificial Potential Field (APF) Motion planning field using APF initially used for online avoidance of collision for where UAVs do not have previous knowledge about obstacles however it avoids it in real-time manner. The comparatively simple concepts treat the vehicles as a node under the effect of an APF where the differences in the spaces characterize the environment structure [65]. The attractive potentials reflect the vehicle pull to the goal and the repulsive potentials reflect the UAV push from the obstacle [44, 66]. Consequently, the environment is disintegrated into values set where high value is linked to obstacles and low value is linked to the goal. Several steps are used to

392

N. M. Elfatih et al.

Fig. 15 Example of the potential field method

construct the map using potential fields. First, the target point is assigned a large negative value and Cfree is assigned increasing values as the distance from the goal increases. Again typically, the inverse of the distance from the goal is used as a value [10, 65]. Second, Cobstacle is assigned as the highest values and Cfree is assigned decreasing values as the distance from the obstacles increases. Typically, the inverse of the distance from the obstacle is used as a value. Finally, the two potentials in Cfree are added and a steepest descent approach is used to find an appropriate path from start point to the end point (see Fig. 15 on the right) [45] (Table 3).

6 Reactive Approaches for UAV Swarm Navigation and Path Planning 6.1 Genetic Algorithm (GA) The Genetic Algorithms (GA) is an dynamic stochastic searching algorithms based on the natural genetics and selection utilized to resolve the optimization issues [46, 66]. In the terms of route-planning, Genes are points that are waypoints on the route, and GA uses genetic operations to for initial route optimization. There are five genetic operations’ stages in GA route planning [47, 66]: • The cross operation: randomly select two points from two routes exchange the remained route after the selected points. • The mutation operation: randomly select one point from a route and swap it with a point that does not select by any route. • The mobile operation: random select a point in the route and change it to a neighboring location.

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

393

• The delete operation: randomly select a point in the route, then connect any two neighbor nodes together. If eliminate the selected node results in a short without collision route, then eliminate this point from the route. • The enhance operation: can be only utilized in the collision-free routes. Choose a point from the route and enclose two new points in two sides of the chosen point. Then, link the two new points with a route. If the new route is viable, eliminate the chosen point. The mentioned genetic operations are used for parent routes to create an optimized child route. In GAs, the parent route is identified as the preliminary route achieved from the preceding route planning operation, which could be achieved using Roadmap. The parent routes should be line sections that link the start and the end via numerous midway points. GAs are robust search algorithms which needs very minor information on the environment for efficient search [67]. Most of the studies have studied static environment navigation only by utilizing GAs however, navigation in dynamic environment with existing of mobile obstacles is not been discussed extensively in the literature. To have excellent achievements in UAB route planning, several studied have been studied GAs applications along with other intelligent algorithms jointly which sometimes called hybrid approaches [50].

6.2 Neural Network (NN) The ANN structure concept is stimulated by the neural biological network operation. It is built on a group that connected with computation function known as artificial neuron (AN). Each link between ANs has ability to transmit a signal from one point to another [68]. The ANs process the signal received and then signal the ANs associated to it. In ANN configuration of UAV route planning, the link between the neurons is called signal and it is usually real number. The neuron output is computed by a nonlinear function. They are typically optimization through mathematical stochastic approaches based on huge amounts of data fitting [69]. Then, we can attain a suitable solution which can be converted by mathematical function. The ANN algorithms reduce the mathematical complexity by eliminating the collocating requirement for the computational environments and providing fast computer equipment [62]. Since an ANN is created by using parallel computation the convergence is generally very fast, and the created route is safe and optimal [63]. There are, two key forms of ANN approaches have been used in UAV route planning: firstly, a UAV built its route on a sample trajectory and utilizes a direct association approach to optimize and compute the trajectory [64]. Secondly, it uses NNs to estimate the system dynamic, objective function, and gradient, which eliminate the collocation requirement, thus reducing the nonlinear programming problems size [65]. Presently, second type approaches are more popular. Then, it has been expanded to determine its best for resolving multiple-UAVs problem [66]. Additionally, ANN has generally been combined with other approaches and algorithms [67, 68] such as

394

N. M. Elfatih et al.

the PFM, PSO, and GA, to maximize their advantages. Deep neural networks (DNNs) are a multi-layer NNs and have been extensively used in the AI field recently, such as speech recognitions and images processing. Due to its capability to characterize and extract features precisely, it can be applied for UAV future facilitation for route planning in complex environments.

6.3 Firefly Algorithm (FA) Firefly algorithms (FAs) are stimulated by the fireflies’ behavior and flashing activities, although it is also known as the meta heuristics’ algorithms. Its concepts include general identifications and random states as trial/error of firefly which is present in nature statistically [70]. The firefly is a flying beetle of the Lampyridae family and usually is called lightning bugs due to its capability to create light. It creates light by a process of Luciferin oxidation in the enzymes Luciferase presence, which arises very rapidly. The light creation process is known as bio luminescence and fireflies utilize this light to glow without spending heat energies. Firefly uses the light for mate selection, message communication and occasionally also for terrifying off other insects who try to attack it. Recently the FAs have been utilized as an optimized tool and its applications are spreading in nearly all engineering areas such as robot mobile navigations. In [70], the authors presented Firefly algorithms-based robot mobile navigations approach in the of static obstacles presence. The paper attained the three primary navigation objectives such as route safety, route length, and route smoothness. In [71], authors showed the FAs for the shortest path with free collision for single robot mobile navigation in simulations environment. In [72] established the FAs for underwater robot mobile navigation. Authors established strategy for swarm robots scheduling for jamming and interference avoidance in 3D marine space environment. Reference [73] discussed a similar environment, where a real-life underwater robot navigation in partially pre-known environment is presented by utilizing the levy light-fireflies based method. The FAs based cooperation for dead robot detection strategy in a multi-mobile robot environment is discussed by [74]. The FA 3D application for world exploration with aerial navigations is implemented and developed by [75]. An enhanced version of FAs is applied for unmanned combat aerial vehicle (UCAV) route planning in a crowded complex environment and to avoid hazard areas and minimizing the fuel cost. The concentric sphere based modified FA algorithm has [76] been presented to avoid random moving of the fireflies in less computational efforts. The experimental and situational results show a great commitment in achieving the goals of navigation in a complex environment. Reference [77] Addressed the problem of navigation specifically in dynamic conditions.

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

395

6.4 Ant Colony Optimization (ACO) The ACO algorithms initiated from the ant community behaviour and its capability to search for the best shortest route from the source (nest) to a destination while they are seeking for food [78]. In route planning method, all the routes of ant swarm establish the optimized solution space for the problem. The pheromone concentration is increasingly accumulated on shorter routes, and the number of ants selecting the route is also growing. Ultimately, the entire ants concentrate on the shortest route under confident feedback, and the consistent solution is the optimum to the route planning of optimized problem [80]. ACO algorithm for UAV route planning is typically developed by dividing the area of flying into a grid and enhancing a route between a grid point and the destination points [85] to conduct the optimal route efficient and rapid search [81]. An improved algorithm was discussed in [81] with the assistance of a climbing weight and a 3D grid. Today, the ACO is utilized for efficient route planning, and to handle the robot mobile navigation problems for obstacles avoidance. The ACO compared with other Collective Influence (CI) algorithms, the ACO has solid robustness and capability to search for a best solution. Furthermore, the ACO is an evolution population-based algorithms that are fundamentally easy and parallel to run in parallel. To enhance the performance of ACO algorithm in route planning problematic issues, the ACO algorithms can be simply combined with a various heuristic algorithm.

6.5 Cuckoo Search (CS) The CS algorithms are based on the cuckoo’s lazy behavior for putting their eggs in the of other birds’ nests. The algorithms follow three basic guidelines for an optimized solution problem as discussed in [79]. At a time, each cuckoo put one egg in a randomly selected nest. The best nest with high eggs quality will be passed to the next generation. The number of nests available is usually fixed, and the cuckoo egg laid has a probability of P ∈ (0, 1) to be discovered by the host bird. In such case, the host bird can either abandon the current nests and build another one or get rid of the egg. The CS algorithms are an enhanced approach due to the grows of the efficiency and rate of convergence, henceforth it is extensively recognized in various optimization engineering problems. Robot mobile navigations are one the area where computational time and performance are need to be optimized [80]. The CS algorithms utilized for wheeled robot navigation in a static environment the environment is partially known, and have shown real-life experiments and simulations over the complex environments. The simulation and experimental results present good arrangement as there was a much slighter deviation errors [81]. The CS-based algorithms perform well when combined with other navigation methods. One such method is a combined of adaptive neuro fuzzy inference systems

396

N. M. Elfatih et al.

(ANFIS) and CS were proposed to obtain better navigation results in uncertain environments. Another hybrid route planning method for an uncertain 3D environment by hybridizing the CS with differential evolution (DE) algorithms for the global convergence speed acceleration. The enhanced convergence speed aids the aerial robot to discover the 3D environment. The CS 3D applications particularly for a battleground has been discussed in [82]. In the manuscript, hybrid method (combing CS and DE) has been proposed for aerial 3D route planning optimization problem. The DE is added for the cuckoo’s selection process optimization which enhanced CS algorithm noticeably, where the cuckoos were act as searching agent for optimum route.

6.6 Particle Swarm Optimization (PSO) PSO is an algorithm that describe birds flocking based optimization approach. There are two parameters in this approach: position and speed. Position defines the movement direction, while speed is the movement variable. Each element in the search space individually searches for the optimum solution, and saves it as the present individual value, and shares this value with other elements in the whole swarm, and finds the optimum individual value for the entire swarm [82]. The present global swarm optimum solution is that all elements belong to the swarm adapt their position and speed according to the present individual value they found and the present global optimum that distributed to the whole entire particle swarm [83]. Extensive studies have been done based on the UAVs route planning by applying PSOs approaches and its alternatives. In PSOs, individual or particle is initialized randomly. Each one of these particles represent a probable solution to path planning problem and search around within certain space to look for optimum position. PSOs have advantage compared with other computing approaches as it can faster finds solution [81, 83]. Each particle in the swarm has its own individual speed, Vi and individual location, Xi and search towards the local optimal position, Pi and global optimal position, Pg. Local optimal location is the location at which the elements in swarm meet its optimum suitability during fitness evaluation phase. For global optimal position, X’ is obtained by particle for the whole swarm obtained. It achieves the optimum solution by iterations. In each iteration, each element would apprise their position and speed until extreme iteration is reached.

6.7 Bacterial Foraging Optimization (BFO) BFO inspired by the behavior of an M. Xanthus and E. coli bacteria optimization process. The bacteria searches for nutrients by applying the best usage of energy attained per time. The BFOs algorithms are characterized by chemotaxis that observes chemical inclines by which bacteria send special signals between each other. This

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

397

process has four key concepts such as reproduction/swarming, chemotaxis, dispersal, and elimination. The bacteria behavior [84] for nutrient searching is shown as below. • Bacteria continually move in search for more regions of nutrient on the space. Bacteria with enough food life longer and can be split into two equivalent parts while bacteria with the lesser nutrient regions will die and disperse. • Bacteria exist in the more nutrient regions are involved with others by chemical phenomenon and bacteria exist lesser nutrient regions give a caution signal to other bacteria utilizing a special signal. • Bacteria grow a highly nutrient regions on the space. • Bacteria are disseminated in the space for new nutrients regions. The BFO algorithm applications for robot mobile navigations in static environments is discussed initially by [84] with variable speed based on Cauchy, uniform, and Gauss distributions. The same strategy in the existing of obstacles is discussed, for navigation in static environments. Real-time navigations in building floor, corridor and, lobby environments for a single robot mobile system are discussed in [85]. For Performance improvement for wheeled robots in route planning, an improved BFO algorithm is proposed [86]. The proposed method models the environment by utilizing an APF algorithm between two contrasting forces i.e., repulsive forces for the obstacle and attractive forces for the goals. The method examines the negative feedbacks from the algorithm to choose appropriate direction vectors that lead the search processes to the auspicious area with a optimum local search. The navigation in the exist of several robots is itself a problematic issue BFOs algorithms are proposed to deal with such a condition [87]. The authors combined the search harmony algorithm with BFOs. Away from the application of wheeled robots, the BFOs algorithms have been validated effectively for an industrial manipulator as reviewed authors in [88] who discovered that the enhanced BFOs give best results compared to the conventional BFOs. The UAV navigations problem by utilizing BFOs have proposed by [88]. In the manuscript, the BFOs have been combined with a proportional integral derivatives (PIDs) controller to obtain optimum search coefficients in 3D spaces and to avoid complex models while adjust the controller for UAVs.

6.8 Artificial Bee Colony (ABC) The ABCs algorithms are an intelligent-based swarm techniques adapted from the honey bees’ activities in food search and it is initially introduced [83]. The ABC algorithms are populations-based protocol comprising of inherent solutions population (i.e., food sources for bee). It is comparatively simple, light processing and it is populations-based stochastic search method in the swarm algorithm field. ABC food search cycles comprises of the following three stages. Send the working bee to food sources and assessing the juice quality. Onlookers’ bees selecting the food source

398

N. M. Elfatih et al.

after attaining information from working bees and computing the quality of nectar. Having the scout bee and send it onto probable food source [87]. The ABCs algorithms application to the robot mobile navigations in static environments is proposed by [89]. The proposed method applies ABCs for local search and evolutionary algorithms to identify the optimum route. Real-time experiment in indoor environments is discussed for result verification. Similar techniques in static environments are also discussed by [89] however the results were limited to simulations environment. For the navigation goal meeting in a real-life with dynamic environments, the ABCs’ based technique is proposed by [90]. Authors proposed hybrid method which combined the ABC with a rolling time window protocol. Several robot mobile navigations in environments are a challenge issue, the development of ABC is successfully finalized in static environments. Similar to the wheeled robot mobile navigations, the ABC is examined for navigation aerial underwater and autonomous vehicles routine problem [83]. UCAV route planning purposes to attain an optimum 3D flight path by consider the constraints and threats in the battle field. The researchers discussed the UCAV navigation problems utilizing an enhanced ABC. The ABC is amended by balanceevolution strategies (BESs) which completely uses the information convergence throughout the iterations to employ the investigation accuracy and conduct balance pursue between global explorations and the local exploitations capability [89]. ABC algorithms applications in the military sectors have been discussed by [90], where an unmanned helicopter has been examined for a stimulating mission such as accurate measurements, information gathering, and etc.

6.9 Adaptive Artificial Fish Swarm Algorithm (AFSA) AFSA is a part of Intelligence swarms, which proposed by [91]. Mostly, fishes move to a location for best consistency food by execution social search behaviors. AFSAs have roughly four behaviors’ prey, follow, swarm, and leap behaviors [90]. Lately, with its robust volume of global search, good robustness, and fast convergence rates, AFSA is extensively utilized for dealing with robot route planning issues. Hence, several researches proposed methods to enhance the standard AFS performance by fictitious entities of real fish. In a noble AFSA algorithm, identified as NAFSA, has been presented to enhance the weak issues of the standard AFSA and fastening the speed of convergence for the algorithm. A mended form of AFSA called MAFSA by dynamic parameters control has been proposed to choose the optimum features subset to enhance the categorization accuracy to enhance vector machines experimental result show that the proposed method is outperform the standard AFSA [91]. A new optimization AFS is presented to enhance the counterfeiting of AFSA behavior, which was near to reality that to enhance the ambient sense for the fishes’ foraging behavior. By testing the environment, artificial fish could monitor the surrounded information to attain an optimum state for better movement direction. The hybrid adaptive systems niche artificial fishes swarm algorithms (AHSNAFSAs) is

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

399

proposed to resolve the vehicles’ routing problems, and the ecological niche concept is discussed and presented to enhance the deficiency of conventional AFSA to achieve an optimum solution [92].

7 Conclusions A review of on UAVs navigation and route planning approaches for autonomous robots mobile, the advantages, and disadvantages of these algorithms were discussed and presented extensively in this chapter. An inclusive argument for each method in the research field under study for UAVs route planning and navigation algorithms were presented. This survey is despite the major enhancement in last studies over some years ago, a very few works of these studies has been reported in this chapter. This survey categories the various techniques into conventional and reactive techniques. The main themes of this review are shown below. • Reactive techniques achieve much better than conventional techniques due to higher ability to handle uncertainty presence in the environments. A few researches studies were presented based on dynamic environments compared with static environments. • Reactive approaches use is common for real-time navigation issues. • In dynamic environments, there are less researches on UVAs navigation for mobile goals issue compared with mobile obstacles problem. • Most researches establish a simulation environment researches on the real-time environments are much fewer. • Researches on the navigation of UASs are few compared with the single UAS. • There are countless scopes in using newly algorithms developed such as CS, SFLA, BA, FA, DE, HS, ABC, BFO and IWO for navigations in an uncertain complex environment in the existence of high uncertainty and these could be utilized to propose new types of hybrid mechanisms. • The classical approaches efficiency can be optimized by mongrelizing with reactive mechanisms

References 1. Lu, Y., Zhucun, X., Xia, G.-S., & Zhang, L. (2018). A survey on vision-based UAV navigation. Geo-Spatial Information Science, 21(1), 1–12. 2. Rashid, A., & Mohamed, O. (2022). Optimal path planning for drones based on swarm intelligence algorithm. Neural Computing and Applications, 34, 10133–10155. https://doi.org/10. 1007/s00521-022-06998-9 3. Aggarwal, S., & Kumar, N. (2019). Path planning techniques for unmanned aerial vehicles: a review. Solutions, and Challenge, Com Com, 149

400

N. M. Elfatih et al.

4. Lina, E., Ali, A., & Rania, A., et al, (2022). Deep and reinforcement learning technologies on internet of vehicle (IoV) applications: Current issues and future trends. Journal of Advanced Transportation, Article ID 1947886. https://doi.org/10.1155/2022/1947886 5. Farshad, K., Ismail, G., & Mihail, L. S. (2018). Autonomous tracking of intermittent RF source using a UAV swarm. IEEE Access, 6, 15884–15897. 6. Saeed, M. M., Saeed, R. A., Mokhtar, R. A., Alhumyani, H., & Ali, E. S. (2022). A novel variable pseudonym scheme for preserving privacy user location in 5G networks. Security and Communication Networks, Article ID 7487600. https://doi.org/10.1155/2022/7487600 7. Han, J., Xu, Y., Di, L., & Chen, Y. (2013). Low-cost multi-uav technologies for contour mapping of nuclear radiation field. Journal of Intelligent and Robotic Systems, 70(1–4), 401–410. 8. Merino, L., Martínez, J. R., & Ollero, A. (2015). Cooperative unmanned aerial systems for fire detection, monitoring, and extinguishing. In Handbook of unmanned aerial vehicles (pp. 2693– 2722). 9. Othman, O. et al. (2022). Vehicle detection for vision-based intelligent transportation systems using convolutional neural network algorithm. Journal of Advanced Transportation, Article ID 9189600. https://doi.org/10.1155/2022/9189600 10. Elfatih, N. M., et al. (2022). Internet of vehicle’s resource management in 5G networks using AI technologies: Current status and trends. IET Communications, 16, 400–420. https://doi.org/ 10.1049/cmu2.12315 11. Sana, U., Ki-Il, K., Kyong, H., Muhammad, I., et al. (2009). UAV-enabled healthcare architecture: Issues and challenges”. Future Generation Computer Systems, 97, 425–432. 12. Haifa, T., Amira, C., Hichem, S., & Farouk, K. (2021). Cognitive radio and dynamic TDMA for efficient UAVs swarm Communications. Computer Networks, 196. 13. Saleem, Y., Rehmani, M. H., & Zeadally, S. (2015). Integration of cognitive radio technologywith unmanned aerial vehicles: Issues, opportunities, and future research challenges. Journal of Network and Computer Applications, 50, 15–31. https://doi.org/10.1016/j.jnca.2014.12.002 14. Rashid, A., Sabira, K., Borhanuddin, M., & Mohd, A. (2006). UWB-TOA geolocation techniques in indoor environments. Institution of Engineers Malaysia (IEM), 67(3), 65–69, Malaysia. 15. Xi, C., Jun, T., & Songyang, L. (2020). Review of unmanned aerial vehicle Swarm communication architectures and routing protocols. Applied Sciences, 10, 3661. https://doi.org/10.3390/ app10103661 16. Sahingoz, O. K. (2013). Mobile networking with UAVs: Opportunities and challenges. In Proceedings of the 2013 international conference on unmanned aircraft systems (ICUAS), Atlanta, GA, USA, 28–31 May 2013 (pp. 933–941). New York, NY, USA: IEEE. 17. Kaleem, Z., Qamar, A., Duong, T., & Choi, W. (2019). UAV-empowered disaster-resilient edge architecture for delay-sensitive communication. IEEE Network, 33, 124–132. 18. Sun, Y., Wang, H., Jiang, Y., Zhao, N. (2019). Research on UAV cluster routing strategy based on distributed SDN. In Proceedings of the 2019 IEEE 19th International Conference on Communication Technology (ICCT), Xi’an, China, 2019 (pp. 1269–1274). New York, NY, USA: IEEE. 19. Khan, M., Qureshi, I, & Khan, I. (2017). Flying ad-hoc networks (FANETs): A review of communication architectures, and routing protocols. In Proceedings of the 2017 first international conference on latest trends in electrical engineering and computing technologies (INTELLECT). (pp. 1–9). New York, NY, USA. 20. Shubhani, A., & Neeraj, K. (2020). Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges. Computer Communications, 149, 270–299. 21. Mamoon, M., et al. (2022). A comprehensive review on the users’ identity privacy for 5G networks. IET Communications, 16, 384–399. https://doi.org/10.1049/cmu2.12327 22. Yijing, Z., Zheng, Z., & Yang, L. (2018). Survey on computational-intelligence-based UAV path planning. Knowledge-Based Systems, 158, 54–64. 23. Zhao, Y., Zheng, Z., Zhang, X., & Liu Y. (2017). Q learning algorithm-based UAV path learning and obstacle avoidance approach. In: 2017 thirty-sixth chinese control conference (CCC)

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

401

24. Zhang, H. (2017). Three-dimensional path planning for uninhabited combat aerial vehicle based on predator-prey pigeon-inspired optimization in dynamic environment. Press. 25. Alaa, M., et al. (2022). Performance evaluation of downlink coordinated multipoint joint transmission under heavy IoT traffic load. Wireless Communications and Mobile Computing, Article ID 6837780. 26. Sharma, R., & Ghose, D. (2009). Collision avoidance between uav clusters using swarm intelligence techniques. International Journal of Systems Science, 40(5), 521–538. 27. Abdurrahman, B., & Mehmetnder, E. (2016). Fpga based offline 3d UAV local path planner using evolutionary algorithms for unknown environments. Proceedings of the Conference of the IEEE Industrial Electronics Society, IECON, 2016, 4778–4783. 28. Yang, X., Cai, M., Li, J. (2016). Path planning for unmanned aerial vehicles based on genetic programming. In Chinese control and decision conference (pp. 717–722). 29. Luciano, B., Simeone, B., & Egidio, D. (2017). A mixed probabilistic-geometric strategy for UAV optimum flight path identification based on bit-coded basic manoeuvres. Aerospace Science Technology, 71. 30. Phung, M., Cong, H., Dinh, T., & Ha, Q. (2017). Enhanced discrete particle swarm optimization path planning for UAV vision-based surface inspection. Automation in Construction, 81, 25–33. 31. Ugur, O., Koray, S. O. (2016). Multi colony ant optimization for UAV path planning with obstacle avoidance. In International conference on unmanned aircraft systems (pp 47–52). 32. Adhikari, E., & Reza, H. (2017). A fuzzy adaptive differential evolution for multi-objective 3d UAV path optimization. Evolutionary Computation, 6(9). 33. Choi, Y., Jimenez, H., & Mavris, D. (2017). Two-layer obstacle collision avoidance with machine learning for more energy-efficient unmanned aircraft trajectories. Robotics and Autonomous Systems, 6(2). 34. Abdul, Q. (2017). Saeed M: Scene classification for aerial images based on CNN using sparse coding technique. International Journal of Remote Sensing, 38(8–10), 2662–2685. 35. Kang, Y., Kim, N., Kim, B., Tahk, M. (2017). Autopilot design for tilt-rotor unmanned aerial vehicle with nacelle mounted wing extension using single hidden layer perceptron neural network. In Proceedings of the Institution of Mechanical Engineers G Journal of Aerospace Engineering, 2(6), 743–789. 36. Bygi, M., & Mohammad, G. (2007). 3D visibility graph. In International conference on computational science and its applications, conference: computational science and its applications, 2007. ICCSA 2007. Kuala Lampur. 37. Rashid, A., Rania, A., & Jalel, C., Aisha, H. (2012). TVBDs coexistence by leverage sensing and geo-location database. In IEEE international conference on computer & communication engineering (ICCCE2012) (pp. 33–39). 38. Fahad, A., Alsolami, F., & Abdel-Khalek, S. (2022). Machine learning techniques in internet of UAVs for smart cities applications. Journal of Intelligent and Fuzzy Systems, 42(4), 3203–3226. 39. Ali, S., Hasan, M., & Rosilah, H, et al. (2021). Machine learning technologies for secure vehicular communication in internet of vehicles: recent advances and applications. Security and Communication Networks, Article ID 8868355. https://doi.org/10.1155/2021/8868355 40. Zeinab, K., & Ali, S. (2017). Internet of things applications, challenges and related future technologies. World Scientific News (WSN), 67(2), 126–148. 41. Wang, Y., & Yuan, Q. (2011). Application of Dijkstra algorithm in robot path-planning. In 2011 2nd international conference mechnical automation control engineering (MACE 2011) (pp. 1067–1069). 42. Patle, B. K., Ganesh, L., Anish, P., Parhi, D. R. K., & Jagadeesh, A. (2019). A review: On path planning strategies for navigation of mobile robot. Defense Technology, 15, 582e606. https:// doi.org/10.1016/j.dt.2019.04.011 43. Reham, A, Ali, A., et al. (2022). Blockchain for IoT-based cyber-physical systems (CPS): applications and challenges. In: De, D., Bhattacharyya, S., Rodrigues, J. J. P. C. (Eds.), Blockchain based internet of things. Lecture notes on data engineering and communications technologies (Vol. 112). Springer. https://doi.org/10.1007/978-981-16-9260-4_4

402

N. M. Elfatih et al.

44. Jia, Q., & Wang, X. (2009). Path planning for mobile robots based on a modified potential model. In Proceedings of the IEEE international conference on mechatronics and automation, China. 45. Gul, W., & Nazli, A. (2019). A comprehensive study for robot navigation techniques. Cogent Engineering, 6(1),1632046. 46. Hu, Y., & Yang, S. (2004). A knowledge based genetic algorithm for path-planning of a mobile robot. In IEEE international conference on robotics automation. 47. Pratihar, D., Deb, K., & Ghosh, A. (1999). Fuzzy-genetic algorithm and time-optimal obstacle free path generation for mobile robots. Engineering Optimization, 32(1), 117e42. 48. Hui, N. B., & Pratihar, D. K. (2009). A comparative study on some navigation schemes of a real robot tackling moving obstacles. Robot Computer Integrated Manufacture, 25, 810e28. 49. Wang, X., Shi, Y., Ding, D., & Gu, X. (2016). Double global optimum genetic algorithm particle swarm optimization-based welding robot path planning. Engineering Optimization, 48(2), 299e316. 50. Vachtsevanos, K., & Hexmoor, H. (1986). A fuzzy logic approach to robotic path planning with obstacle avoidance. In 25th IEEE conference on decision and control (pp. 1262–1264). 51. Ali Ahmed, E. S., & Zahraa, T, et al. (2021). Algorithms optimization for intelligent IoV applications. In Zhao, J., and Vinoth Kumar, V. (Eds.), Handbook of research on innovations and applications of AI, IoT, and cognitive technologies (pp. 1–25). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-7998-6870-5.ch001 52. Rashid, A., & Khatun, S. (2005) Ultra-wideband (UWB) geolocation in NLOS multipath fading environments. In Proceeding of IEEE Malaysian international communications conference– IEEE conference on networking 2005 (MICC-ICON’05) (pp. 1068–1073). Kuala Lumpur, Malaysia. 53. Hassan, M. B., & Saeed, R. (2021). Machine learning for industrial IoT systems. In Zhao, J., & Vinoth, K. (). Handbook of research on innovations and applications of AI, IoT, and cognitive technologies (pp. 336–358). Hershey, PA: IGI Global. https://doi.org/10.4018/9781-7998-6870-5.ch023 54. Ali, E. S., & Hassan, M. B. et al. (2021). Terahertz Communication Channel characteristics and measurements Book: Next Generation Wireless Terahertz Communication Networks Publisher. CRC group, Taylor & Francis Group. 55. Rania, S., Sara, A., & Rania, A., et al. (2021). IoE design principles and architecture. In Book: Internet of energy for smart cities: Machine learning models and techniques, publisher. CRC group, Taylor & Francis Group. 56. Jaradat, M., Al-Rousan, M., & Quadan, L. (2011). Reinforcement based mobile robot navigation in dynamic environment. Robot Computer Integrated Manufacture, 27, 135e49. 57. Tschichold, N. (1997). The neural network model Rule-Net and its application to mobile robot navigation. Fuzzy Sets System, 85, 287e303. 58. Alsaqour, R., Ali, E. S., Mokhtar, R. A., et al. (2022). Efficient energy mechanism in heterogeneous WSNs for underground mining monitoring applications. IEEE Access, 10, 72907–72924. https://doi.org/10.1109/ACCESS.2022.3188654 59. Jaradat, M., Garibeh, M., & Feilat, E. A. (2012). Autonomous mobile robot planning using hybrid fuzzy potential field. Soft Computing, 16, 153e64. 60. Yen, C., & Cheng, M. (2018). A study of fuzzy control with ant colony algorithm used in mobile robot for shortest path planning and obstacle avoidance. Microsystem Technology, 24(1), 125e35. 61. Duan, L. (2014). Imperialist competitive algorithm optimized artificial neural networks for UCAV global path planning. Neurocomputing, 125, 166–171. 62. Liang, K. (2010). The application of neural network in mobile robot path planning. Journal of System Simulation, 9(3), 87–99. 63. Horn, E., Schmidt, B., & Geiger, M. (2012). Neural network-based trajectory optimization for unmanned aerial vehicles. Journal of Guidance, Control, and Dynamics, 35(2), 548–562. 64. Geiger, B., Schmidt, E., & Horn, J. (2009). Use of neural network approximation in multiple unmanned aerial vehicle trajectory optimization. In Proceedings of the AIAA guidance, navigation, and control conference, Chicago, IL.

Navigation and Trajectory Planning Techniques for Unmanned Aerial …

403

65. Ali, E., Hassan, M., & Saeed, R. (2021). Machine learning technologies in internet of vehicles. In: Magaia, N., Mastorakis, G., Mavromoustakis, C., Pallis, E., Markakis, E. K. (Eds.), Intelligent technologies for internet of vehicles. Internet of things. Cham : Springer. https://doi.org/ 10.1007/978-3-030-76493-7_7 66. Gautam, S., & Verma, N., Path planning for unmanned aerial vehicle based on genetic algorithm & artificial neural network in 3d. In Proceedings of the 2014 international conference on data mining and intelligent computing (ICDMIC) (pp. 1–5). IEEE. 67. Wang, N., Gu, X., Chen, J., Shen, L., & Ren, M. (2009). A hybrid neural network method for UAV attack route integrated planning. In Proceedings of the advances in neural networks–ISNN 2009 (pp. 226–235). Springer. 68. Alatabani, L, & Ali, S. et al. (2021). Deep learning approaches for IoV applications and services. In Magaia, N., Mastorakis, G., Mavromoustakis, C., Pallis, E., & Markakis, E. K. (Eds.), Intelligent technologies for internet of vehicles. Internet of things. Cham : Springer. https://doi.org/10.1007/978-3-030-76493-7_8 69. Hidalgo, A., Miguel, A., Vegae, R., Ferruz, J., & Pavon, N. (2015). Solving the multi-objective path planning problem in mobile robotics with a firefly-based approach. Soft Computing, 1e16. 70. Brand, M., & Yu, H. (2013). Autonomous robot path optimization using firefly algorithm. In International conference on machine learning and cybernetics, Tianjin (Vol. 3, p. 14e7). 71. Salih, A., & Rania, A. A., et al. (2021). Machine learning in cyber-physical systems in industry 4.0. In Luhach, A. K., and Elçi, A. (Eds.), Artificial intelligence paradigms for smart cyberphysical systems (pp. 20–41). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-79985101-1.ch002 72. Mahboub, A., & Ali, A., et al. (2021). Smart IDS and IPS for cyber-physical systems. In Luhach, A. K., and Elçi, A. (Eds.), Artificial intelligence paradigms for smart cyber-physical systems (pp. 109–136). Hershey, PA: IGI Global. https://doi.org/10.4018/978-1-7998-5101-1. ch006 73. Christensen, A., & Rehan, O. (2008). Synchronization and fault detection in autonomous robots. In IEEE/RSJ intelligent conference on robots and systems (p. 4139e40). 74. Wang, G., Guo, L., Hong, D., Duan, H., Liu, L., & Wang, H. (2012). A modified firefly algorithm for UCAV path planning. International Journal of Information Technology, 5(3), 123e44. 75. Patle, B., Parhi, D., Jagadeesh, A., & Kashyap, S. (2017). On firefly algorithm: optimization and application in mobile robot navigation. World Journal of Engineering, 14(1):65e76, (2017). 76. Patle, B., Pandey, A., Jagadeesh, A., & Parhi, D. (2018). Path planning in uncertain environment by using firefly algorithm. Defense Technology, 14(6), 691e701. https://doi.org/10.1016/j.dt. 2018.06.004. 77. Ebrahimi, J., Hosseinian, S., & Gharehpetian, G. (2011). Unit commitment problem solution using shuffled frog leaping algorithm. IEEE Transactions on Power Systems, 26(2), 573–581. 78. Tang, D., Yang, J., & Cai, X. (2012). Grid task scheduling strategy based on differential evolution-shuffled frog leaping algorithm. In Proceedings of the 2012 international conference on computer science and service system, (CSSS 2012) (pp. 1702–1708). 79. Hassanzadeh, H., Madani, K., & Badamchizadeh, M. (2010). Mobile robot path planning based on shuffled frog leaping optimization algorithm. In 2010 IEEE international conference on automation science and engineering, (CASE 2010) (pp. 680–685). 80. Cekmez, U., Ozsiginan, M., & Sahingoz, O. (2014). A UAV path planning with parallel ACO algorithm on CUDA platform. In Proceedings of the 2014 international conference on unmanned aircraft systems (ICUAS) (pp. 347–354). 81. Zhang, C., Zhen, Z., Wang, D., & Li, M. (2010). UAV path planning method based on ant colony optimization. In Proceedings of the 2010 Chinese Control and Decision Conference (CCDC) (pp. 3790–3792). IEEE. 82. Brand, M., Masuda, M., Wehner, N., & Yu, X. (2010). Ant colony optimization algorithm for robot path planning. In 2010 international conference on computer design and applications, 3(V3-V436-V3), 440. 83. Mohanty, P., & Parhi, D. (2015). A new hybrid optimization algorithm for multiple mobile robots’ navigation based on the CS-ANFIS approach. Memetic Computing, 7(4), 255e73.

404

N. M. Elfatih et al.

84. Wang, G., Guo, L., Duan, H., Wang, H., Liu, L., & Shao, M. (2012). A hybrid metaheuristic DE/ CS algorithm for UCAV three-dimension path planning. The Scientific World Journal, 2012, 83973. https://doi.org/10.1100/2012/583973.11pages 85. Abbas, N., & Ali, F. (2017). Path planning of an autonomous mobile robot using enhanced bacterial foraging optimization algorithm. Al-Khwarizmi Engineering Journal, 12(4), 26e35. 86. Jati, A., Singh, G., Rakshit, P., Konar, A., Kim, E., & Nagar, A. (2012). A hybridization of improved harmony search and bacterial foraging for multi-robot motion planning. In: Evolutionary computation (CEC), IEEE congress, 1e8, (2012). 87. Asif, K., Jian, P., Mohammad, K., Naushad, V., Zulkefli, M., et al. (2022). PackerRobo: Modelbased robot vision self-supervised learning in CART. Alexandria Engineering Journal, 61(12), 12549–12566. https://doi.org/10.1016/j.aej.2022.05.043 88. Mohanty, P., & Parhi, D. (2016). Optimal path planning for a mobile robot using cuckoo search algorithm. Journal of Experimental and Theoretical Artificial Intelligence, 28(1e2), 35e52. 89. Wang, G., Guo, L., Duan, H., Wang, H., Liu, L., & Shao, M. (2012). A hybrid metaheuristic DE/ CS algorithm for UCAV three-dimension path planning. The Scientific World Journal, 583973, 11 pages. https://doi.org/10.1100/2012/583973 90. Ghorpade, S. N., Zennaro, M., & Chaudhari, B. S., et al. (2021). A novel enhanced quantum PSO for optimal network configuration in heterogeneous industrial IoT, in IEEE access, 9, 134022–134036. https://doi.org/10.1109/ACCESS.2021.3115026 91. Ghorpade, S. N., Zennaro, M., Chaudhari, B. S., et al. (2021). Enhanced differential crossover and quantum particle Swarm optimization for IoT applications. IEEE Access, 9, 93831–93846. https://doi.org/10.1109/ACCESS.2021.3093113 92. Saeed, R. A., Omri, M., Abdel-Khalek, S., et al. (2022). Optimal path planning for drones based on swarm intelligence algorithm. Neural Computing and Applications. https://doi.org/ 10.1007/s00521-022-06998-9

Intelligent Control System for Hybrid Electric Vehicle with Autonomous Charging Mohamed Naoui, Aymen Flah, Lassaad Sbita, Mouna Ben Hamed, and Ahmad Taher Azar

Abstract The present chapter deals with a general review of electric vehicles (EVs) and testing the efficiency of modern charging systems. This work is concentrated also on hybrid vehicle architectures and recharge systems. In the first step and more precisely, a global study on the different architecture and technologies for EVs examined the battery, electric motor, and different sensor actions in electric vehicles. The second part also discusses the different types of charging systems used in EVs which we divided into two types, the first one is classic chargers the second is the autonomous charger. In addition, an overview of the autonomous charger is presented along with its corresponding mathematical modeling to address the photovoltaic charger (PV) and Wireless charging system (WR). After a clear mathematical discerption of each part and by showing the needed electronic equipment to assure each tool’s role, an easy management loop is designed and implemented. Then propose a hybrid charging system between PV and WR and then used an intelligent power distribution system. Then, Matlab/Simulink software is used to simulate the energetic performance of an electric vehicle with this hybrid recharge tool under various simulation conditions. At the end of this study, the given results and their corresponding discussions show the benefits and the drawbacks of each solution and prove the importance of this hybrid recharge tool for increasing vehicle autonomy. Keywords Hybrid electric vehicle · Wireless charging system · Batteries technology · Intelligent control · Photovoltaic · Fuzzy logic control M. Naoui (B) · A. Flah · L. Sbita · M. Ben Hamed Environment and Electrical Systems LR18ES34, National Engineering School of Gabes, University of Gabes, Zrig Eddakhlania, Tunisia e-mail: [email protected] A. Flah e-mail: [email protected] A. T. Azar College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia e-mail: [email protected]; [email protected]; [email protected] Faculty of Computers and Artificial Intelligence, Benha University, Benha 13518, Egypt © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_13

405

406

M. Naoui et al.

1 Introduction Electrifying transport systems have become a necessity in the modern city, and having an electric vehicle instead of a fuel-powered one is becoming essential given the technological advantages of communication techniques as well as the advantages of taxes and the reduced price of electrical energy, compared to that of fuels [1]. This transport system has been applied to several models and architectures which differ internally. In recent years, most countries have sought to develop their transport systems. Indeed, the facilities offered by new technologies and the global orientation in saving the planet from atmospheric pollution have pushed toward the electrification of transport systems. The future objective has therefore become the elimination of all transport systems based on polluting energies, the replacement of which by other systems using clean energy has become a necessity in most countries. This means that modern transport systems are based, either totally or partially, on electrical energy, which is non-polluting energy. The scarcity of fossil fuels and ecological concerns are leading to a phase of the energy transition. The transport sector absorbed nearly 66% of global oil production in 2011, producing more than 70% of global greenhouse gas emissions [2]. The automotive sector is at the heart of these problems. Therefore, research and development of technologies related to electric vehicles have become indispensable [3]. About this sector and its basic elements, the storage of electrical energy is one of the main success factors in this field [4]. Generally, for an electric vehicle to be competitive in the market, the factor mentioned above must be high profitability. About the latter, the model of the charging system used, directly influences the classification of the vehicle model, compared to other models. These charging systems have been treated and studied in various research works since the appearance of accumulators. This led to the appearance of the notion of a charger connected to the network [5]. These conventional systems immobilize the system that carries the accumulators and limit the area of movement of the latter [6]. This behavior includes weaknesses and strengths for some applications. Concerning the objective studied in this thesis, electric vehicles using grid-connected chargers are exposed to various problems [7]. These are especially linked to the recharging time, which is generally high, and to the need for stops for a long journey, and this for a long time [5]. Weaknesses appeared with this type of vehicle. The resolution of this problem began with the appearance of new lithium battery technology as well as with mobile charging systems, such as the photovoltaic, hybrid, or even mobile contact system (example of the metro). In this context, the authors, in [8, 9], proposed the first version of an adaptable photovoltaic energy system for an electric vehicle [10]. This system is used in fixed stations (doors of photovoltaic panels or panels installed on the vehicle itself). From another point of view, the integration of the case of wireless charging when parking the vehicle appeared in [11], and then problems appeared in relation to the frequency factor. The researchers in [12, 13] have proposed a version characterized by a frequency study of this system taking into account the static and dynamic problems [14]. However, the efficiency of all charging systems is related to the type of control

Intelligent Control System for Hybrid Electric Vehicle …

407

and energy distribution used. These kinds of control are indirect or may employ procedures that do not require recharge system information or knowledge, such as fuzzy logic (FL), neural network (NN), and ANFIS-based tactics [15–17]. In this context, the work presented in this chapter aims to study the present history of electric vehicles. We have tried to present this system in accordance with what the literature suggests. We have particularly talked about the different topologies that exist, as well as the known architectures. The rest of this part consists of a state of known and used recharging systems. We have exploited their modern and classic architectures. Then use an intelligent power distribution system to save the energy in the battery and use the available sources of energy in the most efficient way. Finally, in this chapter, the authors studied the incorporation of several recharging devices within the vehicle and were able to use also the vehicle in motion. This hybrid recharge system consists of photovoltaic panels mounted on the body of the car and collected solar energy to be used in the power pack. In addition, the wireless charging device is removed from the car in order to provide control even though the vehicle is in motion on the rechargeable paths. These systems are modeled and explained in order to define the hybrid recharge system. Also, this system is studied using Matlab/Simulink tool for having information about the power flux and the relation of the external factors and the vehicle speed on the battery state of charge parameter. Therefore, this chapter is organized into four sections. After the introduction, a general review presents the general classification of electric vehicles. The third section explains the architecture of electric vehicles, where it accurately presents its components. The next section describes electric vehicle charging, the mathematical model for the autonomous charging system, and simulation results. Finally, a conclusion resumes the chapter.

2 Preliminaries 2.1 Hybrid Vehicle and Pure Electric Vehicle The electric vehicle composes of two models, which are the hybrid version and the pure EV. The combustion engine seems the only difference between the two models as it is existing only in the hybrid model. The initial pack of components regroups a source of energy beside a battery system. This bloc is connected to the power electronic converter for feeding the main electric motor. The function bloc needs a system of control beside a high-processor calculator for supervising all the processes of energy management and vehicle speed control [18]. A lot of problems and advantageous descriptions can be found for each of these models [19]. Pure electric vehicle is friendly to the environment. Based on the problem of environment and gas emissions, the pure electric vehicle field has encouraged research for making this transport solution more efficient. Optimizing the size of the motor and improving the battery technologies, by making various solutions

408

M. Naoui et al.

Fig. 1 General classification of electric vehicle

for recharge are the different fields in this sector of research [20]. The EVs use electrical energy to drive the vehicle and to interact with the vehicle’s electrical system. According to the Technical Committee of the International Electro-Technical Commission (IETCTC), whether a vehicle uses two or more energy sources or storage systems, or converters to drive the vehicle, it is referred to as a hybrid electric vehicle (HEV) as long as at least one source supplies electricity [21]. EVs are categorized according to the combination of sources into various categories [3]. The battery alone serves as a source for the electric vehicle battery (BEV), the electric vehicle fuel cell and battery in the electric vehicle fuel cell (FCEV), the HEV battery and ICE, the PEV battery and grid or the external charging station as shown in Fig. 1. In the following section, the specifics of the EV forms are discussed.

2.2 Hybrid Vehicle Architecture The HEV can be classified into three major architectures. Architecture refers to the configuration of the main elements of the powertrain. In our case, it is the heat engine, an electric machine, and a battery. Three architectures can be characterized by how thermal and electrical energies are channeled to the wheels: series, parallel, or power bypass (series–parallel) [22].

Intelligent Control System for Hybrid Electric Vehicle …

409

ICE

Generator

Electric Motor

Fig. 2 The series architecture of the hybrid vehicle

2.2.1

Hybrid Series

In this configuration (Fig. 2), the heat engine drives an alternator that supplies the battery in the event of discharge and the electric motor in the event of a high-power demand. This type of model allows great flexibility of propulsion, it consists in operating the heat engine in the range of its highest efficiency and in increasing the autonomy of the vehicle. On the other hand, the overall efficiency is very low because of a double conversion of energy. Then, it requires a relatively powerful electric motor because it alone provides all of the propulsion [23]. However, this architecture makes it possible to satisfy one of the constraints raised in the issue, particularly low emissions in the urban cycle and a saving of 15 to 30% in consumption.

2.2.2

Parallel Hybrid

In a parallel hybrid structure, the heat engine supplies its power to the wheels as for a traditional vehicle. It is mechanically coupled to an electric machine which helps it. This configuration is shown in Fig. 3 [24].

410

M. Naoui et al. ICE

Electric Motor

ICE

Electric Motor

Fig. 3 Double-shaft and single-shaft parallel hybrid configuration

Intelligent Control System for Hybrid Electric Vehicle …

411

The peculiarity of its coupling also gives it the name of parallel hybrid with the addition of torque or of addition of speed depending on the structure and design of the vehicle. The torque addition structure adds the torques of the electric machine and the heat engine to propel the vehicle (or to recharge the battery). This connection can be made by belts, pulleys, or gears (a technology called parallel double-shaft hybrid). The electric machine can also be placed on the shaft connecting the transmission to the heat engine (a technology called parallel single shaft). The speed addition structure adds the speeds of the heat engine and the electric machine. The resulting speed is related to the transmission. This type of coupling allows great flexibility in terms of speeds. The connection is made mechanically by planetary gear (also called epicyclic gear). This architecture requires more complex control than that of the serial architecture and requires additional work for the physical integration of power sources. Nevertheless, not insignificant gains can be obtained, even by using electrical components of low power and low capacity. Also, these gains make it possible to compensate for the additional cost of this architecture and the excess weight associated with the batteries and the electric motor.

2.2.3

Series–Parallel Hybrid

A series–parallel architecture combines the operating modes and the advantages of both series and parallel architectures. The best-known of the series / parallel hybrid architectures is that of the Toyota Prius. The latter uses a planetary gear and a first electric machine that brings the engine to its best performance points, a second machine participates in traction. Within these structures, part of the energy delivered by the heat engine is transmitted mechanically to the wheels. At the same time, electric machines take or supply energy to the transmission to meet the objectives (acceleration, charge, or discharge of the battery, optimal consumption of the heat engine). In most cases, there are two electric machines, each of which can be either a motor or a generator. This configuration, therefore, allows at least 4 operating modes each having certain advantages. Such an architecture is described in Fig. 4. This architecture has the advantage of being very efficient, without the use of a clutch or variable speed drive, but with very delicate management [25].

2.2.4

Comparison of the Different Propulsion Structures

Depending on the configuration used, here are some advantages and disadvantages of each one presented in Table 1.

412

M. Naoui et al. ICE

Generator Electric Motor

Fig. 4 Schematic of the double hybridization vehicle

3 The Architecture of Electric Vehicles 3.1 Battery Technologies The battery is defined as a device that ensures the storage of energy to deliver electrical energy. It is divided into two categories: Primary batteries and secondary batteries. Primary batteries are characterized by the provision of energy for a single time during discharge while secondary batteries permanently offer storage energy during the process of charge and discharge during the whole life of the battery. The characteristics of a battery are generally defined by several criteria indicating its performance which are: Energy density, Cost in the market, The number of charging cycles, The discharge processes, The influence on the environment, The temperature ranges, and the memory effects. This section will focus on secondary batteries as they are the ones used in electric or hybrid vehicles. There are a lot of batteries used in EVs, generally based on Lead Acid, Nickel, Lithium Metal, Silver, and Sodium-Sulfur. The following battery technologies that are used in EVs will be described respectively: Lead-acid (Pb-acid), Nickel–Cadmium (NiCd), Nickelmetal-hydride (Ni-MH), Lithium-ion (Li-ion) [26]. Table 2 shows the characteristics

Intelligent Control System for Hybrid Electric Vehicle …

413

Table 1 Comparison of various architectures Hybrid type

Advantages

Disadvantages

Hybrid series

– Good energy efficiency at low speeds (all-electric mode in urban areas) – Good control of the heat engine – The generator set is not necessarily placed next to the electric traction machine: additional degree of freedom to place the various components (example of the low-floor bus) – Relatively easy to manage (compared to other architectures) – It is easy to design and control. It requires very little mechanical equipment (no clutch or gearbox)

– The poor energy efficiency of the overall chain (in extra urban areas) – Use of 3 machines, one of which (the electric traction machine) is at least of high power (maximum size) – All thermal modes are not possible

Parallel hybrid – Good energy efficiency – Use of a single electric machine – All-thermal and all-electric modes (in some cases) are possible – Transmission is little modified (in some cases) compared to the conventional vehicle Mixed hybrid

– More increased operation of the heat engine: poor dynamics – The torque setpoint must be distributed at all times between the two torque sources – Mechanical coupling and complex energy control

– Good energy efficiency – Use of 3 machines or 2 machines – Very good energy distribution with 2 clutches – Very complex coupling and very – Vehicle flexibility: all modes are delicate management authorized (thermal, electric, series, – It requires at least two electric parallel, or series–parallel) machines in addition to the heat – No break-in torque at the wheel engine, which makes it expensive and very heavy

of the different battery hybrid vehicles, their electrification systems, costs, and CO2 emission minimization in each case [27]. Table 2 Comparative table of battery technologies Battery

Plomb acide

Ni–Cd

Ni-Mh

Li-ion

Energy density (Kh/kg)

30–50

45–80

60–120

160–200

Number of cycles (charge/discharge)

500 → 800

1000 → 2000

600 → 1500

400 → 1200

Loading time

6 → 12 h

1→2h

2→4h

2→4h

Operating temperature

– 20 → 60 °C

– 40 → 60 °C

– 20 → 60 °C

– 20 → 60 °C

414

M. Naoui et al.

Fig. 5 Example of supercapacitor

3.2 Super-Capacitors The principle of Supercapacitors (SC) as shown in Fig. 5 is to store energy in the electrostatic form they are energy storage systems of low energy density but of significant power density. Consequently, they are used in the transient phases to provide the requested power peaks, in order to reduce current stresses, reduce the size and increase the lifespan of the main energy source (batteries or battery to fuel) [28]. The supercapacitor consists of two metal collectors (see Fig. 6), each coupled to two carbonaceous, porous electrodes impregnated with an electrolyte. To remedy the problems of oversizing batteries in VEH applications (Ecopur ventilation for housing), supercapacitors have very interesting properties. the charge transfer kinetics are faster than in the case of batteries. Their lifetime is of the order of a few hundred thousand charge/discharge cycles [29].

3.3 The Electric Motor The motor is a relatively simple component at the heart of an electric vehicle that operates on the interaction forces (force vectors) between an electromagnet and a permanent magnet. When braking, the mechanical chain becomes part of the power source, and the main energy source (battery) becomes the receiver an actuator that creates rotational motion from electrical energy Electric motors are widely used

Intelligent Control System for Hybrid Electric Vehicle …

415

Fig. 6 Composition of a supercapacitor

because of their reliability, simplicity, and good performance. An electric motor is composed of an output shaft, a frame body, and two electric spindles [30]. There are a large number of engine types.

3.3.1

DC Motors

The drives with DC motors have long been used in electric vehicles because they provide simple speed control. Furthermore, this sort of engine has great electric propulsion properties (very favorable torque curve at low speed). However, their production is costly, and the brush-collector system must be maintained [31]. Their speed is limited, and they have low specific power, typically 0.3 to 0.5 kW/kg, whereas gasoline engines have a specific power of 0.75 to 1.1 kW/kg. As a result, they are less trustworthy and unsuitable for this purpose [32].

3.3.2

Asynchronous Motors

The asynchronous motor is made up of a stator and a rotor. The stator is the fixed part of the motor and it has three windings (or windings) which can be connected in star (Y) or in delta (4) depending on the supply network. The rotor is the rotating part of the engine and is cylindrical, it carries either a winding (usually three-phase like the stator) accessible by three rings and three brushes, or an inaccessible squirrel cage, based on aluminum conductive edges. In both cases, the rotor circuit is shortcircuited (by rings or a rheostat) [33]. The asynchronous machine, due to its simplicity of manufacture and maintenance, is currently the most widespread in the industrial sector and has much better performance than other types of machines. Furthermore, these machines have a lower mass torque, efficiency, and power factor than magnet machines.

416

3.3.3

M. Naoui et al.

Synchronous Motors

The synchronous motors Although more difficult to manage, more expensive, and potentially less robust, synchronous motor selection has become critical in electric and hybrid vehicles. In generator and motor mode, the synchronous machine has the highest efficiency. A synchronous motor, like an asynchronous motor, consists of a stator and a rotor separated by an air gap. The only change is in the rotor design.

3.3.4

Operation of the Electric Motor

Electric vehicles are increasingly part of our daily lives, so it was time to look at the operation of their motor as well as the different versions (synchronous, asynchronous, permanent, induction, etc.). So, let’s see the general principle of this technology which however is not new [34]. A. The principle of an electric motor The principle of an electric motor in Fig. 7 regardless of its construction, is to use magnetic force to generate movement. Magnetic force is recognizable to us since magnets may repel or attract other magnets. We shall employ two primary elements for this: permanent magnets and copper coils (ideal materials for this work because they are the most conductive…), or even copper coils in some cases (therefore without a permanent magnet). Everything will be mounted on a circular axis to achieve a permanent and linear movement; the idea is to create something with a cycle that will repeat itself as long as we feed the motor.

Fig. 7 Electric motor

Intelligent Control System for Hybrid Electric Vehicle …

417

It should also be known that a coil traversed by current (thus electrons) then behaves like a magnet, with an electromagnetic field with two poles: north and south. All electric motors are reversible: if we move the magnet manually, this generates an electric current in the coil (we can then recharge the battery, for example, this is regeneration) If we inject current into the coil, then the magnet begins to move. In reality, the current goes from – to + . If the convention was decided that it would go from + to – (we decided on this convention before having the true direction of the current). B. Parts of an electric motor B.1 Accumulator: This is where the current that will power the motor comes from, generally from a Lithium-Ion battery or Ninh battery. B.2 Stator: This is the peripheral part of the engine, the one that does not rotate. To help you remember, tell yourself it’s the static part (stator). It is in 99% of the cases made up of coils that we will more or less supply (but also more or less alternate in = /- with alternating current motors) to make the rotor turn. B.3 Rotor: This is the moving part, and to remind you of this, think of the word rotation (rotor). It is generally not powered because being mobile it is difficult to do (or in any case, it is not sustainable over time). C. Transmission: Because the electric motor has a very high operating range (16,000 rpm on a Model S (model of electric vehicles, for example) and torque available quickly (the lower the revs, the more torque), it was not necessary to produce a gearbox, so we have a type of motor that is directly connected to the wheels! The gear ratio remains constant whether you are traveling at 15 or 200 km/h. The rhythm of the electric motor is not exactly set on that of the wheels; there is what is known as a reduction. On a Model S, the ratio is around 10%, which means that the wheel turns 10 times slower than the electric motor. An epicyclic gear train, which is common in automatic gearboxes, is used to obtain the reduction ratio. Figure 8 depicts this global structure. After this reducer, there is finally the differential which allows the wheels to rotate at different speeds. No need for a clutch or a torque converter because if a thermal engine needs to be in motion all the time, this is not the case with an electric motor. It, therefore, has no idling speed or need for a clutch that acts as a bridge between the wheels and the engine: when the wheels stop, there is no need to disengage.

418

M. Naoui et al.

Fig. 8 Transmission system

3.3.5

The Calculator and Different Sensors of Electric Vehicles

The calculator is a power calculator and manages a lot of things for example controls the energy flows thanks to the many sensors it has. For example, when I accelerate, I press a sensor (the pedal) called a potentiometer (it’s the same thing on modern thermal vehicles), the computer then manages the flow of energy to be sent to the engine according to my “degree of acceleration”. Same when I release the pedal, it will manage energy recovery by sending the juice generated by the electric motor (therefore reversible) to the battery while modulating the electric flow. It can ripple the current using a chopper (battery to motor) or even rectify the current (recovery of alternative energy for the DC battery). The different sensor action in electric vehicles is shown in Table 3.

4 Electric Vehicles Charging 4.1 Types of Classic Chargers Integrated chargers can reuse all, or part, of the components of the traction chain to perform the recharge. For example, the power of the traction chain of the Renault ZOE 2nd generation is 80 kW with an on-board battery and a capacity of 40 kWh, which makes it possible to envisage a substantial recharge in 30 min using the components of the traction inverter. The tree in Fig. 9 is taken from a review of integrated onboard chargers carried out as part of a CIFRE thesis with Renault S.A.S. lists the different means of exploiting the traction chain for charging. This classification is based on the study of 67 publications including patented topologies, journal articles, and conference papers [35].

Intelligent Control System for Hybrid Electric Vehicle … Table 3 Different sensor actions in the electric vehicles

Nam sensor Battery discharge status sensor

Battery discharge status sensor

Brake pedal action sensor

Parking brake sensor

Electric motor temperature sensors

Outdoor temperature sensor

4 tire pressure sensors

Front obstacle sensors

Reversing radar sensor

Light sensor

419 Figures

420

M. Naoui et al.

Fig. 9 Classification of on-board and powertrain-integrated chargers

The reuse of onboard power electronics and/or electrical machine windings can cause EMC interference problems with other equipment connected to the electrical system and also with domestic protection devices. This may affect the availability of the EV load. If the high-frequency components of the leakage currents are too high, two things can happen blinding or untimely tripping of protection devices, such as the differential circuit breaker. Any event that can cause an RCD to blind poses a high safety risk to the user. Therefore, the IEC 61,851–21 safety standard specifies that the leakage current must not exceed 3.5 mARMS. Thus, the reduction of emissions conducted towards the network, and more particularly those of common mode currents, in a large frequency range [150 kHz – 30 MHz] is often achieved by galvanic isolation through the use of topologies based on power transformers. Given the charging power levels, galvanic isolation has an impact on the cost and volume of the charger. When the charger is not isolated, manufacturers use passive and active filtering in order to limit the disturbances generated by the charger.

4.2 Autonomous Charger Autonomous chargers, or non-traditional chargers, are charging systems that combine the use of new energy sources and advanced charging techniques, which ensure the simplicity of the charging task, energy saving, and even, recharging time savings.

Intelligent Control System for Hybrid Electric Vehicle …

4.2.1

421

Photovoltaic Charger

The average intensity of solar energy reaching the earth is 1367W/m2 . Benefiting from this amount of energy has encouraged researchers to design solar receivers intended to transform this solar energy into electrical energy. The results found have guided vehicle manufacturers towards another energy source that will later be used to improve vehicle autonomy. This solar charging system is essentially based on a set of components that essentially includes the solar receivers which ensure the obtaining of direct electricity when light reaches them. The efficiency of this conversion depends mainly on the type of solar panel, such as polycrystalline, monocrystalline, or amorphous silicon. Charge controllers are also indispensable tools in this operating loop since the outputs of the panels are variable and must be adjusted before being stored in the battery or supplied to the load. Charge controllers work by monitoring battery voltage. In other words, they extract the variable voltage from the photovoltaic panels, depending on the safety of the battery. Once fully charged, the controller can short out the solar panel to prevent further charge buildup in the battery. These controllers are usually DC-DC converters. Figure 10 shows the architecture of the solar vehicle and the location of the charge controller [36]. Most of these controllers measure the voltage in the battery and supply current to the battery accordingly or completely stop the flow of current. This is done by measuring the current capacity of the battery, rather than looking at its state of charge (SOC). The maximum battery voltage allowed to reach is called the “charging set point”. Factors such as prevention of deep discharge, battery sulfation, overcurrent, and short circuit, are also prevented by the controller. A deep discharge can be detected by the microcontroller, and it will then initiate an automatic acceleration

Fig. 10 Solar vehicle architecture

422

M. Naoui et al.

charge to keep the battery activated. Depending on the connections, charge controllers can be of two types: the parallel controller, which is connected in parallel with the battery and the load, and the series controller, which is placed in series between the solar, the battery, and the load.

4.2.2

Inductive Power Transfer

Implemented by inductive power transfer, wireless vehicle charging is convenient in terms of safety and convenience: the user need not worry about handling power cords, thus avoiding the risk of electrocution, and can park the vehicles in appropriate spaces, so that the charging operation can automatically take place. The coils are generally placed in the following way: the one connected to the grid is placed on the ground, and the other, connected to the battery, is placed below the chassis of the vehicle, as can be seen in Fig. 11 [37]. The minimum power of the electric vehicle charging level is generally 3 kW. Various examples of commercial electric vehicles’ wireless charging stations can be provided as electric vehicle companies are increasingly interested in this innovation. Among vehicle manufacturers, Toyota, Nissan, General Motors, and Ford are some of the companies showing interest in inductive charging [38]. Among companies producing wireless charging systems for electric vehicles, Evatran and HaloIPT are leaders in providing and improving inductive charging technology. Evatran has created the Plugless power inductive charging system. HaloIPT, of which one of the images of the inductive charger is presented in Fig. 11, was acquired by Qualcomm. The opportunity for fast charging would make IPT more attractive for electric vehicles [39, 40]

5 The Mathematical Model for the Autonomous Charging System In this part we tested the efficiency of the autonomous system, we chose the two most used systems in the application of electric and hybrid vehicles, the first photovoltaic charging system and the second wireless charging a hybrid between the PV and WR and tests the efficiency. then we proposed a hybrid system between PV and WR and tested the efficiency. The different blocks for the hybrid recharge system are shown in Fig. 12.

Intelligent Control System for Hybrid Electric Vehicle …

423

Battery Pack BMS with control & Protection DC/DC Converter AC/DC Converter

Grid Or Home

AC/DC Converter

DC/AC Converter

(a) Battery Pack BMS with control & Protection Grid Or Home DC/DC Converter AC/DC Converter

AC/DC Converter

DC/AC Converter

(b) Fig. 11 Wireless charging of electric vehicles, based on IPT a wireless V2G b plug-In V2G

5.1 Inductive Power Transfer Model A simplified representation of the IPT system is given in Fig. 13 where “V1 ” and “V2 ” indicate the input and output voltages of this system. Each part consists of a set of resistance and capacitance placed in series, between the source and the part either emitting or receiving. This system is similar to that of a moving transformer [22].

424

M. Naoui et al.

DC/DC Converter

Battery Pack BMS with control & Protection DC/DC Converter AC/DC Converter

AC/DC Converter

Grid Or Home

DC/AC Converter

Fig. 12 Proposed hybrid system

C1

V1

I1

R1

R2

L1

C2

L2

I2

V2

M Fig. 13 The IPT system: a simplified representation

From this representation, it is possible to express the primary voltage delivered by the DC-AC stage according to the parameters of the coil of the transmitting part Eq. (1). ⎧−  − → − → 1 ⎨→ + R1 I 1 − jωM I 2 V1 = jωL 1 + jωC 1   − → → − → 1 ⎩ − V2 = jωM I1 − jωL 2 + jωC + R2 I2 2

(1)

Intelligent Control System for Hybrid Electric Vehicle …

425

Subsequently, the vectors linked to V1 and V2, by considering that ϕ1 and ϕ2 are their phases, with respect to a zero-phase reference vector, are given by (2). − → V1= − → V2=

√ 2 2 V1 (cosϕ1 π √ 2 2 V 2 (cosϕ2 π

+ j.sinϕ1 ) + j.sinϕ2 )

(2)

Is the real part of the power on the primary and secondary sides equivalent to the active power as seen through the Eq. (3): ⎧ → − → ⎨ P1 = Re − V 1 I1 → − → ⎩ P2 = Re − V 2 I2

(3)

From Eq. (1), the vector of the primary current is expressed according to Eq. (4). ⎧ → − → jωM − V ⎪ V 1− R 2 → ⎨− 2 I 1= 2 R1+ (ωM) R2 ⎪ ⎩ ω = √ 1 = 2π f LC

(4)

where L is the intrinsic inductance of the primary and secondary coils, assumed to be identical. C is the value of the series compensation capacitors C1 and C2, assumed to be equal (C1 = C2). The expression of the current on the emitting side is therefore expressed in Eq. (5). ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩

− → I 1=



√ 2 2 . π

X −Y 2 R1 + (ωM) R



2

X = V1 (cosϕ1 + j.sinϕ1 )

(5)

Y = j ωMR2V2 (sinϕ2 + j.cosϕ2 )

However, the phase delay is defined as the phase difference between V2 and V1, hence: ϕ D = ϕ1 − ϕ2

(6)

According to Eqs. (2), (3), and (5), the real power of the primary side is defined according to Eq. (7) ⎡ ⎤ V V ωM 2 8 ⎣ V1 + 1 R22 sinϕ D ⎦ P1 = 2 2 π R1 + (ωM) R2

From Eq. (1), the secondary current vector is as follows:

(7)

426

M. Naoui et al.

− → I 2=

− → jωM V 1 R1

R2 +

− → − V2

(8)

(ωM)2 R1

The secondary current vector is, therefore: ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩

− → I 2= A=



√ 2 2 . π

A−B 2 R2 + (ωM) R

jωM V1 (sinϕ1 R1

1

+ j.cosϕ1 )

(9)

B = V2 (cosϕ2 + j.sinϕ2 )

According to Eqs. (3), (8), and (9), the real power on the secondary side is defined by Eq. (10). ⎡ ⎤ V V ωM 2 8 ⎣ 1 R21 sinϕ D − V2 ⎦ P2 = 2 2 π R2 + (ωM)

(10)

R1

The real effective values of the primary and secondary waveforms are related to the direct voltages V1 and V2, as a function of their phase shift values, ϕs1 and ϕs2 , relating, respectively, to the primary and secondary bridges. Considering Vdc and Vbatt as the amplitudes of V1 and V2 : 

  V1 = Vdc .sin ϕ2s1  V2 = Vbatt .sin ϕ2s2

(11)

Finally, replacing (7) by (10) and (11), the real powers are also obtained: ⎧ ϕ    V sin( 2s1 )  ⎪ P1 = π82 · dc (ωM) Vdc sin ϕ2s1 + E 2 ⎪ ⎪ R + 1 ⎪ R2 ⎪ ϕ ⎪ ⎨ P = 8 · Vbatt sin( 2s2 )  F − V sin ϕs2  2

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

π2

batt 2 R2 + (ωM) R1 ϕs2 V ωMsin( 2 )sin(ϕ D ) E = batt Rϕ2 Vdc ωMsin( 2s1 )sin(ϕ D ) F= R1

2

(12)

5.2 Photovoltaic Generator Model The solar cell is an electrical component used in some application requirements (such as an electric vehicle) to technologically transform solar energy into electricity to produce the electrical energy requirements. Many authors have suggested various

Intelligent Control System for Hybrid Electric Vehicle …

427

models for solar cell to prove their research work [41–50]. The current I c can be given by Ic = I ph + Ish + Id

(13)

The current I ph (PV cells current) can be evaluated as: ⎧ ⎪ ⎨ I ph =

G Gr e f

  (Ir s−r e f + K SC T (Tc − Tc−r e f ) )

+Rs Ic ) Id = Ir s (exp q(VcαkT − 1) 1 Ish = R p (Vc + Rs Ic )

⎪ ⎩

(14)

The I rs current can be approximately obtained as Ir s =

Ir s−r e f

(15)

Voc exp( n sq.nβT )−1 c

Finally, the current I c can be given by Ic = I ph − Ir s

q(Vc + Rs Is ) 1 exp −1 − (Vc + Rs Is ) αkT Rp

(16)

The model of a photovoltaic generator depends on the number of parallel and series cells, respectively Np and Ns. 

I p = N p Ic V p = Ns n s Vc

(17)

Finally, the photovoltaic generator current can be given by 

I p = N p I ph − [N p Ir s (K − K =

q exp αkT

V ( n s Np s

+

N p Vp ( R p n s Ns Rs I p )− Np

+

Rs I p Np

)]

1)

(18)

The resistance Rp and Rs parameters are not considered (Rp >> Rs). Here is the model with Rp = ∞ and Rs = 0.

I p = N p I ph

Vp q − N p Ir s (exp ( ) − 1) nβTc n s Ns

(19)

428

M. Naoui et al.

6 Simulation Results and Discussion Following this exposure, it is important to cite the conditions of the simulation vehicles out during this phase. Table 4 gives the technical specifications of the hybrid system, as well as the driving condition applied to the vehicle. Driving exhibits a variety of forms of acceleration: low, medium, and high, ensuring a state of deceleration, to demonstrate that this hybrid system can achieve a visible energy gain, especially when the traction motor is not consuming. This phase is summarized in Fig. 14, especially at times from 8 to 13 s. Table 4 Characteristics of the hybrid system

Electrical characteristics of the hybrid system Electric motor power

50 KW

Type of electric motor

PMSM

Type of battery

lithium

Battery voltage

288 V

Maximum vehicle speed

120 km/h

Max motor torque

150 Nm

Mechanical characteristics of the vehicle Max vehicle weight

332 kg

Tilt angle αr Tilt angle αr

Variable according to route

Vehicle front surface

2.7 m2

Air density

1,225 kg/m3

Features of the PV charging system

Fig. 14 Driving cycle selected during the simulation phase

Vehicle front surface

1.5 m2

Number of PV cells

145 145145 °C

Intelligent Control System for Hybrid Electric Vehicle …

429

6.1 Fuzzy Logic Algorithms The fuzzy logic technique has recently been established as one of the intelligent methods used in power distribution systems to detect the power generated by the recharge systems and distribute it in the most efficient way to recharge the battery with a large amount of power. This control is more robust than traditional control techniques and does not necessitate a perfect understanding of the system’s mathematical model. A fuzzy logic supervisor’s basic three functional blocks are fuzzification, inference engine, and defuzzification. As a result, input variables, output variables, membership functions, and fuzzy rules identify it. Any fuzzy controller’s success is determined by variables such as the number and relevance of any chosen input, the fuzzification method, and the number of rules. The chosen variables are related to these three signals in this case of application and are based on multiple tests performed before in the study of [51] and in our earlier works. So, to pilot this energy we will propose a simple flowchart of energy management which is presented in Table 5.

6.2 Power Delivered by the Charging System To test the profitability of this hybrid system, especially in relation to the state of charge of the battery in the case where the vehicle is in motion, we will refer in what follows to the simulation conditions mentioned above. Indeed, we propose to study to have the energetic behavior of the vehicle for an increased speed of the form given in Fig. 14. The forms of power delivered by the studied source are implemented in Fig. 15, corresponding respectively to the photovoltaic and wireless cases. Switching on these two energy sources gave a new kind of power output. The average value of the power acquired from this hybrid system is greater than that of the photovoltaic or wireless mode alone. As explained in the previous part, it is clear that the average value of the power acquired by the hybrid system is quite remarkable. During the action phase of the traction system, the power consumed by the electric machine used follows a variable path proportional to the selected acceleration or driving state. The main power source is usually the battery, where the voltage supplied is quite stable. The other sources are also used, as additional sources, to minimize the charge on the accumulators. At this point, the state of charge of the battery is related to various conditions and situations, especially the driving state and external factors related to the climate and the sizing of the wireless system. Coordination between the various energy sources includes control of power management. We chose to use the battery device in our work to begin generating electricity. The WR generates electricity and the photovoltaic system at least works to adapt solar irradiation to electrical energy supplied to a DC bus. The total power is measured as present in Fig. 15 [52–54].

430

M. Naoui et al.

Fig. 15 Power dynamics of the hybrid system at a predefined driving cycle

6.3 Power Distribution and SOC Evolution It is important to note that the selected driving cycle is the one used in Fig. 14. The contribution of additive sources is well supervised in the form of power delivered by the accumulators in Fig. 16. The first part of the simulation shows that the drops obtained in the form of power delivered by the accumulator do not influence the form of power consumed by the machine. On the other hand, we can notice that during the malfunction of the motor “zero acceleration”, the power which is within the battery is of negative form, which validates the state of charge on the part of the photovoltaic source and wireless. For low acceleration forms, the implanted hybrid system provides enough power to power the engine and charge the battery simultaneously. Figure 16, shows this conclusion. Along with this energy behavior, due to this hybrid system, it is possible to monitor the state of charge of the battery, in order to officially validate whether this model is

Intelligent Control System for Hybrid Electric Vehicle … Fig. 16 Evolution of power in relation to speed (Battery power)

x 10

4

431

4

P-Batt (W)

2 0 Gain

-2 -4

0

5

10

15

Time (s)

40.6

Fig. 17 Evolution of the SOC taking into account the layout of the hybrid charging system SOC (%)

40.4 40.2 40 Low acceleration

39.8 39.6

0

High acceleration 5

Zero acceleration 10

15

Time (s)

profitable or not. Figure 17 shows the state of charge of the battery and proves that during weak acceleration the SOC rate increases, although the vehicle is in motion, and the same during the stop phase. In the same context, we wanted to test the contribution of this hybrid charging system against the wireless one or pure photovoltaic one. Figure 18 shows two cases of SOC evolution, the first case is when the hybrid charging system is deactivated and there is only battery consumption. The second case presents the evolution of the SOC in the operation of this hybrid system. We notice the difference between the two curves is clear and the energy gain is equal to 0.86%. On the other hand, it is possible to monitor the evolution of the power supplied by these charging, photovoltaic, wireless, and hybrid systems. Figure 19, shows that the power of the hybrid system represents the sum of the powers obtained by the single charging systems. It is clear that the power obtained by the wireless system is zero since the vehicle has not yet passed over a transmitter coil. Table 6 summarize the energy statistics of the charging systems studied and prove that the hybrid system provides a greater gain compared to a purely photovoltaic or

432

M. Naoui et al. 40.5 Without (PV+WR) With (PV+WR)

SOC(%)

40

39.5

X= 14.938 Y= 39.6303

39

38.5

0

5

10

15

X= 14.9977 Y= 38.7669

Time (s)

Fig. 18 Hybrid system SOC variation (PV + WR)

Fig. 19 Summary of the average strength of the three charging systems

purely wireless charging system. This performance will ensure a gain in terms of the distance traveled and will increase the life of the battery, which improves the overall performance of the electric vehicle.

7 Conclusion In this chapter, we have discussed the state of the art of the electrified transport system relating to electric vehicles. Indeed, we have tried to present the different architectures and models cited in the literature, such as pure electric and hybrid models. More precisely, in this chapter, the recharge systems used as well as their different internal architectures have been demonstrated. During this study, we divided these types of

Intelligent Control System for Hybrid Electric Vehicle …

433

Table 5 Fuzzy logic algorithms

Step 1. Start Step 2. Measure the parameters Speed(k), P-batt(k), P-PV(k), P-WR (k). Step 3. Fuzzification

Step 4. Rules Energy /Speed P-batt

Deceleration

Low speed

zero

P-PV P-WR

Step 5.

Low

Medium Speed Low

High Speed High

High

High

High

High

Medium

High

Medium

Low

Apply inferences If (Speed =low speed) then P-batt=Low, P-PV= High, P-WR= High If (Speed =Medium Speed) then P-batt=Low, P-PV= High, P-WR=

Medium If (Speed =High Speed) then P-batt= High, P-PV= High, P-WR= Low If (Speed =Deceleration) then P-batt= zero, P-PV= High, P-WR= Medium Step 6. Defuzzification Step 7. End

chargers into two categories: a set of modern or advanced chargers, solar chargers, and Inductive Power Transfer. By focusing on the second category, the rest of this work offers an in-depth study of these charging systems and exposes detailed modeling for these different blocks. An operating simulation, to determine their performance at specific operating conditions is applied. Then, a detailed mathematical model shows a simulation result regarding the hybrid recharge system and energy. This recharge system is installed into an electric vehicle to improve the vehicle’s autonomy. Each recharge bloc, such as the photovoltaic recharge system and the wireless recharge tool was modeled, and their corresponding mathematical expressions are given. Then, the

434

M. Naoui et al.

Table 6 Summary of the effectiveness of the three charging systems SOC (%) Loss

Average power sum

Minimum power

Maximum power

Power harvested from the battery

Only WR generator

0.27%

7588 W

0

6533 W

5772 W

Only PV generator

0.8%

22,290 W

5662 W

10,235 W

6972 W

Only PV + WR generator

1.07%

26,360 W

5662 W

12,219 W

12,760 W

energetic performances were improved as the battery state of charge performances are become more important which proved the vehicle’s total profitability. Acknowledgements The authors would like to thank Prince Sultan University, Riyadh, Saudi Arabia for supporting this work. Special acknowledgement to Automated Systems & Soft Computing Lab (ASSCL), Prince Sultan University, Riyadh, Saudi Arabia.

References 1. Bai, H., & Mi, C. (2011). The impact of bidirectional DC-DC converter on the inverter operation and battery current in hybrid electric vehicles. In 8th international conference power electron. - ECCE Asia "Green world with power electron. ICPE 2011-ECCE Asia (pp. 1013–1015). https://doi.org/10.1109/ICPE.2011.5944686. 2. Sreedhar, V. (2006). Plug-in hybrid electric vehicles with full performance. In 2006 IEEE configuration electrical hybrid vehicle ICEHV (pp. 1–2). https://doi.org/10.1109/ICEHV.2006. 352291. 3. Mohamed, N., Aymen, F., Ali, Z. M., Zobaa, A. F., & Aleem, S. H. E. A. (2021). Efficient power management strategy of electric vehicles based hybrid renewable energy. Sustainability, 13(13), 7351. https://doi.org/10.3390/su13137351 4. Ertan H. B., & Arikan, F. R. (2018). Sizing of series hybrid electric vehicle with hybrid energy storage system. In SPEEDAM 2018 - proceedings: international symposium on power electronics, electrical drives, automation and motion (pp. 377–382). https://doi.org/10.1109/SPE EDAM.2018.8445422. 5. Kisacikoglu, M. C., Ozpineci, B., & Tolbert, L. M. (2013). EV/PHEV bidirectional charger assessment for V2G reactive power operation. IEEE Transactions on Power Electronics, 28(12), 5717–5727. https://doi.org/10.1109/TPEL.2013.2251007 6. Lee, J. Y., & Han, B. M. (2015). A bidirectional wireless power transfer EV charger using self-resonant PWM. IEEE Transactions on Power Electronics, 30(4), 1784–1787. https://doi. org/10.1109/TPEL.2014.2346255 7. Tan, L., Wu, B., Yaramasu, V., Rivera, S., & Guo, X. (2016). Effective voltage balance control for bipolar-DC-Bus-Fed EV charging station with three-level DC-DC Fast Charger. IEEE Transactions on Industrial Electronics, 63(7), 4031–4041. https://doi.org/10.1109/TIE.2016. 2539248 8. Abdelwahab O. M., & Shaaban, M. F. (2019). PV and EV charger allocation with V2G capabilities. In Proceedings - 2019 IEEE 13th international conference on compatibility, power

Intelligent Control System for Hybrid Electric Vehicle …

9.

10.

11.

12.

13.

14. 15.

16.

17.

18.

19. 20.

21. 22. 23.

435

electronics and power engineering, CPE-POWERENG 2019 (pp. 1–5). https://doi.org/10.1109/ CPE.2019.8862370. Domínguez-Navarro, J. A., Dufo-López, R., Yusta-Loyo, J. M., Artal-Sevil, J. S., & BernalAgustín, J. L. (2019). Design of an electric vehicle fast-charging station with integration of renewable energy and storage systems. International Journal of Electrical Power and Energy Systems, 105, 46–58. https://doi.org/10.1016/j.ijepes.2018.08.001 Ali, Z. M., Aleem, S. H. E. A., Omar, A. I., & Mahmoud, B. S. (2022). Economicalenvironmental-technical operation of power networks with high penetration of renewable energy systems using multi-objective coronavirus herd immunity algorithm. Mathematics, 10(7), 1201. https://doi.org/10.3390/math10071201 Omori, H., Tsuno, M., Kimura, N., & Morizane, T. (2018). A novel type of single-ended wireless V2H with stable power transfer operation against circuit constants variation. 2018 7th international conference on renewable energy research and applications (vol. 5, pp. 1–5). Maeno, R., Omori, H., Michikoshi, H., Kimura, N., & Morizane, T. (2018). A 3kW singleended wireless EV charger with a newly developed SiC-VMOSFET. In 7th onternational IEEE conference on renewable energy research and application, ICRERA 2018 (pp. 418–423). https://doi.org/10.1109/ICRERA.2018.8566866. Colak, K., Bojarski, M., Asa, E., & Czarkowski, D. (2015). A constant resistance analysis and control of cascaded buck and boost converter for wireless EV chargers. In Conference proceedings - IEEE applied power electronics conference and exposition - APEC (vol. 2015, pp. 3157–3161). https://doi.org/10.1109/APEC.2015.7104803. Mohamed, N. et al. (2021). A new wireless charging system for electric vehicles using two receiver coils. Ain Shams Engineering Journal. https://doi.org/10.1016/j.asej.2021.08.012. Azar, A. T., Serrano, F. E., Flores, M. A., Kamal, N. A., Ruiz, F., Ibraheem, I. K., Humaidi, A. J., Fekik, A., Alain, K. S. T., Romanic, K., Rana, K. P. S., Kumar, V., Gorripotu, T. S., Pilla, R., & Mittal, S. (2021). Fractional-order controller design and implementation for maximum power point tracking in photovoltaic panels. In Advances in nonlinear dynamics and chaos (ANDC), renewable energy systems, 2021 (pp. 255–277). Academic. https://doi.org/10.1016/ B978-0-12-820004-9.00031-0. Azar, A. T., Abed, A. M., Abdulmajeed, F. A., Hameed, I. A., Kamal, N. A., Jawad, A. J. M., Abbas, A. H., Rashed, Z. A., Hashim, Z. S., Sahib, M. A., Ibraheem, I. K., & Thabit, R. (2022). A new nonlinear controller for the maximum power point tracking of photovoltaic systems in micro grid applications based on modified anti-disturbance compensation. Sustainability, 14(17), 10511. Tian, X., He, R., Sun, X., Cai, Y., & Xu, Y. (2020). An ANFIS-based ECMS for energy optimization of parallel hybrid electric bus. IEEE Transactions on Vehicular Technology, 69(2), 1473–1483. https://doi.org/10.1109/TVT.2019.2960593 Lulhe A. M., & Date, T. N. (2016). A technology review paper for drives used in electrical vehicle (EV) and hybrid electrical vehicles (HEV). In 2015 international conference on control, instrumentation, communication and computational technologies, ICCICCT 2015 (pp. 632– 636). https://doi.org/10.1109/ICCICCT.2015.7475355. Datta, U. (2019). A price - regulated electric vehicle charge - discharge strategy. In Energy research (pp. 1032–1042). https://doi.org/10.1002/er.4330. Rawat, T., Niazi, K. R., Gupta, N., & Sharma, S. (2019). Impact assessment of electric vehicle charging/discharging strategies on the operation management of grid accessible and remote microgrids. International Journal of Energy Research, 43(15), 9034–9048. https://doi.org/10. 1002/er.4882 Hu, Y. et al. (2015). Split converter-fed SRM drive for flexible charging in EV/HEV applications, 62(10), 6085–6095. Mohamed, N., et al. (2022). A comprehensive analysis of wireless charging systems for electric vehicles. IEEE Access, 10, 43865–43881. https://doi.org/10.1109/ACCESS.2022.3168727 Wang, J., Cai, Y., Chen, L., Shi, D., Wang, R., & Zhu, Z. (2020). Review on multi-power sources dynamic coordinated control of hybrid electric vehicle during driving mode transition process. International Journal of Energy Research, 44(8), 6128–6148. https://doi.org/10.1002/ er.5264

436

M. Naoui et al.

24. Zhao, C., Zu, B., Xu, Y., Wang, Z., Zhou, J., & Liu, L. (2020). Design and analysis of an engine-start control strategy for a single-shaft parallel hybrid electric vehicle. Energy, 202(5), 2354–2363. https://doi.org/10.1016/j.energy.2020.117621 25. Cheng, M., Sun, L., Buja, G., & Song, L. (2015). Advanced electrical machines and machinebased systems for electric and hybrid vehicles. Energies, 8(9), 9541–9564. https://doi.org/10. 3390/en8099541 26. Naoui, M., Aymen, F., Ben Hamed, M., & Lassaad, S. (2019). Analysis of battery-EV state of charge for a dynamic wireless charging system. Energy Storage, 2(2). https://doi.org/10.1002/ est2.117. 27. Rajashekara, K. (2013). Present status and future trends in electric vehicle propulsion technologies. IEEE Journal of Emerging and Selected Topics in Power Electronics, 1(1), 3–10. https://doi.org/10.1109/JESTPE.2013.2259614 28. Paladini, V., Donateo, T., de Risi, A., & Laforgia, D. (2007). Super-capacitors fuel-cell hybrid electric vehicle optimization and control strategy development. Energy Conversion and Management, 48(11), 3001–3008. https://doi.org/10.1016/j.enconman.2007.07.014 29. Chopra, S. (2011). Contactless power transfer for electric vehicle charging application. Science (80). 30. Emadi, A. (2017). Handbook of automotive power electronics and motor drives. 31. Naoui, M., Flah, A., Ben Hamed, M., & Lassaad, S. (2020). Brushless motor and wireless recharge system for electric vehicle design modeling and control. In Handbook of research on modeling, analysis, and control of complex systems. 32. Guarnieri, M. (2011). When cars went electric, Part 2. IEEE Industrial Electronics Magazine, 5(2), 46–53. https://doi.org/10.1109/MIE.2011.941122 33. Levi, E., Bojoi, R., Profumo, F., Toliyat, H. A., & Williamson, S. (2007). Multiphase induction motor drives-a technology status review. IET Electric Power Applications, 1(5), 643–656. https://doi.org/10.1049/iet-epa 34. Mohamed, N., Flah, A., Ben Hamed, M., & Lassaad, S. (2021). Modeling and simulation of vector control for a permanent magnet synchronous motor in electric vehicle. In 2021 4th international symposium on advanced electrical and communication technologies (ISAECT), 2021 (pp. 1–5). https://doi.org/10.1109/ISAECT53699.2021.9668411. 35. Yilmaz, M., & Krein, P. T. (2013). Review of battery charger topologies, charging power levels, and infrastructure for plug-in electric and hybrid vehicles. IEEE Transactions on Power Electronics, 28(5), 2151–2169. https://doi.org/10.1109/TPEL.2012.2212917 36. Mohamed, N., Flah, A., & Ben Hamed, M. (2020). Influences of photovoltaics cells number for the charging system electric vehicle. In Proceedings of the 17th international multi-conference system signals devices, SSD 2020 (pp. 244–248). https://doi.org/10.1109/SSD49366.2020.936 4141. 37. Wu, H. H., Gilchrist, A., Sealy, K., Israelsen, P., & Muhs, J. (2011). A review on inductive charging for electric vehicles. 2011 IEEE international electrical machine drives conference IEMDC, 2011 (pp. 143–147). https://doi.org/10.1109/IEMDC.2011.5994820 38. Xie, L., Shi, Y., Hou, Y. T., & Lou, A. (2013). Wireless power transfer and applications to sensor networks. IEEE Wireless Communications, 20(4), 140–145. https://doi.org/10.1109/ MWC.2013.6590061 39. Cao, P. et al. (2018). An IPT system with constant current and constant voltage output features for EV charging. In Proceedings of the IECON 2018 - 44th annual conference IEEE industrial electronics society (vol. 1, pp. 4775–4780). https://doi.org/10.1109/IECON.2018.8591213. 40. Nagendra, G. R., Chen, L., Covic, G. A., & Boys, J. T. (2014). Detection of EVs on IPT highways. In Conference proceedings of the - IEEE applied power electronics conference and exposition - APEC (pp. 1604–1611). https://doi.org/10.1109/APEC.2014.6803521. 41. Mohamed, N., Aymen, F., & Ben Hamed, M. (2019). Characteristic of photovoltaic generator for the electric vehicle. International Journal of Scientific and Technology Research, 8(10), 871–876. 42. Dheeban, S. S., Selvan, N. M., & Kumar, C. S. (2019). Design of standalone pv system. International Journal of Scientific and Technology Research (vol. 8, no. 11, pp. 684–688).

Intelligent Control System for Hybrid Electric Vehicle …

437

43. Kamal, N. A., & Ibrahim, A. M. (2018). Conventional, intelligent, and fractional-order control method for maximum power point tracking of a photovoltaic system: A review. In Advances in nonlinear dynamics and chaos (ANDC), fractional order systems (pp. 603–671). Academic. 44. Amara, K., Malek, A., Bakir, T., Fekik, A., Azar, A. T., Almustafa, K. M., Bourennane, E., & Hocine, D. (2019). Adaptive neuro-fuzzy inference system based maximum power point tracking for stand-alone photovoltaic system. International Journal of Modelling, Identification and Control, 2019, 33(4), 311–321. 45. Fekik, A., Hamida, M. L., Houassine, H., Azar, A. T., Kamal, N. A., Denoun, H., Vaidyanathan, S., & Sambas, A. (2022). Power quality improvement for grid-connected photovoltaic panels using direct power control. In A. Fekik, & N. Benamrouche (Ed.), Modeling and control of static converters for hybrid storage systems, 2022 (pp. 107–142). IGI Global. https://doi.org/ 10.4018/978-1-7998-7447-8.ch005. 46. Fekik, A., Azar A. T., Kamal, N. A., Serrano, F. E., Hamida, M. L., Denoun, H., & Yassa, N. (2021). Maximum power extraction from a photovoltaic panel connected to a multi-cell converter. In Hassanien, A. E., Slowik, A., Snášel, V., El-Deeb, H., & Tolba, F. M. (Eds.), Proceedings of the international conference on advanced intelligent systems and informatics 2020. AISI 2020. Advances in intelligent systems and computing (vol. 1261, pp. 873–882). Springer, Cham. https://doi.org/10.1007/978-3-030-58669-0_77. 47. Kamal, N. A., Azar, A. T., Elbasuony, G. S., Almustafa, K. A., & Almakhles, D. (2019). PSO-based adaptive perturb and observe MPPT technique for photovoltaic systems. In The international conference on advanced intelligent systems and informatics AISI 2019. Advances in intelligent systems and computing (vol. 1058, pp. 125–135). Springer. 48. Ammar, H. H., Azar, A. T., Shalaby, R., Mahmoud, M. I. (2019). Metaheuristic optimization of fractional order incremental conductance (FO-INC) Maximum power point tracking (MPPT). Complexity, 2019, Article ID 7687891, 1–13. https://doi.org/10.1155/2019/7687891 49. Rana, K. P. S., Kumar, V., Sehgal, N., George, S., & Azar, A. T. (2021). Efficient maximum power point tracking in fuel cell using the fractional-order PID controller. In Advances in nonlinear dynamics and chaos (ANDC), renewable energy systems (pp. 111–132). Academic. https://doi.org/10.1016/B978-0-12-820004-9.00017-6 50. Ben Smida, M., Sakly, A., Vaidyanathan, S., & Azar, A. T. (2018). Control-based maximum power point tracking for a grid-connected hybrid renewable energy system optimized by particle swarm optimization. Advances in system dynamics and control (pp. 58–89). IGI-Global, USA. https://doi.org/10.4018/978-1-5225-4077-9.ch003 51. Ghoudelbourk, S., Dib, D., Omeiri, A., & Azar, A. T. (2016). MPPT Control in wind energy conversion systems and the application of fractional control (PIα ) in pitch wind turbine. International Journal of Modelling, Identification and Control (IJMIC), 26(2), 140–151. 52. Kraiem, H., et al. (2022). Decreasing the battery recharge time if using a fuzzy based power management loop for an isolated micro-grid farm. Sustain, 14(5), 1–23. https://doi.org/10. 3390/su14052870 53. Liu, H. C., Wu, S.-M., Wang, Z.-L., & Li, X.-Y. (2021). A new method for quality function deployment with extended prospect theory under hesitant linguistic environment. IEEE Transactions on Engineering Management, 68(2), 442–451. https://doi.org/10.1109/TEM.2018.286 4103 54. Nguyen, B. H., Trovão, J. P. F., German, R., & Bouscayrol, A. (2020). Real-time energy management of parallel hybrid electric vehicles using linear quadratic regulation. Energies, 13(21), 1–19. https://doi.org/10.3390/en13215538 55. Guo, J., He, H., & Sun, C. (2019). ARIMA-based road gradient and vehicle velocity prediction for hybrid electric vehicle energy management. IEEE Transactions on Vehicular Technology, 68(6), 5309–5320. https://doi.org/10.1109/TVT.2019.2912893

Advanced Sensor Systems for Robotics and Autonomous Vehicles Manoj Tolani, Abiodun Afis Ajasa, Arun Balodi, Ambar Bajpai, Yazeed AlZaharani, and Sunny

Abstract In robotic and autonomous vehicle applications, sensor systems play a critical role. Machine learning (ML), data science, artificial intelligence (AI), and the internet of things (IoT) are all advancing, which opens up new possibilities for autonomous vehicles. For vehicle control, traffic monitoring, and traffic management applications, the integration of robotics, IoT, and AI is a very powerful combination. For effective robotic and vehicle control, robot sensor devices require an advanced sensor system. As a result, the AI-based system seeks the attention of the researcher to make the best use of sensor data for various robotic applications while conserving energy. The efficient collection of the data from sensors is a significant difficulty that AI technologies can effectively address. The data consistency method can also be used for time-constraint data collection applications. The present chapter discusses three important methods to improve the quality of service (QoS) and quality of experience (QoE) parameters of the robotic and autonomous vehicle applications. The first one is consistency-guaranteed and collision-resistant approach that can be used by the advanced sensor devices for the data aggregation and the removal of the redundant data. The second one is aggregation aware AI-based methods to improve the lifetime of the robotic devices and the last one is dividing the sensors M. Tolani (B) Manipal Institute of Technology, udupi, Manipal, Karnataka, India e-mail: [email protected] A. A. Ajasa Universiti Teknologi Malaysia, Skudai, JB, Malaysia e-mail: [email protected] A. Balodi · A. Bajpai Atria Institute of Technology, Bangalore, India e-mail: [email protected] A. Bajpai e-mail: [email protected] Y. AlZaharani University of Wollongong, Wollongong, Australia Sunny Indian Institute of Information Technology, Allahabad, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_14

439

440

M. Tolani et al.

devices based on continuous and event-monitoring robotic application and usage of the application-specific protocol to deal with the corresponding data. In addition the present chapter also discusses the role of sensor systems for various applications. Keywords Machine learning (ML) · Artificial intelligence (AI) · Continuous monitoring · Event-monitoring · Consistency-guaranteed

1 Introduction The sensors play an important role for the various robotic applications. The robots used for industrial, commercial as well domestic applications [1–11]. The advancement in machine learning methods makes the system more intelligent like human beings. The uses of advance sensor system makes the robot intelligent upto next level. The present work deals with the need and uses of the advanced sensor system for the autonomous vehicle application. In the present work the robotic vehicle application is mainly divided into railway trains and road vehicles. The advanced sensor systems are mainly used into railway track or inside the train in case of automatic application. Similarly, in case of road vehicle, the sensors are used for the efficient operations of the vehicles and in roadside for automatic driving and other services. In this chapter, the sensors are divided based on their three different categories of operation [12–79]. The first category belongs to the category of sensors which continuously senses the data based on the operation requirement. These types of sensors are called continuous monitoring sensors [80]. The next category of the sensors are called event monitoring sensors which belongs to the category of event based data generation. Also, one more category of the sensors are their i.e. periodic sensors. The periodic sensors transmit the data periodically to the monitoring station. The roadside monitoring system for vehicle application and track monitoring system for railway application is discussed below in the subsections [14–122].

1.1 Automatic Driving Application Now a days advanced sensors are used for the automatic driving and vehicle monitoring application. For the automatic driving, efficient and accurate monitoring helps to better prediction and tracking. the researchers are working in the field of various prediction methods for the tracking of the vehicles. The kalman, extended kalman based prediction methods are widely used for the prediction application. The advanced algorithms are also used for the accurate prediction e.g. cuckoo search algorithm, particle swarm optimization. The prediction algorithm plays an important role but it has a limit. The prediction efficiency can be improved with the help of the advanced sensors. Now a days advanced sensors with efficient communication system is used for the monitoring application. The advanced sensor system for vehicle application

Advanced Sensor Systems …

441

Fig. 1 Advanced sensors of vehicle application

is shown in Fig. 1. As shown in figure the sensor device is mounted on each vehicle. The sensor devices directly communicates with the road side devices and transmits the data to road side device. All the road side devices transmit the data to the monitoring station. The monitoring station analyzes the data and generates the control signal for the vehicle. The advanced sensors used in the vehicle makes the system more efficient [91, 120, 121]. The automatic driving is an important application which requires advanced sensors for autonomous robotic application. The advanced sensors of the vehicle can be subdivided into continuous monitoring, event monitoring, and periodic monitoring sensors.

1.2 Railway Monitoring Application The railway monitoring is an important field to make the railway trains robotic and autonomous [13]. The researchers have worked on railway monitoring application in different fields. Most of the researchers are working in the MAC protocol [14– 58, 81, 86–100, 104, 114–119], aggregation protocol [59–83, 101, 102, 122], and data consistency [91, 120, 121]. However, the performance efficiency of the railway monitoring system depends upon the advanced sensor devices. Now a days many advanced sensor devices are available for the wireless sensor network and IoT appli-

442

M. Tolani et al.

Fig. 2 Advanced sensors of railway application

cation. The operation of railway track monitoring application is shown in The Fig. 2. The sensor devices are placed on the track. The sensor devices transmits the data to the monitoring station via base station. The railway monitoring is an important application which requires advanced sensors for autonomous robotic application. The advanced sensors of the railway monitoring can be subdivided into continuous monitoring, event monitoring, and periodic monitoring sensors.

The advancement of artificial intelligence (AI)/machine learning (ML) opens a new door in the field of monitoring application. The integration of advanced sensors with AI/ML is now used for the advanced applications. In the present chapter, various methods are discussed for advanced robotic devices. The data consistency, data aggregation, and application-specific data transmission protocol are few methods are discussed for advanced robotic sensor application. In the rest of the chapter, literature review and analysis are discussed in Sect. 2. The uses of advanced sensors for various applications is discussed in Sect. 3. Finally, Sect. 4 discusses the overall research contribution, future scope and conclusion based on results and findings.

2 Related Works The researchers have reported many works related to the advanced sensor application. Victoria et al. discussed various uses of advanced sensors for the railway track

Advanced Sensor Systems …

443

condition monitoring systems. The authors reported various research works for the monitoring application using WSN. The researchers used WSN to (1) Maintain process Tolerance, (2) Verify and protect machine, (3) Detect maintenance requirement, (4) Minimize Downtime, and (5) Prevent failure and save business money and time. In this work, the author have reported many advanced sensors for bridge monitoring, tunnel monitoring, track monitoring, and rail bed monitoring. Eila et al. proposed a system to measure the bogie vibration [56]. Maly et al. proposed the heterogeneous sensor model to integrate data [94]. There are many other research works reported by many of the researchers. For the better analysis of the contribution of the researchers in this direction, the research papers are selected based on inclusion and exclusion criteria. Approximately 280 research papers are identified at the first stage. Based on inclusion and exclusion criteria, total 70% papers are rejected.

Inclusion and Exclusion criteria plays an important role in the identification of the papers. The research papers related to the advanced sensors and automatic robot applications are closely identified.

The research works related to the core data communication are rejected at the screening stage and research papers with core idea with advanced sensors are identified for the full review as shown in Fig. 3. The keywords of the works closely related to the advanced sensors are also mentioned. The identified keywords are according to the inclusion criteria. In the continuous process, the year-wise contribution of the researchers are also analyzed in the field of advanced sensors. The research analysis shows that the research contribution of the researchers are increased in the last 4–5 years as shown in Fig. 4. The main reason of the researchers attraction in this direction is the advancement in AI/ML and IoT. Advanced sensors are primary requirement of the IoT. Also, the AI/ML methods make the IoT system more powerful. Therefore, the requirement of the advanced sensors increasing exponentially. The literature study shows that the researchers contribution is increasing from last 5–10 years. The advancement of the IoT and AI/ML is the major cause of the demand of advanced sensors. The automatic driving efficiency strongly depends upon advanced sensors.

The main inclusion terms are mentioned in Fig. 3. The cumulative use of each term is shown in Fig. 5. The result in Fig. 5. shows that most of the papers are based on the event monitoring sensors. The continuous monitoring is also mentioned in many of the sensors. Therefore, the discussion of advanced sensors are categorized into continuous monitoring and event monitoring sensors.

444

M. Tolani et al.

Fig. 3 Screening and Identification process of the research papers (Inclusion/Exclusion criteria, Keywords)

Advanced Sensor Systems …

Fig. 4 Year-wise contribution of the researchers in the field of advanced sensors

Fig. 5 Cumulative use of inclusion terms in manuscripts

445

446

M. Tolani et al.

Fig. 6 The cumulative count of the keywords

Apart from this the main focus of the researchers are AI/ML based advance methods to control the vehicle. The vehicle sensors are advance sensors which are used for the autonomous driving. The researchers have also used various different types of sensors devices. The sensor devices are mainly categorized into reduced function device (RFD) and fully function device (FFD). The RFD devices can pnly perform sensing operation and transmission of the data. However, the FFD devices can perform all the different types of the computational operation. The researchers have reported clustering, and aggregation operation which can be performed by FFD devices. RFD mainly works as a end device. However, the FFD device can work as intermediate device and can perform all the mathematical operations. The FFD can also work as cluster head device. The keywords of the inclusion criteria mainly describes the focus of the researchers in the particular direction. Therefore the cumulative count of the keywords are also identified. The keywords of the shortlisted manuscripts also show that the main focus of the researchers is either event monitoring and continuous monitoring sensors. The researchers used this type of sensors for WSN or IoT applications. Similarly, data aggregation and filtration of the collected data from the advanced sensors are also reported in the research works (Fig. 6).

Advanced Sensor Systems …

447

Fig. 7 Research Contribution in various fields

The cumulative count of the keywords signifies the focus of the researchers in the particular direction. The current cumulative count of the keywords clearly indicates that the advanced continuous and event monitoring sensors plays an important role for autonomous robotic application.

As already mentioned that the researchers are working in various domain of the autonomous vehicle. The contribution of the researchers in various fields are analyzed as shown in Fig. 7. The analysis shows that the major focus of the researchers is to reduce the energy consumption. The advanced sensors play an important role in reduction of energy consumption. The researchers also reported various other works for the reduction of energy consumption. The efficient design of MAC protocol is one of the field. The researchers have reported various contention based and contentionless protocols. The road/railway safety based intelligent system is reported by various researchers. These types of systems can only be made with the help of the AI/ML and IoT. There are various ways to reduce the energy consumption. However, the energy-efficient methods can be categorized into two different fields, i.e., hardware and software related fields. In the software related fields, the researchers are working in MAC protocol, routing protocol, and aggregation protocol. For the hardware related field, the researchers are working in the advanced sensor design and fabrication.

448

M. Tolani et al.

The researchers have reported many works in related to the bridge monitoring [4–9]. The acoustic emission sensor is used for the crack/fatigue detection. Similarly strain guage sensor is used for the stress detection on the railway track [1, 3, 8, 12]. the researchers have used for the piezoelectric strain guage sensor for bridge monitoring application [11]. Similarly, the strain guage sensor is used for the weight measurement of the train as reported in [10]. For the dynamic load application accelerometer sensor is used for various application [2, 8]. Many other research works are reported for different other applications.

3 Types of Sensors for Various Applications The advanced sensors can be used for different applications. In this section, we have discussed two categories of advanced sensors as given below: • Efficient Road Monitoring • Efficient railway Monitoring.

3.1 Efficient Road Monitoring Various requirements and dependencies on various technologies and parameters are discussed for efficient road monitoring. The efficient road monitoring depends upon various other technologies as shown in Fig. 8. Now a days the data rate demand is increasing day by day. To fulfill the huge demand of the data rate, the researcher are moving in higher frequency range. The 5G technology fulfills the current demand of the bandwidth requirement. The advanced sensors integrated with device-todevice communication and IoT increases the efficiency of the road monitoring. The researchers have proposed various aggregation protocols for the energy-efficient data transmission. The data consistency methods are also important. There is a trade-off between energy-efficient data transmission and data consistency. The data is transmitted from the sensor devices to the roadside equipment via direct communication. The monitoring station receives data from every roadside device. The monitoring station evaluates the information and produces the vehicle’s control signal.The technology is more effective thanks to the sophisticated sensors utilized in the vehicle.

Advanced Sensor Systems …

449

Fig. 8 Various dependencies of efficient road monitoring

3.2 Efficient Railway Monitoring Monitoring The efficient railway monitoring depends upon various other technologies similar to railway monitoring as shown in Fig. 9.The demand for data rates is rising now a days. The researchers are stepping up their frequency range in order to meet the enormous demand for data rate. The present need for bandwidth is met by the 5G technology for railway application. The effectiveness of the road monitoring is increased by the modern sensors combined with device-to-device communication and IoT. The researchers have put up a number of aggregation protocols for the transfer of data that uses little energy. Methods for ensuring data consistency are also crucial. Data integrity and energy-efficient data transmission are trade-offs [80, 84, 85, 90, 92, 95, 103, 105–113]. The researchers have reported various uses of the advance sensors for the various railway application needs. The accelerometer, gyroscope, FB Strain sensors are used for the train shell monitoring application. Humidity, motion detector, vibration sensors are used for the wagon monitoring application. Surface, acoustic sensor, and inertia sensors are used for the bogie monitoring. Gyro sensor and gap sensors are used for the wheels monitoring. Wind pressure sensors are used for the brakes monitoring. Few other sensors are also used for other applications as given in Table 1.

450

M. Tolani et al.

Fig. 9 Various dependencies of efficient railway monitoring Table 1 Models used for optimising the energy usage in a building Advance sensors Application Accelerometer, gyroscope, FB strain Humidity, motion detector, Vibration sensor Surface acoustic sensor, Inertia sensors Gyro, gap sensors Wind pressure sensors Load cell, FBS, FBG, FBT Thermo-couple, SAW temperature

Train shell monitoring Wagon monitoring Bogie monitoring Wheels monitoring Brakes monitoring Panto-graph Axles monitoring

The field of railroad monitoring is crucial to the development of robotic and autonomous trains. The researchers have developed applications for railway monitoring in several sectors. The majority of researches are focused on data consistency, MAC protocol, and aggregation protocol. However, the modern sensor devices are what make the railway monitoring system operate effectively. There are several cutting-edge sensor devices available now for use with IoT applications and wireless sensor networks.

The road side application requires various different types of sensors. As shown in Fig. 10 that footpath sensor, light sensor, service road shoulder width, safety barrier, medium width, gyroscope, road capacity, traffic signal, operating speed, and many other different types of the sensors. Most of the sensor are mentioned in the Fig. 10.

Advanced Sensor Systems …

451

Fig. 10 Advanced sensors for Robotic Road Application

Fig. 11 Advanced sensors for Robotic Railway Application

Similarly, the advanced sensors can also be used for the railway application. The sensors used for the railway application are shown in Fig. 11. The accelerometer sensors, FBG sensor, inclinometer sensor, maganetoelectric sensor, acoustic emission, gyroscope, displacement sensor, and many other senors are used for the railway monitoring application.

452

M. Tolani et al.

The monitoring of bridges is a topic on which the researchers have published a lot of work. Crack and fatigue detection is done using the acoustic emission sensor. Similar to this, the railway track uses a strain gauge sensor to monitor stresses. In order to monitor bridges, researchers have developed piezoelectric strain gauge sensors. Similar to how it was described in [118], the strain gauge sensor is used to measure the train’s weight. A variety of applications use accelerometer sensors for dynamic loads. Numerous further research projects have been reported for numerous other uses.

4 Conclusion Sensor systems are essential in applications involving robotics and autonomous vehicles. The advancement of data science, artificial intelligence, machine learning, and the internet of things (IoT) creates new opportunities for autonomous cars. The fusion of robots, IoT, and AI is a particularly potent combination for applications such as vehicle control, traffic monitoring, and traffic management. Advanced sensor systems are necessary for efficient robotic and vehicle control with robot sensor devices. As a result, the AI-based system attracts researcher’s attention in order to maximize the utilization of sensor data for diverse robotic applications while minimizing energy consumption. One key challenge that AI technology can successfully address is the effective collection of data from sensors. Applications requiring timeconstrained data collection can also make use of the data consistency method. The current chapter examines different crucial ways to raise the robotic and autonomous vehicle applications’ quality of service (QoS) and quality of experience (QoE) standards. In future, the nano-electromachenical system (NEMS) advanced sensors and actuators can be developed for the low energy and long life time monitoring applications. Also, new physical layer protocols can be developed for the efficient operation.

References 1. Bischoff, R., Meyer, J., Enochsson, O., Feltrin, G., & Elfgren, L. (2009). Eventbased strain monitoring on a railway bridge with a wireless sensor network. In Proceedings of the 4th International Conference on Structural Health Monitor, (pp. 1–8). Zurich, Switzerland: Intell. Infrastructure. 2. Chebrolu, K., Raman, B., Mishra, N., Valiveti, P., & Kumar, R. (2008). Brimon: A sensor network system for railway bridge monitoring. In Proceedings of the 6th International Conference on Mobile System and Application Services, Breckenridge, CO, USA (pp. 2–14).

Advanced Sensor Systems …

453

3. Feltrin, G. (2012). Wireless sensor networks: A monitoring tool for improving remaining lifetime estimation. In Civil Struct (Ed.), Health Monitoring Workshop (pp. 1–8). Berlin: Germany. 4. Grosse, C., et al. (2006). Wireless acoustic emission sensor networks for structural health monitoring in civil engineering. In Proceedings of the European Conference on Non-Destructive Testing (pp. 1–8), Berlin, Germany. 5. Grosse, C., Glaser, S., & Kruger, M. (2010). Initial development of wireless acoustic emission sensor Motes for civil infrastructure state monitoring. Smart Structures and Systems, 6(3), 197–209. 6. Hay, T. et al. (2006). Transforming bridge monitoring from time-based to predictive maintenance using acoustic emission MEMS sensors and artificial intelligence. In Proceedings of the 7th World Congress on Railway Research, Montreal, Canada, CD-ROM. 7. Hay, T. (2007). Wireless remote structural integrity monitoring for railway bridges. Transportation Research Board, Washington, DC, DC, USA, Technical report no. HSR-IDEA Project 54. 8. Krüger, M. et al. (2007). Sustainable Bridges. Technical Report on Wireless Sensor Networks using MEMS for Acoustic Emission Analysis including other Monitoring Tasks. Stuttgart, Germany: European Union. 9. Ledeczi, A., et al. (2009). Wireless acoustic emission sensor network for structural monitoring. IEEE Sensors Journal, 9(11), 1370–1377. 10. Reyer, M., Hurlebaus, S., Mander, J., & Ozbulut, O. E. (2011). Design of a wireless sensor network for structural health monitoring of bridges. In Proceedings of the 5th International Conference on Sens Technology, Palmerston North, New Zealand (pp. 515–520). 11. Sala, D., Motylewski, J., & Koaakowsk, P. (2009). Wireless transmission system for a railway bridge subject to structural health monitoring. Diagnostyka, 50(2), 69–72. 12. Townsend, C., & Arms, S. (2005). Wireless sensor networks. Principles and applications. In J. Wilson (Ed.), Sensor Technology Handbook (Chap. 22). Oxford, UK: Elsevier. 13. Tolani, M., Sunny, R., Singh, K., Shubham, K., & Kumar, R. (2017). Two-Layer optimized railway monitoring system using Wi-Fi and ZigBee interfaced WSN. IEEE Sensors Journal, 17(7), 2241–2248. 14. Rasouli, H., Kavian, Y. S., & Rashvand, H. F. (2014). ADCA: Adaptive duty cycle algorithm for energy efficient IEEE 802.15.4 beacon-enabled WSN. IEEE Sensors Journal, 14(11), 3893–3902. 15. Misic, J., Misic, V. B., & Shafi, S. (2004). Performance of IEEE 802.15.4 beacon enabled PAN with uplink transmissions in non-saturation mode-access delay for finite buffers. In First International Conference on Broadband Networks, San Jose, CA, USA (pp. 416–425). 16. Jung, C. Y., Hwang, H. Y., Sung, D. K., & Hwang, G. U. (2009). Enhanced markov chain model and throughput analysis of the slotted CSMA/CA for IEEE 802.15.4 under unsaturated traffic conditions. In IEEE Transactions on Vehicular Technology (Vol. 58, no. 1, pp. 473–478), January 2009. 17. Zhang, H., Xin, S., Yu, R., Lin, Z., & Guo, Y. (2009). An adaptive GTS allocation mechanism in IEEE 802.15.4 for various rate applications. In 2009 Fourth International Conference on Communications and Networking in China. 18. Ho, C., Lin, C., & Hwang, W. (2012). Dynamic GTS allocation scheme in IEEE 802.15.4 by multi-factor. In 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing. 19. Yang, L., Zeng, S. (2012). A new GTS allocation schemes For IEEE 802.15.4. In 2012 5th International Conference on BioMedical Engineering and Informatics (BMEI 2012) 20. Hurtado-López, J., & Casilari, E. (2013). An adaptive algorithm to optimize the dynamics of IEEE 802.15.4 network. In Mobile Networks and Management (pp. 136–148). 21. Standard for Part 15.4: Wireless Medium Access Control (MAC) and Physical Layer Specifications for Low Rate Wireless Personal Area Networks (LR-W PAN), IEEE Standard 802.15.4, Junuary 2006.

454

M. Tolani et al.

22. Pei, G., & Chien, C. (2001). Low power TDMA in large WSNs. In 2001 MILCOM Proceedings Communications for Network-Centric Operations: Creating the Information Force (Cat. No.01CH37277) (Vol. 1, pp. 347–351). 23. Shafiullah, G. M., Thompson, A., Wolf, P., & Ali, S. (2008). Energy-efficient TDMA MAC protocol for WSNs applications. In Proceedings of the 5th ICECE, Dhaka, Bangladesh, December 24–27, 2008 (pp. 85–90). 24. Hoesel & Havinga. (2004). A lightweight medium access protocol (LMAC) for WSNs: Reducing preamble transmissions and transceiver state switches. In 1st International Workshop on Networked Sensing Systems (pp. 205–208). 25. Alvi, A. N., Bouk, S. H., Ahmed, S. H., Yaqub, M. A., Sarkar, M., & Song, H. (2016). BESTMAC: Bitmap-Assisted efficient and scalable TDMA-Based WSN MAC protocol for smart cities. IEEE Access, 4, 312–322. 26. Li, J., & Lazarou, G. Y. (2004). A bit-map-assisted energy-efficient MAC scheme for WSNs. In Third International Symposium on Information Processing in Sensor Networks. IPSN 2004 (pp. 55–60). 27. Shafiullah, G., Azad, S. A., & Ali, A. B. M. S. (2013). Energy-efficient wireless MAC protocols for railway monitoring applications. IEEE Transactions on Intelligent Transportation Systems, 14(2), 649–659. 28. Patro, R. K., Raina, M., Ganapathy, V., Shamaiah, M., & Thejaswi, C. (2007). Analysis and improvement of contention access protocol in IEEE 802.15.4 star network. In 2007 IEEE International Conference on Mobile Adhoc and Sensor Systems, Pisa (pp. 1–8). 29. Pollin, S. et al. (2008). Performance analysis of slotted carrier sense IEEE 802.15.4 medium access layer. In IEEE Transactions on Wireless Communications (Vol. 7, no. 9, pp. 3359– 3371), September 2008. 30. Park, P., Di Marco, P., Soldati, P., Fischione, C., & Johansson, K. H. (2009). A generalized Markov chain model for effective analysis of slotted IEEE 802.15.4. In IEEE 6th International Conference on Mobile Adhoc and Sensor Systems Macau (pp. 130–139). 31. Aboelela, E., Edberg, W., Papakonstantinou, C., & Vokkarane, V. (2006). WSN based model for secure railway operations. In Proceedings 25th IEEE International Performance, Computer Communication Conference, Phoenix, AZ, USA (pp. 1–6). 32. Shafiullah, G., Gyasi-Agyei, A., & Wolfs, P. (2007). Survey of wireless communications applications in the railway industry. In Proceedings of 2nd International Conferences on Wireless Broadband Ultra Wideband Communication, Sydney, NSW, Australia (p. 65). 33. Shrestha, B., Hossain, E., & Camorlinga, S. (2010). A Markov model for IEEE 802.15.4 MAC with GTS transmissions and heterogeneous traffic in non-saturation mode. In IEEE International Conference on Communication Systems, Singapore (pp. 56–61). 34. Park, P., Di Marco, P., Fischione, C., & Johansson, K. H. (2013). Modeling and optimization of the IEEE 802.15.4 protocol for reliable and timely communications. In IEEE Transactions on Parallel and Distributed Systems (Vol. 24, no. 3, pp. 550–564), March 2013. 35. Farhad, A., Zia, Y., Farid, S., & Hussain, F. B. (2015). A traffic aware dynamic super-frame adaptation algorithm for the IEEE 802.15.4 based networks. In IEEE Asia Pacific Conference on Wireless and Mobile (APWiMob), Bandung (pp. 261–266). 36. Moulik, S., Misra, S., & Das, D. (2017). AT-MAC: Adaptive MAC-Frame payload tuning for reliable communication in wireless body area network. In IEEE Transactions on Mobile Computing (Vol. 16, no. 6, pp. 1516–1529), June 1, 2017. 37. Choudhury, N., & Matam, R. (2016). Distributed beacon scheduling for IEEE 802.15.4 clustertree topology. In IEEE Annual India Conference (INDICON), Bangalore, (pp. 1–6). 38. Choudhury, N., Matam, R., Mukherjee, M., & Shu, L. (2017). Adaptive duty cycling in IEEE 802.15.4 Cluster Tree Networks Using MAC Parameters. In Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Mobihoc’17, Chennai, India (pp. 37:1–37:2). 39. Moulik, S., Misra, S., & Chakraborty, C. (2019). Performance evaluation and Delay-Power Trade-off analysis of ZigBee Protocol. In IEEE Transactions on Mobile Computing (Vol. 18, no. 2, pp. 404–416), February 1, 2019.

Advanced Sensor Systems …

455

40. Barbieri, A., Chiti, F., & Fantacci, R. (2006). Proposal of an adaptive MAC protocol for efficient IEEE 802.15.4 low power communications. In Proceedings of IEEE 49th Global Telecommunication Conference, December 2006 (pp. 1–5). 41. Lee, B.-H., & Wu, H.-K. (2010). Study on a dynamic superframe adjustment algorithm for IEEE 802.15.4 LR-WPAN. In Proceedings of Vehicular Technology Conference (VTC), May 2010 (pp. 1–5). 42. Jeon, J., Lee, J. W., Ha, J. Y., & Kwon, W. H. (2007). DCA: Duty-cycle adaptation algorithm for IEEE 802.15.4 beacon-enabled networks. In Proceedings of the 65th IEEE Vehicular Technology Conference, April 2007 (pp. 110–113). 43. Goyal, R., Patel, R. B., Bhadauria, H. S., & Prasad, D. (2014). Dynamic slot allocation scheme for efficient bandwidth utilization in Wireless Body Area Network. In 9th International Conference on Industrial and Information Systems (ICIIS), Gwalior (pp. 1–7). 44. Na, C., Yang, Y., & Mishra, A. (2008). An optimal GTS scheduling algorithm for timesensitive transactions in IEEE 802.15.4 networks. In Computer Networks (Vol. 52 no. 13 pp. 2543–2557), September 2008. 45. Akbar, M. S., Yu, H., & Cang, S. (2017). TMP: Tele-Medicine protocol for slotted 802.15.4 with duty-cycle optimization in wireless body area sensor networks. IEEE Sensors Journal, 17(6), 1925–1936. 46. Koubaa, A., Alves, M., & Tovar, E. (2006). GTS allocation analysis in IEEE 802.15.4 for real-time WSNs. In Proceedings 20th IEEE International Parallel and Distributed Processing Symposium, Rhodes Island (p. 8). 47. Park, P., Fischione, C., & Johansson, K. H. (2013). Modeling and stability analysis of hybrid multiple access in the IEEE 802.15.4 protocol. ACM Transactions on Sensor Networks, 9(2), 13:1–13:55. 48. Alvi, A., Mehmood, R., Ahmed, M., Abdullah, M., & Bouk, S. H. (2018). Optimized GTS utilization for IEEE 802.15.4 standard. In International Workshop on Architectures for Future Mobile Computing and Internet of Things. 49. Song, J., Ryoo1, J., Kim, S., Kim, J., Kim, H., & Mah, P. (2007). A dynamic GTS allocation algorithm in IEEE 802.15.4 for QoS guaranteed real-time applications. In IEEE International Symposium on Consumer Electronics. ISCE 2007. 50. Lee, H., Lee, K., & Shin, Y. (2012). A GTS Allocation Scheme for Emergency Data Transmission in Cluster-Tree WSNs, ICACT2012, February 2012 (pp. 19–22). 51. Lei, X., Choi, Y., Park, S., & Hyong Rhee, S. (2012). GTS allocation for emergency data in low-rate WPAN. In 18th Asia-Pacific Conference on Communications (APCC), October 2012. 52. Yang, L., & Zeng, S. (2012). A new GTS allocation schemes For IEEE 802.15.4. In 2012 5th International Conference on BioMedical Engineering and Informatics (BMEI 2012). 53. Cheng, L., Bourgeois, A. G., & Zhang, X. (2007). A new GTS allocation scheme for IEEE 802.15.4 networks with improved bandwidth utilization. In International Symposium on Communications and Information Technologies 54. Udin Harun Al Rasyid, M., Lee, B., & Sudarsono, A. (2013). PEGAS: Partitioned GTS allocation scheme for IEEE 802.15.4 networks. In International Conference on Computer, Control, Informatics and Its Applications. 55. Roy, S., Mallik, I., Poddar, A., & Moulik, S. (2017). PAG-MAC: Prioritized allocation of GTSs in IEEE 802.15.4 MAC protocol—A dynamic approach based on Analytic Hierarchy Process. In 14th IEEE India Council International Conference (INDICON), December 2017. 56. Heinzelman, W. B., Chandrakasan, A. P., & Balakrishnan, H. (2002). An application-specific protocol architecture for wireless microsensor networks. IEEE Wireless Communication Transactions, 1(4), 660–670. 57. Philipose, A., & Rajesh, A. (2015). Performance analysis of an improved energy aware MAC protocol for railway systems. In 2nd International Conference on Electronics and Communication Systems (ICECS), Coimbatore, (pp. 233–236). 58. Kumar, D., & Singh, M. P. (2018). Bit-Map-Assisted Energy-Efficient MAC protocol for WSNs. International Journal of Advanced Science and Technology, 119, 111–122.

456

M. Tolani et al.

59. Duarte-Melo, E. J., & Liu, M. (2002). Analysis of energy-consumption and lifetime of heterogeneous WSNs. In Global Telecommunications Conference. GLOBECOM ’02. IEEE, 2002 (Vol. 1. pp. 21–25). 60. Shabna, V. C., Jamshid, K., & Kumar, S. M. (2014). Energy minimization by removing data redundancy in WSNs. In 2014 International Conference on Communication and Signal Processing, Melmaruvathur (pp. 1658–1663). 61. Yetgin, H., Cheung, K. T. K., El-Hajjar, M., & Hanzo, L. (2015). Network-Lifetime maximization of WSNs. IEEE Access, 3, 2191–2226. 62. Rajagopalan, R., & Varshney, P. K. (2006). Data-aggregation techniques in sensor networks: A survey. In IEEE Communications Surveys & Tutorials (Vol. 8, no. 4, pp. 48–63). Fourth Quarter 2006. 63. Jesus, P., Baquero, C., & Almeida, P. S. (2015). A survey of distributed data aggregation algorithms. In IEEE Communications Surveys Tutorials (Vol. 17, no. 1, pp. 381–404). Firstquarter 2015. 64. Zhou, F., Chen, Z., Guo, S., & Li, J. (2016). Maximizing lifetime of Data-Gathering trees with different aggregation modes in WSNs. IEEE Sensors Journal, 16(22), 8167–8177. 65. Sofra, N., He, T., Zerfos, P., Ko, B. J., Lee, K. W., & Leung, K. K. (2008). Accuracy analysis of data aggregation for network monitoring. MILCOM 2008–2008 IEEE Military Communications Conference, San Diego, CA (pp. 1–7). 66. Heinzelman, W., Chandrakasan, A., & Balakrishnan, H. (2000). Energy-Efficient communication protocols for wireless microsensor networks. In Proceedings of the 33rd Hawaaian International Conference on Systems Science (HICSS), January 2000. 67. Liang, J., Wang, J., Cao, J., Chen, J., & Lu, M. (2010). Algorithm, an efficient, & for constructing maximum lifetime tree for data gathering without aggregation in WSNs. In Proceedings IEEE INFOCOM, San Diego, CA (pp. 1–5). 68. Wu, Y., Mao, Z., Fahmy, S., & Shroff, N. B. (2010). Constructing maximum-lifetime datagathering forests in sensor networks. IEEE/ACM Transactions on Networking, 18(5), 1571– 1584. 69. Luo, D., Zhu, X., Wu, X., & Chen, G. (2011). Maximizing lifetime for the shortest path aggregation tree in WSNs. Proceedings IEEE INFOCOM, Shanghai (pp. 1566–1574). 70. Hua, C., & Yum, T. S. P. (2008). Optimal routing and data aggregation for maximizing lifetime of WSNs. IEEE/ACM Transactions on Networking, 16(4), 892–903. 71. Choi, K., & Chae, K. (2014). Data aggregation using temporal and spatial correlations in Advanced Metering Infrastructure. In The International Conference on Information Networking 2014 (ICOIN2014), Phuket (pp. 541–544). 72. Villas, L. A., Boukerche, A., Guidoni, D. L., de Oliveira, H. A. B. F., de Araujo, R. B., & Loureiro, A. A. F. (2013). An energy-aware spatio-temporal correlation mechanism to perform efficient data collection in WSNs. Computer Communications, 36(9), 1054–1066. 73. Liu, C., Wu, K., & Pei, J. (2007). An energy-efficient data collection framework for WSNs by exploiting spatiotemporal correlation. IEEE Transactions on Parallel and Distributed Systems, 18(7), 1010–1023. 74. Kandukuri, S., Lebreton, J., Lorion, R., Murad, N., & Daniel Lan-Sun-Luk, J. (2016). Energyefficient data aggregation techniques for exploiting spatio-temporal correlations in WSNs. Wireless Telecommunications Symposium (WTS) (pp. 1–6), London. 75. Mantri, D., Prasad, N. R., & Prasad, R. (2014). Wireless Personal Communications, 5, 2589. https://doi.org/10.1007/s11277-013-1489-x. 76. Mantri, D., Prasad, N. R., Prasad, R., & Ohmori, S. (2012). Two tier cluster based data aggregation (TTCDA) in WSN. In 2012 IEEE International Conference on Advanced Networks and Telecommunciations Systems (ANTS). 77. Pham, N. D., Le, T. D., Park, K., & Choo, H. SCCS: Spatiotemporal clustering and compressing schemes for efficient data collection applications in WSNs. International Journal of Communication Systems, 23, 1311–1333. 78. Villas, L. A., Boukerche, A., de Oliveira, H. A. B. F., de Araujo, R. B., & Loureiro, A. A. F. (2014). A spatial correlation aware algorithm to perform efficient data collection in WSNs. Ad Hoc Networks, 12, 69–85. ISSN 1570-8705.

Advanced Sensor Systems …

457

79. Krishnamachari, B., Estrin, D., & Wicker, S. B. (2002). The impact of data aggregation in WSNs. In ICDCSW ’02: Proceedings of the 22nd International Conference on Distributed Computing Systems (pp. 575–578). Washington, DC, USA: IEEE Computer Society. 80. Tolani, M., & Sunny, R. K. S. (2019). Lifetime improvement of WSN by information sensitive aggregation method for railway condition monitoring. Ad Hoc Networks, 87, 128–145. ISSN 1570-8705. https://doi.org/10.1016/j.adhoc.2018.11.009. 81. Tolani, M., & Sunny, R. K. S. (2019). Energy Efficient Adaptive Bit-Map-Assisted Medium Access Control Protocol, Wireless Personal Communication (Vol. 108, pp. 1595–1610). https://doi.org/10.1007/s11277-019-06486-9. 82. MacQueen, J. B. Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). Berkeley: University of California Press. 83. Miši´c, J., Shafi, S., & Miši´c, V. B. (2005). The impact of MAC parameters on the performance of 802.15.4 PAN. Ad Hoc Network. 3, 5 (September 2005), 509–528. https://doi.org/10.1016/ j.adhoc.2004.08.002. 84. An IEEE 802.15.4 complaint and ZigBee-ready 2.4 GHz RF transceiver. (2004). Microwave Journal, 47(6), 130–135. 85. Dargie, W., & Poellabauer, C. (2010). Fundamentals of WSNs: Theory and Practice. Wiley Publishing. 86. Park, P., Fischione, C., & Johansson, K. H. (2013). Modeling and stability analysis of hybrid multiple access in the IEEE 802.15.4 protocol. ACM Transactions on Sensor Networks, 9, 2, Article 13, 55 pages. 87. Zhan, Y., & Xia, M. A. (2016). GTS size adaptation algorithm for IEEE 802.15.4 wireless networks. Ad Hoc Networks, 37, Part 2, pp. 486–498. ISSN 1570-8705, https://doi.org/10. 1016/j.adhoc.2015.09.012. 88. Iala, I, Dbibih, I., & Zytoune, O. (2018). Adaptive duty-cycle scheme based on a new prediction mechanism for energy optimization over IEEE 802.15.4 wireless network. International Journal of Intelligent Engineering and Systems, 11(5). https://doi.org/10.22266/ijies2018. 1031.10. 89. Boulis, A. (2011). Castalia: A simulator for WSNs and Body Area Networks, user’s manual version 3.2, NICTA. 90. Kolakowski1, P., Szelazek, J., Sekula, K., Swiercz, A., Mizerski, K., & Gutkiewicz, P. (2011). Structural health monitoring of a railway truss bridge using vibration-based and ultrasonic methods. Smart Materials and Structures, 20(3), 035016. 91. Al-Janabi, T. A., & Al-Raweshidy, H. S. (2019). An energy efficient hybrid MAC protocol with dynamic sleep-based scheduling for high density IoT networks. IEEE Internet of Things Journal, 6(2), 2273–2287. 92. Penella-López, M. T., & Gasulla-Forner, M. (2011). Powering autonomous sensors: An integral approach with focus on solar and RF energy harvesting. Springer Link. https://doi.org/ 10.1007/978-94-007-1573-8. 93. Farag, H., Gidlund, M., & Österberg, P. (2018). A delay-bounded MAC protocol for missionand time-critical applications in industrial WSNs. IEEE Sensors Journal, 18(6), 2607–2616. 94. Lin, C. H., Lin, K. C. J., & Chen, W. T. (2017). Channel-Aware polling-based MAC protocol for body area networks: Design and analysis. IEEE Sensors Journal, 17(9), 2936–2948 95. Hodge, V. J., O’Keefe, S., Weeks, M., & Moulds, A. (2015). WSNs for condition monitoring in the railway Industry: A survey. IEEE Transactions on Intelligent Transportation Systems, 16(3), 1088–1106. 96. Ye, W., Heidemann, J., & Estrin, D. (2002). An energy-efficient MAC protocol for WSNs. In Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies (Vol. 3, pp. 1567–1576). 97. Siddiqui, S., Ghani, S., & Khan, A. A. (2018). ADP-MAC: An adaptive and dynamic pollingbased MAC protocol for WSNs. IEEE Sensors Journal, 18(2), 860–874. 98. Stem, M., & Katz, R. H. (1997). Measuring and reducing energy-consumption of network interfaces in hand held devices. IEICE Transactions on Communications, E80-B(8), 1125– 1131.

458

M. Tolani et al.

99. Lee, A. H., Jing, M. H., & Kao, C. Y. (2008). LMAC: An energy-latency trade-off MAC protocol for WSNs. International Symposium on Computer Science and its Applications, Hobart, ACT (pp. 233–238). 100. Karl, H., & Willig, A. (2005). Protocols and Architectures for WSNs. Wiley. 101. Balakrishnan, C., Vijayalakshmi, E., & Vinayagasundaram, B. (2016). An enhanced iterative filtering technique for data aggregation in WSN. In 2016 International Conference on Information Communication and Embedded Systems (ICICES), Chennai (pp. 1–6). 102. Nayak, P., & Devulapalli, A. (2016). A fuzzy logic-based clustering algorithm for WSN to extend the network lifetime. In IEEE Sensors Journal, 16(1), 137–144. 103. Tolani, M., Bajpai, A., Sunny, R. K. S., Wuttisittikulkij, L., & Kovintavewat, P. (2021). Energy efficient hybrid medium access control protocol for WSN. In The 36th International Technical Conference on Circuits/Systems, Computers and Communications, June 28th(Mon)– 30th(Wed)/Grand Hyatt Jeju, Republic of Korea. 104. María Gabriela Calle Torres, energy-consumption in WSNs Using GSP, University of Pittsburgh, M.Sc. Thesis, April. 105. Chebrolu, K., Raman, B., Mishra, N., Valiveti, P., & Kumar, R. (2008). Brimon: A sensor network system for railway bridge monitoring. In Proceeding 6th International Conference on Mobile Systems, Applications, and Services, Breckenridge, CO, USA, pp. 2–14. 106. Pascale, A., Varanese, N., Maier, G., & Spagnolini, U. (2012). A WSN architecture for railway signalling. In Proceedings of 9th Italian Network Workshop, Courmayeur, Italy (pp. 1–4). 107. Grudén, M., Westman, A., Platbardis, J., Hallbjorner, P., & Rydberg, A. (2009). Reliability experiments for WSNs in train environment. in Proceedings of European Wireless Technology Conferences, (pp. 37–40). 108. Rabatel, J., Bringay, S., & Poncelet, P. (2009). SO-MAD: Sensor mining for anomaly detection in railway data. Advances in Data Mining: Applications and Theoretical Aspects, LNCS (Vol. 5633, pp. 191–205). 109. Rabatel, J., Bringay, S., & Poncelet, P. (2011). Anomaly detection in monitoring sensor data for preventive maintenance. Expert Systems With Applications, 38(6), 7003–7015. 110. Reason, J., Chen, H., Crepaldi, R., & Duri, S. (2010). Intelligent telemetry for freight trains. Mobile computing, applications, services (Vol. 35, pp. 72–91). Berlin, Germany: Springer. 111. Reason, J., & Crepaldi, R. (2009). Ambient intelligence for freight railroads. IBM Journal of Research and Development, 53(3), 1–14. 112. Tuck, K. (2010). Using the 32 Samples First In First Out (FIFO) in the MMA8450Q, Energy Scale Solutions by free scale, FreeScale Solutions, 2010. http://www.nxp.com/docs/ en/application-note/AN3920.pdf. 113. Pagano, S., Peirani, S., & Valle, M. (2015). Indoor ranging and localisation algorithm based on received signal strength indicator using statistic parameters for WSNs. In IET Wireless Sensor Systems (Vol. 5, no. 5, pp. 243–249), October 2015. 114. Tolani, M., Bajpai, A., Sharma, S., Singh, R. K., Wuttisittikulkij, L., & Kovintavewat, Energy efficient hybrid medium access control protocol for WSN. In 36th International Technical Conference on Circuits/Systems, Computers and Communications, (ITC-CSCC 21), at Jeju, South Korea, 28–30 June 2021. 115. Tolani, M., Sunny, R. K. S. (2020). Energy-Efficient adaptive GTS allocation algorithm for IEEE 802.15.4 MAC protocol. Telecommunication systems. Springer. https://doi.org/10.1007/ s11235-020-00719-0. 116. Tolani, M., Sunny, R. K. S. Adaptive Duty Cycle Enabled Energy-Efficient Bit-Map-Assisted MAC Protocol. Springer, SN Computer Science. https://doi.org/10.1007/s42979-020-001627. 117. Tolani, M., Sunny, R. K. S. (2020). Energy-Efficient Hybrid MAC Protocol for Railway Monitoring Sensor Network (Vol. 2, p. 1404). Springer, SN Applied Sciences (2020). https://doi. org/10.1007/s42452-020-3194-1. 118. Tolani, M., Sunny, R. K. S. (2018). Energy-efficient aggregation-aware IEEE 802.15.4 MAC protocol for railway, tele-medicine & industrial applications. In 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Gorakhpur (pp. 1–5).

Advanced Sensor Systems …

459

119. Khan, A. A., Jamal, M. S., & Siddiqui, S. (2017). Dynamic duty-cycle control for WSNs using artificial neural network (ANN). International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2017, 420–424. https://doi.org/10.1109/ CyberC.2017.93 120. Wahyono, I. D., Asfani, K., Mohamad, M. M., Rosyid, H., Afandi, A., & Aripriharta (2020). The new intelligent WSN using artificial intelligence for building fire disasters. In 2020 Third International Conference on Vocational Education and Electrical Engineering (ICVEE) (pp. 1–6). https://doi.org/10.1109/ICVEE50212.2020.9243210. 121. Aliyu, F., Umar, S., & Al-Duwaish, H. (2019). A survey of applications of artificial neural networks in WSNs. In 2019 8th International Conference on Modeling Simulation and Applied Optimization (ICMSAO) (pp. 1–5). https://doi.org/10.1109/ICMSAO.2019.8880364. 122. Sun, L., Cai, W., & Huang, X. (2010). Data aggregation scheme using neural networks in WSNs. In 2010 2nd International Conference on Future Computer and Communication, May 2010 (Vol. 1, pp. V1-725–V1-729). 123. Elia, M. et al. (2006). Condition monitoring of the railway line and overhead equipment through onboard train measurement-an Italian experience. In Proceedings of IET International Conference on Railway Condition Monitor, Birmingham, UK (pp. 102–107). 124. Maly, T., Rumpler, M., Schweinzer, H., & Schoebel, A. (2005). New development of an overall train inspection system for increased operational safety. In Proceedings of IEEE Intelligent Transportation Systems, Vienna, Austria (pp. 188–193).

Four Wheeled Humanoid Second-Order Cascade Control of Holonomic Trajectories A. A. Torres-Martínez, E. A. Martínez-García, R. Lavrenov, and E. Magid

Abstract This work develops model-based second-order cascade motion controller of a holonomic humanoid-like wheeled robot. The locomotion structure is comprised of four mecanum wheels radially arranged. The model is given as a function of all wheels contribution adding maneuverability to upper limbs. High-order derivatives are synchronized through numeric derivations and integration, obtained online for consistent performance of inner loops feedback. The controller deploys reference inputs vectors, both global and local to each cascade loop. In this approach, the controller decreases errors in position, velocity and acceleration simultaneously through Newton-based recursive numerical approximations. A main advantage of this approach is the robustness obtained by three recursive feedback cascades: distance, velocity and acceleration. Observers are modeled by combining multi-sensor inputs. The controller showed relative complexity, effectiveness, and robustness. The proposed approach demonstrated good performance, re-routing flexibility and maneuverability through numerical simulations. Keywords Cascade-control · Holonomy · Wheeled-humanoid · Path-tracking · Control-loop · Mobile-robot

A. A. Torres-Martínez · E. A. Martínez-García (B) Institute of Engineering and Technology, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez, Mexico e-mail: [email protected] R. Lavrenov · E. Magid Institute of Information Technology and Intelligent Systems, Kazan Federal University, Kazan, Russian Federation e-mail: [email protected] E. Magid HSE Tikhonov Moscow Institute of Electronics and Mathematics, HSE University, Moscow, Russian Federation © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 A. T. Azar and A. Koubaa (eds.), Artificial Intelligence for Robotics and Autonomous Systems Applications, Studies in Computational Intelligence 1093, https://doi.org/10.1007/978-3-031-28715-2_15

461

462

A. A. Torres-Martínez et al.

1 Introduction Wheeled mobile robots are widely used in a number of applications. The performance of a wheeled robot is considerably good in particular on flat and structured floors. For instance, they are faster, more efficient in reaching positions and usually more effective in terms of mechanical energy requirements than walking bipeds. In numerous robotic applications, particularly in cluttered environments, omnidirectional rolling locomotion capabilities result suitable to easily change the robot’s body posture to basically move towards any direction without explicit yaw control. Nowadays, deployment of omnidirectional wheels as a means for locomotion in mobile robotics is highly demanding in a number or applications due to their ability to move in any direction and particularly driving in plain structured confined spaces [1, 2]. The use of omniwheels, unlike conventional wheels have fewer kinematic constraints and allow the robot to move in a wide range of mobility. Adding, holonomy, considerable maneuverability. In modern times, the cases of omniwheel-based holonomic robots developed for different applications are considerable numerous and relevant. For instance, the use of personal assistant robots as walking-helper tool, demonstrated capability to provide guidance and dynamic support for impaired walking people [3]. There are also other types of human-robot assistants in cases where mobile robots are purposed to perform interaction socially assistive [4]. In healthcare, robotic systems have been designed with mecanum wheels for providing omnidirectional motion to wheelchairs [5]. Although, manipulators onboard mobile platforms are not relatively new approaches, however mobile manipulators with omnidirectional locomotion provides interesting advantages. Mecanum-wheeled platforms have been performed as holonomic vehicular manipulators moving in industrial working spaces [6]. Park et al. [7] presented a controller for velocity tracking and vibration reduction of a cart-pole inverted pendulum like-model of omnidirectional assistive mobile robot. The robot adopted mecanum wheel rolling with suspension for keeping consistent contact mecanum wheel and ground while transporting heavy goods placed on high locations. Moreover, instrumented omnidirectional platforms with visually guided servoing devices have been reported [8]. Furthermore, holonomic robotic platforms have been exploited as robotized sporting and training technology to provide assistance and training in racquet sports [9]. A traditional robotic application is exploiting advantages of omniwheel-based mobile robots being deployed as domestic assistants in household environments [10]. An advantageous application of mecanum-wheels used as an omni-directional mobile robot in industrial fields has been critical. For instance, autonomous material indoor transportation [11] as well as robotic platforms of four omnidirectional wheels working in warehouses [12]. The work [13] performed collaborative manipulation by multirobot displacing payloads transported to desired locations in planar obstacle-clustered scenarios maneuvering through narrow pathways for which advocated the use of Mecanum-Wheeled Robots positioning without body-orientation change. The work [14] developed modular reconfiguration by deploying a group of vehicles to perform different mission tasks. Recofnigura-

Four Wheeled Humanoid Second-Order …

463

tion was done at the level of motion planning deploying four-wheel-drive Mecanum vehiclemobile robots. A critical deployment of omniwheel robotics is in a gaining popularity field for omnidirectional humanoids, that is in nursing and rehabilitation [15]. A variety of demands on the use of robots differs considerably on how to deploy them. For instance, in the industry with robots working on-site with highly accurate robotic arms, or mobile platforms moving heavy loads and assisting humans workers in close proximity [16]. This chapter presents the trajectory tracking model of a humanoid robot at the stage of kinematic modeling and simulation. This work approaches a model of two upper limbs of three joints fixed on a trunk that is placed on a Mecanum wheeled platform with four asynchronous rolling drives. This research’s main contribution is the development of a three-cascade kinematic trajectory tracking controller. Each cascade is comprised of a different order derivative deduced from the robot’s kinematic model. Different observers to complement the control are developed considering a deterministic approach and based on wheels-encoder and an inertial measurement unit. The omniwheels physical arrangement is radially equidistant and tangential rotated with respect to (w.r.t.) the center of reference. This work shows numerical simulation results that allow validating and understanding proposed models and ideas, as well as to refine them before converting them into feasible and operational physical systems. This chapter organizes the sections as follows. Section 2 briefly discusses similar works. Section 3 deduces motion equations of the robot’s arms and its four-wheel four-drive omnidirecional rolling platform. Section 4 defines the sensing model and observers used as online feedback elements. Section 5 describes the three-cascade controller. Finally, Sect. 7 provides conclusion of the work.

2 Related Work The study of position errors and calibration methods for robot locomotion with omnidirectional wheels has demonstrated to be relevant [17]. The work [18] developed a reference-based control for a mecanum-wheels omnidirectional robot platform, relying on the robot’s kinematic model and generate trajectories and optimal constrained navigation. The cost function quantified differences between the robot’s path prediction and using a family of parameterized reference trajectories. [19] demonstrated control of a time-varying proportional integral derivative model for trajectory tracking of a mecanum-wheeled robot. It used linearization of a nonlinear kinematic error model and controller’s parametric coefficients adjusted by trial-and-error. Omniwheel-based robot motion is effected by systematic perturbations differently as in conventional wheeled robots. Identifying the sources of pose errors are critical to develop methods for kinematic errors reduction of omni-directional robotic system [20]. The work [21] evaluated a method to correct systematic odometry errors of a humanoid-like three-wheeled omnidirectional mobile robot. Correction was made

464

A. A. Torres-Martínez et al.

by iteratively adjusting effective values with respect to robot’s kinematic parameters, matching referenced positions by estimation. The correct functionality of a four Mecanum wheels robot was approached by the Dijkstra’s algorithm and tested in [22]. The work [23], proposed an odometry-based kinematic parameters error calibration deploying least squares linear regression for a mobile robot with three omniwheels. Similarly, [24] presented a calibration system to reduce pose errors based on the kinematic formulation of a three-wheeled omnidirectional robot, considering systematic and non-systematic errors compensation. Another three-omniwheel robot was reported in [27], where motion calibration is obtained by optimizing the effective kinematic parameters and the inverse Jacobian elements are minimized through a cost function during path tracking. Previous cited works reported different solutions to calibrate odometric position errors in omni-wheel-based mobile robots. Such works highlight two main approaches, by numerical estimations and by modeling deterministic metrical errors. A main difference with respect to the present context is the focus on tracking control of local Cartesian points within a global trajectory, unlike encoders usage, other pose measurement methods are considered, for instance data fusion of online heterogeneous inertial measurements. The research [25] reported a radially arranged omnidirectional four-wheeled robot controlled by three proportional-integral-derivative controllers (PID). The PIDs controlled speed, heading, and position during trajectory tracking and using odometry to measure the robot’s posture. The research [26] developed a theoretical kinematic basis for accurate motion control of combined mobility configurations based on coefficients for speeds compensation mainly caused by wheels slippage in a four mecanum wheels industrial robot. The work [28] reported a controller based on observer and a high order sliding mode for a multirobot system of three-wheeled omnidirectional platforms. Previous cited works reported motion control approaches of omniwheel robotic platforms that tackled either slippage problems or motion inaccuracies to improve the robot’s posture. As a difference with the present work, it assumes that pose observation is already adequate but rather focusing on robustly controlling the robot’s path motion along a linear trajectory segment by triple kinematics control of high-order derivatives, simultaneously. The research reported in [29] introduced a general model for analysis of symmetrical multi-wheel omnidirectional robot. Inclusion of constrained trajectory planning optimization was implemented for free collision navigation. The reported work in [30] introduced a kinematic model for trajectory tracking control of a radially oriented four-wheel omnidirectional robot using odometry as feedback. The work [31] presented a four Mecanum wheels omnidirectional mobile robot for motion planning tasks implementing fault-tolerance on wheels with a fuzzy controller. The work conducted by [32] proposed a neural control algorithm to determine neural network weights adaptation with parametric disturbances as an intelligent control for path motion by a four mecanum wheels mobile robot. Fault tolerant navigation control on a four mecanum-wheel structure was developed by [33], using adaptive control second order dynamics and parametric uncertainty. A controller for an omni-wheeled industrial manipulator was presented by [34], it adaptively combined a fuzzy wavelet neural network, a sliding mode and a fractional-order criterion.

Four Wheeled Humanoid Second-Order …

465

Finally, a path-following control using extended Kalman filtering for sensor fusion was introduced in [35]. Some of previous cited works reported approaches using soft-computing techniques combined with traditional control methods for tracking, either for recovery of disturbances and fault tolerances in tracking motion control. As a difference, in the present research a model-based recursive control is proposed with the particularity of implementing inner multi-cascades combining multiple higher-order inputs. Numerical errors are reduced with respect to a reference model by successive approximations as convergence laws. The focus presented in this research differs from most the cited related work, fundamentally in the class of control’s structure and the kind of observers models. For instance, while a traditional PID controller might combine three different order derivatives as a summation of terms into an algebraic expression, the proposed approach exploits each derivative inside another of lower order and faster sampling as different recursive control cycles.

3 Robot Motion Model This section describes the essential design parts of the proposed robotic structure at the level of simulation model. Additionally, both kinematic models, the onboard manipulators and the four mecanum wheels and the omnidirectional locomotive structure are illustrated. Figure 1a depicts the humanoid CAD concept of the proposed robotic platform. Figure 1b shows a basic figure created in C/C++ language as a resource for numerical simulations, which deploy the Object Dynamic Engine (ODE) libraries to create simulated animations.

(a) CAD structure.

(b) Simulation model.

Fig. 1 Mecanum four-wheeled humanoid structure. a a CAD mo del. b a simulation model from the physics engine ODE

466

A. A. Torres-Martínez et al.

Fig. 2 Onboard arms basic mechanism. Joints and links kinematic parameters (above). Side of the elbow mechanism (middle). Side of the wrist mechanism and shoulder gray-color gear (below)

The four mecanum wheels are located symmetric radially arranged and equidistant beneath the chassis structure. Each wheel is independently driven both rotary directions. This work provides the emphasis on the omnidirectional locomotion controller, since motion over the plane ground has impacts on the manipulators position, adding robot’s translation and orientation is given in models separately along the manuscript. Figure 2 illustrates a basic conceptual design purposed to help describing joints’ functional form. The limbs purpose in this manuscript is to illustrate general interaction in general scenarios with manipulable objects. Therefore, the onboard arms may be modeled for multiple degrees of freedom. However, in this manuscript the manipulators have been established symmetrically planar with three rotary joints: shoulder (θ0 ), elbow (θ2 ) and wrist (θ2 ), all turning in pitch (see Fig. 2). Additionally, the robot’s orientation is assumed to be the arms’ yaw motion (θt ). The onboard arm’s side view is shown in Fig. 2(below), where the gray-color gear is the actuating device ϕ0 that rotates a shoulder. The arm’s joint φl1 describes angular displacements for link l1 . Figure 2(middle) shows an antagonistic arm’s side view where the orange-color joint mechanism for θ1 (elbow) is depicted. The mechanism device for θ1 has asynchronous motion from θ0 and θ2 . Additionally,

Four Wheeled Humanoid Second-Order …

467

Fig. 2(middle) shows yellow-color gearing system to depict how the wrist motoin is obtained and transmitted from the actuating gear ϕ2 towards ϕ5 . The wrist rotary angle is the joint θ2 that rotates the gripper’s elevation angle. Hence, without lose of generality, the Cartesian position is a system of equations . .. that are established for now in two-dimension, z = 0, z = 0 and z = 0. Let z be the depth dimension no treated in this section. Subsequently, a third Cartesian component may be stated when the robot’s yaw is defined as it impacts the arms pose, given in the next sections. From the depiction of Fig. 2(above), the following arms position xa , y A expressions are deduced to describe the motion in sagittal plane (pitch), xa = l1 · cos(θ0 ) + l2 · cos(θ0 + θ1 ) + l3 · cos(θ0 + θ1 + θ2 ),

(1)

ya = l1 · sin(θ0 ) + l2 · sin(θ0 + θ1 ) + l3 · sin(θ0 + θ1 + θ2 ).

(2)

and

where the functional forms of actuating joints are described in the following Definition 1, Definition 1 (Joints functional forms) Assuming gears angles and teeth numbers by ϕi and n j , respectively, let ϕ0 be the actuating joint, θ0 = ϕ0 .

(3)

Let ϕ6 be an actuating gear that transmits rotation to gear ϕ6 ≡ θ1 for link l2 ,  ϕ8 =

n6n7 n7n8

 · ϕ6 =

n6 · ϕ6 , n8

(4)

. Therefore, θ1 = ϕ8 . Let ϕ1 transmit motion to ϕ5 ≡ θ2 for l3 rotation by  ϕ5 =

n1n2n3n4 n2n3n4n5

 · ϕ1 =

n1 · ϕ1 , n5

(5)

. therefore θ2 = ϕ5 . Previous statements conduct to the following Proposition 1. Proposition 1 (Arm’s kinematic law) The kinematic control including the gears mechanical advantages n 6 /n 8 and n 1 /n 5 , reaches the reference angular positions θ0,1,2 , while varying joints angles ϕ0,8,5 by 

xat+1 − xat yat+1 − yat



  ⎛θ − ϕ ⎞ 0 0 −l1 c0 − nn 68 (l1 c0 + l2 c01 ) − nn 15 (l1 c0 + l2 c01 + l3 c012 ) ⎝θ1 − ϕ8 ⎠ = n6 n1 l1 s0 n 8 (l1 s0 + l2 s01 ) n 5 (l1 s0 + l2 s01 + l3 s012 ) θ2 − ϕ5

(6)

468

A. A. Torres-Martínez et al.

Fig. 3 Onboard arms local Cartesian motion simulation for an arbitrary trajectory

Hence, the law (6) is satisfied when limϕi →θi (x, y)t+1 − (x, y)t = 0. Being x, y a Cartesian position, and (θi − ϕi ) an instantaneous joints error. It follows that validating previous kinematic expressions (1) and (2), Fig. 3 shows a numerical simulation for Cartesian position along an arbitrary trajectory. Moreover, from the system of nonlinear equations modeling position (1) and (2) and hereafter assuming that joints θ j (ϕk ) are functions in terms of gears rotations. Thus, first-order derivative w.r.t. time is algebraically deduced and Cartesian velocities are described by     1   2   −s0 ˙ −s01 ˙ −s012 ˙ x˙a = l1 θ0 + l2 θi + l3 θi . y˙a c0 c01 c012 i=0

(7)

i=1

It follows the second-order derivative which describe the arms Cartesian accelerations, where the serial links’ Jacobian is assumed a non stationary matrix Jt ∈ R2×3 , such that ⎛ ⎞ ⎛ ⎞   θ˙ 0 θ¨0 x¨a ˙ ⎝ ⎝ ⎠ ¨ = Jt · θ1 + Jt · θ˙ 1 ⎠ (8) y¨a θ¨2 θ˙ 2 The ultimate purpose of this section is to merely establish kinematic models as a basis for the following sections. However, the essential focus of this research is the locomotive control of the mobile holonomic structure. At this point, the two-

Four Wheeled Humanoid Second-Order …

469

Fig. 4 4W4D holonomic kinematics. Mecanum wheels location without twist (left). Wheels positions and twisted ± π2 w.r.t. its center (right)

dimension manipulators can therefore exploit the native holonomic mobility such as position and rotation as to provide three-dimension spatial manipulator’s trajectories. Additionally, omnidirectional mobility as a complement to the arms, allows arms’ degrees of freedom complexity reduction. Let us establish the following mobility kinematic constraints depicted in Fig. 4. Therefore, without loss of generality let us state the following Proposition 2, Proposition 2 (Holonomic motion model) Let ut be the robot state vector in its Cartesian form with components (x, y) orientation, such that ut ∈ R2 , u = (x, y) . Hence, the forward kinematics is .

.

u = r K · Φ,

(9)

where, K is a stationary kinematic control matrix containing geometrical parameters and r is the wheels radius. Let Φ t = (φ1 , φ2 , φ3 , φ4 ) be the four wheels angular velocity vector. Likewise, the backward kinematics where the constraints matrix is a non square system 1 1 . Φ˙ = · K+ · u˙ = · K (K · K )−1 · u. r r

(10)

Therefore, according to geometry of Fig. 4 and the general models of previous Propo. . sition 2, the following algebraic deduction arises, Cartesian speeds x and y in holonomic motion are obtained from wheels tangential velocities Vk expressed as,

470

A. A. Torres-Martínez et al.

π π π π . + V2 · cos α2 − + V3 · cos α3 − + V4 · cos α4 − , x = V1 · cos α1 − 2 2 2 2

(11)

as well as π π π π . y = V1 · sin α1 − + V2 · sin α2 − + V3 · sin α3 − + V4 · sin α4 − . 2 2 2 2

(12)

Moreover, let. Vk be the wheels tangential velocities described in terms of the angular speeds φ k , such that the following equality is stated, .

Vk = r · φ k

(13)

From where, the stationary non square kinematic control matrix K is provided by Definition 2, Definition 2 (4W4D holonomic kinematic matrix) Each wheel with angle αk w.r.t. the robot’s geometric center, thus  K=

 cos(α1 − π2 ) cos(α2 − π2 ) cos(α3 − π2 ) cos(α4 − π2 ) . sin(α1 − π2 ) sin(α2 − π2 ) sin(α3 − π2 ) sin(α4 − π2 )

(14)

It follows that the speed holonomic model in as a function of wheels angular speeds and matrix K is ⎛. ⎞ φ1 . ⎜. ⎟ x ⎜φ ⎟ . (15) = r · K · ⎜ . 2⎟ . y ⎝φ 3 ⎠ .

φ4 Similarly, from previous model higher-order derivatives are deduced for subsequent treatment for the sake of controller cascades building. Thus, the second-order kinematic model is ⎛ .. ⎞ φ  ..   ⎜ .. 1 ⎟  x cos(α1 − π2 ) cos(α2 − π2 ) cos(α3 − π2 ) cos(α4 − π2 ) ⎜φ 2 ⎟ .. = r · · ⎜ .. ⎟ . sin(α1 − π2 ) sin(α2 − π2 ) sin(α3 − π2 ) sin(α4 − π2 ) ⎝φ 3 ⎠ y .. φ4 (16) Likewise, a third-order derivative is provided by model

Four Wheeled Humanoid Second-Order …

471

(a) First order derivative performance.

(b) Second order derivative performance. Fig. 5 General higher order derivatives for 4W4D holonomic model. Velocity (above). Acceleration (below)

472

A. A. Torres-Martínez et al.

⎛... ⎞ φ  ⎜...1 ⎟  ... π π π π x cos(α1 − 2 ) cos(α2 − 2 ) cos(α3 − 2 ) cos(α4 − 2 ) ⎜φ 2 ⎟ ... = r · · ⎜... ⎟ . sin(α1 − π2 ) sin(α2 − π2 ) sin(α3 − π2 ) sin(α4 − π2 ) ⎝φ 3 ⎠ y ...

φ4 (17) The fact that matrix K is stationary keeps simplistic the linear derivative expressions. For this type of four-wheel holonomic platforms, their kinematic models produce the following behavior curves shown in Fig. 5.

4 Observer Models This section establishes the main sensing models that are assumed deterministic and in the cascade controller as elements of feedback for observing the robot’s model state. It is worth saying that perturbation models and noisy sensor measurements and calibration methods are out of the scope of this manuscript’s interest. Thus, let us assume a pulse shaft encoder fixed for each wheel. Hence, let φˆ εk be a measurement of the angular position of the k th -wheel, φˆ εkt (η) =

2π ηt , R

(18)

where ηt is the instantaneous number of pulses detected while wheel is rotating. Let R be defined as the encoder angular resolution. Furthermore, the angular velocity encoder-based observation is given by the backward high-precision first-order derivative, 3φˆ t − 4φˆ t−1 + φˆ t−2 , (19) φˆ˙ ε (η, t) = (tk − tk−1 )(tk−1 − tk−2 ) with three previous measurements of angle θˆε and time tk . Hence, the kth wheel’s tangential velocity is obtained by υk =

πr (3ηt − 4ηt−1 + ηt−2 ) , RΔt

(20)

. where r is the wheel’s radius and considering varying time loops, let Δt = (tk − tk−1 )(tk−1 − tk−2 ). Without loss of generality, let us substitute previous statements in Proposition 3 to describe Cartesian speeds observation, such that Proposition 3 (Encoder-based velocity observer) For simplicity, let us define the . constants βk = αk − π2 as constant angles for wheels orientation. Therefore, the ˆ˙ yˆ˙ are modeled by encoder-based velocity observers x,

Four Wheeled Humanoid Second-Order …

xˆ˙k =

473 4

υk sin(βk )

(21)

υk cos(βk ).

(22)

k=1

and yˆ˙k =

4

k=1

Moreover, the four wheels tangential speeds contribute to exert yaw motion w.r.t. the center of the robot. Thus, since by using encoders the wheels linear displacements can be inferred, then an encoder-based yaw observation θˆε is possible, θˆε =

4 πr

ηk , 2L k=1

(23)

where L is the robot’s distance between any wheel and its center of rotation. Thus, the robot’s angular velocity observer based only on the encoders measurements is θˆ˙ε =

4  πr  3ηkt − 4ηkt−1 + ηkt−2 . 4R LΔt k=1

(24)

In addition, in order to decrease time-based derivative order of gyroscope’s raw measurements θ˙g , let us integrate sequence of raw measurements according to the Newton-Cotes integration approach as Definition 3, Definition 3 (Online sensor data integration) The robot’s yaw observation is inferred by time integration of raw angular velocity measurements, such that θˆg =



tN t0

t N − t0 θˆ˙g dt = 2N



θˆ˙g0 + 2

N −1

θˆ˙gk + θˆ˙g N

 ,

(25)

k=1

with N available measurements in time segment t N − t0 , Furthermore, for the robot’s yaw motion let us assume a fusion of encoders and inertial measurements about angle θι (inclinometer [rad]) and angular velocity ωg (gyroscope [rad/s]), such that a complete robot’s yaw rate observer is provided by Proposition 4. Proposition 4 (Yaw deterministic observers) The robot’s angular velocity is an averaged model of three sensing models, encoder θˆε , inclinometer θˆι and gyroscope θˆg such that  1 1 1 t2 ˆ θˆt = θˆε + θˆι + θ˙g dt, (26) 3 3 3 t1

474

A. A. Torres-Martínez et al.

where θˆε is substituted by (23). It follows that its first-order derivative, ωˆ t =

1ˆ 1 d θˆι 1 + θˆ˙g , θ˙ε + 3 3 dt 3

(27)

where θˆ˙ε is substituted by (24).

5 Omnidirectional Cascade Controller The relevant topic of this manuscript is a second-order cascade controller. It implies three nested control cycles. The highest frequency is the acceleration loop within the velocity cycle, and both in turn inside the loop for position, being the latter the slowest. The global cycle of position uses global references, while the rest of the inner cycles work on predictions as local references. The proposed cascade controller considers three feedback simultaneously and reduces the errors by recursive calculations and successive approximations with different sampling frequencies. By stating the equation provided in Proposition 2 in the form of differential equation dΦ du =r ·K· (28) dt dt and solving according to the following expression, where both differentials dt are reduced, and by integrating w.r.t. time establishing the limits in both sides of equation: 

u2

 dudt = r · K

Φ2

dΦdt,

(29)

u2 − u1 = r · K · (Φ 2 − Φ 1 ).

(30)

u1

Φ1

resulting in the following equality:

Therefore, considering the Moore-Penrose approach to obtain the pseudoinverse of the non square stationary kinematic matrix K and by solving and algebraically arranging using continuous-time notation, for and expression in terms of a recursive backward solution, Φ t+1 = Φ t +

1 · K T (K · K T )−1 · (ur e f − uˆ t ), r

(31)

where, the prediction vector ut+1 is re-formulated as the global reference ur e f or the goal the robot is desirable to reach. Likewise, for the forward kinematic solution ut+1 is (32) ut+1 = ut + r · K · (Φ t+1 − Φˆ t ).

Four Wheeled Humanoid Second-Order …

475

Fig. 6 A global cascade recursive simulation for Cartesian position

Therefore, the first global controller cascade is formed by means of the pair of recursive expressions (31) and (32). Proposition 5 highlights the global cascade by. Proposition 5 (Feedback position cascade) Given the inverse kinematic motion with observation in the workspace uˆ t Φ t+1 = Φ t +

1 · K T (K · K T )−1 · (ur e f − uˆ t ), r

(33)

ˆ t, and direct kinematic motion with observation in the control variables space Φ ˆ t ). ut+1 = ut + r · K · (Φ t+1 − Φ

(34)

Proposition 5 is validated through numerical simulations that are shown in Fig. 6 the automatic Cartesian segments and feedback position errors decreasing. Without loss of generality and following the same approach as Proposition 5, let us represent a second controller cascade to control velocity. Thus, the following equation expresses the second-order derivative kinematic expression given in (16) as a differential equation, . d u˙ dΦ =r ·K· (35) dt dt and solving definite integrals,

476

A. A. Torres-Martínez et al.

Fig. 7 Numerical simulation for second cascade inner recursive in terms of velocities



.

u2 .

.



d udt = r · K

u1

.

Φ2 .

Φ1

.

d Φdt,

(36)

and similarly obtaining the following first-order equality, .

.

.

.

u2 − u1 = r · K · (Φ 2 − Φ 1 ).

(37)

It follows the Proposition 6 establishing the second cascade controlling the first-order derivatives. Proposition 6 (Feedback velocity cascade) The backwards kinematic recursive function with in-loop velocity observers and prediction u˙ t+1 used as local reference u˙ r e f is given by 1 Φ˙ t+1 = Φ˙ t + · K T · (K · K T )−1 · (u˙ r e f − uˆ˙ t ), r

(38)

likewise the forward speeds kinematic model, ˆ˙ ). ˙ t+1 − Φ u˙ t+1 = u˙ t + r · K · (Φ t

Proposition 6 is validated by simulation Fig. 7. Similarly, the third-order model from Eq. (17),

(39)

Four Wheeled Humanoid Second-Order …

477

...

...

u =K·Φ

(40)

and stated as a differential equation, ..

..

du dΦ =r ·K· , dt dt

(41)

which is solved by definite integral in both sides of equality 



..

u2 ..

..

d udt = r · K

u1

..

Φ2 ..

Φ1

..

d Φdt,

(42)

thus, a consistent second-order derivative (acceleration) equality is obtained, ..

..

..

..

u2 − u1 = r · K · (Φ 2 − Φ 1 ).

(43)

Therefore, the following Proposition 7 is provided and notation rearranged for a third recursive inner control loop in terms of accelerations. Proposition 7 (Feedback acceleration cascade) The backwards kinematic recursive function with in-loop acceleration observers and prediction Φ¨ t+1 used as local reference u¨ r e f is given by 1 Φ¨ t+1 = Φ¨ t + · K T · (K · K T )−1 · (u¨ r e f − u¨ˆ t ). r

(44)

Additionally, the forward acceleration kinematic model is .. .. .. .. ˆ ut+1 = ut + r · K · (Φ t+1 − Φ t ),

(45)

Proposition 7 is validated through numerical simulation of Fig. 8 showing an arbitrary performance. At this point is worth highlighting a general convergence criteria for recursive control loops. The cycles end up by satisfying feedback error numerical precision εu,Φ , such that. Definition 4 (Convergence criteria) When the feedback error numerically meets a local general reference according to the limit lim

ΔΦ→0

Φ r e f − Φˆ = 0,

ˆ that numerically will nearly approach where the feedback error is eΦ = (Φ r e f − Φ) zero. Thus, given the criterion eΦ < εΦ is a

478

A. A. Torres-Martínez et al.

Fig. 8 Numerical simulation for third cascade inner recursive in terms of accelerations

Fig. 9 Second-order cascade controller block diagram

   Φ − Φˆ  t  ref   < ε.  Φre f 

(46)

Although, previous Definition 4 was described as a criterion for Φ, it has general applicability being subjected to condition any other control variable in process. Therefore, given Propositions 5, 6 and 7, which establish the different recursive control loops for each order of derivative involved, Fig. 9 shows cascades coupling forming the controller. Therefore, Fig. 9 is summarized in Table 1 depicting only the coupling order of solutions provided in Propositions 5–7. Essentially, the coupling element between steps 1 and 2 of Table 1 is a first order derivative of wheels angular velocities w.r.t. time. Likewise, the following list briefly describes the observers or sensing models interconnecting every step in the controller

Four Wheeled Humanoid Second-Order …

479

Table 1 Cascade-coupled recursive controllers Steps Equation  −1 · KT K · KT · (ur e f − uˆ t )

1

Φ t+1 = Φ t +

2

. . . ˆ. ut+1 = ut + r · K · (Φ t+1 − Φ t )

3

Φ t+1 = Φ t +

4

.. ..ˆ .. .. ut+1 = ut + r · K · (Φ t+1 − Φ t )

5

Φ t+1 = Φ t +

6

ut+1 = ut + r · K · (Φ t+1 − Φˆ t )

..

.

..

.

1 r

..

..

· K T (K · K T )−1 · (ut+1 − uˆ t )

1 r

. . · K T (K · K T )−1 · (ut+1 − uˆ t )

1 r

cascades. Basically, it is about variational operations with fundamentals in numerical derivation and integration to transform sensor data into consistent physical units. 1. Feedback from step 1 to step 2: ˆ t−1 + Φ ˆ t−2 ˆ t − 4Φ d 3Φ , Φ˙ t+1 = Φ t+1 = dt Δt 2. Feedback from step 2 to step 3: u¨ t+1

. . . d . 3uˆ t − 4uˆ t−1 + uˆ t−2 , = ut+1 = dt Δt

3. Feedback from step 4 to step 5: 

t

u˙ t+1 =

..

ut+1 dt =

t0

n−1

tn − t0 ..ˆ .. .. · (u 0 + 2 uˆ i + uˆ k ), 2 · nt i=1

4. Feedback from step 5 to step 6: 

n−1

tn − t0 ˆ. ˆ. ˆ. Φ t+1 dt = · (Φ 0 + 2 Φ i + Φ k ), 2 · nt i=1

t .

Φ t+1 = t0

Additionally, the following listing Algorithm 1 is the controller in pseudocode notation.

480

A. A. Torres-Martínez et al.

. . .. .. ˆ. .. ..ˆ ˆ t, Φ Data: , K, ur e f , ut , uˆ t , ut , uˆ t , ut , uˆ t , Φ t , Φ t , Φt , Φt ur e f = (xi , yi )T ; while (ur e f − ut ) < u do Φ t+1 = Φ t + r1 · K T (K · K T )−1 · (ur e f − uˆ t ); d dt Φ t+1

=

.

ˆ t −4Φ ˆ t−1 +Φ ˆ t−2 3Φ ; Δt

ˆ˙ ) < ε do while (Φ t+1 − Φ t Φ˙

. . . ˆ. ut+1 = ut + r · K · (Φ t+1 − Φ t ); d . dt ut+1

ˆ.

ˆ.

ˆ.

t−1 +ut−2 = 3ut −4uΔt ; while (u¨ t+1 − u¨ˆ t ) < εu¨ do

..

..

.. .. · K T · (K · K T )−1 · (ut+1 − uˆ t ); .. ..ˆ .. .. ut+1 = ut + r · K · (Φ t+1 − Φ t );  b ..  ..ˆ ..ˆ jn−1 ..ˆ b−a j=1 uk j + ukn ); a ut+1 dt = 2·n · (u0 + 2 · end . . . . Φ t+1 = Φ t + r1 · K T · (K · K T )−1 · (ut+1 − uˆ t ); . . . . b  jn−1 ˆ ˆ ˆ b−a j=1 Φ k j + Φ kn ); a Φ t+1 dt = 2·n · (Φ 0 + 2 · end ˆ t ); ut+1 = ut + r · K · (Φ t+1 − Φ end

Φ t+1 = Φ t +

1 r

Algorithm 1: Second-order three cascade controller pseudocode

The following Figures of Sect. 6 show the numerical simulation results under a controlled scheme. The robot navigate to different Cartesian positions and within trajectory segments the cascade controller is capable to control position, then controls the velocity exerted within such segment of distance, and similarly the acceleration is controlled within a small such velocity-window that is being controlled.

6 Results Analysis and Discussion The three cascade controllers required couplings between them through numerical derivations and integrations overtime. In this case, backwards high precision derivatives and Newton-Cotes integration were used. Although, the traditional PID also deploys three derivative orders, the use of them is by far different in their implementation. The proposed cascade model worked considerably stable, reliable and numerically precise. A main fundamental in the proposed method is that three loops are nested. The slowest loop establishes a current metric error distance toward a global vector reference. Then, the second and third nested control loops establish local reference velocity and acceleration, both corresponding to the actual incremental distance. The three loops are conditioned to recursively reduce errors up to a numerical precision value by means of successive approximations.

Four Wheeled Humanoid Second-Order …

481

Fig. 10 Controlled Cartesian position along a trajectory compounded of four global points

For instance, Fig. 10 shows a nonlinear Cartesian trajectory comprised of four main linear local segments with different slopes each. The first cascade loop basically controls the local displacements in between couples of global points. Figure 11 shows four peaks that represent the frontier of each global point reached. The controller locally starts metric displacement after getting each goal point, and approaches the next one by means of successive approximations producing nonlinear rates of motion. The second inner looped cascade controls the robot’s Cartesian velocities w.r.t. time as shown in Fig. 12. At each goal reached the first-order derivative shows local minimal or maximal with magnitudes depending on the speeds due to the loop reference values (local predictions as references). where the last Trajectory’s point reached is the global control’s reference. In this case, the velocity references are local values to be fitted, or also known as the predicted values for the next local loop calculation at t + 1, such as u˙ t+1 or Φ˙ t+1 . As the velocity control loop is inside the metric displacement loop, the velocity is controlled by a set of loops, only for a segment of velocity, the one that is being calculated in the current displacement’s loop. The third inner control loop that manages the second-order derivative produces the results shown in Fig. 13. This control loop is the fastest iterative process, which exerts the higher sampling frequency. In this case, the acceleration references are local values to be fitted, or the predicted values for the next local loop calculation, such as u¨ t+1 or Φ¨ t+1 . As the acceleration control loop is inside the velocity loop, the

482

Fig. 11 Controlled displacement performance

Fig. 12 Controlled linear velocity

A. A. Torres-Martínez et al.

Four Wheeled Humanoid Second-Order …

483

Fig. 13 Controlled acceleration

acceleration is controlled by a set of loops, only for a segment of velocity, the one that is being calculated in the current velocity’s loop. Finally, the observers that provided feedback were stated as a couple of single expression representing a feasible model of sensor fusion (summarized by Proposition 4). The robot’s angular motion (angle and yaw velocity) combined wheels motion and inertial movements into a compounded observer model. The in-loop transition between numerical derivatives worked highly reliably. The multiple inner control cascades approach resulted numerically accurate, working online considerably fast. Although, this type of cascade controller has the advantage that input, reference and state vectors and matrices can easily be augmented without any alteration to the algorithm, but if compared with PID controller in terms of speed, the latter is faster due to less computational complexity.

7 Conclusions The natural kinematic constraints of a genuinely omnidirectional platform always produce straight paths. It is an advantage because it facilitates its displacement. In strictly necessary situations, to generate deliberately curved or even discontinuous displacements, an omnidirectional platform traverses it through linearization of infinitesimal segments.

484

A. A. Torres-Martínez et al.

The proposed cascade control model was derived directly from the physics of the robotic mechanism. In such an approach, the kinematic model was obtained as an independent cascade and established as a proportional control type with a constant gain represented by the robot’s geometric parameters. The gain or convergence factor resulted in a non-square stationary matrix (MIMO). Unlike a PID controller, the inverse analytic solution was formulated to obtain a system of linear differential equations. In its solution, definite integration produced a recursive controller, which converged to a solution by successive approximations of the feedback error. The strategy proposed in this work focuses on connecting all the higher order derivatives of the system in nested forms (cascades), unlike a PID controller which is described as a linear summation of all derivative orders. Likewise, a cascade approach does not need gain adjustment. The lowest order derivative was organized in the outer cascade. Being the loop with the slowest control cycle frequency and containing the global control references (desired positions). Further, the intermediate cascade is a derivative with the next higher order, and is governed by a local speed reference. That is, this cascade controls the speed during the displacement segment that has projected the cycle of the external cascade. Finally, the acceleration cascade cycle is the fastest loop and controls the portions of acceleration during a small interval of displacement along the trajectory towards the next Cartesian goal. The proposed research evidenced a good performance, showing controlled limits of disturbances due to the three controllers acting over a same portion of motion. The controller was robust and the precision error ε allowed to adjust the accuracy of the robot goal closeness. Acknowledgements The corresponding author acknowledges the support of Laboratorio de Robótica. The third and fourth authors acknowledge the support of the Kazan Federal University Strategic Academic Leadership Program (‘PRIORITY-2030’).

References 1. Mutalib, M. A. A., & Azlan, N. Z. (2020). Prototype development of mecanum wheels mobile robot: A review. Applied Research and Smart Technology, 1(2), 71–82, ARSTech. https://doi. org/10.23917/arstech.v1i2.39. 2. Yadav P. S., Agrawal V., Mohanta J. C., & Ahmed F. (2022) A theoretical review of mobile robot locomotion based on mecanum wheels. Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy, 9(2), Evergreen. 3. Palacín, J., Clotet, E., Martínez, D., Martínez, D., & Moreno, J. (2019). Extending the application of an assistant personal Robot as a Walk-Helper Tool. Robotics, 8(27), MDPI. https:// doi.org/10.3390/robotics8020027. 4. Cooper S., Di Fava A., Vivas C., Marchionni L., & Ferro F. (2020). ARI: The social assistive robot and companion. In 29th IEEE International Conferences on Robot and Human Interactive Communication, Naples Italy, August 31–September 4. https://doi.org/10.1109/ROMAN47096.2020.9223470.

Four Wheeled Humanoid Second-Order …

485

5. Li, Y., Dai, S., Zheng, Y., Tian, F., & Yan, X. (2018). Modeling and kinematics simulation of a mecanum wheel platform in RecurDyn. Journal of Robotics Hindawi. https://doi.org/10.1155/ 2018/9373580 6. Rohrig, C., Hes, D., & Kunemund, F. (2017). Motion controller design for a mecanum wheeled mobile manipulator. In 2017 IEEE Conferences on Control Technology and Applications, USA (pp. 444–449), August 27–30. https://doi.org/10.1109/ccta.2017.8062502. 7. Park J., Koh D., Kim J., & Kim C. (2021). Vibration reduction control of omnidirectional mobile robot with lift mechanism. In 21st International Conferences on Control, Automation and Systems. https://doi.org/10.23919/ICCAS52745.2021.9649932. 8. Belmonte, Á., Ramón, J. L., Pomares, J., Garcia, G. J., & Jara, C. A. (2019). Optimal imagebased guidance of mobile manipulators using direct visual servoing. Electronics, 8(374). https:// doi.org/10.3390/electronics8040374. 9. Yang, F., Shi, Z., Ye, S., Qian, J., Wang, W., & Xuan D. (2022). VaRSM: Versatile autonomous racquet sports machine. In ACM/IEEE 13th International Conferences on Cyber-Physical Systems, Milano Italy, May 4–6. https://doi.org/10.1109/ICCPS54341.2022.00025. 10. Eirale A., Martini M., Tagliavini L., Gandini D., Chiaberge M., & Quaglia G. (2022). Marvin: an innovative omni-directional robotic assistant for domestic environments. arXiv:2112.05597, https://doi.org/10.48550/arXiv.2112.05597. 11. Qian J., Zi B., Wang D., Ma Y., & Zhang D. (2017). The design and development of an omnidirectional mobile robot oriented to an intelligent manufacturing system. Sensors, 17 (2073). https://doi.org/10.3390/s17092073. 12. Zalevsky, A., Osipov, O., & Meshcheryakov, R. (2017). Tracking of warehouses robots based on the omnidirectional wheels. In International Conferences on Interactive Collaborative Robotics (pp. 268–274). Springer. https://doi.org/10.1007/978-3-319-66471-2_29. 13. Rauniyar A., Upreti H. C., Mishra A., & Sethuramalingam P. (2021). MeWBots: MecanumWheeled robots for collaborative manipulation in an obstacle-clustered environment without communication. J. of Intelligent & Robotic Systems, 102(1). https://doi.org/10.1007/s10846021-01359-5. 14. Zhou, J., Wang, J., He, J., Gao, J., Yang, A., & Hu, S. (2022). A reconfigurable modular vehicle control strategy based on an improved artificial potential field. Electronics, 11(16), 2539. https://doi.org/10.3390/electronics11162539 15. Tanioka, T. (2019). Nursing and rehabilitative care of the elderly using humanoid robot. The Journal of Medical Investigation, 66,. https://doi.org/10.2152/jmi.66.19 16. Shepherd, S., & Buchstab, A. (2014). KUKA Robots On-Site. In W. McGee and M. Ponce de Leon M (Eds.), Robotic Fabrication in Architecture, Art and Design (pp. 373–380). Cham: Springer. https://doi.org/10.1007/978-3-319-04663-1_26. 17. Taheri, H., & Zhao, C. X. (2020). Omnidirectional mobile robots, mechanisms and navigation approaches. Mechanism and Machine Theory, 153(103958), Elsevier. https://doi.org/10.1016/ j.mechmachtheory.2020.103958. 18. Slimane Tich Tich, A., Inel, F., & Carbone, G. (2022). Realization and control of a mecanum wheeled robot based on a kinematic model. In V. Niola, A. Gasparetto, G. Quaglia & G. Carbone (Eds.), Advances in Italian Mechanism Science, IFToMM Italy, Mechanisms and Machine Science (Vol. 122). Cham: Springer. https://doi.org/10.1007/978-3-031-10776-4_77. 19. Thai, N. H., Ly, T. T. K., & Dzung, L. Q. (2022). Trajectory tracking control for mecanum wheel mobile robot by time-varying parameter PID controller. Bulletin of Electrical Engineering and Informatics, 11(4), 1902–1910. https://doi.org/10.11591/eei.v11i4.3712 20. Han K., Kim H., & Lee J. S. (2010). The sources of position errors of omni-directional mobile robot with mecanum wheel. In IEEE International Conferences on Systems, Man and Cybernetics, October 10–13, Istanbul, Turkey (pp. 581–586). https://doi.org/10.1109/ICSMC.2010. 5642009. 21. Palacín J., Rubies E., & Clotet E. (2022). Systematic odometry error evaluation and correction in a human-sized three-wheeled omnidirectional mobile robot using flower-shaped calibration trajectories. Applied Sciences, 12(5), 2606, MDPI. https://doi.org/10.3390/app12052606.

486

A. A. Torres-Martínez et al.

22. Cavacece, M., Lanni, C., & Figliolini, G. (2022). Mechatronic design and experimentation of a mecanum four wheeled mobile robot. In: V. Niola, A. Gasparetto, G. Quaglia & G. Carbone G. (Eds.) Advances in Italian Mechanism Science. IFToMM Italy 2022. Mechanisms and Machine Science (Vol. 122). Cham: Springer. https://doi.org/10.1007/978-3-031-10776-4_93. 23. Lin, P., Liu, D., Yang, D., Zou, Q., Du, Y., & Cong, M. (2019). Calibration for odometry of omnidirectional mobile robots based on kinematic correction. In IEEE 14th International Conferences on Computer Science & Education, August 19–21, Toronto, Canada (pp. 139– 144). https://doi.org/10.1109/iccse.2019.8845402. 24. Maddahi, Y., Maddahi, A., & Sepehri, N. (2013). Calibration of omnidirectional wheeled mobile robots: Method and experiments. In Robotica (Vol. 31, pp. 969–980). Cambridge University Press. https://doi.org/10.1017/S0263574713000210. 25. Ma’arif, I. A., Raharja, N. M., Supangkat, G., Arofiati, F., Sekhar, R., & Rijalusalam, D.U. (2021). PID-based with odometry for trajectory tracking control on four-wheel omnidirectional Covid-19 aromatherapy robot. Emerging Science Journal, 5. SI “COVID-19: Emerging Research”. https://doi.org/10.28991/esj-2021-SPER-13. 26. Li, Y., Ge, S., Dai, S., Zhao, L., Yan, X., Zheng, Y., & Shi, Y. (2020). Kinematic modeling of a combined system of multiple mecanum-wheeled robots with velocity compensation. Sensors, 20(75), MDPI. https://doi.org/10.3390/s20010075. 27. Savaee E., & Hanzaki A. R. (2021). A new algorithm for calibration of an omni-directional wheeled mobile robot based on effective kinematic parameters estimation. Journal of Intelligent & Robotic Systems, 101(28), Springer. https://doi.org/10.1007/s10846-020-01296-9. 28. Khoygani, M. R. R., Ghasemi, R., & Ghayoomi, P. (2021). Robust observer-based control of nonlinear multi-omnidirectional wheeled robot systems via high order sliding-mode consensus protocol. International Journal of Automation and Computing, 18, 787–801, Springer, https:// doi.org/10.1007/s11633-020-1254-z. 29. Almasri, E., & Uyguro˘glu, M. K. (2021). Modeling and trajectory planning optimization for the symmetrical multiwheeled omnidirectional mobile robot. Symmetry, 13(1033), MDPI. https:// doi.org/10.3390/sym13061033. 30. Rijalusalam, D.U., & Iswanto, I. (2021). Implementation kinematics modeling and odometry of four omni wheel mobile robot on the trajectory planning and motion control based microcontroller. Journal of Robotics and Control, 2(5). https://doi.org/10.18196/jrc.25121. 31. Alshorman, A. M., Alshorman, O., Irfan, M., Glowacz, A., Muhammad, F., & Caesarendra, W. (2020). Fuzzy-Based fault-tolerant control for omnidirectional mobile robot. Machines, 8(3), 55, MDPI. https://doi.org/10.3390/machines8030055. 32. Szeremeta, M., & Szuster, M. (2022). Neural tracking control of a four-wheeled mobile robot with mecanum wheels. Applied Science, 2022(12), 5322, MDPI. https://doi.org/10.3390/ app12115322. 33. Vlantis, P., Bechlioulis, C. P., Karras, G., Fourlas, G., & Kyriakopoulos, K. J. (2016). Fault tolerant control for omni-directional mobile platforms with 4 mecanum wheels. In IEEE International Conferences on Robotics and Automation (pp. 2394–2400). https://doi.org/10.1109/ icra.2016.7487389. 34. Wu, X., & Huang, Y. (2021). Adaptive fractional-order non-singular terminal sliding mode control based on fuzzy wavelet neural networks for omnidirectional mobile robot manipulator. ISA Transactions. https://doi.org/10.1016/j.isatra.2021.03.035 35. Pizá, R., Carbonell, V., Casanova, Á., Cuenca, J. J., & Salt L. (2022). Nonuniform dual-rate extended kalman-filter-based sensor fusion for path-following control of a holonomic mobile robot with four mecanum wheels. Applied Science, 2022(12), 3560, MDPI. https://doi.org/10. 3390/app12073560.