136 38 126MB
English Pages 676 [662] Year 2023
Fuchun Sun · Angelo Cangelosi · Jianwei Zhang · Yuanlong Yu · Huaping Liu · Bin Fang (Eds.)
Communications in Computer and Information Science
1787
Cognitive Systems and Information Processing 7th International Conference, ICCSIP 2022 Fuzhou, China, December 17–18, 2022 Revised Selected Papers
Communications in Computer and Information Science Editorial Board Members Joaquim Filipe , Polytechnic Institute of Setúbal, Setúbal, Portugal Ashish Ghosh , Indian Statistical Institute, Kolkata, India Raquel Oliveira Prates , Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil Lizhu Zhou, Tsinghua University, Beijing, China
1787
Rationale The CCIS series is devoted to the publication of proceedings of computer science conferences. Its aim is to efficiently disseminate original research results in informatics in printed and electronic form. While the focus is on publication of peer-reviewed full papers presenting mature work, inclusion of reviewed short papers reporting on work in progress is welcome, too. Besides globally relevant meetings with internationally representative program committees guaranteeing a strict peer-reviewing and paper selection process, conferences run by societies or of high regional or national relevance are also considered for publication. Topics The topical scope of CCIS spans the entire spectrum of informatics ranging from foundational topics in the theory of computing to information and communications science and technology and a broad variety of interdisciplinary application fields. Information for Volume Editors and Authors Publication in CCIS is free of charge. No royalties are paid, however, we offer registered conference participants temporary free access to the online version of the conference proceedings on SpringerLink (http://link.springer.com) by means of an http referrer from the conference website and/or a number of complimentary printed copies, as specified in the official acceptance email of the event. CCIS proceedings can be published in time for distribution at conferences or as postproceedings, and delivered in the form of printed books and/or electronically as USBs and/or e-content licenses for accessing proceedings at SpringerLink. Furthermore, CCIS proceedings are included in the CCIS electronic book series hosted in the SpringerLink digital library at http://link.springer.com/bookseries/7899. Conferences publishing in CCIS are allowed to use Online Conference Service (OCS) for managing the whole proceedings lifecycle (from submission and reviewing to preparing for publication) free of charge. Publication process The language of publication is exclusively English. Authors publishing in CCIS have to sign the Springer CCIS copyright transfer form, however, they are free to use their material published in CCIS for substantially changed, more elaborate subsequent publications elsewhere. For the preparation of the camera-ready papers/files, authors have to strictly adhere to the Springer CCIS Authors’ Instructions and are strongly encouraged to use the CCIS LaTeX style files or templates. Abstracting/Indexing CCIS is abstracted/indexed in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, Scopus. CCIS volumes are also submitted for the inclusion in ISI Proceedings. How to start To start the evaluation of your proposal for inclusion in the CCIS series, please send an e-mail to [email protected].
Fuchun Sun · Angelo Cangelosi · Jianwei Zhang · Yuanlong Yu · Huaping Liu · Bin Fang Editors
Cognitive Systems and Information Processing 7th International Conference, ICCSIP 2022 Fuzhou, China, December 17–18, 2022 Revised Selected Papers
Editors Fuchun Sun Tsinghua University Beijing, China
Angelo Cangelosi University of Manchester Manchester, UK
Jianwei Zhang Universität Hamburg Hamburg, Germany
Yuanlong Yu Fuzhou University Fuzhou, China
Huaping Liu Tsinghua University Beijing, China
Bin Fang Tsinghua University Beijing, China
ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-981-99-0616-1 ISBN 978-981-99-0617-8 (eBook) https://doi.org/10.1007/978-981-99-0617-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023, corrected publication 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This volume contains the papers from the Seventh International Conference on Cognitive Systems and Information Processing (ICCSIP 2022), which was held in Fuzhou, during December 17–18, 2022. ICCSIP is the prestigious biennial conference on Cognitive Systems and Information Processing with past events held in Beijing (2012, 2014, 2016, 2018), Zhuhai (2020) and Suzhou (2021). Over the past few years, ICCSIP has matured into a well-established series of international conferences on cognitive information processing and related fields. Similarly to the previous editions, ICCSIP 2022 provided an academic forum for the participants to share their new research findings and discuss emerging areas of research. It also established a stimulating environment for the participants to exchange ideas on future trends and opportunities in cognitive information processing research. Currently, cognitive systems and information processing are applied in an increasing number of research domains such as cognitive sciences and technology, visual cognition and computation, big data and intelligent information processing, and bioinformatics and applications. We believe that cognitive systems and information processing will certainly exhibit greater-than-ever advances in the near future. With the aim of promoting research and technical innovation in relevant fields domestically and internationally, the fundamental objective of ICCSIP is defined as providing a premier forum for researchers and practitioners from academia, industry, and government to share their ideas, research results, and experiences. ICCSIP 2022 received 121 submissions, all of which were written in English. After a thorough reviewing process with 3 reviews per paper, 47 papers were selected for presentation as full papers, resulting in an approximate acceptance rate of 39%. The accepted papers not only address challenging issues in various aspects of cognitive systems and information processing but also showcase contributions from related disciplines that illuminate the state of the art. In addition to the contributed papers, the ICCSIP 2022 technical program included six plenary speeches by Dewen Hu, Yongduan Song, Xin Xu, Xiaoli Li, Huaping Liu, and Li Wen. We would like to thank the members of the Advisory Committee for their guidance, the members of the International Program Committee and additional reviewers for reviewing the papers, and the members of the Publications Committee for checking the accepted papers in a short period of time. Last but not least, we would like to thank all the speakers, authors, and reviewers as well as the participants for their great contributions that made ICCSIP 2022 successful
vi
Preface
and all the hard work worthwhile. We also thank Springer for their trust and for publishing the proceedings of ICCSIP 2022. December 2022
Fuchun Sun Angelo Cangelosi Jianwei Zhang Yuanlong Yu Huaping Liu Bin Fang
Organization
Hosts Chinese Association for Artificial Intelligence, Chinese Association of Automation, IEEE Computational Intelligence Society
Organizers Cognitive Systems and Information Processing Society of Chinese Association for Artificial Intelligence, Cognitive Computing and Systems Society of Chinese Association of Automation, Tsinghua University, Gusu Laboratory of Material Science
Co-organizers Nanjing Tsingzhan Institute of Artificial Intelligence, China Center for Information Industry Development, Artificial Intelligence and Sensing Technology Institute (SIP) Co., Ltd.
Technical Sponsor NVIDIA-IM
Conference Committee Honorary Chairs Bo Zhang Nanning Zheng Deyi Li
Tsinghua University, China Xi’an Jiaotong University, China Chinese Association for Artificial Intelligence, China
viii
Organization
Advisory Committee Chairs Qionghai Dai Fuji Ren Shiming Hu
Tsinghua University, China University of Tokyo, Japan Tsinghua University, China
General Chairs Fuchun Sun Angelo Cangelosi Jianwei Zhang Yuanlong Yu
Tsinghua University, China University of Manchester, UK University of Hamburg, Germany Fuzhou University, China
Program Committee Chairs Dewen Hu Wenzhong Guo Stefan Wermter Huaping Liu
National University of Defense Technology, China Fuzhou University, China University of Hamburg, Germany Tsinghua University, China
Publication Chair Bin Fang
Tsinghua University, China
Program Committee Chenguang Yang Guang-Bin Huang Katharina Rohlfing Antonio Chella Yufei Hao Zhen Deng Jun Ren Chunfang Liu Changsheng Li Mingjie Dong Rui Huang Tian Liu
University of the West of England, UK Nanyang Technological University, Singapore University of Paderborn, Germany Università degli Studi di Palermo, Italy EPFL, Switzerland Fuzhou University, China Hubei University of Technology, China Beijing University of Technology, China Beijing Institute of Technology, China Beihang University, China University of Electronic Science and Technology of China, China Beijing Information Science and Technology University, China
Organization
Haiyuan Li Yong Cao Taogang Hou
Beijing University of Posts and Telecommunications, China Northwestern Polytechnical University, China Beijing Jiaotong University, China
ix
Contents
Award Multi-modal Ankle Muscle Strength Training Based on Torque and Angle Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mingjie Dong, Zeyu Wang, Ran Jiao, and Jianfeng Li T3SFNet: A Tuned Topological Temporal-Spatial Fusion Network for Motor Imagery with Rehabilitation Exoskeleton . . . . . . . . . . . . . . . . . . . . . . . . Kecheng Shi, Fengjun Mu, Chaobin Zou, Yizhe Qin, Zhinan Peng, Rui Huang, and Hong Cheng
3
16
Joint Trajectory Generation of Obstacle Avoidance in Tight Space for Robot Manipulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chunfang Liu, Jinbiao Zhang, Pan Yu, and Xiaoli Li
30
Manipulating Adaptive Analysis and Performance Test of a Negative-Pressure Actuated Adhesive Gripper . . . . . . . . . . . . . . . . . . . . . . . . . Jiejiang Su, Huimin Liu, Jing Cui, and Zhongyi Chu
45
Depth Control of a Biomimetic Manta Robot via Reinforcement Learning . . . . . Daili Zhang, Guang Pan, Yonghui Cao, Qiaogao Huang, and Yong Cao Inchworm-Gecko Inspired Robot with Adhesion State Detection and CPG Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zijian Zhang, Zhongyi Chu, Bolun Zhang, and Jing Cui Coherence Matrix Based Early Infantile Epileptic Encephalopathy Analysis with ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yaohui Chen, Xiaonan Cui, Runze Zheng, Yuanmeng Feng, Tiejia Jiang, Feng Gao, Danping Wang, and Jiuwen Cao
59
70
85
Constrained Canonical Correlation Analysis for fMRI Analysis Utilizing Experimental Paradigm Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Ming Li, Yan Zhang, Pengfei Tang, and Dewen Hu Algorithm BFAct: Out-of-Distribution Detection with Butterworth Filter Rectified Activations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Haojia Kong and Haoan Li
xii
Contents
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Yuhan Suo, Rui Wang, Senchun Chai, Runqi Chai, and Mengwei Su Mastering “Gongzhu” with Self-play Deep Reinforcement Learning . . . . . . . . . . 148 Licheng Wu, Qifei Wu, Hongming Zhong, and Xiali Li Improved Vanishing Gradient Problem for Deep Multi-layer Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Di Wang, Xia Liu, and Jingqiu Zhang Incremental Quaternion Random Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 174 Xiaonan Cui, Tianlei Wang, Hao Chen, Baiying Lei, Pierre-Paul Vidal, and Jiuwen Cao Application Question Answering on Agricultural Knowledge Graph Based on Multi-label Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Pengxuan Zhu, Yuan Yuan, Lei Chen, and Huarui Wu Dairy Cow Individual Identification System Based on Deep Learning . . . . . . . . . 209 Zhijun Li, Huai Zhang, Yufang Chen, Ying Wang, Jiacheng Zhang, Lingfeng Hu, Lichen Shu, and Lei Yang Automatic Packaging System Based on Machine Vision . . . . . . . . . . . . . . . . . . . . . 222 Chunfang Liu, Jiali Fang, and Pan Yu Meteorological and Hydrological Monitoring Technology Based on Wireless Sensor Network Model and Its Application . . . . . . . . . . . . . . . . . . . . . 234 Ni Wang, Zhongwen Guo, Jinxin Wang, and Suiping Qi A Review of Deep Reinforcement Learning Exploration Methods: Prospects and Challenges for Application to Robot Attitude Control Tasks . . . . . 247 Chao Li, Fengge Wu, and Junsuo Zhao AeroBotSim: A High-Photo-Fidelity Simulator for Heterogeneous Aerial Systems Under Physical Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Jianrui Du, Yingjun Fan, Kaidi Wang, Yuting Feng, and Yushu Yu Trailer Tag Hitch: An Automatic Reverse Hanging System Using Fiducial Markers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Dongxi Lu, Wei Yuan, Chaochun Lian, Yongchun Yao, Yan Cai, and Ming Yang
Contents
xiii
Anatomical and Vision-Guided Path Generation Method for Nasopharyngeal Swabs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Jing Luo, Wenbai Chen, Fuchun Sun, Junjie Ma, and Guocai Yao A Hierarchical Model for Dynamic Simulation of the Fault in Satellite Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Danni Nian, Jiawei Wang, and Sibo Zhang Manipulation and Control Design and Implementation of Autonomous Navigation System Based on Tracked Mobile Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Hui Li, Junhan Cui, Yifan Ma, Jiawei Tan, Xiaolei Cao, Chunlong Yin, and Zhihong Jiang Towards Flying Carpet: Dynamics Modeling, and Differential-Flatness-Based Control and Planning . . . . . . . . . . . . . . . . . . . . . . 351 Jiali Sun, Yushu Yu, and Bin Xu Region Clustering for Mobile Robot Autonomous Exploration in Unknown Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Haoping Zheng, Liwei Zhang, and Meng Chen Human Intention Understanding and Trajectory Planning Based on Multi-modal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Chunfang Liu, Xiaoyue Cao, and Xiaoli Li Robotic Arm Movement Primitives Assembly Planning Method Based on BT and DMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 Meng Liu, Wenbo Zhu, Lufeng Luo, Qinghua Lu, Weichang Yeh, Yunzhi Zhang, and Qingwu Shi Center-of-Mass-Based Regrasping of Unknown Objects Using Reinforcement Learning and Tactile Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Renpeng Wang, Yu Xie, Xinya Zhang, Jiangtao Xiao, Houde Liu, and Wei Zhou Alongshore Circumnavigating Control of a Manta Robot Based on Fuzzy Control and an Obstacle Avoidance Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Beida Yang, Yong Cao, Yu Xie, Yonghui Cao, and Guang Pan Robot Calligraphy Based on Footprint Model and Brush Trajectory Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Guang Yan, Dongmei Guo, and Huasong Min
xiv
Contents
Hierarchical Knowledge Representation of Complex Tasks Based on Dynamic Motion Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Shengyi Miao, Daming Zhong, Runqing Miao, Fuchun Sun, Zhenkun Wen, Haiming Huang, Xiaodong Zhang, and Na Wang Saturation Function and Rule Library-Based Control Strategy for Obstacle Avoidance of Robot Manta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Yu Xie, Shumin Ma, Yue He, Yonghui Cao, Yong Cao, and Qiaogao Huang Perception-Aware Motion Control of Multiple Aerial Vehicle Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Mingfei Jiang, Mingshuo Zuo, Xinming Yu, Rong Guo, Ruixi Wang, and Yushu Yu NSGA-II Optimization-Based CPG Phase Transition Control Method of Manta Ray Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Shumin Ma, Yu Xie, Yingzhuo Cao, Yue He, Yonghui Cao, Yong Cao, and Qiaogao Huang Hardware High Resolution Multi-indicator MIM Nano-Sensor Based on Aperture-Coupled Asymmetric Square Resonator . . . . . . . . . . . . . . . . . . . . . . . 503 Congzhi Yu, Naijing Lv, Lan Wei, Yifan Zhang, and Xiongce Lv Rigid-Flexible Coupled Soft Gripper with Treble Modular Fingers . . . . . . . . . . . . 512 Shijie Tang, Haiming Huang, Wenqing Chai, Di’en Wu, and Qinghua Lu Ecofriendly Soft Material-Based Sensors Capable of Monitoring Health . . . . . . . 521 Nana Kwame Ofotsu, Xinyu Zhu, and Rui Chen Teleoperation of a Dexterous Hand Using a Wearable Hand . . . . . . . . . . . . . . . . . . 532 Hongze Yu, Haiyuan Li, and Yan Wang Vision A Cuboid Volume Measuring Method Based on a Single RGB Image . . . . . . . . . 547 Xingyu Ding, Jianhua Shan, Ding Zhang, Yuhao Sun, and Lei Zhao Surface Defect Detection of Electronic Components Based on FaSB R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Zihao Zheng, Wenjing Zhao, Haodong Wang, and Xinying Xu
Contents
xv
A LED Module Number Detection for LED Screen Calibration . . . . . . . . . . . . . . 570 Yang Zhang, Zhuang Ma, Yimin Zhou, Lihong Zhao, Yong Wang, and Liqiang Wang Estimating the Pose of Irregular Surface Contact Based on Multi-colliders Collision Information and Nonlinear Neural Network in Virtual Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Ruize Sun and Yongjia Zhao Vision-Tactile Fusion Based Detection of Deformation and Slippage of Deformable Objects During Grasping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Wenjun Ruan, Wenbo Zhu, Kai Wang, Qinghua Lu, Weichang Yeh, Lufeng Luo, Caihong Su, and Quan Wang A MobileNet Based Model for Tongue Shape Classification . . . . . . . . . . . . . . . . . 605 Shasha Wang, Ruijuan Zheng, Lin Wang, and Mingchuan Zhang A 3D Point Cloud Object Detection Algorithm Based on MSCS-Pointpillars . . . 617 Zengfeng Song, Yang Gao, and Honggang Luan Polar Grid Based Point Cloud Ground Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 632 Jiyang Zhou and Liwei Zhang High-Precision Localization of Mobile Robot Based on Particle Filters Combined with Triangle Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644 Huaidong Zhou, Wanchen Tuo, and Wusheng Chou Correction to: A LED Module Number Detection for LED Screen Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yang Zhang, Zhuang Ma, Yimin Zhou, Lihong Zhao, Yong Wang, and Liqiang Wang
C1
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653
Award
Multi-modal Ankle Muscle Strength Training Based on Torque and Angle Sensors Mingjie Dong, Zeyu Wang, Ran Jiao, and Jianfeng Li(B) Beijing Key Laboratory of Advanced Manufacturing Technology, Faculty of Materials and Manufacturing, Beijing University of Technology, Beijing 100124, People’s Republic of China [email protected]
Abstract. With the aggravation of aging, a series of ankle injuries, including muscle weakness symptoms, caused by stroke or other reasons have made the problem of ankle rehabilitation increasingly prominent. Muscle strength training is one of the main rehabilitation methods for the ankle joint complex (AJC). Based on the current incomplete development of muscle strength training modes for all forms of ankle movement, this study developed six muscle strength training modes for the human ankle on our parallel ankle rehabilitation robot, namely continuous passive motion (CPM), isotonic exercise, isometric exercise, isokinetic exercise, centripetal exercise and centrifugal exercise, based on position inverse solution of the robot combining with the admittance control or position control. The dorsiflexion (DO) movement was used as an example to analyze the training effect of each training mode, with the results showing good function performance of the developed ankle muscle strength training methods. Keywords: Ankle muscle strength training · Multi-modal · Ankle rehabilitation robot · Admittance control
1 Introduction With the aggravation of aging, a series of ankle injuries, including muscle weakness symptoms, caused by stroke or other reasons have made the problem of ankle rehabilitation increasingly prominent [1–5]. Ankle rehabilitation mainly includes rehabilitation training and muscle strength training; Rehabilitation training focuses on the process of regeneration and repair of the central nervous system [6–8]; By comparison, muscle strength training is mainly aimed at the needs of muscle strengthening and symptoms such as muscle weakness caused by diseases, the muscle groups can be improved, the muscle atrophy can be reduced and the muscle strength level can be improved [9, 10]. In addition, muscle strength training programs are also recommended for muscle function recovery in clinical applications [11, 12]. The commonly used muscle strength training mode is the isokinetic exercise. To achieve muscle strength training, many muscle training systems have been developed to improve the motor function of the relevant joints in the human body [10, 13–17]. Also, many muscle strength training devices have been developed for providing specific © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 3–15, 2023. https://doi.org/10.1007/978-981-99-0617-8_1
4
M. Dong et al.
tasks, human-machine interaction and assessment analysis for ankle joint [18–20]. During the initial stage of ankle muscle strength training, the patient’s ankle is unable to move on its own, so continuous passive motion (CPM) is used to gradually improve muscle strength and restore the ankle’s range of motion [21–23]. Further, the patient’s muscle strength level can be improved through active exercise when the patient has a certain level of muscle strength. To enhance the suppleness and interactivity of active exercise, a force/torque sensor system is incorporated, which also ensures safety during training. Currently, many muscle strength training methods have been developed, such as passive training, isotonic exercises [9, 21, 24], et al. However, the development of muscle strength training in the three degrees of freedom (DOFs) of the ankle joint at this stage is relatively small and one-sided, and cannot meet the plyometric needs of different groups for different training periods. Our previous work proposed different compliant rehabilitation training strategies, including isokinetic muscle strength training based on the position inverse solution and admittance-controller, and isotonic exercise, et al. [9, 21]. Based on the current incomplete development of muscle strength training for ankle joint, this paper developed six muscle strength training modes by using the 2-UPS/RRR ankle rehabilitation robot [21], namely CPM, isotonic exercise, isometric exercise, isokinetic exercise, centripetal exercise and centrifugal exercise. The ankle joint complex (AJC) has 3 DOFs, dorsiflexion/plantarflexion (DO/PL), eversion/inversion (EV/IN), and abduction/adduction (AB/AD). The proposed training strategies can meet the muscle strength demands of the human ankle joint in all directions of freedom from the beginning to the end of training. The DO was used as an example to analyze the training effect of each training mode based on the experiments. The remainder of this paper is organized as follows. Section 2 introduces the developed muscle strength training strategies. Section 3 presents the experiments and results of the developed six muscle strength training modes in the direction of DO movement. Conclusion and future works are presented in Sect. 4.
2 Muscle Strength Training Strategies 2.1 Control Strategies of Robot-Assisted Ankle Muscle Strength Training The development six robot-assisted ankle muscle strength training methods are realized by using the admittance controller, the position inverse solution and the velocity Jacobian matrix, with the training strategies shown in Fig. 1, where, d˙ 1 , d˙ 2 and d˙ 3 denote running speed of each motor; z13 and z23 denote the corresponding direction vectors; Q denotes coefficient matrix. Inside, the CPM combines with position inverse solution controller and velocity Jacobian matrix controller for exercise muscle strength at the early stage; Isotonic exercise combines with admittance controller and position inverse solution controller to meet the demand of resistance during the training; Isometric exercise realizes the training function of a specific position through a position inverse solution controller; Isokinetic exercise achieves the goal of keeping the ankle training speed constant on the basis of resisting the resistance demand by combining the position inverse solution, admittance controller and velocity Jacobian matrix. Centripetal and centrifugal exercise
Multi-modal Ankle Muscle Strength Training
5
are based on isotonic exercise and isokinetic exercise to analyze and train the stretching and contracting states of specific muscles.
Continuous passive motion
Output target location
PC
Set training range Set operating speed Position judgment of moving platform
Velocity Jacobian matrix Drive real-time position
Position inverse solution
Isotonic exercise AJC PC
Set operating Set training speed position Admittance controller Set admittance parameters M, B, K
Centrifugal exercise AJC PC Tint Centrifugal exercise for Isokinetic Isotonic specific muscles exercise training Isokinetic exercise
Isometric exercise PC
Centripetal exercise AJC PC Tint Centripetal exercise for Isokinetic Isotonic specific muscles exercise training
Position inverse solution
Position inverse solution
Tint
Admittance controller θ (t )
Tint Tint Ms 2 + Bs + K
PC
Admittance θ (t ) controller
θ (t )
Velocity Jacobian matrix
Position inverse solution
θ (t )
AJC
Tint
Position inverse solution
Velocity Jacobian matrix ⎡ ⎤ Speed of ⎢ d1 ⎥ ⎡ z ⎤ Training ω ⎢⎢d ⎥⎥ = ⎢⎢ z ⎥⎥ω each ⎢ 2⎥ ⎢ ⎥ motor evaluation ⎢ ⎥ ⎣⎢ 0 0 1⎦⎥ ⎢θ 3 ⎥ •
T
Inverse position solution
Location mode
•
13 T
23
•
⎣
⎦
Fig. 1. Control strategy diagram of six muscle strength training modes.
2.2 Principles of the Six Muscle Strength Training Modes 2.2.1 Continuous Passive Motion The CPM is the passive exercise of the limbs through the equipment to increase the range of motion. This mode can drive the ankle joint back and forth at a fixed speed. It is used for early stage of ankle muscle strength training and plays an important role in the recovery of the range of motion and preliminary muscle strength. At the same time, extremely low-speed passive activity can overcome stretch reflex. The control strategy is shown in Fig. 2. Before training, the moving platform of the parallel ankle rehabilitation robot is in the horizontal position; the training range (Pmax ) and speed of passive exercise (ω) are set by the host computer, and the ankle movement mode is selected. CPM takes the training speed set by the host as one of the inputs of the velocity Jacobian matrix. During the training process, the movement angle of the ankle is finally mapped to the movement position of each joint motor through the position inverse solution. At the same time, during the training process, it is judged whether the upper limit of the training range is reached or not by constantly detecting the position of the joint motor (P), and the current angle of the motion platform is fed back as another input of the velocity Jacobian matrix. If the upper limit of the training range set at the beginning has been reached, the return motion will be performed, otherwise it will continue to run towards the upper limit of the training range set, thus achieving reciprocal motion.
6
M. Dong et al.
Position inverse solution
Each motor position
2-UPS / RRR robot
Position mode
Output target location
Speed of each motor
The target position Y is horizontal The target position is the maximum value
P > Pm a x
N Y
⎡ • ⎤ T ⎢ d1 ⎥ ⎡ ⎢ • ⎥ ⎢ z13 ⎢d ⎥ = ⎢ z T 23 2 ⎢ • ⎥ ⎢ ⎢ ⎥ ⎣⎢0 0 ⎢θ 3 ⎥ ⎣ ⎦
⎤ ⎥ ⎥ω 1⎥⎥ ⎦
Ankle movement speed
Set operating speed Set training range
PC
Drive real-time position
P < Pm a x
Fig. 2. Control strategy chart of CPM.
The position inverse solution and the velocity Jacobian matrix used in CPM can be found in our previous work [25]. 2.2.2 Isotonic Exercise Isotonic contraction refers to muscle contraction with constant tension and varying length, to realize various acceleration and displacement movements of human body. When the applied load is large, the muscle takes a long time to overcome the load, and the contraction speed is slow. When the muscle tension reaches the maximum, the muscle contraction speed is zero, and then the isotonic contraction occurs. In this exercise mode, when the patient’s muscle strength remains stable, the degree of muscle extension is basically unchanged. When the patient recovers a certain muscle strength, isotonic exercise can strengthen the patient’s muscle, make the patient get rid of the completely passive state, stimulate the patient’s active movement consciousness, and improve the rehabilitation efficiency. In this mode, the admittance controller is used to resist the active motion of the patient by continuously detecting the interaction torque and using it as an input to obtain the output angle θ(t). The final position of each joint motor motion is obtained by using the output angle as the input to the position inverse solution. At the same time, the difficulty of training can be changed by changing the admittance parameter. Admittance control used in isotonic exercise can be equivalent to the massspring-damping system. Equation (1) is obtained according to Newton’s second law and Laplace transformation [21]. 1 K 1 X (s) K · M = = F(s) Ms2 + Bs + K s2 + MB s +
K M
(1)
where, M denotes mass, B denotes damping coefficient, K denotes stiffness coefficient. θ(t) is the output of the admittance controller for isotonic exercise, which can also be found from our previous work [21]. The control strategy is shown in Fig. 3.
Multi-modal Ankle Muscle Strength Training Set operating speed
Position inverse Each motor solution position Ankle position
PC Set admittance parameters M, B, K
Tint 2 Ms + Bs + K
7
Position mode AJC Tint
2-UPS / RRR robot
Fig. 3. Isotonic exercise control strategy.
2.2.3 Isometric Exercise Isometric exercise refers to the muscle contraction without shortening the muscle fibers, and the muscle tension increases while the muscle length and joint angle remain unchanged, that is, the angle is constant and the resistance changes. Isometric exercise can effectively increase muscle strength and reduce joint exudation. Isometric contraction refers to muscle contraction with constant length and varying tension. The isometric exercise runs at a fixed speed to the set end position by setting the end angle position of the exercise, and constantly detects the interaction torque. The control strategy is shown in Fig. 4. Interaction torque
PC
2-UPS / RRR robot
Set operating speed
Set training position
Position mode Each motor position Position inverse solution
Fig. 4. Isometric exercise control strategy chart.
Before training, the ankle rehabilitation robot is kept level in the initial position. The initial operating speed of the robot and the position angle to be trained can be set on the host computer, and then the training position angle is mapped to the operating space of each joint servo motor based on the position inverse solution. After the ankle rehabilitation robot runs to the exercise position at the initial set running speed, the interaction torque of the patient is continuously detected by the six-axis torque sensor. In this state of contraction, muscle tension can be increased to a maximum. Although the displacement does not exist, physically speaking, the muscle does not perform external work, but it still needs to consume a lot of energy. 2.2.4 Isokinetic Exercise Isokinetic refers to the movement at a fixed speed (constant angular velocity), and the patient must use the maximum force to resist the resistance. The speed and angle are constant, and the resistance varies with the patient’s application of force. Isokinetic
8
M. Dong et al.
exercise is achieved by admittance controller, position inverse solution and velocity Jacobian matrix. By setting the ideal ankle movement speed, the interaction torque is constantly detected during the exercise process, and the real-time output angle is obtained by admittance control, the angle and the speed are used as the input of velocity Jacobian control, and the speed of each joint motor at different exercise angles is obtained to ensure the constant ankle speed. The control strategy is shown in Fig. 5.
Set admittance parameters M, B, K
T int 2 Ms + Bs + K
Ankle position
Position inverse solution Each motor position
Ankle position
Position mode
PC •
Set operating speed
Ankle movement speed
⎡ ⎤ ⎢ d1 ⎥ ⎡ ⎢ • ⎥ ⎢ ⎢d ⎥ = ⎢ ⎢ •2 ⎥ ⎢ ⎢ ⎥ ⎣⎢0 ⎢θ 3 ⎥ ⎣ ⎦
⎤ ⎥ ⎥ω 23 0 1⎥⎥ ⎦
z z
T
13 T
2-UPS / RRR robot
Tint
AJC
Fig. 5. Isokinetic exercise control strategy chart.
An initial training assessment was performed prior to training to determine the maximum applied torque for the subject, and the training speed (ω) will be determined based on this, which is called speed segmentation processing. During the training process, according to the real-time detected interaction torque, the movement angle of the moving platform of the robot is obtained through the admittance controller, and then ω is obtained by speed segmentation. Together, they are used as the input of the velocity Jacobian matrix control to obtain the running speed of the equipment motor, and finally they are mapped to the running results of each joint drive through the position inverse solution, to achieve the goal of isokinetic exercise at the ankle joint. 2.2.5 Centripetal and Centrifugal Exercise Centripetal contraction refers to the contraction state of muscles when, for example, the weight is lifted upwards during weight lifting exercise and the length of muscle fibers is shortened. Centripetal exercise can be divided into isotonic exercise mode and isokinetic exercise mode. Centrifugal contraction is a kind of contraction in which the muscle is stretched while producing tension, and the muscle is gradually stretched under resistance, so that the movement link moves in the opposite direction to the muscle tension. Centrifugal contraction is a kind of dynamic contraction, also known as retreating contraction, and it can also be divided into isotonic exercise mode and isokinetic exercise mode. The control strategy is shown in Fig. 6.
Multi-modal Ankle Muscle Strength Training
Centrifugal training Centrifugal training for specific muscles
Centripetal training Centripetal training for specific muscles
2-UPS / RRR robot
Ankle position
Isokinetic exercise mode
Centrifugal Centripetal training training Centripetal training Centrifugal training for specific muscles for specific muscles Isotonic exercise mode
Each motor Position inverse position Position mode solution
Position mode
Speed of each motor
Tin t M s2 + B s+ K
Set admittance parameters M, B, K
T int AJC
Each motor position Set operating speed
Ankle position ⎡ .⎤ ⎡ T ⎤ ⎢ d. 1 ⎥ ⎢ z13 ⎥ T ⎢d 2 ⎥ = ⎢ z 23 ⎥ω ⎢⋅⎥ ⎢ ⎢ θ 3 ⎥ ⎣0 0 1⎥⎦ ⎣ ⎦
9
Position inverse solution Ankle position
PC Ankle movement Set operating speed speed
Set admittance parameters M, B, K
Tin t M s2 + B s+ K
Fig. 6. Isotonic centripetal/centrifugal exercise, Isokinetic concentric/centrifugal exercise.
Take the DO of the ankle as an example. During the training process, tibialis anterior (TA) is a kind of eccentric exercise, while lateral gastrocnemius (LG), medial gastrocnemius (MG) and soleus (SO) are centripetal exercise. Based on isotonic exercise and isokinetic exercise, the training forms of muscles are analyzed, and the training states of each muscle in different exercise modes are obtained.
3 Experimental Scheme and Results 3.1 Experimental Scheme One healthy male experimenter volunteered for the experiment. Prior to the experiment, the experimenter’s left ankle was immobilized on the powered platform of the 2-UPS/RRR ankle rehabilitation robot. The ankle joint was driven by the rehabilitation robot to achieve six exercise muscle exercise modes in six movement directions DO/PL, EV/IN, AB/AD. The experimenter rested for five minutes between each plyometric mode to avoid muscle fatigue. During the ankle training period, we recorded the interaction moments and angles of motion of the subject and presented the experimental results with DO direction motion as an example. The initial parameters of the admittance controller were determined as M = 1, B = 0.8 and K = 1 based on experimental tests and experience. Before training, set the threshold of the device in the six directions of ankle movement to avoid secondary injuries caused by exceeding the range of motion of the human ankle during the operation of the device. The set angle thresholds were shown in Table 1.
10
M. Dong et al. Table 1. Motion angle threshold
Axis
X
Y
Z
Direction of motion
DO
PL
EV
IN
AB
AD
Threshold (°)
20
37
10
15
16
22
3.2 Experimental Result 3.2.1 Continuous Passive Motion The range of motion set in the DO direction of motion during CPM is 13° and the training speed is 50 rpm. During the experiment, no active force was applied by the subject, and the ankle rehabilitation robot drove the experimenter’s ankle for continuous passive training. The experimental results of the DO motion direction at the ankle site of the subject are shown in Fig. 7. 15
Angle (°)
Actual trajectory
10 5 0
0.18
0
10
20
30
40
50
60
Torque (Nm)
Measured torque
0.16 0.14 0.12 0
10
20
30
40
50
60
Time(s)
Fig. 7. Experiment results of continuous passive motion in DO.
According to the experimental results, it was concluded that during the CPM, the experimenter was driven by the ankle of the device and generated interaction torque, but the level of the torque was low and was suitable for the initial arousal process of the experimenter’s muscle strength and increasing the range of motion. 3.2.2 Isotonic Exercise After setting the training parameters, the isotonic exercise started to open the interactive torque into the guide channel for exercise. In this rehabilitation training mode, the physician can adjust the parameters of the admittance controller M, B and K to meet the desired resistance. The results of this experiment are shown in Fig. 8.
Multi-modal Ankle Muscle Strength Training
Angle (°)
Theoretical trajectory
11
Actual trajectory
20
10
0
Torque (Nm)
0
20
40
60
80
100
Measured torque
1 0.5 0 -0.5 0
20
40
60
80
100
Time(s)
Fig. 8. Experiment results of isotonic exercise in DO.
According to the experimental results, we can see that during the training process, the experimenter applied a torque in the direction of DO and the level of torque could reach the set level of admittance resistance so that the device got the theoretical angle. After the equipment was sensed by the admittance controller, a corresponding offset was made in the direction of the torque with respect to the initial position (horizontal position), which could be used to strengthen the recovered muscle strength. 3.2.3 Isometric Exercise The initial position was chosen for isometric exercise to facilitate the experimenter’s perception of force generation. The experimenter’s muscle length was kept constant while the tension was constantly changing to reduce joint exudation. The results of this experiment are shown in Fig. 9. According to the experimental results, it can be concluded that the subject performed multiple cycles of training, the equipment platform didn’t move during the force. The experimenter continuously applied torque, the tension changed without muscle length change, which was conducive to maintaining muscle strength and low fatigue. 3.2.4 Isokinetic Exercise The subject’s training speed was obtained by exercise assessment prior to exercise, and the channel of the moment to the admittance controller was opened. And the velocity Jacobi matrix controller and the position inverse solution were trained and the training speed of the ankle was kept constant. The results of this experiment are shown in Fig. 10. According to the experimental results, the experimenter applied a torque in the direction of DO motion during the training process, and the torque level could reach the set level of the admittance resistance so that the device got the theoretical angle. After the equipment was sensed by the admittance controller and the velocity Jacobi matrix
12
M. Dong et al. 10
Angle (°)
Theoretical trajectory Actual trajectory
5 0 -5
0
25
50
75
100
125
150
2
Torque (Nm)
Measured torque
1.5 1 0.5 0 0
25
50
75
100
125
150
Time(s)
Fig. 9. Experiment results of isometric exercise in DO.
Theoretical trajectory
Actual trajectory
Angle (°)
20
10
0
Torque (Nm)
0
50
100
150
200
250
300
350
Measured torque
1 0.5 0 0
50
100
150
200
250
300
350
Time(s)
Fig. 10. Experiment results of isokinetic exercise in DO.
controller, it made a corresponding equal velocity offset in the direction of the torque relative to the initial position (horizontal position), while the experimenter’s torque level and the sustained high torque application capability. 3.2.5 Centripetal and Centrifugal Exercise Centripetal and centrifugal exercise are available in isotonic exercise and isokinetic exercise respectively. For muscles, TA is centrifugal training and LG, MG and SO are centripetal training under the training behavior of DO. Therefore, to a certain extent, it could be assumed that the other three muscles were trained centripetally during the
Multi-modal Ankle Muscle Strength Training
13
centrifugal training of the TA. The results of this experiment are shown in Fig. 11 and Fig. 12. Theoretical trajectory
Actual trajectory
Angle (°)
20
10
0 0
20
40
60
80 Measured torque
1
Torque (Nm)
100
0.5 0 0
20
40
60
80
100
Time(s)
Fig. 11. TA – isotonic centrifugal/LG, MG and SO – isotonic centripetal Theoretical trajectory
Angle (°)
20
Actual trajectory
10 0 0
10
20
30
40
50
60
70
1
Torque (Nm)
80
90
Measured torque
0.5 0 -0.5 0
10
20
30
40
50
60
70
80
90
Time(s)
Fig. 12. TA – isokinetic centrifugal/LG, MG and SO – isokinetic centripetal
According to the experimental results, the division of centripetal and centrifugal exercise modes based on isotonic exercise and isometric exercise was highly targeted for muscle training. In contrast, isometric exercise had advantages in terms of moment and moment sustained output, and muscle training was more effective.
4 Discussion and Future Work In this work, we developed six muscle strength training modes for the ankle joint, namely CPM, isotonic exercise, isometric exercise, isokinetic exercise, centripetal exercise and
14
M. Dong et al.
centrifugal exercise. And the DO was used as an example to analyze the training effect of each training mode. According to the experimental results, in CPM, the level of the torque was low and suitable for the initial arousal process of the experimenter’s muscle strength and increasing the range of motion; In isometric exercise, the subject applied torque periodically, so that the muscle tension changed while the length remained the same; In isotonic exercise and isokinetic exercise, the subject applied torque in the direction of DO and the level of torque could reach the set level of resistance, and a corresponding offset was made in the direction of the torque with respect to the initial position, which could be used to strengthen the recovered muscle strength; In isokinetic exercise, the velocity Jacobi matrix controller allowed the subject’s ankle joint to run at a constant speed; Centripetal and centrifugal exercise were analyzed for specific muscles on the basis of isotonic exercise and isometric exercise, and experimental results showed that muscle strength training can be used to strengthen recovered muscles. The developed muscle strength training methods can meet the muscle strength demands of the full phase of the ankle joint. Future work will evaluate the effect of muscle-specific training by combining surface EMG (sEMG). We will evaluate the training effect of the selected muscles by preconditioning the collected sEMG and extracting features of the muscle activation level, combining force and actual trajectory. And later we will perform real-time variable resistance training based on the evaluation and detection of sEMG. Acknowledgements. This research was supported in part by the National Natural Science Foundation of China under Grant numbers 61903011 and 52175001, and in part by the General Program of Science and Technology Development Project of Beijing Municipal Education Commission under Grant number KM202010005021.
References 1. Lotfian, M., Noroozi, S., Dadashi, F., Kharazi, M.R., Mirbagheri, M.M.: Therapeutic effects of robotic rehabilitation on neural and muscular abnormalities associated with the spastic ankle in stroke survivors. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 3860–3863. Montreal, QC, Canada (2020) 2. Lu, Z. et al.: Development of a three freedoms ankle rehabilitation robot for ankle training. In: TENCON 2015 – 2015 IEEE Region 10 Conference, pp. 1–5. Macao, China (2015) 3. Ren, Y., Yi-Ning, W., Yang, C.-Y., Tao, X., Harvey, R.L., Zhang, L.-Q.: Developing a wearable ankle rehabilitation robotic device for in-bed acute stroke rehabilitation. IEEE Trans. Neural Syst. Rehabil. Eng. 25(6), 589–596 (2017). https://doi.org/10.1109/TNSRE.2016.2584003 4. Bai, Y., Li, F., Zhao, J., Li, J., Jin, F., Gao, X.: A powered ankle-foot orthoses for ankle rehabilitation. In: 2012 IEEE International Conference on Automation and Logistics, pp. 288– 293. Zhengzhou, China (2012) 5. Takahashi, K., Lewek, M., Sawicki, G.: A neuromechanics-based powered ankle exoskeleton to assist walking post-stroke: a feasibility study. J NeuroEng. Rehabil. 12, 23 (2015) 6. Ma, L., Deng, Q., Dong, L., Tang, Y., Fan, L.: Effect of rehabilitation training on the recovery of hemiplegic limb in patients with cerebral infarction. Indian J. Pharm. Sci. 83(2), 30–35 (2021)
Multi-modal Ankle Muscle Strength Training
15
7. Wang, H., et al.: The reorganization of resting-state brain networks associated with motor imagery training in chronic stroke patients. IEEE Trans. Neural Syst. Rehabil. Eng. 27(10), 2237–2245 (2019) 8. Dong, M., et al.: State of the art in parallel ankle rehabilitation robot: a systematic review. J. Neuroeng. Rehabil. 18(1), 52 (2021) 9. Li, J., Zhou, Y., Dong, M., Rong, X.: Isokinetic muscle strength training strategy of an ankle rehabilitation robot based on adaptive gain and cascade PID control. IEEE Trans. Cognitive Dev. Syst. (2022). https://doi.org/10.1109/TCDS.2022.3145998 10. Li, J., Zhang, P., Cao, Q., Jiang, L., Dong, M.: Configuration synthesis and structure design of a reconfigurable robot for muscle strength training. In: 2021 IEEE International Conference on Real-time Computing and Robotics (RCAR), pp. 341–346. Xining, China (2021) 11. Tole, G., Raymond, M.J., Williams, G., Clark, R.A., Holland, A.E.: Strength training to improve walking after stroke: how physiotherapist, patient and workplace factors influence exercise prescription. Physiother. Theory Pract. 38(9), 1198–1206 (2022) 12. Cho, J., Lee, W., Shin, J., Kim, H.: Effects of bi-axial ankle strengthening on muscle cocontraction during gait in chronic stroke patients: a randomized controlled pilot study. Gait Posture 87, 177–183 (2021) 13. Ma, H.: Research on promotion of lower limb movement function recovery after stroke by using lower limb rehabilitation robot in combination with constant velocity muscle strength training. In: 7th International Symposium on Mechatronics and Industrial Informatics (ISMII), pp. 70–73. IEEE, Zhuhai, China (2021) 14. Zhang, X., Bi, X., Shao, J., Sun, D., Zhang, C., Liu, Z.: Curative effects on muscle function and proprioception in patients with chronic lumbar disk herniation using isokinetic trunk muscle strength training. Int. J. Clin. Exp. Med. 12(4), 4311–4320 (2019) 15. Hu, B., Su, Y., Zou, H., Sun, T., Yang, J., Yu, H.: Disturbance rejection speed control based on linear extended state observer for isokinetic muscle strength training system. IEEE Trans. Autom. Sci. Eng. (2022). https://doi.org/10.1109/TASE.2022.3190210 16. Fischer, H., et al.: Use of a portable assistive glove to facilitate rehabilitation in stroke survivors with severe hand impairment. IEEE Trans. Neural Syst. Rehabil. Eng. 24(3), 344–351 (2016) 17. Khor, K., et al.: Portable and reconfigurable wrist robot improves hand function for post-stroke subjects. IEEE Trans. Neural Syst. Rehabil. Eng. 25(10), 1864–1873 (2017) 18. Hou, Z., et al.: Characteristics and predictors of muscle strength deficit in mechanical ankle instability. BMC Musculoskelet. Disord. 21, 730 (2020) 19. Zhai, X., et al.: Effects of robot-aided rehabilitation on the ankle joint properties and balance function in stroke survivors: a randomized controlled trial. Front Neurol. 12, 719305 (2021) 20. Zhang, C., et al.: Intensive in-bed sensorimotor rehabilitation of early subacute stroke survivors with severe hemiplegia using a wearable robot. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 2252–2259 (2021) 21. Dong, M., et al.: A new ankle robotic system enabling whole-stage compliance rehabilitation training. IEEE/ASME Trans. Mechatron. 26(3), 1490–1500 (2021) 22. Li, J., Fan, W., Dong, M., et al.: Implementation of passive compliance training on a parallel ankle rehabilitation robot to enhance safety. Ind. Robot 47(5), 747–755 (2020) 23. Zhang, M., McDaid, A., Veale, A.J., Peng, Y., Xie, S.Q.: Adaptive trajectory tracking control of a parallel ankle rehabilitation robot with joint-space force distribution. IEEE Access 7, 85812–85820 (2019) 24. Zhang, L., et al.: Design and workspace analysis of a parallel ankle rehabilitation robot (PARR). J. Healthcare Eng 2019, 7345780 (2019) 25. Li, J., et al.: Mechanical design and performance analysis of a novel parallel robot for ankle rehabilitation. ASME J. Mech. Robot. 12(5), 051007 (2020)
T3SFNet: A Tuned Topological Temporal-Spatial Fusion Network for Motor Imagery with Rehabilitation Exoskeleton Kecheng Shi1,2 , Fengjun Mu1,2 , Chaobin Zou1,2 , Yizhe Qin1,2 , Zhinan Peng1,2 , Rui Huang1,2(B) , and Hong Cheng1,2 1
Center for Robotics, School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China [email protected] 2 School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu, China http://www.uestcrobot.net/ Abstract. In recent years, motor imagery-based brain computer interfaces (MI-BCI) combined with exoskeleton robot has proved to be a promising method for spinal cord injury (SCI) rehabilitation training. The core of BCI is to achieve a high accurate movement prediction based on patient’s MI. The inconsistent response frequency of MI in different trials and subjects leads to the limited performance accuracy of MI movement prediction method for the single subject. The individual differences in the activation patterns of MI brain regions bring a greater challenge to the generalization ability of the method. According to the MI mechanism, this paper proposes a graph-based tuned topological temporal-spatial fusion network (T3SFNet) for MI electroencephalography (MI-EEG) limb movement prediction. The proposed method designs a learnable EEG tuning mechanism to fuse and enhance the subject’s MI response band data, and then uses a channel node-based graph convolutional network and a temporal-spatial fusion convolutional network to extract the topological features and spatiotemporal coupling features of the fused band data respectively. We evaluate the proposed approach on two MI datasets and show that our method outperforms state-of-the-art methods in both within-subject and cross-subject situations. Furthermore, our method shows surprising results on the small-sample migration test, reaching the prediction baseline with only 5% of the data sample size. Ablation experiments of the model demonstrate the effectiveness and necessity of the proposed framework. Keywords: Spinal cord injury · Motor imagery-based brain computer interface · Movement prediction · Tuned topological temporal-spatial fusion network · Graph convolutional network
c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 16–29, 2023. https://doi.org/10.1007/978-981-99-0617-8_2
T3SFNet
1
17
Introduction
Every year, about 250,000–500,000 people worldwide suffer a spinal cord injury (SCI) [1]. SCI will take severe impairments of the sensory, motor, and autonomic functions below the lesion level to patients, which makes them unable to take care of themselves and places a heavy burden on their body, mind, and family, The functional recovery of patients has always been a rehabilitation problem [2]. In recent years, the methods of combining brain computer interface and exoskeleton robot have been applied in the rehabilitation training of spinal cord injury, which have made gratifying progress [3–5]. BCI can obtain the movement intention of spinal cord injury patients through biological signals like electroencephalography (EEG) and then converts the intention into exoskeleton control instructions. MI-BCI is one of the most important noninvasive BCI paradigms. It has an important role in SCI patients because patients can activate the same brain regions during motor imagery as during limb movement [6,7]. In MI-BCI, the main topic is motor imagery-based EEG (MI-EEG) classification method, and the core goal is to achieve accurate and stable patient limb movement prediction. Machine learning methods like filter bank common spatial patterns (FBCSP) [8] and wavelet packet decomposition-common spatial patterns (WPD-CSP) [9] are commonly used in MI-EEG classification because they can better catch the event-related desynchronization (ERD) and eventrelated synchronization (ERS) induced by motor imagery [10]. However, due to the limited information on hand-crafted features, the performance of machine learning methods cannot meet the needs of practical applications. Achieving high performance is still the challenge of MI-BCI studies. Compared with machine learning methods, deep learning methods can automatically extract the underlying EEG features of patients from raw data or minimally processed data. Considerable effort has been devoted to developing MIEEG classification method using deep learning techniques and achieved promising results [11–13]. [13] combined filter bank and channel attention mechanism, and achieved 79.4% four-class movement prediction accuracy on the MI dataset BCI Competition IV 2a. Multiple brain regions are activated simultaneously and communicate with each other during MI. It is difficult for convolutional neural networks (CNN) based on Euclidean distance to extract such non-Euclidean EEG structure topological features [14]. Recently, graph convolution network (GCN) prevailed in the deep learning community [15–17], and got a better limb movement prediction performance than deep learning method [18,19], considering their strong representation performance on non-Euclidean structure. Reference [18] used five functional connectivity indicators to construct different adjacency matrices, divided EEG electrode channels according to the adjacency matrix, and then combined GCN and CNN to extract functional topological and temporal features of EEG signals. It achieved an accuracy of 83.26% in the fine movement prediction task of four parts of the upper limb. Although GCN have been successful in MI-EEG movement prediction, there are still some problems: 1) Existing work ignores the characteristic that MI only responds in a specific frequency band and the response frequency is inconsistent
18
K. Shi et al.
across subjects [20]. As a result, the features extracted by the GCN model contain much useless information, and the accuracy of the model’s limb movement prediction needs to be improved. 2) Different subjects have different brain activation patterns during MI [14], making it difficult for existing GCN methods to generalize to new subjects. It severely limits the application of MI-BCI in SCI rehabilitation. In response to these problems, combined with the MI mechanism of the brain, this paper proposes a Tuned Topological Temporal-Spatial Fusion Network (T3SFNet) to achieve limb movement prediction in patients with SCI. The proposed T3SFNet first utilizes a learnable EEG tuner to enhance the frequency of movement-related signals with a weighted fusion. The fused EEG data are then represented by spatial embeddings using the EEG node channel topology and a two-layer GCN to extract information directly related to MI movements. Finally, a temporal-spatial convolutional network is used to extract the spatiotemporal fused features of the EEG signal from the embedded representation to realize the prediction of the MI movements. The main contributions are as follows: – Combined with the MI mechanism of the brain, T3SFNet is proposed to realize limb movement prediction in patients with SCI. The network uses a learnable tuner to enhance the frequency of movement-related signals and utilizes GCN and temporal-spatial fusion convolutional networks to achieve topological spatio-temporal fused features of EEG signals. – The limb movement prediction performance and generalization performance of T3SFNet are verified on the public dataset BCI Competition IV 2a and the self-built dataset LLM-BCImotion and compared with five mainstream methods. In addition, model ablation experiments are used to test the effectiveness of each module of the proposed framework.
2
Method
This part introduces the proposed T3SFNet MI-EEG movement prediction framework. Section 2.1 describes the overall framework of the proposed method. Section 2.2 describes the mathematical models and expressions of T3SFNet’s component module. 2.1
Overview
Figure 1 visualizes the proposed T3SFNet movement prediction framework. The entire model architecture consists of three parts. The Signal Band Tuning Module to reconstruct the original EEG signal on frequency level. The Topological Feature Extraction Module uses GCN modules for utilizing the connections between brain regions through MI. The Spatio-Temporal Fusion Module employees heterogeneous feature extractor for further generating spatio-temporal fused feature, which is densely coupled with movement prediction tasks.
T3SFNet
19
Fig. 1. The proposed T3SFNet movement prediction framework.
The whole framework predicts the patients limb movement of a set of EEG signal fragments, which is collected while subjects performing MI. The preset movements contain Nm classes. The movement prediction method firstly maps the input EEG signals into the confidence of each movement types, and further make the movement decision. 2.2
Module Description
Signal Band Tuning Module uses a filter bank group to split the original EEG signal into five EEG filtered signals with finer frequency bands, and uses a learnable weighting strategy for hierarchical tuning to achieve the purpose of enhancing the frequency of movement-related signals. MI only responds in a specific frequency band, which means that the movement-related EEG signals usually only exist in the specific frequency band. However, the activated frequency bands various among different subjects, and the significant deviation harms the correlation between the extracted features and the movement prediction task. Assuming the original EEG signal from the subjects is X ori (size Ce × L, where Ce donates number of signal channels acquired from electrodes, and L means the length of signal slices) by using EEG acquisition devices. By using a filter bank group, X ori is transformed into X f (size Cf × Ce × L), which consists Cf different frequency channels without overlapping bands. After that, we wonder whether these individual frequency bands contributes to the movement prediction task. We weighted the Cf divided signal clips: X tuned =
Cf
wi · Xif ,
(1)
i=1
where W = wi is the learnable weights for each bands, which is used to tune the salience of the output EEG signal. By dividing and re-tuning the original EEG signal, the frequency band distribution of each subjects can be learned,
20
K. Shi et al.
which is used to enhance the movement-related signal strength, making it easier to activate the neurons of the subsequent neural network. After all, we slice X turned into Cclips with a sliding window (length is Lclips ), and re-organize them to X clips (size Cclips × Ce × L). Topological Feature Extraction Module mainly contains a two-layer graph convolutional network structure. Its main function is to utilize the topological relationship among the EEG signals from Ce individual electrode channels to get the spatial embeddings of X clips . Because multiple areas of the brain are activated simultaneously and communicate with each other during MI, information related to the MI task is propagated through the topological relationships of different brain areas. Based on the extracted topological connections through EEG activation regions, we employ GCN to construct the cerebral cortex’s activation-response model of MI, and the resulting spatial embeddings are the MI information propagated to each EEG electrode channel. GCN requires as input an explicit graph structure representing the topological relationship between different EEG electrode channels. The graph structure is constructed based on large-scale EEG data sets X . We firstly calculate the mean of the correlation coefficient matrix for all samples: Mcorr =
1 · Fcorr (X ori ), N ori X
(2)
∈X
where N is the slices number of original EEG signals X ori , and the function Fcorr calculates the correlation coefficient matrix on the Ce channels. The graph structure A is determined at the raw data level according to the following formula: (3) A = {aij }, i, j ∈ [1, ..., Ce ] 0, |mij | < avg(|Mcorr |) aij = (4) 1, |mij | > avg(|Mcorr |) where mij is the correlation coefficient of the ith and jth electrode channels, and |·| represents the absolute value sign. After the initialization of graph, the tuned EEG signal X clips is feed into the corresponding nodes. Spatial embeddings are propagated and dynamically updated along the brain activation model as undirected edges to extract MI information containing in every electrode channel. We input the tuned EEG signal X clips and the graph structure A into the GCN, and define our graph as: G = (V, E),
(5)
V is the set of nodes in the graph, Vi ∈ V represents a node, defined herein as an electrode channel. (Vi , Vj ) ∈ E represent the edges between nodes, that is, the connection relation between electrode channels. The goal is to learn the spatial embeddings of input EEG signal X clips on the graph structure A (typically an adjacency matrix A). The output of GCN is defined as O, an C × F feature
T3SFNet
21
matrix, where F is node-level embedding. Each layer of GCN can be written as an function: H (l+1) = f (H (l) , A)
(6)
where H (l+1) denotes hidden layer which H (0) = X and H (L) = O, L is the number of GCN layers. GCN use self-loop and symmetric normalization as tricks in propagation rule, such as: ˆ − 12 H (l) W (l) , ˆ − 12 AˆD (7) f (H (l) , A) = σ D ˆ is the diagonal node degree matrix of Aˆ (Aˆ = A + I), I is the identity where D matrix, and σ(·) is a non-linear activation function. After GCN, we can obtain the spatial embeddings X embbeded (size C clips × F ) Spatio-Temporal Fusion Module perform two individual feature extractors to obtain the fused features. First, we use lightweight EEGNet [11] to extract spatial features for every window of X embbeded . The operation provides a direct way to learn spatial filters for different timing information. Then, we employ LSTM module on Cclips channel to extract different timing and spatial features. In this module, X embbeded is transformed as follows: Xispatio = NEEGN et (Xiembbeded ), i ∈ [1, ..., Cclips ],
(8)
X f inal = NLST M (X spatio ).
(9)
Finally, we use the fully connected network to classify the features and obtain the predicted movements.
3
Experiment and Result
In this part, the performance in MI-EEG movement prediction tasks of T3SFNet framework is evaluated on two benchmark datasets BCI Competition IV 2a dataset [21] (BCICIV2a will be used uniformly in the following text) and LLM-BCImotion dataset [22]. Section 3.1 presents the datasets used to evaluate the performance. The machine learning and deep learning MI-EEG movement prediction methods compared with T3SFNet and hyperparameter settings are described in Sect. 3.2. Section 3.3 evaluates and analyzes the strengths and weaknesses of T3SFNet on different datasets. 3.1
Dataset Description and Processing
BCICIV2a is the most widely used MI dataset. It contains EEG signals of 22 nodes recorded with nine healthy subjects and two sessions on two different days. Each session consists of 288 four-second trials of MI per subject (imagining the movement of the left hand, the right hand, the feet, and the tongue), and
22
K. Shi et al.
each trial is a 3-s duration with one movement. The signals are sampled 250 Hz and bandpass-filters between 0.5 Hz and 100 Hz by the dataset provider before release. For each subject, there are 576 trials (288 trials × 2 sessions) for training and testing. LLM-BCImotion is a dataset specially constructed for applying MI-BCI in SCI rehabilitation exoskeleton robots. The acquisition of the entire dataset follows an experimental paradigm of asynchronous acquisition of EEG and electromyography signals [23]. All subjects performed MI tasks with a duration of 3s, and then immediately performed the corresponding movements. This paper only uses the EEG signal data collected during the subject’s motor imagery task. The BCImotion dataset comprises 10 healthy subjects executing standing, sitting, and walking imagery. The EEG data is collected using Waveguard-Original instrumentation with 32 EEG nodes and a 1000 Hz sampling rate. For each subject, there are around 90 trials with a roughly balanced ratio in the standing, sitting, and walking MI. Dataset Processing. To ensure consistent EEG signal lengths for each subject per trial in both datasets, we downsampled the LLM-BCImotion dataset 250 Hz, which means that samples from all datasets contain 750 time points (250 Hz × 3 s). We then normalized the EEG signals in both datasets to reduce the effect of noise. Considering the timeliness of the practical application of MI-BCI, we sliced each sample in both datasets with a sliding window size of 400 (time points) and a moving step size of 50 (time points). After data processing, each subject in the BCICIV2a dataset contains 720 samples (90 trials × 8), and the LLM-BCImotion dataset contains 2304 samples per subject. 3.2
Compared Methods and Implementation Details
To evaluate the performance of the proposed method for MI-EEG limb movement prediction in this paper, while showing the superior structure of the proposed framework, the most recent state-of-the-art methods whose implementation code is available online are selected for comparison. We first compare with the traditional machine learning method FBCSP [8], as FBCSP is the most widely used method in MI-BCI. The method uses filter banks to extract spatial features of EEG signals in different frequency bands, and then uses the support vector machine based on gaussian radial basis function kernel (RBFSVM) and K-nearest neighbor (KNN) classifier to classify the features. Then the proposed model is compared with the recently published deep learning method EEGNet [11], which encapsulates well-known EEG feature extraction concepts for BCI to construct a uniform approach for different BCI paradigms. A further comparison with the channel synergy-based network (MCSNet) [23] method is performed. MCSNet couples temporal and spatial features at different frequencies, and introduces a channel attention mechanism to select important spatiotemporal coupling features, which achieves good results in EMG signal classification.
T3SFNet
23
The RGCN method reported in [15] is also used for comparison, which dynamically updates the graph structure by the distance between EEG nodes and is the latest GCN method applied in BCI. All methods are compared in both within-subject and cross-subject situations, and we use the prediction accuracy of the method as the observation index to test the limb motion prediction performance of the above method. Repeatedmeasures analysis of variance (ANOVA) is used to test the results statistically (using the number of subjects and the classification method as factors, and the prediction accuracy as the response variable). In the within-subject case, the data for model training and testing come from the same subject, and the performance of different methods in the within-subject case is the most intuitive indicator to measure their pros and cons. On the BCICIV2a and LLM-BCImotion datasets, we divided all samples into the training set and the test set according to the ratio of 8:2 (see Table 1 for the specific number). The performance of different methods in the cross-subject situation reflects the generalization ability of the method to new subjects, which is a key indicator for the practical application of MI-BCI. We randomly selected the data of five subjects as the training set and the data of two subjects as the test set respectively in the two datasets. The whole process is repeated 5 times, producing five different folds. Table 1. Number of train set and test set in within-subject and cross-subject situations for BCICIV2a and LLM-BCImotion datasets. BCICIV2a Train set
Test set
LLM-BCImotion Train set Test set
Withinsubject (per subject)
3680 samples
928 samples
576 samples
Cross-subject
23040 samples 9216 samples 3600 samples 1440 samples
144 samples
The preprocessing method of the dataset is given in Sect. 3.1. We use the classification learner toolbox of MATLAB2021a to train and test the machine learning-based MI-EEG limb movement prediction model. For deep learning and GCN MI-EEG prediction model, we make use of the Pytorch framework for a GPU-based implementation using matrix multiplications, The stochastic gradient descent with Adam update rule is used to minimize the cross-entropy loss function. The network parameters are optimized with a learning rate of 10−4 , the train epoch is set 1000. The structural parameters of the model and the settings of other hyperparameters can be found in https://github.com/mufengjun260/ T3SFNet.
24
3.3
K. Shi et al.
Experimental Results
Within-Subject Classification. We compare the performance of T3SFNet with other 5 mainstream methods (FBCSP-SVM, FBCSP-KNN, EEGNet, MCSNet and RGCN) on the BCICIV2a and LLM-BCImotion datasets, and the results of all methods in the within-subject case are shown in Fig. 2. It can be intuitively seen that our proposed method’s limb movement prediction accuracy is significantly better than other methods on all datasets. Among them, T3SFNet can achieve an average movement prediction accuracy of 81.42% in the case of the BCICIV2a four-category dataset, and can reach 76.11% in the case of the LLM-BCImotion three-category dataset. An interesting phenomenon is that the FBCSP method outperforms the deep learning method on the BCICIV2a dataset. The possible reason is that FBCSP uses a filter bank similar to T3SFNet to extract the most relevant band data features for MI, which illustrates the effectiveness of the learnable EEG tuning module. RGCN achieves the lowest average prediction accuracy on both datasets. Combined with the framework structure of RGCN and T3SFNet, this shows that topological features alone cannot effectively extract MI information. Table 2 and 3 shows the movement prediction accuracy of different methods on each subject. It can be found that T3SFNet can obtain a higher movement prediction accuracy on each subject. It shows that the T3SFNet framework designed based on the MI mechanism can effectively and accurately extract the feature information most relevant to MI.
Fig. 2. Within-subject movement prediction performance on the BCICIV2a and LLMBCImotion dataset.
Cross-Subject Classification. Figure 3 shows the limb movement prediction performance of different methods in the cross-subject situation. It can be clearly seen that the generalization ability of the machine learning-based MI-EEG movement prediction method to new subjects is basically zero due to the different activation patterns of brain regions during MI in different subjects. Our proposed T3SFNet method achieves the best performance among all methods, and its movement prediction accuracy for new subjects is only 50%, which does not meet the requirements of practical applications.
T3SFNet
25
Table 2. Within-subject movement prediction performance on the BCICIV2a dataset of per subject. (Test with prediction accuracy). Subject ID
SVM
KNN
EEGNet MCSNet RGCN
Ours
S1 S2 S3 S4 S5 S6 S7 S8 S9
64.93% 39.41% 72.31% 33.42% 30.51% 29.34% 53.60% 53.69% 43.27%
58.16% 37.28% 66.71% 29.64% 28.39% 30.82% 48.22% 46.79% 38.54%
31.90% 33.62% 40.52% 31.90% 41.38% 41.38% 60.34% 32.76% 61.21%
74.14% 85.34% 94.83% 57.76% 84.48% 86.21% 84.48% 89.66% 75.86%
Average accuracy
46.72% 42.73% 41.67% 43.77% 33.05% 81.42%
Standard deviation 14.49
12.59
10.89
44.83% 52.59% 44.83% 41.38% 45.69% 39.66% 37.07% 44.83% 43.10% 4.12
42.15% 26.67% 34.86% 31.11% 31.11% 32.22% 29.93% 31.81% 37.57% 4.30
10.27
Table 3. Within-subject movement prediction performance on the LLM-BCImotion dataset of per subject. Subject ID
SVM
KNN
EEGNet MCSNet RGCN
Ours
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
51.85% 62.50% 70.83% 45.37% 49.07% 54.17% 64.35% 55.56% 55.09% 57.41%
55.56% 61.11% 63.89% 48.61% 47.69% 56.48% 61.11% 61.11% 50.46% 53.24%
50.00% 66.67% 72.22% 61.11% 61.11% 77.78% 83.33% 66.67% 72.22% 61.11%
77.78% 83.33% 77.77% 77.78% 66.67% 77.78% 83.33% 77.78% 72.22% 66.67%
Average accuracy
56.62% 55.93% 67.22% 70.56% 51.33% 76.11%
Standard deviation 7.16
5.50
9.11
72.22% 72.22% 88.89% 50.00% 77.78% 66.67% 77.78% 61.11% 66.67% 72.22% 9.95
53.33% 51.11% 51.11% 46.67% 47.78% 55.56% 54.44% 46.67% 52.22% 54.44% 3.13
5.58
The model’s parameters contains implicit knowledge related to movement prediction through the training process on existing data, which could be helpful for improving the performance under limited training data from new subject. We performed a few-shot model transfer experiment on the proposed method on the BCICIV2a dataset. We randomly select 5 subjects as the library of existing data. After the training process, we obtained the pre-trained model. After that, we employ some different presets to complete the test on new subjects. Finally, we observe the results of the model on the test set, as shown in Table 4.
26
K. Shi et al.
Fig. 3. Cross-subject movement prediction performance on the BCICIV2a and LLMBCImotion dataset, averaged over all folds. Table 4. Test result of the few-shot model transfer experiment. Type
Accuracy (%)
Pre-training
51.41%
Directly test with 80% data 41.00% Train from zero
25.60%
Transfer with 5% data
53.65%
Transfer with 10% data
56.26%
Transfer with 20% data
60.30%
Firstly, according to the preset division method, we take 20% data from the new subject as train set, and process training process from zero. Due to the limited scale of training set, the model cannot learn the motion intents of the subject, and only reached meaningless accuracy of 25.59%. After that, we finetune the parameters based on the pre-trained model trained from existing 5 subjects and reached decent accuracy of 68.33%, which outperforms the crosssubject accuracy 40.99%. In addition, for scenarios where it is difficult to collect subjects’ EEG data for initial training of the model, we further reduce the size of the training data at 5% and 10%. Under such harsh conditions, our method still achieves practically valuable. Model Ablation Experiment. In order to verify the effectiveness of each structure of the proposed method, we conduct model ablation experiments on T3SFNet, and the experimental results prove that each layer structure we propose has a significant impact on the movement prediction of T3SFNet, see Table 5 for details.
T3SFNet
27
Table 5. Test results of the model ablation experiment. Ablation module
Model ablation test
Without signal band tuning module
45.21%
Without topological feature extraction module 55.27% Without EEGNet module
4
51.72%
Without LSTM module
68.96%
The compeleted model
81.42%
Conclusion
This paper presents an end-to-end movement prediction model named T3SFNet, which could recognize subject’s motion intents. The proposed approach firstly dividing and reorganizing the input EEG signal to enhance mission-relevant signal bands. The graph structure is then built based on brain’s activation relationship, and it is used to extract the topological features. Finally, we extracted the spatial features using EEGNet and feed them into LSTM module for better resolving the EEG’s propagation in temporal dimension. The experimental results prove the effectiveness of our method, T3SFNet obtained better accuracy within and cross subjects than other compared methods. The additional transfer experiments also provide a new solution for BCI under limited data. Acknowledgement. This work was supported by the National Key Research and Development Program of China (No. 2018AAA0102504), the National Natural Science Foundation of China (NSFC) (No. 62003073, No. 62103084, No. 62203089), and the Sichuan Science and Technology Program (No. 2021YFG0184, No. 2020YFSY0012, No. 2022NSFSC0890), the Medico-Engineering Cooperation Funds from UESTC (No. ZYGX2021YGLH003, No. ZYGX2022YGRH003), and the China Postdoctoral Science Foundation Program (No. 2021M700695).
References 1. World Health Organization. Spinal Cord Injury, 384 (2013) 2. Chen, X., Chen, D., Chen, C., et al.: The epidemiology and disease burden of traumatic spinal cord injury in China: a systematic review. Chin. J. Evid. Based Med. 18(2), 143–150 (2018) 3. Samejima, S., Khorasani, A., Ranganathan, V., et al.: Brain-computer-spinal interface restores upper limb function after spinal cord injury. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 1233–1242 (2021) 4. Davis, K., Meschede-Krasa, B., Cajigas, I., et al.: Design-development of an athome modular brain-computer interface (BCI) platform in a case study of cervical spinal cord injury. J. Neuroeng. Rehabil. 19(1), 114 (2022) 5. Zulauf-Czaja, A., Al-Taleb, M., Purcell, M., et al.: On the way home: a BCI-FES hand therapy self-managed by sub-acute SCI participants and their caregivers: a usability study. J. Neuroeng. Rehabil. 18(1), 118 (2021)
28
K. Shi et al.
6. Burianov, H., Marstaller, L., Rich, A., et al.: Motor neuroplasticity: a MEG-fMRI study of motor imagery and execution in healthy ageing. Neuropsychologia 146, 107539 (2022) 7. Zhou, L., Zhu, Q., Wu, B., et al.: A comparison of directed functional connectivity among fist-related brain activities during movement imagery, movement execution, and movement observation. Brain Res. 1777, 147769 (2022) 8. Ang, K., Chin, Z., Zhang, H., et al.: Filter bank common spatial pattern (FBCSP) in brain-computer interface. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 2390– 2397. IEEE (2008) 9. Park, H., Kim, J., Min, B., et al.: Motor imagery EEG classification with optimal subset of aavelet based common spatial pattern and kernel extreme learning machine. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2863–2866. IEEE (2017) 10. Graimann, B., Allison, B., Pfurtscheller, G.: Brain-computer interfaces: a gentle introduction. In: Graimann, B., Pfurtscheller, G., Allison, B. (eds.) BrainComputer Interfaces, pp. 1–27. Springer, Heidelberg (2009). https://doi.org/10. 1007/978-3-642-02091-9 1 11. Lawhern, V., Solon, A., Waytowich, N., et al.: EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J. Neural Eng. 15(5), 056013 (2018) 12. Tabar, Y., Halici, U.: A novel deep learning approach for classification of EEG motor imagery signals. J. Neural Eng. 14(1), 016003 (2016) 13. Chen, J., Yi, W., Wang, D., et al.: FB-CGANet: filter bank channel group attention network for multi-class motor imagery classification. J. Neural Eng. 19(1), 016011 (2022) 14. McEvoy, L., Smith, M., Gevins, A.: Dynamic cortical networks of verbal and spatial working memory: effects of memory load and task practice. Cereb. Cortex 8(7), 574563 (1998) 15. Zhang, Y., Huang, H.: New graph-blind convolutional network for brain connectome data analysis. In: Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S. (eds.) IPMI 2019. LNCS, vol. 11492, pp. 669–681. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-20351-1 52 16. Zhao, M., Yan, W., Luo, N., et al.: An attention-based hybrid deep learning framework integrating brain connectivity and activity of resting-state functional MRI data. Med. Image Anal. 78, 102413 (2022) 17. Li, Y., Zhong, N., Taniar, D., et al.: MCGNet+: an improved motor imagery classification based on cosine similarity. Brain Inform. 9(1), 1–11 (2022) 18. Feng, N., Hu, F., Wang, H., et al.: Motor intention decoding from the upper limb by graph convolutional network based on functional connectivity. Int. J. Neural Syst. 31(12), 2150047 (2021) 19. Hou, Y., Jia, S., Lun, X., et al.: GCNs-Net: a graph convolutional neural network approach for decoding time-resolved EEG motor imagery signals. IEEE Trans. Neural Netw. Learn. Syst. 1–12 (2022) 20. Hamedi, M., Salleh, S., Noor, A.: Electroencephalographic motor imagery brain connectivity analysis for BCI: a review. Neural Comput. 28(6), 999–1041 (2016) 21. Brunner, C., Leeb, R., Mller-Putz, G., et al.: BCI Competition 2008CGraz Data Set A. Institute for Knowledge Discovery (Laboratory of Brain-Computer Interfaces), Graz University of Technology, vol. 16, pp. 1–6 (2008)
T3SFNet
29
22. Shi, K., Huang, R., Mu, F., et al.: A novel multimodal human-exoskeleton interface based on EEG and sEMG activity for rehabilitation training. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 8076–8082. IEEE (2022) 23. Shi, K., Huang, R., Peng, Z., et al.: MCSNet: channel synergy-based humanexoskeleton interface with surface electromyogram. Front. Neurosci. 15, 704603 (2021)
Joint Trajectory Generation of Obstacle Avoidance in Tight Space for Robot Manipulator Chunfang Liu(B) , Jinbiao Zhang, Pan Yu, and Xiaoli Li Department of Information, Beijing University of Technology, Beijing, China {cfliu1985,panyu}@bjut.edu.cn, [email protected]
Abstract. It is a difficult thing for robot working in a tight and narrow space with obstacles because of collision occurrence. For solving this problem, the paper proposes a joint trajectory generation method for obstacle avoidance. Besides of the end-effector, our work plans a collision free trajectory for each joint in the narrow space. Considering the complexity of obstacle distribution, the presented method combines Dynamic Movement Primitive (DMP) with a RRT-Connect algorithm that firstly, in the joint space DMPs generate trajectories for each manipulator joint, and then, in the cartesian space, the collision detection model checks the DMP generated trajectories. If any of the links collides with the obstacle, a collision free path will be planned on the trajectory points that encounter obstacles by employing RRT-Connect algorithm. Based on ROS platform, the experiments build a tight and narrow simulated environment, and test the method on a UR3 robot manipulator, which show the effectiveness of the presented method.
Keywords: DMP-RRT-Connect avoidance · Narrow space
1
· Collision detection · Obstacle
Introduction
As we know, it’s tedious and tiring for humans to spend numerous time working in small spaces. If we could replace human beings with robotic manipulator to do this work, there is no doubt that people’s productivity will be improved significantly. Some researchers have done a lot of work on this field. However, among these works, they either only consider obstacle avoidance between the end-effector of the robotic manipulator and the obstacle, or not involve obstacle avoidance in their work. As for the studies on robotic manipulator to generate trajectories in narrow space considering obstacle avoidance of all links, there is very little. So it’s necessary to do this meaningful work. Traditional programming of industrial robots is inefficient and costly, which is not suitable for small spaces and unstructured environments. To enable robots to perform dexterous tasks, some researchers have proposed several methods, c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 30–44, 2023. https://doi.org/10.1007/978-981-99-0617-8_3
Joint Trajectory Generation of Obstacle Avoidance in Tight Space
31
which includes imitation learning. Imitation learning describes the process by which humans or other organisms learn through observation, comprehension, and imitation. In imitation learning, we use DMP to code, reproduce and generalize our teached joint angle data. In order to get the postures of end-effector and other links of mechanical manipulator in Cartesian space, we use Denavit-Hartenberg (DH) coordinate transformation to solve the forward kinematics. Using these postures, whether the links collide with obstacles can be checked in narrow space. RRT-Connect is an improved algorithm of RRT. By building a search tree from both the start configuration and goal configuration simultaneously, it can greatly improve search efficiency. It is well fit in multi-degree-of-freedom robots for path planning in complex environment and high-dimensional space. In order to make the manipulator work in narrow space, this paper proposes a method that combines DMP with the RRT-Connect algorithm. DMP is used to learn and reproduce demonstrated joint angles, and the RRT-Connect algorithm is used for exploring a collision-free path. Section 2 introduces the work related to manipulator motion planning in small space. Section 3 presents the original basic formulation of the dynamic movement primitives. Section 4 shows transformation of joint space to Cartesian space. Section 5 presents the work done for the purpose of manipulator obstacle avoidance in narrow space. Section 6 presents the related experiments, and Sect. 7 presents the conclusions drawn.
2
Related Works
In this section, some related works about DMP and path planning algorithms are introduced. Based on the idea of learning from biological systems, DMP was first proposed by Stefan Schaal in 2002 [7]. By introducing a task-oriented regression analysis algorithm and reling on model switching [13], Youn Zhou and Tamin Asfour solved the problem of poor generalization of a single DMP model in 2017. In the next year, Iman Kardan et al. proposed a fuzzy dynamic motion primitive framework [6] that embeds velocity scaling and also features obstacle avoidance. In 2008, Dae-Hyung Park et al. added the potential field term at the center of the obstacle into the DMP model [12]. By this way, DMP can avoid mechanical manipulator colliding with obstacles. However, this method is prone to fall into the local minimum trap in the presence of numerous obstacles. Steering angle is incorporated to the formulation of DMP [5] in 2009, so that robot can steer around obstacles to avoid collision. For trajectory learning and obstacle avoidance, Yuqiang Jin et al. proposed a method that combined DMP and a Rapidly-exploring Random tree [14]. Unfortunately, the obstacle avoidance method proposed in [5,14] only considered the obstacle avoidance of end-effector, with not considering for the link of mechanical manipulator. As a result, the two methods are both not fit in narrow space. As for path planning, Xue Han et al. proposed a new method for visual servo design based on an improved multi-RRT algorithm [4], which has a much higher convergence speed and shows better performance in narrow spaces compared to
32
C. Liu et al.
the traditional RRT algorithm. However, the method was not extended to online visual servo controllers. Using the schematic path as the heuristic term, combining non-uniform sampling and uniform sampling to generate PRM stochastic road maps offline [10], the algorithm raised by M.K. Sun et al. can find the final executable path fastly and stablely. Kai Cao et al. improved the sampling strategy of PRM and proposed a PRM based on the optimal sampling strategy [1]. The algorithm’s path planning time and success rate are greatly improved compared to the original PRM. For better application of the known information during the search process, Chenxi Feng et al. proposed the RRT*-Local Directional Visibility algorithm [3]. It not only improved the sampling strategy of nodes and the convergence of this algorithm in narrow channels, but also enhanced it’s success rate of planning. Using dynamic repulsive fileds and introducing parametric decision forces, Wei Zhang et al. improved the robustness and flexibility of obstacle avoidance algorithm by filtering preplanned trajectories [15]. By combining the advantages of the original Weighed A* and SMHA* together with adding stagnation detection when expanding nodes [9], the SMHA* algorithm can be used for solving the difficulties of planning in narrow space. This algorithm greatly improves the efficiency of robotic arm planning in narrow space. Based on the traditional A* algorithm, a three-dimensional A* algorithm combined with collision [8] is proposed by Yuchen Li et al. It can be better used for the motion control in the confined environment of space station experiment box. Gang Chen et al. proposed a virtual force-based sampling method, designed a connection strategy with ordinary connection, extended component connection, and proposed an improved PRM algorithm [2]. The algorithm can effectively avoid obstacles in cluttered environment.
3
Dynamic Movement Primitives
The reproduction of original joint angles is base on dynamic movement primitives. Here, we briefly introduce the DMP. The starting point of the dynamic motion primitives is to construct a “point of attraction” model using a second-order system with sub-stability, where the final state of the system changes when the point of attraction is changed, i.e., the target position of the system changes. The sub-stable second-order system used in the dynamic motion primitive is a common spring-damped system. The expressions are as follows: ˙ y¨ = αy (βy (g − y) − y)
(1)
where y denotes the state of the system. y˙ and y¨ are the first-order and secondorder derivatives of the system state y, respectively. g represents the target state to which the system will finally converge. αy and βy are two constants. Although the above formula ensures that the system converges to the target state, it does not control the learning trajectory and the rate of convergence of the trajectory change, while the following formula can achieve such an effect. ˙ +f τ 2 y¨ = αy (βy (g − y) − τ y)
(2)
Joint Trajectory Generation of Obstacle Avoidance in Tight Space
33
where f is a force term to control the shape of the trajectory. τ denotes a scaling term that controls the convergence rate of the trajectory. Besides, f is a nonlinear function, which is achieved by the normalized superposition of multiple nonlinear basis functions, described as N f (u) =
i=1 Ψi (u)ωi u N i=1 Ψi
(3)
τ · u˙ = −αu u
(4)
where Ψi is the ith kernel function (Gaussian basis function). N represents the total number of Gaussian basis functions. ωi denotes the weight coefficient corresponding to the kernel function. u is the canonical value determined by the canonical system, it will eventually return to 0. τ is the same as the τ of DMP.
4
Transformation of Joint Space to Cartesian Space
DH coordinate transformation is a better way to map joint space to Cartesian space. Using this method, we can easily get the postures of end-effector and other links of mechanical manipularor in Cartesian space by the joint angles obtained by DMP. In this paper, we use Modified-Denavit-Hartenberg (MDH) to establish coordinate of each link of UR3 manipulator. Figure 1 shows the relationship between each link of UR3 built by MDH. Based on the size parameter of UR3 manipulator, we can get it’s table of DH parameter, as is shown in Table 1. In Table 1, ai is the distance along the Xi−1 -axis, moving from the Zi−1 -axis to the Zi -axis. αi−1 is the angle along the Xi−1 -axis rotating from the Zi−1 -axis to the Zi -axis. di represents the distance along the Zi -axis, moving from the Xi−1 -axis to the Xi -axis. θi represents the angle along the Zi -axis, rotating from the Xi−1 -axis to the Xi -axis. Table 1. MDH parameters of UR3 manipulator Serial number of link i
Serial number of links ai /mm
Toring angle of link αi−1 /rad
Distance between links di /mm
The corner of joint θi /rad
1 2 3 4 5 6
0 0 −243.65 −213 0 0
0 π/2 0 0 π/2 −π/2
151.9 119.85 0 −9.45 83.4 82.4
θ1 θ2 θ3 θ4 θ5 θ6
34
C. Liu et al.
According to the relationship between each link, we can use following mathematics to express them. Tii−1 = T rans(x, ai−1 ) · Rot(x, αi−1 ) · T rans(z, di ) · Rot(z, θi ) ⎡ ⎤ cosθi −sinθi 0 ai−1 ⎢sinθi cosαi−1 cosθi cosαi−1 −sinαi−1 −di sinαi−1 ⎥ ⎥ Tii−1 = ⎢ ⎣sinθi sinαi−1 cosθi sinαi−1 cosαi−1 di cosαi−1 ⎦ 0 0 0 1
(5)
(6)
where i = 1, 2, 3 ... 6. Expression (6) is obtained by expanding mathematics (5). Then transformation matrix of UR3 can be calculated as Eq. 7 show. ⎡ ⎤ nx ox ax px ⎢ny oy ay py ⎥ ⎥ T60 = T10 · T21 · T32 · T43 · T54 · T65 = ⎢ (7) ⎣nz oz az pz ⎦ 0 0 0 1 Using ci = cosθi , si = sinθi , cij = cos(θi + θj ), sij = sin(θi + θj ), cijk = cos(θi + θj + θk ), sijk = sin(θi + θj + θk ), Eq. 7 can be simplified as ⎧ nx = c6 (s1 s5 + c1 c234 c5 ) + c1 s234 s6 ⎪ ⎪ ⎪ ⎪ n ⎪ y = s1 s234 s6 − c6 (c1 s5 − s1 c234 c5 ) ⎪ ⎪ ⎪ n ⎪ z = s234 c5 c6 − c234 s6 ⎪ ⎪ ⎪ o ⎪ x = c1 s234 c6 − s6 (s1 s5 + c1 c234 c5 ) ⎪ ⎪ ⎪ oy = s6 (c1 s5 − s1 c234 c5 ) + s1 s234c6 ⎪ ⎪ ⎨ oz = −c234 c6 − s234 c5 s6 ax = c1 c234 s5 − s1 c5 ⎪ ⎪ ⎪ ⎪ ay = s1 c234 s5 + c1 c5 ⎪ ⎪ ⎪ ⎪ az = s345 s5 ⎪ ⎪ ⎪ ⎪ px = (c1 c234 s5 − s1 c5 )d6 + c1 s234 d5 + s1 d4 + c1 c23 a3 + s1 d3 + c1 c2 a2 + s1 d2 ⎪ ⎪ ⎪ ⎪ py = (s1 c234 s5 + c1 c5 )d6 + s1 s234 d5 − c1 d4 + s1 c23 a3 − s1 d3 + s1 c2 a2 − c1 d2 ⎪ ⎪ ⎩ pz = s234 s5 d6 − c234 d5 + s23 a3 + s2 a2 (8) Through the forward kinematic model built by MDH, the postures of endeffector and other links of the UR3 manipulator can be calculated. This part is done in preparation for the following collision detection and the search for a collision-free path.
Joint Trajectory Generation of Obstacle Avoidance in Tight Space
35
Fig. 1. The relationship between each link of UR3 built by MDH
5
Obstacle Avoidance
Our robotic manipulator is working in a small space with obstacles. In order to ensure the safety of the robotic manipulator in the environment during operation, the obstacle avoidance of each link of manipulator must be taken into consideration. In this section, two methods of obstacle avoidance are introduced: one is to add a steering angle on the basis of DMP for obstacle avoidance, the other is to use the idea of obstacle avoidance in RRT-Connect algorithm. 5.1
DMP Obstacle Avoidance
The idea of coupled term obstacle avoidance is to calculate the angle between the current velocity vector and the vector with the end position coordinates pointing to the center coordinates of the obstacle, and then move away from it [5]. Figure 2 depicts this process.
36
C. Liu et al.
Fig. 2. Definition of steering angle ϕ
The specific implementation of steering angle is as follows: P (x, v) = γRvϕexp(−βϕ)
(9)
where ϕ is the angle between the velocity vector v and the difference vector (o − x) between the coordinates of the center of the obstacle and the coordinates of the current position, R is the rotation matrix that determines the direction of rotation of the motion trajectory around the obstacle. The effect of obstacle avoidance by adding the coupling term to the DMP basic formula is shown in Fig. 3. When encounting single obstacle, DMP can avoid it successfully. However, if there are multiple obstacles, it may fail. So it is not well fit in our work.
(a) Single obstacle
(b) Multiple obstacles
Fig. 3. The effect of DMP (green point is target point, while red point is obstacle point.) (Color figure online)
5.2
RRT-Connect Obstacle Avoidance
RRT-Connect. The RRT-Connect algorithm is an improved algorithm of the RRT algorithm. It is more efficient than the RRT algorithm for path planning.
Joint Trajectory Generation of Obstacle Avoidance in Tight Space
37
The search process of RRT-Connect is shown in Fig. 4. The basic idea of RRTConnect algorithm is to expand a fast expanding random tree from the start and end points simultaneously.
Fig. 4. The search process of RRT-Connect (red point is starting point while green point is end point) (Color figure online)
Collision Detection. The collision detection of manipulator includes selfcollision detection between each link and collision detection between all links and environment. For each link of the manipulator, those links that are far away or unlikely to cause collisions are set as collision-free. And we do the same for the collision between the links of the robot and the environment. Due to the complexity of the surface shapes of the links and the obstacles in the environment, it would be difficult to detect them directly, so the links and the obstacles are encapsulated with a hierarchical envelope box separately to reduce the complexity of collision detection. In this paper, we construct a wraparound box for each link of the UR3 manipulator, and construct a hierarchical wraparound box for the UR3 manipulator based on these wraparound boxes. Also, the corresponding wraparound boxes for the obstacles in the environment are built. Figure 5 shows the hierarchical surrounding boxes of the manipulator in different postures. This section uses the Flexible Collision Library (FCL) [11] for collision detection, and the flow chart of collision detection is shown in Fig. 6. FCL checks whether there is a collision between the enclosing boxes. If the answer is no, then no collision occurs. Otherwise, we need to check if there is an overlap between the enclosing boxes. If no, there will be no collision. If yes, we can assert that a collision happens. The results of the collision detection are
38
C. Liu et al.
Fig. 5. Hierarchical surround box of UR3
Fig. 6. Flow chart of collision detection
displayed in Fig. 7. When a collision appears, the color of corresponding link will turn red.
Joint Trajectory Generation of Obstacle Avoidance in Tight Space
39
Fig. 7. The results of collision detection (Color figure online)
Track Correction. By the DMP basic formula, although the reproduction and generalization of the original trajectory can be achieved, it does not have the ability to actively avoid obstacles. Therefore, in the environment with obstacles, it will greatly limit the flexibility and safety of robot. To solve this problem, the RRT-Connect algorithm is selected in this section to correct the trajectory points generated by the DMP model for achieving the effect of actively avoiding obstacles. The specific search process of the algorithm is shown in Fig. 8.
Fig. 8. Correction of DMP node
Based on the trajectory generated by the DMP model, collision detection is performed for each node and between adjacent nodes. Assuming that the nodes causing collision of the robot arm are some adjacent nodes. Let the first node causing collision be q1 , the last node causing collision be qn . Remove the nodes between q1 and qn from the original path, and take the node q0 in front of q1 as the starting node qstart of the RRT-Connect algorithm, the node qn+1 in back of qn as the ending node qend of the RRT-Connect algorithm. Then a collision-free path is obtained by exploring from two nodes simultaneously.
40
6
C. Liu et al.
Simulation on ROS
In this part, an UR3 manipulator model is applied to simulate all the experiments on ROS platform. The following simulation experiments include acquisition of demonstration joint angle data, simulation experiment of DMP-RRT-Connect and trajectory generation of obstacle avoidance in narrow space. 6.1
Acquisition of Demonstration Joint Angle Data
In this paper, we use the mouse to drag the interactive marker at the end of the manipulator in the rviz interface to complete the operation of the entire teached action. The demonstrated process of the whole action is shown in Fig. 9. We set the joint angles for the initial demonstration state and end demonstration state of the UR3 manipulator as (1.56201, −0.87403, 0.75898, −1.45006, 0.477, −0.00696) and (1.56201, −0.87403, 0.75898, −1.45006, 0.477, −0.00696). The joint angles recorded are in radians. Then the UR3 manipulator is dragged from the initial state to the goal state to complete the whole action on a plane.
Fig. 9. The whole demonstrated action of UR3 mechanical manipulator
6.2
Experiment of DMP-RRT-Connect
In order to verify the effectiveness of DMP-RRT-Connect, this paper chooses to compare RRT-Connect, DMP with it. We add a cube obstacle with a length, width and height of 0.1 m to the original taught trajectory. The Fig. 10 shows the different performances of the above methods on path planning. It’s clearly that DMP can not avoid obstacle, the length of trajectory generated by RRTConnect is too long, while using DMP-RRT-Connect, not only can we get similar trajectory, but also can make any of the links avoid obstacles. This facilitates further study of the obstacle avoidance trajectory in tight spaces.
Joint Trajectory Generation of Obstacle Avoidance in Tight Space
(a) RRT-Connect
(b) DMP
41
(c) DMP-RRT-connect
Fig. 10. The results of collision detection
6.3
Trajectory Generation of Obstacle Avoidance in Narrow Space
Finally, we use an UR3 manipulator to complete an obstacle avoidance action in narrow space. One of the experimental scenario built is shown in Fig. 11. A semi-enclosed small space model with openings on both sides is placed at the position of (0.04, 0.73, 1.22) in the scene. Besides, several small objects are also setted in this small space model.
Fig. 11. The experiment scenario
After the trajectory generated by the DMP-RRT-Connect model is applied to the UR3 manipulator, the effect of obtained obstacle avoidance is shown in Fig. 12. It can be seen that any link of the UR3 manipulator can successfully avoid collisions in the environment during the whole process. The above two pictures of Fig. 13 show the change in each joint angle of the manipulator when considering obstacle avoidance. The above two pictures of Fig. 13 show the changes in joint angle of manipulator when not considering obstacle avoidance. From Fig. 13(c) and (d), we can see that after processing by the DMP-RRTConnect model, the change of corrected joint angle has more ups and downs compared with the change of original joint angle due to the consideration of obstacle avoidance.
42
C. Liu et al.
Fig. 12. The performance of obstacle avoidance in wiping task
Fig. 13. The change in each joint. (a and b represent the change in the first three joint angles and the change in the last three joint angles in the original joint angles, respectively. c and d represent the change in the first three joint angles and the change in the last three joint angles in the modified joint angles, respectively.)
7
Conclusion
In this paper, we have proposed a DMP-RRT-Connect model to solve the problem of manipulator trajectory planning in tight space. DMP learns the motion characteristics of the demonstrated joint angle data to reproduce and generalize
Joint Trajectory Generation of Obstacle Avoidance in Tight Space
43
the original trajectory. For the trajectory data generated by DMP, we use the FCL package to detect whether each link of the manipulator collides with obstacles in narrow space. When a collision is detected, RRT-Connect corrects the trajectory data. Trajectories that take obstacle avoidance of links into account are obtained in tight space. Simulation on the ROS platform shows that DMP-RRT-Connect can make UR3 manipulator generate trajetories in narrow space successfully. Moreover, it avoids collisions between all the links of UR3 manipulator and obstacles in tight space. In this work, we use DMP to learn the characteristics of joint angles. However, when generalizing to different poses in Cartesian space, trajectories with high similarity to the original trajectory may not be obtained. In the furture, to meet the requirements of different tasks in different locations, we will use DMPRRT-Connect to learn the characteristics of the posture information, and RRTConnect algorithm correct trajectories generated by DMP. Acknowledgment. The work was jointly supported by Beijing Natural Science Foundation (4212933), Scientific Research Project of Beijing Educational Committee (KM202110005023) and National Natural Science Foundation of China (62273012, 62003010).
References 1. Cao, K., Cheng, Q., Gao, S., Chen, Y., Chen, C.: Improved PRM for path planning in narrow passages. In: 2019 IEEE International Conference on Mechatronics and Automation (ICMA), pp. 45–50. IEEE (2019) 2. Chen, G., Luo, N., Liu, D., Zhao, Z., Liang, C.: Path planning for manipulators based on an improved probabilistic roadmap method. Robot. Comput.-Integr. Manuf. 72, 102196 (2021) 3. Feng, C., Wu, H.: Accelerated RRT* by local directional visibility. arXiv preprint arXiv:2207.08283 (2022) 4. Han, X., Wang, T.T., Liu, B.: Path planning for robotic manipulator in narrow space with search algorithm. In: 2017 2nd International Conference on Advanced Robotics and Mechatronics (ICARM) (2017) 5. Hoffmann, H., Pastor, P., Park, D.H., Schaal, S.: Biologically-inspired dynamical systems for movement generation: automatic real-time goal adaptation and obstacle avoidance. In: IEEE International Conference on Robotics and Automation, ICRA 2009 (2009) 6. Kardan, I., Akbarzadeh, A., Mohammadi Mousavi, A.: Real-time velocity scaling and obstacle avoidance for industrial robots using fuzzy dynamic movement primitives and virtual impedances. Ind. Robot 45(1), 110–126 (2018) 7. Schaal, S.: Dynamic movement primitives-a framework for motor control in humans and humanoid robotics. In: Kimura, H., Tsuchiya, K., Ishiguro, A., Witte, H. (eds.) Adaptive Motion of Animals and Machines, pp. 261–280. Springer, Tokyo (2006). https://doi.org/10.1007/4-431-31381-8 23 8. Lai, Y., Jiang, Q., Wang, H.-S.: Adaptive trajectory planning study of robotic arm in confined space. Electron. Compon. Inf. Technol. (2021)
44
C. Liu et al.
9. Mi, K., Zheng, J., Wang, Y., Jianhua, H.: A multi-heuristic A* algorithm based on stagnation detection for path planning of manipulators in cluttered environments. IEEE Access 7, 135870–135881 (2019) 10. Mingjing, S., Qixin, C., Xiuchang, H., Xiang, L., Xiaoxiao, Z.: Fast and stable planning algorithm for narrow space of robotic arm. Mech. Des. Res. 35(06), 67 (2019) 11. Pan, J., Chitta, S., Manocha, D.: FCL: a general purpose library for collision and proximity queries. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2012) 12. Park, D. H., Hoffmann, H., Pastor, P., Schaal, S.: Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In: 8th IEEE-RAS International Conference on Humanoid Robots 2008 (2008) 13. You, Z., Asfour, T.: Task-oriented generalization of dynamic movement primitive. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2017) 14. Yuqiang, J., Xiang, Q., Andong, L., Wenan, Z.: DMP-RRT based robot arm trajectory learning and obstacle avoidance method. Syst. Sci. Math. 42(2), 13 (2022) 15. Zhang, W., Cheng, H., Hao, L., Li, X., Gao, X.: An obstacle avoidance algorithm for robot manipulators based on decision-making force. Robot. Comput.-Integr. Manuf. 71, 102114 (2021)
Manipulating Adaptive Analysis and Performance Test of a Negative-Pressure Actuated Adhesive Gripper Jiejiang Su(B) , Huimin Liu, Jing Cui, and Zhongyi Chu Beijing University of Technology, Beijing 101100, China [email protected]
Abstract. Flexible printed circuit boards (FPCBs) are widely used in the electronic information industry. Due to their different shapes and porous, the traditional vacuum suction has problems such as leakage. Dry adhesive technology has strong potential to replace existing techniques since it provides stable handling. To this end, this paper proposes a negative pressure-driven adhesion gripper based on an annular wedge-shaped microstructure. The deformable chamber is designed to be cylindrical, which provides shear loads for annular wedge-shaped dry adhesion. The adhesion switch by shear loads and pick-and-place operations for FPCBs can be realized. During the operations, the contact state is critical for adhesion performance. To realize better contact and stability, this paper focuses on the plane adaptation technology of the negative pressure-actuation adhesion gripper, and a corrugated connector is modeled and mounted in the gripper. A series of experiments demonstrate that the adhesion becomes more stable and the adhesion force is increased by more than 20% with the designed connector. The negative pressure-actuation adhesive gripper designed in this paper can achieve reliable picking and release for flat, flexible, and thin objects, such as flexible printed circuit boards. Keywords: Adhesive gripper · Flexible structure · Leveling technology · Controllable adhesion
1 Introduction With the development of the precision industry (including semiconductors, displays, solar cells, etc.), electronic products continue to develop in the direction of small volume, high density, multi-level and high flexibility. Therefore, there are more and more demands for reliable handling of porous, fragile, and soft targets in industries. Traditional operation methods, including rigid grasp and vacuum adhesion [1–4], easily cause damage or vacuum leakage. However, the dry adhesion based on van der Waals force has the advantages of it works in a vacuum, does not damage the surface of the object, and leaves no residue, which provides the possibility of reliable handling of target objects. However, the present dry adhesion is mainly non-directional [5–8], which requires a large preload to achieve adhesion, and is not easy to detach. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 45–58, 2023. https://doi.org/10.1007/978-981-99-0617-8_4
46
J. Su et al.
Directional dry adhesives use asymmetric microstructure, which can be switched by regulating the contact area with directional lateral shearing [9–15]. Thus, they require little force to attach and easy detachment. It is very suitable for the non-destructive picking and releasing of thin, soft, and fragile objects. The most representative is the wedge microstructure. There have been several examples of such grippers. Jiang et al. assembled a robotic device using tendons to transmit force and generate a shear load in the adhesives [16]. Although the tendon mechanisms distribute the forces well, they can loosen over time and be difficult to calibrate, meanwhile, in the load state, the tendon between rigid tile will cause a moment hence peeling force will cause the adhesive to detach. Tao fabricated wedge-structured dry adhesive surfaces using an ultraprecision diamond cutting method and assembled a robotic device using a sliding block driven to generate tangential displacement to load the adhesives [17]. But the flatness of the symmetrically-distributed adhesive pads on the same plane is difficult to guarantee, and loading failure of part of the structure is caused. To address many of these challenges, we introduce a negative pressure for the actuation of an annular wedge-shaped microstructure, by designing the cylindrical shape of the deformation chamber. Under the actuation of negative pressure, the contraction deformation is generated to provide a stable load for the annular wedge-shaped microstructures, utilizing one dry adhesive pad to engage or disengage. However, due to the assembling error of the gripper, the simple cylindrical chamber design makes the plane adaptability poor of the adhesive gripper, which cannot ensure intimate contact between the gripper and the substrate surfaces. This paper focuses on the plane adaptation of a negativepressure actuation adhesives gripper by designing corrugated connectors, and annular wedge-shaped microstructures that can fully come into contact with the adhered object and improve the stability of the adhesive gripper.
2 Negative Pressure-Actuation Adhesion Gripper and Its Working Principle 2.1 The Working Principle of the Operator A negative pressure-actuation adhesive gripper, as shown in Fig. 1: includes an aluminum alloy joint, deformation chamber, deformation layer, and annular wedge-shaped microstructures integrated at the bottom. The cylindrical deformation chamber is connected to the tube through the aluminum alloy joint, the bottom of the deformation chamber is provided with a deformation layer, and the annular wedge-shaped microstructures are annular array distribution along the center of the circle. Compared to the conventional parallel array of wedge-shaped microstructures, the annular wedge-shaped microstructures can be used independently as an adhesion unit since they can be loaded uniformly and symmetrically without needing to be used in pairs. The negative-pressure actuation adhesion gripper work process is as follows: (1) In the atmospheric pressure state, the deformation chamber under the action of pre-pressure, with only their tips contacting the surface, the van der Waals adhesion is negligible (2) When in the negative pressure state, the deformation chamber produces contraction deformation actuate wedges bend over and contact area increases dramatically, increasing adhesion, as shown in Fig. 2. (3) After the negative pressure is removed, the elastic
Manipulating Adaptive Analysis and Performance Test
47
Fig. 1. Annular wedge-shaped microstructures with negative-pressure actuation gripper
deformation of the deformation chamber and the deformation layer recovers itself, and only the tip contact with the adhered object, again with negligible adhesion.
Fig. 2. Schematic diagram of negative-pressure actuation chamber operation
2.2 Planar Adaptation Analysis of Adhesive Gripper It is shown that the normal force generated by the wedge-shaped microstructures is proportional to the contact area, meaning fully engaged adhesive can support large loads. From the microscopic point of view, the length of a single wedge-shaped microstructure is about 80 µm, and the height difference between the microstructure before and after loading is about 35 µm as shown in Fig. 3a. Assuming that the diameter of the deformation cavity is 20 mm, when there is an angle of 0.1° between the microstructure at
48
J. Su et al.
the bottom of the deformation cavity and the adhered object, as shown in Fig. 3b, it can be derived according to the trigonometric function, at this time the height difference of 35 µm will be generated on both sides of the driving cavity. This means that the annular wedge-shaped microstructure needs to be in full contact with the adhered object, and the planeness between the two surfaces is very high, but in practice, it is difficult to meet the above requirements due to the assembling error and the flatness of the adhered object. Although the deformation chamber is made of silicone rubber, which is a super-elastic material and can ensure full contact between the annular wedge-shaped microstructures and the adhered object when it is subjected to a pre-pressure of more than 10 N, the prepressure disappears when the adhered object is ready to be lifted, and the adhesion force generated by the annular wedge-shaped microstructures is not symmetrical and uniform, resulting in peeling moment, which leads to adhesion failure. Therefore, it is necessary to study a leveling device to ensure that the annular wedge-shaped microstructures are in full contact with the adhered object.
Fig. 3. (a) With microwedges, there is a 35 µm window between full no engagement with the surface (left) and engagement (right). (b) This corresponds to 0.1° of misalignment over a 20 mm patch.
3 Design of Negative Pressure-Actuation Adhesive Gripper Leveling Device 3.1 Leveling Device Design A leveling device designed in this paper is shown in Fig. 4a. The original deformation chamber is improved by changing from being directly connected to a rigid pneumatic joint to being connected to a pneumatic joint through a corrugated structure and a spring buffer support transition. The adaptive corrugated structure is elastic and can produce elastic deformation under very small pre-pressure, which can help the annular wedgeshaped microstructures at the bottom of the deformation chamber to passively adapt to the angle of different directions as shown in Fig. 4b.
Manipulating Adaptive Analysis and Performance Test
49
In a negative pressure state, the simple bellows structure will shrink and deform, and then the deformation chamber will be separated from the adhered object, resulting in adhesion failure. For this reason, the design further introduces and spring buffer bracket, and changes the fixed connection into a floating connection, so that the compression of the spring buffer bracket can compensate for the shrinkage of the adaptive bellows structure when the negative pressure is applied, as shown in Fig. 4c. The selection of the spring buffer bracket is based on the size of the pre-pressure and the deformation of the corrugated structure under different negative pressures. As long as the compression of the spring can compensate for the shrinkage of the bellows, and the pre-pressure is still within the required range after the spring compensates for the shrinkage of the bellows, it is sufficient. In addition, since the device can float up and down, it is possible to relax the control accuracy of the operating device on the pre-pressure, improve the robustness of the operation, and avoid the situation that the pre-pressure is too large and damages the adhered object.
Fig. 4. Schematic diagram of leveling device (a) initial state, (b) preload applied on top of the deformation chamber, (c) negative pressure applied on the adhesive gripper.
3.2 Simulation Analysis and Design of the Corrugated Structure Since the spring buffer support is rigid support that does not affect the loading of the deformation chamber, this section focuses on the shear load that the deformation chamber can provide after integrating the adaptive bellows structure when subjected to the simultaneous action of pre-pressure and pressure difference by using Abaqus simulation software for parameter selection. The deformation chamber and bellows structure are mainly made of PlatSil 73–40 silicone rubber material, so uniaxial tensile experiments were performed on this material, and then the experimental data were fitted with a polynomial model. After the comparison, the polynomial model can accurately describe the material’s tensile properties, which will be used in the subsequent simulation experiments to analyze the deformation chamber and the bellows structure.
50
J. Su et al.
The experimentally obtained stress-strain data are brought into the polynomial model as follows. W=
N i+j=1
i j N cij I1 − 3 I2 − 3 +
K=1
1 (j − 1)2K DK
(1)
Fitting was performed and the model parameters are shown in Table 1. Table 1. Polynomial model coefficients Principal equations
Material constants
Value
Polynomial model N=2
C10
−0.0881 MPa
C01
0.3911 MPa
C20
−0.0799 MPa
C11
0.40483447 MPa
C02
0.224646 MPa
D1
0 MPa−1
D2
0 MPa−1
The simulation software cannot directly simulate the van der Waals force between the annular wedge-shaped microstructures and the adhered object, so the model is simplified by removing the annular wedge-shaped microstructures at the bottom of the deformation chamber, and the friction shear stress between the bottom of the deformation chamber and the adhered object is calculated to determine whether the loading force provided by the annular wedge-shaped microstructures is adequate. To quantify the normal adhesion force generated by the annular wedge-shaped microstructures at the bottom of the deformation chamber, the relationship between the shear load and the normal adhesion force of the linear array of wedge-shaped microstructures was measured, and multiple sets of experimental data were fitted to obtain the relationship in Fig. 5. Then the friction shear stress at the bottom of the deformation chamber is brought into the fitted curve to calculate the normal adhesion force at that point, and finally, the normal adhesion force of the whole annular wedge-shaped microstructures can be calculated by summing up all the grid points. In the structural design of the bellows, the number of layers of the bellows structure greatly influences the leveling performance and the stability after picking, so this paper firstly simulates the influence of the number of layers of the bellows structure on the adhesion performance. The effect of single-layer(SL), double-layer(DL), and triplelayer(ML) bellows structures on the friction force at the bottom of the actuation is simulated and analyzed as shown in Fig. 6. Friction shear stress distribution at the adhered interface of the deformation chamber as shown in Fig. 7 is brought into the above relationship between shear load and normal adhesion force for integration and calculation, and the normal adhesion force generated by the annular wedge-shaped microstructures at the bottom of the deformation chamber
Manipulating Adaptive Analysis and Performance Test
Fig. 5. Relationship between shear load and normal adhesion force
Fig. 6. Stress diagram of corrugated tube structure with different layers of bellows
Fig. 7. Relationship between the number of bellows layers and the friction shear stress
51
52
J. Su et al.
integrated with different bellows structures is found respectively, as shown in Fig. 8. From the results, the adhesion force generated by the deformation chamber with an integrated double-layer wave is larger, so the double-layer wave structure is preferred in the later simulation experiments.
Fig. 8. Normal adhesion force generated by annular wedge-shaped microstructures at the bottom of the deformation chamber
In the simulation experiment, the diameter of the deformation chamber is 20 mm, and the wall thickness is 5 mm. To investigate the effect of different sizes of double-layer corrugations on the adhesion performance of the deformation chamber, three sizes of bellows with diameters of 12(DL12), 15(DL15), and 20(DL20) mm were selected to be integrated into the deformation chamber for simulation as shown in Fig. 9, and the simulation results are shown in Fig. 10.
Fig. 9. Schematic diagram of different diameters bellows connections
Manipulating Adaptive Analysis and Performance Test
53
Fig. 10. Parametric study of the effect of the diameter on the friction shear stress distribution at the adhered interface
The distribution of the friction shear stress at the bottom of the deformation chamber as shown in Fig. 10 is brought into the above relationship between the shear load and the normal adhesion force for the integral summation calculation, and the normal adhesion force generated by the annular wedge-shaped microstructures at the bottom of the deformation chamber with different diameter bellows structure integrated is calculation separately, as shown in Fig. 11. From the results shown in Fig. 11, the adhesion performance of the double-layer bellows structure with a 15 mm diameter is optimal for a 20 mm diameter deformation chamber.
Fig. 11. Effect of bellows with different diameters on adhesion force
54
J. Su et al.
4 Negative Pressure Actuation Adhesion Gripper Performance Test 4.1 Experimental System Composition To test the adhesion performance of the negative pressure-actuation adhesion gripper, the adhesion performance test system shown in Fig. 12 was built. The system consists of a motion module, a weight module, a sensing module, and a leveling module. The motion module mainly consists of a z-axis displacement platform, which is used to actuate the adhesive gripper to pick up and release the adhered object with negative pressure. The weight module is composed of a weight box, which is used to test the maximum adhesion force of the gripper. The sensing module is composed of a column pressure sensor, which is used to detect the pre-pressure and the mass of the adhered object. The leveling module consists of a triangle leveling instrument, which is used to ensure the parallelism between the end of the adhesive gripper and the adhered object.
Fig. 12. Diagram of the experimental setup for adhesion force measurement
4.2 Adhesion Performance Test To investigate the adhesion performance of the negative pressure-drive adhesive gripper after integrating the bellows structure, this paper selects the deformation chamber diameter of 20 mm, and the height of 8 mm, using the double-layer corrugated structure of diameters 15, and 20 for integration, and makes three sets of samples for testing respectively. Before the experiment, the bottom of the deformation chamber was fully contacted with the surface of the weight box by a triangle leveler, and the negative pressure was carried out after applying 2.5 N pre-pressure each time. The actual adhesion force test results are shown in Fig. 13. The adhesion force of DL15 and DL20 at -40 kPa
Manipulating Adaptive Analysis and Performance Test
55
calculated by simulation is 165 g and 125 g, while the actual measurement results of DL15 and DL20 are 130 g and 90 g, which can accurately reflect the adhesion strength of the deformation chamber.
Fig. 13. Actual heavyweight test chart
To compare the stability of the adhesion performance of the negative pressureactuation adhesive gripper before and after integration of the bellows structure, this paper first selected the deformation chamber diameter of 20 mm, the height of 8 mm, in the state of the negative pressure of −40 kPa for 20 tests. Then the same deformation chamber will be integrated with a double-layer bellows structure of 15 and 20 diameters after 20 tests according to the same procedure, the test results are shown in Fig. 14. The analysis of the variance values in Table 2 shows that the integration of the leveling device significantly improves the stability of the adhesion performance of the deformation chamber, and the adhesion force of the DL15 model is improved by nearly 30% compared to the deformation chamber without the leveling device.
Fig. 14. Comparison of the adhesion performance of the adhesive gripper before and after the integrated bellows structure
56
J. Su et al. Table 2. Adhesion value analysis results None (g)
Mean Variance
93.5 242.75
DL15 (g) 123.5 72.75
DL20 (g) 99.5 54.75
To verify the plane adaptation ability of the negative pressure actuation adhesion gripper, this paper firstly selects the deformation chamber diameter of 20 mm and height of 8 mm and tests it under the state of −40 kPa. Firstly, the bottom triangle leveling instrument was used to ensure the bottom of the deformation chamber was in full contact with the weight box to test its adhesion force. Then adjust the angle between the test deformation chamber and the adhered object to 1°, and 2°, record the mass of the lifted weight and test its adaptability, the test results are shown in Fig. 15. Analysis of Fig. 15 can be seen that a smaller clamping angle can also lead to a sharp decline in the adhesion force of the pressure-actuation adhesion gripper, or even lead to adhesion failure. The leveling device improves the plane adaptation ability of the pressure-actuation adhesive gripper, and the adhesion value recession is controlled within 10% when it is adapted to an angle of 2° or less.
Fig. 15. Adaptability to angles with integrated bellows structure
4.3 Pick-and-Release Operation Experiment To verify the adhesion performance of the deformation chamber with the addition of the leveling device, a negative pressure-actuation adhesion gripper was integrated into the end of the Scara arm (YAMAHA, YK400XE) as a transfer system in this experiment. The gripper is operated by the Scara arm and picks up directly at an average speed of 60 mm/s. To verify the leveling performance of the leveling device, the experiment was conducted by placing the adhered object on a table with an inclination of 1° and 2° for pickup. As shown in Figs. 16(a) to (d), the negative pressure-actuation adhesion gripper can steadily pick up thin, soft, and fragile objects such as FPCBs, glass, PCB, and wafers.
Manipulating Adaptive Analysis and Performance Test
57
This proves that the leveling device can help the negative-pressure-actuation adhesive gripper adapt to the angle, and ensure that the annular wedge-shaped microstructures at the bottom of the deformation chamber are in full contact with the object to adhere, thus ensuring the stability of the adhesive performance and reducing the requirements of the negative-pressure-actuation adhesive gripper for the use environment.
Fig. 16. Grasping demonstrations of the gripper for flat objects such as (a) FPCBs, (b) glassplates, (c) PCB, and (d) wafer.
5 Conclusion To improve the stability of the negative pressure-actuation adhesive gripper, a passive leveling device is designed in this paper to help the negative pressure-actuation adhesive gripper adapt to the angle between it and the adhered object, which ensures the contact area between its bottom annular wedge-shaped microstructures and the adhered object. It uses the deformation property of the bellows structure, which can produce elastic deformation at low pre-pressure to help the deformation chamber to adapt to the angle between the adhered object. As the bellows structure and the deformation chamber will produce different degrees of contraction at negative pressure, spring buffer support is introduced to compensate for the contraction of the bellows structure and the deformation chamber at the later stage by using the spring compression in the early stage to ensure that the negative pressure actuation adhesion gripper can ensure full contact with the adhered object at both the pre-pressure and negative pressure. The study of planar adaptation analysis in this paper proves the importance of the leveling device for the negative pressure-actuation adhesive gripper and provides a novel solution on how to ensure the annular wedge-shaped microstructures are in full contact with the adhered object. It is worth acknowledging that the leveling device stabilizes the adhesion force by ensuring the contact area of the annular wedge-shaped microstructures at the bottom of the deformation chamber, and the adhesion force of the same deformation chamber is increased by about 30% with the integration of the leveling device. Moreover, the leveling device further improves the plane adaptability of the negative pressure-actuation adhesion gripper, and when adapting to the angle of 2° or less, it can ensure the adhesion force recession value is less than 10%, which greatly reduces the use conditions of the negative pressure actuation adhesion gripper. Acknowledgments. This project was supported in part by NSFC under Grants (51975021, U1913206).
58
J. Su et al.
References 1. Borisov, I.I., Borisov, O.I., Gromov, V.S., Vlasov, S.M., Kolyubin, S.A.: Versatile gripper as key part for smart factory. IEEE Ind. Cyber-Phys. Syst. (ICPS) 2018, 476–481 (2018). https:// doi.org/10.1109/ICPHYS.2018.8390751 2. Laliberté, T., Birglen, L., Gosselin, C.: Underactuation in robotic grasping hands. Japan. J. Mach. Intell. Robot. Control. 4, 77–87 (2002) 3. Sandoval, J., Jadhav, S., Quan, H., Deheyn, D., Tolley, M.: Reversible adhesion to rough surfaces both in and out of the water, inspired by the clingfish suction disc. Bioinspir. Biomim. 14, 066016 (2019). https://doi.org/10.1088/1748-3190/ab47d1 4. Lien, T., Davis, P.G.G.: A novel gripper for limp materials based on lateral Coanda ejectors. CIRP Ann. Manuf. Technol. 57, 33–36 (2008). https://doi.org/10.1016/j.cirp.2008.03.119 5. Bae, W.-G., Kim, D., Suh, K.-Y.: Instantly switchable adhesion of bridged fibrillar adhesive via gecko-inspired detachment mechanism and its application to a transportation system. Nanoscale 5, 11876–11884 (2013). https://doi.org/10.1039/c3nr02008h 6. Yuan, L., Wang, Z., Li, Y., Wu, T.: Reusable dry adhesives based on ethylene vinyl acetate copolymer with strong adhesion. J. Appl. Polym. Sci. 136, 47296 (2018) 7. Sarikaya, R., et al.: Bioinspired multifunctional adhesive system for next generation bioadditively designed dental restorations. J. Mech. Behav. Biomed. Mater. 113, 104135 (2021). https://doi.org/10.1016/j.jmbbm.2020.104135 8. Paretkar, D., Kamperman, M., Schneider, A., Martina, D., Creton, C., Arzt, E.: Bioinspired pressure actuationd adhesive system. Mater. Sci. Eng., C 31, 1152–1159 (2011). https://doi. org/10.1016/j.msec.2010.10.004 9. Parness, A., et al.: A microfabricated wedge-shaped adhesive array displaying gecko-like dynamic adhesion, directionality, and long lifetime. J. Roy. Soc., Interface/the Royal Soc. 6, 1223–1232 (2009). https://doi.org/10.1098/rsif.2009.0048 10. Day, P., Eason, E., Esparza, N., Christensen, D., Cutkosky, M.: Microwedge machining for the manufacture of directional dry adhesives. J. Micro Nano-Manuf. 1, 011001 (2013). https:// doi.org/10.1115/1.4023161 11. Zhou, T., Ruan, B., Che, J., Li, H., Chen, X., Jiang, Z.: Gecko-inspired biomimetic surfaces with annular wedge structures fabricated by ultraprecision machining and replica molding. ACS Omega. 1, 011001 (2021). https://doi.org/10.1021/acsomega.0c05804 12. Wang, W., Xie, Z.: Fabrication of a biomimetic controllable adhesive surface by ultraprecision multistep and layered scribing and casting molding. Sci. China Technol. Sci. 64(8), 1814–1826 (2021). https://doi.org/10.1007/s11431-020-1801-9 13. Jiang, H., et al.: Scaling controllable adhesives to grapple floating objects in space. Proc. – IEEE Int. Conf. Robot. Autom. 2015, 2828–2835 (2015). https://doi.org/10.1109/ICRA.2015. 7139584 14. Hawkes, E., Jiang, H., Cutkosky, M.: Three-dimensional dynamic surface grasping with dry adhesion. The Int. J. Robot. Res. 35, 943–958 (2015). https://doi.org/10.1177/027836491558 4645 15. Chu, Z., Wang, C., Hai, X., Deng, J., Cui, J., Sun, L.: Analysis and measurement of adhesive behavior for gecko-inspired synthetic microwedge structure. Adv. Mater. Interfaces 6, 1900283 (2019). https://doi.org/10.1002/admi.201900283 16. Jiang, H., et al.: A robotic device using gecko-inspired adhesives can grasp and manipulate large objects in microgravity. Sci. Robot. 2, eaan4545 (2017). https://doi.org/10.1126/scirob otics.aan4545 17. Tao, D., et al.: Controllable anisotropic dry adhesion in vacuum: gecko inspired wedged surface fabricated with ultraprecision diamond cutting. Adv. Func. Mater. 27, 1606576 (2017). https://doi.org/10.1002/adfm.201606576
Depth Control of a Biomimetic Manta Robot via Reinforcement Learning Daili Zhang1,2,3 , Guang Pan1,2,3 , Yonghui Cao1,2,3 , Qiaogao Huang1,2,3 , and Yong Cao1,2,3(B) 1 School of Marine Science and Technology, Northwestern Polytechnical University,
Xi’an 710072, China [email protected], {panguang,caoyonghui,huangqiaogao, cao_yong}@nwpu.edu.cn 2 Unmanned Vehicle Innovation Center, Ningbo Institute of NPU, Ningbo 315103, China 3 Key Laboratory of Unmanned Underwater Vehicle Technology of Ministry of Industry and Information Technology, Xi’an 710072, China
Abstract. This paper proposes a model-free biomimetic manta robot depth control method based on reinforcement learning. Different from the traditional control method, the reinforcement learning method does not need to establish a mathematical model of the control object, and autonomously learns the control law through data training. Based on the classical Q algorithm, the state space, the action space, and reward function of the depth control of the bionic manta robot are designed. The state-action function is trained offline using the experience replay mechanism and random sampling strategy. Finally, the trained function is transplanted to the biomimetic manta robot prototype to establish a controller. The effectiveness of the proposed control method is verified by experiments. Keywords: Depth control · Autonomous underwater vehicle · Biomimetic manta robot · Reinforcement learning
1 Introduction Autonomous underwater vehicles (AUVs) can perform tasks such as underwater environmental monitoring, scientific investigation, underwater archaeology, and resource development in near and far seas, and have good scientific and engineering application prospects [1, 2]. Bionic autonomous underwater robot is a new type bionic AUV(BAUV), which has better environmental affinity, high maneuverability, and stronger adaptability to complex environments than traditional propellers [3, 4]. At present, there are still many challenges in the control of bionic AUVs. The stable depth control of bionic AUVs is a necessary condition for its application and one of the basic problems of underwater vehicle control. Model-based AUV depth control research, such as model predictive control (MPC) and synovial control (SMC), is relatively mature [5–7]. Due to the uncertainty of complex underwater environmental disturbances, and compared with rigid AUVs, BAUVs have © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 59–69, 2023. https://doi.org/10.1007/978-981-99-0617-8_5
60
D. Zhang et al.
active and passive deformation characteristics in underwater motion, so its underwater motion model has strong coupling and strong nonlinearity, and it is difficult to establish a dynamic system model [8, 9]. Therefore, it is difficult to achieve deep control using traditional classical control and modern control methods that require a dynamic system model. The fuzzy control method has been successfully applied to the bionic underwater robot [10]. Based on TS fuzzy control, Cao et al. carried out the depth control and heading control of the manta ray-like robotic fish and designed the fuzzy control rules based on human experience to realize the depth and heading control [11], but the insufficiency of the fuzzy control method lies in relying on human experience and design rules. In 2013, Mnih et al. [12, 13] first proposed a deep reinforcement learning algorithm by combining deep learning and reinforcement learning and successfully applied it to Atari games, which was a milestone in the practical application of reinforcement learning. In 2016, the reinforcement learning of Deep Blue Academy Algorithm Alpha successfully defeated the human world champion [14]. At present, researchers are also keen to use reinforcement learning algorithms to solve the control problem of AUV. Wu et al. [15] used the NNDPG algorithm to realize the deep control problem of AUV, established the MDP model of the deep control problem, and verified the feasibility of the algorithm through simulation. Aiming at the difficult problem of building a BAUV motion model, this paper uses a model-free RL algorithm to solve the deep control problem of BAUV, and at the same time avoids the difficulty of relying on artificial experience design rules for fuzzy control. The effectiveness of the method is verified by experiments.
2 The Bionic Manta Robot Design 2.1 The Structure and Shape Design of the Bionic Manta Robot
Fig. 1. The three-dimension model and prototype of the robot
To achieve a bionic shape, the shape of the prototype is designed to imitate the shape and structure of the real bullnose ray. The prototype shown in Fig. 1 mainly includes the
Depth Control of a Biomimetic Manta Robot
61
main body and the pectoral and caudal fins. The main cabin is mainly made of nylon material, which has the advantages of being lightweight, high strength, and low cost compared with ABS resin material. The main cabin is a hermetic design with end-face seals and contains electronic systems that are not waterproof. To easily realize the bionic motion of the creatures, the pectoral and caudal fins are mainly composed of carbon fiber deformable skeleton with silicone skin. The carbon fiber material has the characteristics of rigidity and variability required by the skeleton. Provides more thrust while reducing drag in the water. The 3D map of the real prototype is shown in the figure below. To realize the quasi-sinusoidal motion of the pectoral fins along the chord, the pectoral fin skeleton consists of three fin rays. This effect occurs when the fins do quasi-sine motion under the driving of the servos, but the phases are different. 2.2 The Control System Design of the Biomimetic Manta Robot We show the schematic diagram of the control system of the biomimetic manta robot in Fig. 2, which is mainly composed of a control system, a sensor system, a communication system, a storage system, a power supply system, and a drive system. The control system uses the STM32ZET6 chip as the MCU to realize the control and information processing of the system. The sensor system mainly includes a pressure sensor for measuring depth, a sensor SBG for measuring attitude, and a ranging sensor for obstacle avoidance. The communication system enables the console to interact with the robotic fish through a radio station, receives commands from the console and forwards it to the MCU, and feeds back the attitude of the robotic fish to the console. The storage system is mainly composed of a storing card and peripheral circuits, which store important information about the system. The power supply of the system is composed of an 8.4 V, 25 Ah secondary lithium battery, which can be recharged multiple times, and a single full charge can last for 6 h. The drive system consists of 6 pectoral fin servos and 1 caudal fin servo. The pectoral fin servo drives the pectoral fins of the robotic fish to make it swim freely like a real manta ray, and the caudal fin servo controls the pitching motion of the robot.
Fig. 2. The hardware system of the bionic manta robot
62
D. Zhang et al.
3 The Depth Control Algorithm 3.1 The Depth Control Problem The purpose of the depth control of the manta-like robot is to swim in the desired line of the setting depth z(t) = f (x, t), where t denotes the time and x denotes the direction. To describe the motion of the manta ray robot in six degrees of freedom (DOF), we use six independent coordinates to represent the position and pose. The prototype’s six degrees of freedom motions are swell, roll, and heave, which refer to longitudinal, lateral, and vertical displacements, and roll, pitch, and yaw, which describe rotation about longitudinal, lateral, and vertical axes. The general motion of the manta-like robotic fish T T can be described by the following vector: η = x, y, z, ϕ, θ, ψ , υ = u, v, w, p, q, r , and τ = [X , Y , Z, K, M , N ]T where η represents the fixed position and attitude vector of the earth, υ represents the fixed linear velocity and angular velocity vector of the carrier and τ denotes the forces and moments acting on the robot. We depict the coordinate system fixed to the carrier and the inertial reference system fixed to the earth in Fig. 3.
Fig. 3. The bionic manta robot coordinate
We only consider the depth control of the bionic manta robotic fish in the longitudinal plane, so we only need to focus on the motion of the x-z plane, and variables in other dimensions will be ignored in this paper. In addition, we set the amplitude, frequency, and phase difference of the pectoral fins as fixed values, so the force generated in the propulsion direction can be constant. When the thrust of the pectoral fins and the resistance in the water reach a balance, it is known from Newton’s law that the robotic fish will remain constant speed U. Then the remaining coordinates can be expressed as χ = z, θ, w, q , and the dynamic equation of the robotic fish movement can be simplified as: χ˙ = Aχ + Bu.
(1)
Depth Control of a Biomimetic Manta Robot
63
3.2 The Reinforcement Learning-Based Control Algorithm If we got the motion model of the manta-like robotic fish, we could use the traditional control methods to achieve its depth control. However, the hydrodynamic parameters of the bionic soft body are very difficult to achieve through experiments or numerical calculation methods. This section will introduce the reinforcement learning models for Q-learning and solve the deep control problem of manta-like robotic fish. The advantage of the reinforcement learning method is that it does not need to establish a kinematic model of the control system and realizes the control of objects based on a data-driven approach. The control structure block diagram is shown in Fig. 4. The input of the controller is the deviation of the desired depth and the feedback depth, and the controller outputs the control amount of the control object according to the result of offline training through reinforcement learning.
Fig. 4. The block diagram of the control structure
Fig. 5. The Markov decision process
The Markov decision process (MDP) is a formal expression of sequential decisionmaking. The theoretical derivation of reinforcement learning is carried out under the MDP model. Therefore, the use of reinforcement learning methods to achieve deep control of the manta-like robot requires the establishment of an MDP model. The MDP
64
D. Zhang et al.
can be represented as the interaction process between the agent and the environment as shown in Fig. 5. The agent selects the action at the state st . After the environment receives the action, the state changes to st+1 and gives the reward r t+1 . When time t tends to infinity, the cumulative reward is ∞ γ t−t0 rt , (2) Gt = t=t0
where γ denotes the discount rate, and the value range is 0 to 1. The goal of reinforcement learning is to find a policy function to maximize the cumulative reward. Assuming that our environment E is deterministic, all the equations proposed here are also deterministic, the interaction process between the agent and the environment is a finite MDP, and the purpose of the agent is to interact with the environment E to select a series of actions to maximize future rewards, that is Gt =
T t=t0
γ t−t0 rt ,
(3)
where the termination time step. We define the optimal state-action function T represents Q∗ s , a , meaning that if we choose action a in state s will get the maximum reward, that is Q∗ s , a = max E Gt |st = s , at = a , μθ (at |st ) (4) μ
It is easy to know that the action-value function obeys the Bellman equation: Q∗ (s, a) = Es ε r + γ Q∗ s , a |s, a
(5)
If the optimal value function Q∗ s , a of the next time step sequence s is known for all actions a , the optimal strategy is to choose a to maximize the expected value ∗ r + γ Q s , a , we want to act in a given state, then we can easily construct a policy that maximizes our reward: μ∗ (s) = arg max Q∗ (s, a) a
(6)
However, we don’t know everything about the world, so we don’t have access to Q∗ . We use Qtarget (s, a) = rt + γ max Qt s , a |s, a (7) a
come closer to Q∗ , Among them, the expression on the right side indicates that under the state s , our strategy for selecting an action is an offline strategy, and the action with the maximum reward is selected. The update rule for our training is: Qi+1 (s, a) = Qi (s, a) + α Qtarget (s, a) − Qi (s, a) (8) To maximize the cumulative reward, our goal is to continuously update Qtarget (s, a) to make Qtarget (s, a) close to the real action-value function Q∗ .
Depth Control of a Biomimetic Manta Robot
65
3.3 Algorithm Implementation 3.3.1 Discrete State Space and Action Space The design state space S = {s1 , s2 , · · · , st }, where st = ( zt , θt , wt , qt ), z is the interpolation between the target depth and the current depth, and the calculation formula of z is: zt = zt − zref
(9)
among them, zt represents the current depth of the sensor feedback, zref presents the reference target depth; θ is the pitch angle of the robotic fish; w and q are the descending speed and the pitch angle speed of the control object, respectively. The calculation formula is: w=
zt − zt−1 Ts
(10)
Among them, zt−1 represents the depth position of the control object at the last moment, and Ts is the sampling time. qt =
θt − θt−1 Ts
(11)
Create an action space A = {a1 , a2 , · · · , an }, where a represents the elements of the action space, and n represents n actions that the control object can take. The reward function space R = {r1 , r2 , · · · , rt , · · ·} is built, where the calculation formula of rt is as follows: 2 rt (st , at ) = c1 1 − zt /zref + c2 (1 − 2θt /π )2 + c3 (1 − wt /wmax )2 + (12) c4 (1 − qt /qmax )2 among them, c1 ∼ c4 represent the reward weight coefficient; wmax denotes the maximum speed of descent; qmax represents the maximum rate of change of the pitch angle. 3.3.2 Experience Replay Mechanism The lack of training datasets is one of the great challenges of reinforcement learning implementation in robotics. For the training datasets are time-consuming to obtain, we will use the experience replay memory to train our state-action value function. It stores the observed state of the agent, actions performed, rewards received and state transitions, allowing us to leverage this data multiple times in the future. This greatly improves the efficiency of sample utilization and solves the problem of small sample data. Secondly, to solve the problem of insufficient sample data, we not only collect a part of the datasets ourselves, but also use the datasets from previous experiments, which can also be used as training data after processing, this method greatly saves the time of test sampling.
66
D. Zhang et al.
3.3.3 Train The hardware environment we used for training is Intel-i7-10700F, and the software environment is python. The parameters used during training are set as: discount factor γ = 0.9, learning rate: LR = 0.9, the number of training episode: EPISODE = 10000, the number of steps for each episode loop: STEP = 10000, random sampling number batch size: BATCHSIZE = 1. The MDP model parameter settings: the number of discrete state spaces: 41, the number of action state spaces: 41, the reward function parameters setting: c1 = 1, c2 = c3 = c4 = 0. The depth change curve of the training data is present in Fig. 6.
Fig. 6. The training datasets
The specific solid process of the algorithm is shown in Table 1: it is divided into 4 parts. First, we prepare the data set, second, we process the raw data and calculate the rewards before stored in the memory pool, third, we train the state-action function using the training data sets before we test the results in the prototype.
Depth Control of a Biomimetic Manta Robot
67
Table 1. The steps of the depth control algorithm 1. Training data collection: 1) Set prototype parameters 2) Collect data through experiments and store to the storing card 3) Preprocessing of raw data and store them as data0 2. Process the data in data0 and store them in D 1) Assert the state st and st+1 in data0 2) Assert the action at in data0 3) Calculate the reward of each state-action rt 4) Store st , at , rt , st+1 in the memory pool D 3. Train 1) Initialize the parameters of the training: Q(st , at ) = 0, γ = 0.9, α = 0.1 2) Set the max episode num and circulate the next steps: 3) Random samples batch data from D 4) Calculate the Qtarget (s, a) according to the Eq. (7) 5) Update the value of Q(st , at ) according to the Eq. (8) 4. Deployment 1) Program deployment in the MCU 2) Test the algorithm through experiment
4 Experiment The algorithm for the depth control problem has been realized, to verify the effectiveness of the algorithm, it is necessary to test the manta-like vehicle in the pool. The test environment shown in Fig. 7 below is a pool with the length, width, and height of 7 m * 3 m * 18 respectively. The key parameters of the robot are shown in Table 2, the spread length and the chord length of the robot are 0.9 m and 0.6 m, and the max thickness is 0.12 m. The key experiment parameter settings are as follows: the pectoral fins’ amplitudes are set as amplitude 50°, the flapping frequency is 0.4 Hz, and the phase difference is −40° between the different fin rays. We set the same parameters both in training experiments and test experiments. The console sends the task command to the prototype through the radio, the prototype receives the command and calculates the desired depth and the current depth fed back by the pressure sensor. The robot asserts its current state and chooses the optimized action of the tail angle. By adjusting the command, the robotic fish changes its pitch angle by adjusting the angle of the tail fin, thereby achieving the purpose of controlling the depth of the robotic fish. There are two main fixed depth tests. The first test sets the target depth to 0.55 m, and the second sets the target depth to 0.35 m. The test results are shown in Fig. 8.
68
D. Zhang et al.
Fig. 7. The experiment pool and robot fish
Table 2. Parameters of the robot fish Terms
Parameter value
Unit
Spread length
0.9
m
Chord length
0.6
m
Maximum Height
0.12
m
Weight
8.6
kg
Fig. 8. The depth control experiment results
5 Conclusion To solve the depth control problem of the manta-like robot without a mathematical model, we model the depth control problem as a discrete action space and discrete state space MDP model in the article. Based on the Q algorithm, we design the reward function of the deep control problem and train the value function of the state action through the training
Depth Control of a Biomimetic Manta Robot
69
data set, then we get the deep control law according to the training results. Finally, the effectiveness of the proposed method is verified by a pool depth control experiment. The experiments’ result shows that the proposed reinforcement learning algorithm has better performance in the depth control. In future work, we will use the same method to the heading control of the manta-like robot and realize the three-dimensions control.
References 1. González-García, J., Gómez-Espinosa, A., Cuan-Urquizo, E., García-Valdovinos, L.G., Salgado-Jiménez, T., Escobedo Cabello, J.A.: Autonomous underwater vehicles: localization, navigation, and communication for collaborative missions. Appl. Sci. 10, 1256 (2020) 2. Wynn, R.B., et al.: Autonomous Underwater Vehicles (AUVs): their past, present and future contributions to the advancement of marine geoscience. Mar. Geol. 352, 451–468 (2014) 3. Scaradozzi, D., Palmieri, G., Costa, D., Pinelli, A.: BCF swimming locomotion for autonomous underwater robots: a review and a novel solution to improve control and efficiency. Ocean Eng. 130, 437–453 (2017) 4. Ahmed, F., et al.: Decade of bio-inspired soft robots: a review. Smart Mater. Struct. 31, 073002 (2022) 5. Chemori, A., Kuusmik, K., Salumae, T., Kruusmaa, M.: Depth control of the biomimetic U-CAT turtle-like AUV with experiments in real operating conditions. In: 2016 IEEE International Conference on Robotics and Automation (ICRA). pp. 4750–4755. IEEE, Stockholm, Sweden (2016) 6. Yao, F., Yang, C., Liu, X., Zhang, M.: Experimental evaluation on depth control using improved model predictive control for autonomous underwater vehicle (AUVs). Sensors 18, 2321 (2018) 7. Tran, H.N., Nhut Pham, T.N., Choi, S.H.: Robust depth control of a hybrid autonomous underwater vehicle with propeller torque’s effect and model uncertainty. Ocean Eng. 220, 108257 (2021) 8. Duraisamy, P., Kumar Sidharthan, R., Nagarajan Santhanakrishnan, M.: Design, modeling, and control of biomimetic fish robot: a review. J. Bionic Eng. 16(6), 967–993 (2019). https:// doi.org/10.1007/s42235-019-0111-7 9. Chen, L., Qiao, T., Bi, S., Ren, X., Cai, Y.: Modeling and simulation research on soft pectoral fin of a bionic robot fish inspired by manta ray. Jixie Gongcheng Xuebao. 56, 182–190 (2020) 10. Cao, Z., Shen, F., Zhou, C., Gu, N., Nahavandi, S., Xu, D.: Heading control for a robotic dolphin based on a self-tuning fuzzy strategy. Int. J. Adv. Rob. Syst. 13, 28 (2016) 11. Cao, Y., Xie, Y., He, Y., Pan, G., Huang, Q., Cao, Y.: Bioinspired central pattern generator and T-S fuzzy neural network-based control of a robotic manta for depth and heading tracking. JMSE 10, 758 (2022) 12. Mnih, V., et al.: Playing Atari with Deep Reinforcement Learning, http://arxiv.org/abs/1312. 5602 (2013) 13. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529– 533 (2015) 14. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016) 15. Wu, H., Song, S., You, K., Wu, C.: Depth control of model-free AUVs via reinforcement learning. IEEE Trans. Syst., Man, and Cybern.: Syst. 49, 2499–2510 (2019)
Inchworm-Gecko Inspired Robot with Adhesion State Detection and CPG Control Zijian Zhang1 , Zhongyi Chu1(B) , Bolun Zhang2 , and Jing Cui2 1 School of Instrumental Science and Opto-Electronics Engineering, Beihang University,
Beijing 100191, China [email protected] 2 School of Mechanical Engineering and Applied Electronics, Beijing University of Technology, Beijing 101100, China
Abstract. Although many climbing robots have been developed, working stably on complex surface in narrow space still remains a challenge. In this work, we present a miniature inchworm-gecko inspired robot (IGIR) which can achieve climbing and sensing adhesion state on different kinds of surface including flat and porous surfaces. Inspired by the inchworm’s locomotion and gecko’s adhesion mechanism, we conduct the overall system design of IGIR by using servos and connecting rods to form a motion module, the negative pressure driven gecko adhesion feet as the adhesive module, and the micro three-axis force sensor as the sensing module. Facing the problem of stable adhesion perception of climbing robots, the adhesion state detection of IGIR is realized through the sensing module and detection strategy. A novel CPG control model is established to realize the periodic gait of IGIR, and the adaptive adjustment is completed by combining the adhesion state detection information. Finally, a prototype of IGIR is made. Experiments of the prototype verify the feasibility of the proposed adhesion state detection and CPG control model of IGIR. The maximum moving speed of IGIR is 24 cm/min. Over all, this paper shows the combination of adhesion detection and gait planning for climbing robot, and provides a new way to the research of climbing robot stable locomotion. Keywords: Inchworm-gecko inspired robot · Adhesion state detection · CPG control model
1 Introduction Due to the characteristics of small size, light weight, load carrying and flexible locomotion, the miniature climbing robot has broad development prospects in the detection and maintenance of narrow spaces. The attachment ability and locomotion ability of the climbing robot are the key issues. At present, typical adhesion means include vacuum [1, 2], electromagnetic [3], hook and claw [4] and van der Waals force [5] adhesion, among which van der Waals force adhesion has no special requirements for target surface materials, and can be better adapted to porous or non-ferromagnetic surfaces. ACROBOT [6] has used it to check the equipment of the International Space Station. Its foot end uses © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 70–84, 2023. https://doi.org/10.1007/978-981-99-0617-8_6
Inchworm-Gecko Inspired Robot
71
two opposite directional adhesive pads, which have the climbing ability of slopes and vertical planes. However, the adhesive drive mechanism is relatively large. In terms of locomotion mode, the climbing robots usually adopt wheel type, crawler type and foot type [7, 8]. Among them, the robot with foot structure has good obstacle surmounting ability and strong adaptability to terrain [9]. Furthermore, the stable locomotion of the climbing robot depends on the accurate perception of the attachment state [5, 10, 11] and the correction scheme which are given according to the perception results. To detect the adhesion state of the climbing robot based on van der Waals force, researchers have supplied some solutions. Common methods include adhesion force and adhesion area measurement. Example includes the ATI multi-axis force sensor on precision mechanics test platform for measuring the maximum adhesion force[12] and frustrated total internal reflection (FTIR) [13] for evaluating the area of contact concretely. Despite their well performance, including accurate force and intuitive area measurement, they are inadequate for sensing the adhesion state during the climbing robot working on the wall due to the large and complex measurement system. In this paper, we propose a novel inchworm-gecko inspired robot. IGIR combines the locomotion mode of inchworm and negative-pressure-driven gecko adhesion feet. The miniature body can work in the narrow space and cross obstacles flexibly. The driveadhesion integrated feet of IGIR is designed to adhere to the surface and the soft material of the feet can protect the target’s surface. In order to achieve stable adhesion, threeaxial force sensors are used for getting adhesion state in time which provide feedback for motion control. The gait control is implemented by a central pattern generator (CPG). The rest of the article is framed as follows: Sect. 2 describes the system components of IGIR and its locomotion processes, Sect. 3 explains the adhesion state detection method of IGIR, and Sect. 4 explains the CPG control model. Prototype test experiment is performed in Sect. 5. Conclusion is expounded in Sect. 6.
2 Design of Inchworm-Gecko Inspired Robot To solve the problem of non-destructive testing in narrow space of aeroengine, an inchworm-gecko inspired climbing robot is designed, which combines the flexible wall climbing and small space occupied characteristics of the inchworm structure and the adaptability to the surface with hole of the negative pressure driven gecko adhesion feet. 2.1 Mechanical Structure In nature, the inchworm (see Fig. 1(a)) only relies on the bending and stretching of the body to realize the attachment and locomotion on the leaves surface. The simple motion form and small size of the inchworm provides reference for the miniature climbing robot’s structure. The inchworm-inspired structure can be abstracted as a three degrees of freedom mechanism (see Fig. 1(b)), thus realizing the free motion of the robot in one direction. The main structure of IGIR is designed with inchworm type body and gecko inspired feet. As shown in Fig. 2, the three servos are used as rotating joints, and the front and rear ends are negative-pressure-driven gecko adhesion feet. Force sensors [14] are installed between the rotating joints and the feet for adhesion state detection.
72
Z. Zhang et al.
(a) Inchworm locomotion
(b) Inchworm-inspired structure
Fig. 1. Bionics Design of IGIR’s body
Fig. 2. Design drawing of IGIR
2.2 Negative-Pressure-driven Gecko Adhesion Foot The negative-pressure-driven gecko adhesion foot is composed of elastic membrane outside and solid granula inside (see Fig. 3) [15]. The annular wedge-shaped setae are integrated on the bottom of the foot. The schematic of attaching and detaching process of the foot is shown in Fig. 4. The five toes of gecko’s paw contract inwards when loaded. The annular wedge-shaped setae are used to simulate the omnidirectional bristles on the toes of gecko when walking. In the preload stage, the foot with low stiffness adapts to the surface. Then in the load stage, the foot is turned to high stiffness through negative pressure driving. The deforming zone contracts inwards which applies shear loading to the annular wedge-shaped setae and the adhesion is turned on [16]. In the release stage, the foot returns to low stiffness by removing the negative pressure [17]. Meanwhile, the setae are separated from the surface from outside to inside and the adhesion is turned off.
Fig. 3. Structure of negative-pressure-driven gecko adhesion feet
Inchworm-Gecko Inspired Robot
73
Fig. 4. Working principle of negative-pressure-driven gecko adhesion feet
2.3 Work Process of IGIR According to the characteristics of inchworm’s climbing motion, the IGIR’s work process can be divided into three stages (see Fig. 5): move, detection and adjustment, and task execution. Firstly, IGIR moves a step and becomes bending state. Then, the adhesion state of the hindfoot is detected. If the adhesion is successful, IGIR moves another step and returns to straight state. Otherwise, the preload or the drop position of the hindfoot will be adjusted in the second loading. IGIR executes the engine detection task in the straight state finally.
Fig. 5. Working process of IGIR
3 Adhesion State Detection Method In order to make IGIR move stably and efficiently in the actual work, it’s necessary to accurately perceive the adhesion state of the feet to provide feedback for the adjustment during motion control.
74
Z. Zhang et al.
3.1 Principle of Adhesion Detection The adhesive force is mainly composed of the van der Waals force (FvdW ) generated by the annular wedge-shaped setae loaded along the radial direction [18] and the negative pressure suction force (Fnps ) resulting from the increased volume of the deforming zone (see Fig. 6). These forces can be calculated as formula (1) and (2), 4 FvdW = N • t(4π ERγ 2 )1/3 3
(1)
Fnps = π r 2 (P0 − Ps )
(2)
where N is the number of the setae which contacted to the surface, R is the radius of the peel zone, t is the width of the peel zone, E and γ are related to the material, r is the radius of the deforming zone, P0 is atmospheric pressure, and Ps is the pressure in deforming zone. According to the formula above, the adhesion force is related to the N which controlled by preload, R and Ps which controlled by the negative pressure drive. For detection of adhesion state between the foot and the surface, a three-axial force sensor is used. When the load of the foot is done, a detection force which opposite to the adhesion force is applied to the foot through servos (see Fig. 7). The detection force, i.e., the output of the sensor, includes the adhesion force and the component of the foot’s gravity force along the z-axis of the sensor. Since the gravity is known, the adhesion force can be calculated as formula (3), Fade = FvdW + Fnps = Fdet − Fg
(3)
Combining the stable climbing conditions following, the adhesion state can be obtained.
(a) Process for switching adhesion on and off
(b) Adhesive force composition Fig. 6. Force analysis of the adhesion feet
Inchworm-Gecko Inspired Robot
75
Fig. 7. Force sensor between the joint and the foot
3.2 Stable Climbing Conditions The adhesion force provided by the foot which resist the external force and moment during IGIR stably moving is the basis for the adhesion state perception and adhesion force regulation. Because the motion process of IGIR is complex, the position where IGIR is prone to instability is selected for mechanical analysis. When IGIR moves in the horizontal plane and is in the straight state, the distance between the IGIR’s center of gravity and the adhesion foot is the longest, and there is the maximum gravity moment. The force and moment balance analysis for this state is as follows: Fy = Fn1 + Fn2 − Fa1 − Fa2 − G = 0 (4) Mz,B = (Fa1 − Fn1 )d1 − G l1 − d21 = 0 where G = mg is the weight of IGIR, Fni is the reaction force of the wall, Fai is the adhesion force of the foot, Mz, B is the component in z-axis of the moment about B, d1 is the distance between a pair of feet, and l1 is the distance between the center of gravity and the joint. According to the moment equilibrium, the adhesion force can be obtained as G l1 − d21 G l1 − d21 ≥ 0, Fa1 ≥ (5) Fn1 = Fa1 − d1 d1
76
Z. Zhang et al.
When IGIR moves in the inclined plane and is in the bending state, the distance between the IGIR’s center of gravity and the adhesion foot is the longest, and there is the maximum gravity moment. The force and moment balance analysis for this state is as follows: ⎧ ⎨ Fx = Fa1 + Fa2 + Gcosθ − Fn1 − Fn2 = 0 (6) F = f1 + f2 − Gsinθ = 0 ⎩ y Mz,B = Gsinθ lg1 + Gsinθ lg2 − (Fa1 − Fn1 )d1 = 0 where θ is the inclination angle of the wall, fi is the shear adhesion force of the feet, lg1 and lg2 are the distance from the center of gravity to the wall and the distance from the center of gravity to the adhesion foot respectively. According to the moment equilibrium, the adhesion force can be obtained as (Fig. 8) Fn1 = Fa1 −
Gsinθ lg1 + Gcosθ lg2 Gsinθ lg1 + Gcosθ lg2 ≥ 0, Fa1 ≥ d1 d1
(a) Horizontal plane
(7)
(b) Inclined plane
Fig. 8. Position prone to instability of IGIR
Through the above analysis, the stable adhesion force requirements of IGIR moving on different surfaces can be obtained as the standard of adhesion state detection.
4 CPG-Based Gait Generation For the periodic motion of IGIR, a novel CPG-based gait generation is designed. The Hopf oscillator is applied in the CPG control, which can control the step frequency and step distance of IGIR by adjusting parameters of the oscillator. 4.1 CPG Model Hopf oscillator is chosen as the neuron model of CPG network because of its fast rate of convergence and robustness for disturbances [19]. The CPG network adopts a chain
Inchworm-Gecko Inspired Robot
77
structure model, and each oscillator controls the foot loading and joint angle of IGIR respectively, as shown in Fig. 9. The Hopf oscillator model with coupling term can be defined as the following equations: ⎧ n
dx 2 x − ωy + ⎪ = α μ − r wij cosφij xj − sinφij yj − xi ⎪ ⎨ dt j
n
⎪ dy ⎪ ⎩ dt = α μ − r 2 y + ωx + wij sinφij xj + cosφij yj − yi
(8)
j
where x, y represents the state vector of the equations, α donates a constant of the speed of convergence, μ is the amplitude of the steady state oscillation, r equals x2 + y2 , ω is the frequency of the oscillator, which determine the step frequency of IGIR, wij and ϕij represent the coupling weight and phase difference between the ith neuron and the jth neuron separately.
Fig. 9. Chain CPG network structure of IGIR
The coupling terms of the CPG network keep a fixed phase relationship between the oscillators. In order to achieve the inchworm gait, the initial phase of each oscillator (ϕ1 ϕ2 ϕ3 ϕ4 ϕ5 ) is set as (0 0 π 2π π ). The other parameters (α μ ω w) are set as (50 1 2π 10). The waveform diagram of oscillator is shown as Fig. 10(c). Then, the origin outputs of CPG network are processed to be the control signal of IGIR’s feet and joints (see Fig. 10(d)). The process is as follows: ⎧
1 xOSC1 ≥ 0 means adhesion ⎪ ⎪ Foot1 = ⎪ ⎪ ⎪ 0 xOSC1 < 0 means release ⎪ ⎪ ⎪ ⎪ Joint1 = 30 × |xOSC2 |◦ ⎨ (9) Joint2 = −60 × |xOSC3 |◦ ⎪ ⎪ ◦ ⎪ Joint3 = 30 × |xOSC4 | ⎪
⎪ ⎪ ⎪ 1 xOSC5 ≥ 0 means adhesion ⎪ ⎪ Foot2 = ⎩ 0 xOSC5 < 0 means release In order to reduce friction between the feet of IGIR and the surface during locomotion, the adjustment angle is added on the basis of the joint angle currently. When IGIR moves
78
Z. Zhang et al.
(a) Control variable of IGIR
(b) The process of moving one step
(c)Waveform diagrams from OSC1 to OSC5 (d) Control signal of IGIR’s feet and joints Fig. 10. Simulation of CPG model of IGIR
from straight state to bending state, joint1 gets an angle increment, so that the foot2 can leave the surface to complete translation. When IGIR moves from bending state to straight state, joint3 gets an angle increment, and the foot1 can leave the surface to complete translation (see Fig. 11(b)). The frequency of the adjustment angle is twice the original frequency. A Hopf oscillator (OSC6) with an initial phase of π/2 is created and multiplied by the OSC1 to obtain the adjustment angle signal by using a double-angle formula. In this way, the phase of the adjustment angle is locked to the phase of joint angle.
Inchworm-Gecko Inspired Robot
79
(a) Adjustment angle added on the joint angle
(b) Optimization results of the gait Fig. 11. Optimization for gait planning of IGIR
4.2 CPG Control with Adhesion State Feedback The chain CPG model constructed in Sect. 4.1 provides a method for generating the basic gait of IGIR. Further, in order to make IGIR have an ability to adapt to the environment and ensure the stability of adhesion during locomotion, it’s necessary to introduce the perception of the adhesion state in Sect. 3 as feedback to adjust the CPG and foot loading parameters in real time. The control structure diagram is shown in Fig. 12. Taking advantages of the force sensors on IGIR, the joints force data are gathered in each step. The force data are converted into foot adhesion state information by comparing the adhesion force and stable climbing conditions. Once the adhesion state is not stable, the CPG parameters are adjusted to make the feet reach stable adhesion.
Fig. 12. Feedback control structure of IGIR
80
Z. Zhang et al.
According to the Sect. 3.1, the adhesion state is related to the negative pressure and the contact state which is influenced by preload and contact angle between foot and surface. The preload, i.e., the force acting on the foot can be expressed as: Fp = Fs + Fi =
Iω M + L Lδt
(10)
where F s is the force obtained by the servo whose output torque is M, F i is the force introduced by the impulse acting on the foot, I is the moment of inertia of moving parts, ω is the joint angular velocity, L is the distance between the rotating joint and the foot without adhesion, δt is the impulse time. From (9), we can know that the preload is controlled by the servo’s output torque M and the joint angular velocity ω. The output power of the servo is proportional to the angle it needs to rotate, and the joint angular velocity is related to the frequency of the CPG network. When the adhesion state of IGIR’s foot is unstable, the target angle of the servo and the CPG network’s frequency can be increased to provide more torque and speed of revolution respectively to make the foot fully contacted with the surface. In addition, applying a higher negative pressure can also increase the adhesion force of the foot. Meanwhile, M and ω have upper bounds of adjustment because them are supplied by another adhesion foot. In summary, IGIR has the ability to detect and adjust the adhesion state during the locomotion based on force sensors and the CPG network (Fig. 13).
Fig. 13. Adhesion state control of IGIR’s foot
Inchworm-Gecko Inspired Robot
81
5 Experiment Result In this paper, an IGIR prototype is completed as shown in Fig. 14. The length and width of IGIR are 155 mm and 45 mm. In the performance tests of the prototype, it was able to achieve a forward speed of 24 cm/min and climb a slope of 25°.
Fig. 14. Inchworm-gecko inspired robot prototype
As shown in Fig. 15, through CPG network gait planning, IGIR can move stably on the plane (see Fig. 15(a)–(e)) and slope (see Fig. 15(f)).
Fig. 15. Locomotion experiment of IGIR with CPG network and adhesion detection
Then the feasibility of the adhesion state detection method is tested. First, the negative pressure loading is completed at the stage of changing two feet’s adhesion state (see Fig. 15(c)). Then, carry out the adhesion state detection (description in Sect. 3.1). After reaching the threshold value of the minimum adhesion force slightly higher than the current stable climbing condition, it is considered that the adhesion is successful and subsequent operations are carried out. The output curves of z-axis force of the sensor for four times of adhesion state detection are shown in Fig. 16. Stage I: Hind foot is without adhesion; Stage II: Hind foot lifting and translation; Stage III: Hind foot touches the wall and carries out negative pressure drive; Stage IV: Adhesion state detection with “pull-up” strategy of hind foot; Stage V: Hind foot is in a static adhesion state; Stage VI: Front foot lifting and translation.
82
Z. Zhang et al.
Fig. 16. Z-axis force of the sensor in adhesion state detection experiments Table 1. Analysis of adhesion state detection Experiment No
1
2
3
4
Gravity of the foot/N
−0.70
−1.19
−1.28
−1.14
Adhesion force/N
−0.69
−1.27
−1.31
−1.41
Adhesion state
Failure
Success
Success
Success
Based on formula (4), according to the real prototype, l1 is equal to 45 mm, d1 is equal to 36 mm and G is equal to 1.5 N. The value of F a1 is calculated as 1.2 N. According to the analysis in Table 1, the adhesion state of foot in Experiment 1 is failure. Based on the feedback method described in 4.2, the joint angle θ1 and θ3 increase by 3°and negative pressure drive prolongs by 0.3 s in order to increase adhesion force. Then the maximum tensile (z-axis) forces of experiment 2–4 are greater than the stable climbing condition F a1 , which are judged as the successful adhesion state. In the subsequent actual locomotion, the z-axis forces are stable at 2 N and above, and the feet detachment do not occur, which proves that the gait execution is successful.
6 Conclusion According to the characteristics of gecko adhesion and inchworm mode locomotion, this paper realizes the IGIR prototype with CPG control model and the underlying control
Inchworm-Gecko Inspired Robot
83
system of IGIR through STM32. Through static analysis, the stable climbing conditions of IGIR are obtained. On this basis, combined with three-axis force sensor, the adhesion state detection of IGIR is realized. In the terms of motion control, a novel CPG network is proposed which is suitable for inchworm locomotion gait. With the information of adhesion state feedback, the frequency and amplitude of CPG network are adjusted to adapt the environment and avoid the failure of adhesion. The experiment results show that IGIR is capable of planar and slope motion and can detect the adhesion state in real time. Therefore, it provides a new robot platform for narrow spaces detection such as aero-engines, and provides a reference for wall climbing robots which need attachment detection. Acknowledgment. This project was supported in part by NSFC under Grant 51975021 and Grant U1913206, and in part by the Ministry of Science and Technology of China under Grant 2018AAA0102900, the New Generation of Artificial Intelligence Technology Innovation 2030 Major Project.
References 1. Zhang, Y., Ge, L., Zou, J., Xu, H., Gu, G.: A multimodal soft crawling-climbing robot with the controllable horizontal plane to slope transition. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3343–3348. IEEE (2019) 2. Zhang, Y., Yang, D., Yan, P., Zhou, P., Zou, J., Gu, G.: Inchworm inspired multimodal soft robots with crawling, climbing, and transitioning locomotion. IEEE Trans. Rob. 38(3), 1806– 1819 (2022) 3. Khan, M.B., et al.: icrawl: an inchworm-inspired crawling robot. IEEE Access 8, 200655– 200668 (2020) 4. Lam, T.L., Xu, Y.: Climbing strategy for a flexible tree climbing robot—treebot. IEEE Trans. Rob. 27(6), 1107–1117 (2011) 5. Kim, S., Spenko, M., Trujillo, S., Heyneman, B., Santos, D., Cutkosky, M.R.: Smooth vertical surface climbing with directional adhesion. IEEE Trans. Rob. 24(1), 65–74 (2008) 6. Kalouche, S., Wiltsie, N., Su, H. J., Parness, A.: Inchworm style gecko adhesive climbing robot. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2319–2324. IEEE (2014) 7. Leon-Rodriguez, H., Hussain, S., Sattar, T. A compact wall-climbing and surface adaptation robot for non-destructive testing. In: 2012 12th International Conference on Control, Automation and Systems, pp. 404–409. IEEE (2012) 8. Menon, C., Murphy, M., Sitti, M.: Gecko inspired surface climbing robots. In 2004 IEEE International Conference on Robotics and Biomimetics, pp. 431–436. IEEE (2004) 9. Guan, Y., et al.: A modular biped wall-climbing robot with high mobility and manipulating function. IEEE/ASME Trans. Mechatron. 18(6), 1787–1798 (2012) 10. Xiao, S., Man-Tian, L.I., Rong-Xi, L.I., Sun, L.N.: Sensing system on foot in micro biped climbing robot. Transducer Microsyst. Technol. 26(12), 117–120 (2007) 11. Zhu, H., Guan, Y., Wu, W., Zhang, L., Zhou, X., Zhang, H.: Autonomous pose detection and alignment of suction modules of a biped wall-climbing robot. IEEE/ASME Trans. Mechatron. 20(2), 653–662 (2014) 12. Soto, D.R.: Force Space Studies of Elastomeric Anisotropic Fibrillar Adhesives. Stanford University (2010)
84
Z. Zhang et al.
13. Hawkes, E.W., Jiang, H., Cutkosky, M.R.: Three-dimensional dynamic surface grasping with dry adhesion. The Int. J. Robot. Res. 35(8), 943–958 (2016) 14. Cui, J., Feng, K., Wang, Y., Chu, Z., Hu, Z.: A tactile sensor and its preparation method and a force and/of moment measuring device. China Patent No. CN113607307A. China National Intellectual Property Administration (2021) 15. Cui, J., Li, M., Chu, Z., Wang, J., Liu, H.: A variable stiffness end effecter based on the microwedge adhesive. China Patent No. CN112864078A. China National Intellectual Property Administration (2021) 16. Zhou, T., Ruan, B., Che, J., Li, H., Chen, X., Jiang, Z.: Gecko-inspired biomimetic surfaces with annular wedge structures fabricated by ultraprecision machining and replica molding. ACS Omega 6(10), 6757–6765 (2021) 17. Li, L., Liu, Z., Zhou, M., Li, X., Meng, Y., Tian, Y.: Flexible adhesion control by modulating backing stiffness based on jamming of granular materials. Smart Mater. Struct. 28(11), 115023 (2019) 18. Chu, Z., Wang, C., Hai, X., Deng, J., Cui, J., Sun, L.: Analysis and measurement of adhesive behavior for gecko-inspired synthetic microwedge structure. Adv. Mater. Interfaces 6(12), 1900283 (2019) 19. Li, G., Li, W., Zhang, H., Zhang, J.: Integration of sensory feedback into CPG model for locomotion control of caterpillar-like robot. In: 2015 IEEE International Conference on Industrial Technology (ICIT), pp. 303–308. IEEE (2015)
Coherence Matrix Based Early Infantile Epileptic Encephalopathy Analysis with ResNet Yaohui Chen1 , Xiaonan Cui1 , Runze Zheng1 , Yuanmeng Feng1 , Tiejia Jiang3 , Feng Gao3 , Danping Wang1,4 , and Jiuwen Cao1,2(B) 1
Machine Learning and I-Health International Cooperation Base of Zhejiang Province, Hangzhou Dianzi University, Hangzhou, China {xncui,runzewuyu,jwcao}@hdu.edu.cn 2 Research Center for Intelligent Sensing, Zhejiang Lab, Hangzhou, China 3 Department of Neurology, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou 310003, China {jiangyouze,epilepsy}@zju.edu.cn 4 Plateforme d’Etude de la Sensorimotricit´e (PES), BioMedTech Facilities, Universit´e Paris Cit´e, 75270 Paris, France [email protected] Abstract. EIEE syndrome, known as early infantile epileptic encephalopathy, is considered to be the earliest onset form of agedependent epileptic encephalopathy. The main manifestations are tonicspasmodic seizures in early infancy, accompanied by burst suppressive electroencephalogram (EEG) patterns and severe psychomotor disturbances, with structural brain lesions in some cases. Specific to EIEE syndrome, this paper presents a comprehensive analysis of EEG features at three different periods: pre-seizure, seizure and post-seizure. Coherent features are extracted to characterize EEG signals in EIEE syndrome, and Kruskal-Wallis H Test and Gradient-weighted Class Activation Mapping (Grad-CAM) are used to investigate and visualize the significance of features in different frequency band for distinguishing the three stages. The study found that activity synchrony between temporal and central regions decreased significantly in the γ band during seizures. And the coherence feature in the γ band combined with the ResNet18-based seizure detection model achieved an accuracy of 91.86%. It is believed that changes in the γ band can be considered as a biomarker of seizure cycle changes in EIEE syndrome.
Keywords: EIEE syndrome Grad-CAM
· Seizure detection · Coherence ·
This work was supported by the National Natural Science Foundation of China (U1909209), the National Key Research and Development Program of China (2021YFE0100100, 2021YFE0205400), the Natural Science Key Foundation of Zhejiang Province (LZ22F030002), and the Research Funding of Education of Zhejiang Province (GK228810299201). c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 85–101, 2023. https://doi.org/10.1007/978-981-99-0617-8_7
86
1
Y. Chen et al.
Introduction
EIEE syndrome, also known as Ohtahara syndrome, usually occurs in the first few months of life and is characterized by frequent tonic spasms with a suppression-burst (S-B) pattern in electroencephalography (EEG) that appears continuously in both waking and sleeping states [1]. The S-B pattern is characterized by bursts of 1–3 s duration alternating with suppression phase of 2–5 s, where bursts consist of 150–350 microvolt high-voltage slow waves and multifocal spikes, and the burst-burst interval is typically 5–10 s [2]. Some cases of this syndrome have structural brain lesions, it is speculated that some asymmetry in S-B reflects underlying cortical lesions [3]. The onset of the disease is difficult to control and the prognosis is poor, survivors often develop to West and Lennox-Gastaut syndrome. EEG recording the electrical activity of the cerebral cortex is the most effective method for clinical diagnosis of epilepsy. EEG-based early warning of seizures has been extensively studied, and many reliable algorithms have demonstrated good performance in seizure prediction. Rukhsar et al. [4] extracted temporal features of EEG to monitor preictal states in epilepsy patients by using multivariate statistical process control (MSPC). Bandarabadi et al. [5] proposed a relative bivariate method based on spectral power features for seizure prediction and developed a novel feature selection method to find the best discriminative features. Gadhoumi et al. [6] showed that preictal and interictal states were separable based on wavelet analysis of high-frequency EEG activity, and their discriminability varies with frequency band. Tzallas et al. [7–9] extracted features from the power spectral density (PSD) and then performed artificial neural network (ANN) to automatically detect seizures. George et al. [10] converted EEG signals into 2D images and proposed a seizure prediction system based on ResNet50 convolutional neural network. Zhang et al. [11] used the CNN model to identify the Pearson correlation matrix measuring the relationship between EEG channels during preictal and interictal periods. Yang et al. [12] proposed a dual self-attention residual network (RDANet) combined with spectrograms for seizure prediction. RDANet adaptively fuses local and global features of EEG signals through spectral attention module and channel attention module to capture the interdependence between channels. In [13–15], entropy, which describes the irregularity and complexity of EEG signals, is used for automatic seizure detection. Although fruitful results in seizure detection have been achieved in the past, there are few EEG analyses specific to the different stages of EIEE syndrome. The current understanding of EIEE syndrome is mostly limited to some pathological studies. To further explore the evolution of EEG signals in EIEE syndrome during seizures, this paper attempts to find out the relationship between EEG features and EIEE syndrome, and discusses the role of EEG changes in seizure detection. Figure 1 shows the main structure of the analysis method used for EIEE syndrome, there mainly includes EEG feature extraction, feature significance analysis based on statistical analysis, and classification model. Comparing with the previous studies, our main contributions include:
Coherence Matrix Based EIEE Analysis with ResNet
87
1) An analytical framework for seizure detection in EIEE syndrome is proposed. Combining with the coherence feature, the evolution mechanism of seizures is analyzed by statistical methods, and it is concluded that the high-frequency (γ) coherence feature could serve as a preictal biomarker. 2) The feature combinations of the coherence matrices in different frequency bands are selected to train the ResNet18-based EIEE syndrome seizure detection model. Gradient-weighted Class Activation Mapping (Grad-CAM) visualization is used to explain and validate which frequency band is more likely to be a biomarker of seizure cycle changes in EIEE syndrome. The optimized model can obtain 91.86% accuracy.
Fig. 1. The proposed EEG based EIEE syndrome analysis and seizure detection.
2
Database and EEG Preprocessing
This article analyzes the EEG signals of 7 children with EIEE syndrome (2 males and 5 females, with an average age less than 2 months), which are obtained by Children’s Hospital, Zhejiang University of Medicine with the consent of the patients’ legal guardians. The EEG recordings are based on the International 10–20 system with a sampling frequency of 1000 Hz, and 21 channels are selected for data analysis. In this paper, three periods of EEG are used in the experiment: seizure, pre-seizure (10 min before the seizure) and post-seizure (10 min after the seizure). When it is less than 10 min before and after the seizure, we take 5 min. Table 1 shows the gender, duration of each period and the number of seizures for each patient. In particular, the EEG of each state is segmented into frames with the length of 4 s and the overlap rate of 50%. Finally, the number of pre-seizure, seizure and post-seizure samples used for analysis are 3886, 1068, 3886. Since
88
Y. Chen et al.
the EEG is random signal with strong background noise, it is greatly interfered by the power frequency during the acquisition process. For EEG preprocessing, 50 Hz notch filter is applied to eliminate the power frequency interference, and then a 1–70 Hz band-pass filter is used to remove ultra-low and high frequency interferences. Table 1. CHZU EIEE syndrome database Patient Gender Pre (s) Seizures (s) Post (s) Seizure numbers
3
P1
F
1200
18
1200
2
P2
M
300
380
300
1
P3
F
600
111
600
1
P4
F
3000
1156
3000
5
P5
M
1200
146
1200
2
P6
F
600
30
600
1
P7
F
900
335
900
2
Methods
We extract coherence features from the filtered EEG signals and perform statistical analysis. Finally, the features are used as the input of ResNet18 model to classify the three periods. 3.1
Feature Extraction
For the coherence feature, we first divide EEG signals into 5 frequency bands commonly used in clinical medicine: Delta (30 Hz). The coherence feature can evaluate the linear relationship between specific frequencies of two signals [16]. From reference [17], the coherence is calculated as follows: 2 |Gxy (f )| (1) Cxy (f ) = Gxx (f )Gyy (f ) where Gxy (f ) is the cross-spectral density of the signals X and Y . Gxx (f ) and Gyy (f ) are the self-spectral density of these signals. Coherence ranges from 0 to 1, two signals are synchronized in time series if coherence is 1. Figure 2 shows the average coherence matrices of different sub-bands in three periods of EIEE syndrome. It can be seen that 21 electrodes have the highest similarity in the delta band, but the number of similar channels gradually decreases as the frequency band increases. The regular is evident in the β and γ band that the connection strength between these leads with the approaching onset showed a trend of first weakening and then enhancing, while the θ and α band is the opposite and the δ band did not change. Besides, low coherence between Cz electrode and other electrodes can be found.
Coherence Matrix Based EIEE Analysis with ResNet
89
Fig. 2. Average coherence matrices of different sub-bands in 3 periods of EIEE syndrome. The redder the dot, the higher the coherence. 21 electrodes (From left to right and top to bottom are Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6, A1, A2, Fz, Cz and Pz) are selected for data analysis.
Much accumulating evidences suggest that neuropsychiatric disorders are closely related to abnormal topological changes in brain networks [18]. To observe the abnormal topological changes in brain networks more clearly, the corresponding results are presented in the form of binarized coherence matrices in Fig. 3. It can be found that: 1) in different frequency band, Fp1, Fp2, F3, F4, C3, C4, P3, P4 electrodes show synchronization variability with other electrodes, suggesting altered synchronization of activity between frontal, central regions and other regions. It is worth noting that these changes are not obvious in the δ and θ band of pre-seizure and post-seizure, but it is obvious in α, β and γ band. 2) The variability of the connections between T3, T4, T5, T6 electrodes and other electrodes appears in β and γ band, indicating changes in temporal brain activity in these two bands.
90
Y. Chen et al.
Fig. 3. Brain network connectivity patterns of 5 rhythms in 3 different states are presented as mean coherence matrices. Finally, 1.2 times of the average value is used as a threshold for filtering. The blue color represents the connection strength of two leads that did not pass the threshold setting, while the yellow color represents the connection that pass the threshold setting. The areas marked in red are the connections that show differences compared to each state data. (Color figure online)
Coherence Matrix Based EIEE Analysis with ResNet
3.2
91
Statistical Analysis
To further study the difference of the connection matrix in different periods, we perform a statistical analysis of coherence between electrodes. Specifically, Kruskal-Wallis H Test is used to obtain the corresponding p-value at each pixel of the coherence matrix, and then the p-value matrix is constructed. Kruskal-Wallis H Test is a nonparametric test that does not require to know the distribution and population parameters of the original data. In detail, the rank of each data is calculated in ascending order. The statistics H is then computed to perform a chi-square test with the given significance level and degrees of freedom. And H is defined as: R2 12 i − 3(N + 1) N (N + 1) i=1 ni m
H=
N=
m
ni
(2)
(3)
i=1
where m is the number of simple random samples, Ri is the rank sum of each group, and ni is the number of samples in each group. In addition, a total of 15 p-value matrices are derived for EIEE analysis. For each seizure, a p-value matrix on the difference among pre-seizure/seizure/postseizure is calculated, where an amount of 14 p-value matrices on 14 seizure onsets will be obtained. For all seizures of all subjects, an additional p-value matrix on the difference among pre-seizure/seizure/post-seizure is also calculated. To quantify the significant different points among pre-seizure/seizure/post-seizure periods, the probability of the p-value less than 0.001 is further counted in these p-value matrices. Algorithm 1 summarizes the calculation process of the probability matrix. Finally, we visualize the probability matrix as shown in Fig. 4. The redder the dots in the probability matrix, the greater the differences between the three periods. Coherence value distribution map corresponding to the point with a value of 1 in the probability matrix is shown in Fig. 5. It can be seen that: 1) the differences between the three periods is larger as the frequency band increases. 2) In the γ band, the coherence between T5 and O1 electrode, T5 and Pz electrode is quite different, which indicates that altered synchronization of activity between temporal and central regions, temporal and occipital regions. Toda et al. [19] suggested that pathological high-frequency activity is related to the pathophysiology of epileptic encephalopathy in early infancy, including EIEE syndrome. It has been speculated that pathological high frequency production that is continuously active during seizures may be detrimental to cognitive function in the developing brain. From this we conclude that the coherence feature of the γ band is more favorable for classifying the three periods and may provide some hints to the pathophysiology of EIEE syndrome.
92
Y. Chen et al.
Fig. 4. Probability matrix.
Coherence Matrix Based EIEE Analysis with ResNet
93
Fig. 5. Coherence value distribution map corresponding to the point with a value of 1 in the probability matrix. The probability matrix for α band has a point with the value 1, while 4 points in β band and 10 points in γ band.
94
Y. Chen et al.
Algorithm 1. Probability Matrix Algorithm Input: Input p-value matrix X k ∈ R21×21 , and the number is K. Output: Probability Matrix A 1: Aij is the element at row i and column j in A, Aij is calculated as follows: 2: Initialize n = 0 3: while k ≤ K do k ≤ 0.001 then 4: if Xij 5: n=n+1 6: end if 7: end while 8: Calculate Aij : Aij = n/K 9: return A
3.3
Classification
In this section, the extracted feature matrix is input into the ResNet18 network for training, and it is verified that the coherence feature classification performance of the γ band is the best. Compared with traditional machine learning methods, deep networks have good learning ability in complex data and have been widely studied in epilepsy analysis [20–27]. He et al. [28] proposed ResNet to ease the training of deep neural networks. The proposed residual learning framework allivates the problem of vanishing gradients, and facilitates information flow across layers. Efficiency and lightweight ResNet18 is adopted as classifier. The coherence matrix of each frequency band is separately fed into the ResNet18 network for training. The dimension of the coherence matrix is 21∗21, while the input size of the ResNet18 is 224∗224. To adapt the change of input feature dimension, we remove the pooling layer in the ResNet18 model and adjust the number of convolution kernels, as shown in Fig. 6. This model is mainly composed of three parts: 4 residual modules, 1 convolutional and 1 fully connected layer, in which each residual module is constructed by 4 convolutional layers, and the number of convolutional kernels of the 4 residual modules is 32, 64, 128, and 256 respectively.
Fig. 6. The adjusted ResNet18 model structure.
Coherence Matrix Based EIEE Analysis with ResNet
95
In addition, we train the Resnet18 model by concatenating 5 features (in the order δ − γ from left to right), then we use Grad-CAM [29] to explain which features the network is more sensitive to. Grad-CAM uses gradient information from the last convolutional layer of a neural network to understand the importance of each neuron for the decision of interest. Specifically, for the feature map Ak ∈ Ru×v generated by the last convolutional layer, k is the number of feature maps, this algorithm is to let the score y c of category c be derived from Ak , these gradients then through the global average pooling to obtain the neuron importance weight αkc , the calculation formula is as follows: global average pooling
αkc
=
1 Z i j
∂y c ∂Akij
(4)
gradients via backprop
where αkc captures the importance of feature map k for a target class c. Then a weighted combination of forward activation maps is conducted followed it by a ReLU to obtain heatmap: c c k αk A (5) LGrad-CAM = Re LU
k
linear combination
Applying ReLU layer to the linear combination of feature maps is to capture features that only have a positive impact on the target category, i.e. pixels whose strength increases can increase the score y c .
4
Results and Discussions
The coherence matrix is used for the deep residual network model learning. The ratio of training, testing and validation dataset is set to be 6:2:2. The learning rate is initialized to 0.001 and Adam is applied as the network optimizer. Network parameters are optimized with cross-entropy loss. 4.1
Model Prediction Analysis
Table 2 shows the classification accuracy corresponding to these features, it can be seen that the classification accuracy improves as the frequency band increases. Figure 7 shows the confusion matrix of the classification results for each band. Obviously, the coherence feature of the γ band can effectively distinguish the three periods, the β band is the second, while the remaining 3 bands are relatively not satisfactory. It can be concluded that the differences between the three periods is greater as the frequency band increases. The ROC curve of 3 categories based on γ band as shown in Fig. 8. The area under the curve (AUC) during
96
Y. Chen et al.
the seizure is the largest with a value of 0.99. The larger the AUC value, the more likely the algorithm is to rank positive samples ahead of negative samples, enabling better classification. In conclusion, there are indeed physiological differences in the pre-seizure, seizure and post-seizure periods of EIEE syndrome. It is feasible to use EEG signals to distinguish these three categories, and the ResNet18-based classification model performs well. Table 2. Classification performance using different features. Classifier Coherence δ θ
α
β
γ
Resnet18 78.59 78.84 79.33 85.37 90.21
In addition, we visualized the γ band feature vectors learned by the conv1 layer and residual blocks, as shown in Fig. 9. It can be found that some information in the coherence matrix is amplified to distinguish different periods after multi-layer convolution. The connections between temporal and central regions of the γ band show great variability. 4.2
Grad-CAM Analysis
For the input of the neural network, there is a reason why the neural network predicts it as category A. With the Grad-CAM, we can see which areas in the input have a large contribution to the prediction of category A. We train the classification model based on 5 connected features, this model can provide a classification performance of 91.86%, and then put each feature matrix into the trained model to get Grad-CAM. The most typical Grad-CAM map is shown in Fig. 10(a). The redder the color, the higher the value. Without loss of generality, Fig. 10(b) shows the average value of Grad-CAM in each frequency band for all samples. To better explore the sensitivity of the network to the corresponding features, only select the elements in the Grad-CAM map whose value is greater than 0.9Gmax during the calculation, Gmax is the largest value in the Grad-CAM map. The result shows that the model is sensitive to the coherence feature of the γ band, which is consistent with the results of statistical analysis, indicating that the coherence feature of the γ band has a better effect on distinguishing the three periods. This may provide a certain reference for the significant pathological changes of γ band during the seizures.
Coherence Matrix Based EIEE Analysis with ResNet
Fig. 7. Confusion matrix.
97
98
Y. Chen et al.
Fig. 8. ROC curve of 3 categories.
Fig. 9. Feature maps of conv1 layer and different residual blocks.
Coherence Matrix Based EIEE Analysis with ResNet
(a) Grad-CAM map
99
(b) Average of Grad-CAM in each frequency band
Fig. 10. Typical Grad-CAM map in different periods and average value of Grad-CAM in each frequency band (These values are normalized).
5
Conclusions
To characterize the EEG signals in different stages of EIEE syndrome and effectively detect seizures, a comprehensive analysis of coherence features in the preseizure/seizure/post-seizure of EIEE syndrome is performed, and ResNet18 is used to validate the analysis results. In addition, the significance of coherence features for seizure detection in EIEE syndrome is visually demonstrated by Kruskal-Wallis H Test and Grad-CAM. Experiments show that: 1) The ResNet18 model performs well for the seizure detection of EIEE syndrome, and obtains an accuracy rate of 91.86%. 2) The differences between the three periods is greater as the frequency band increases, and the coherence feature of the γ band can be considered as a biomarker of seizure cycle changes in EIEE syndrome. Ethical Standards. This study has been approved by the Second Affiliated Hospital of Zhejiang University and registered in Chinese Clinical Trail Registry (ChiCTR1900020726). All patients gave their informed consent prior to their inclusion in the study.
References 1. Epilepsy, A.: Proposal for revised classification of epilepsies and epileptic syndromes. In: The Treatment of Epilepsy: Principles & Practice, p. 354 (2006) 2. Ohtahara, S., Yamatogi, Y.: Epileptic encephalopathies in early infancy with suppression-burst. J. Clin. Neurophysiol. 20(6), 398–407 (2003) 3. Yamatogi, Y., Ohtahara, S.: Early-infantile epileptic encephalopathy with suppression-bursts, ohtahara syndrome; its overview referring to our 16 cases. Brain Develop. 24(1), 13–23 (2002)
100
Y. Chen et al.
4. Rukhsar, S., Khan, Y.U., Farooq, O., Sarfraz, M., Khan, A.T.: Patient-specific epileptic seizure prediction in long-term scalp EEG signal using multivariate statistical process control. IRBM 40(6), 320–331 (2019) 5. Bandarabadi, M., Teixeira, C.A., Rasekhi, J., Dourado, A.: Epileptic seizure prediction using relative spectral power features. Clin. Neurophysiol. 126(2), 237–248 (2015) 6. Gadhoumi, K., Lina, J.-M., Gotman, J.: Discriminating preictal and interictal states in patients with temporal lobe epilepsy using wavelet analysis of intracerebral eeg. Clin. Neurophysiol. 123(10), 1906–1916 (2012) 7. Tzallas, A.T., Tsipouras, M.G., Fotiadis, D.I.: Automatic seizure detection based on time-frequency analysis and artificial neural networks. Comput. Intell. Neurosci. (2007) 8. Tzallas, A.T., Tsipouras, M.G., Fotiadis, D.I.: The use of time-frequency distributions for epileptic seizure detection in EEG recordings. In: 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3–6. IEEE (2007) 9. Tzallas, A.T., Tsipouras, M.G., Fotiadis, D.I.: Epileptic seizure detection in EEGs using time-frequency analysis. IEEE Trans. Inf. Technol. Biomed. 13(5), 703–710 (2009) 10. George, F., et al.: Epileptic seizure prediction using EEG images. In: 2020 International Conference on Communication and Signal Processing (ICCSP), pp. 1595– 1598. IEEE (2020) 11. Zhang, S., Chen, D., Ranjan, R., Ke, H., Tang, Y., Zomaya, A.Y.: A lightweight solution to epileptic seizure prediction based on EEG synchronization measurement. J. Supercomput. 77(4), 3914–3932 (2021) 12. Yang, X., Zhao, J., Sun, Q., Jianbo, L., Ma, X.: An effective dual self-attention residual network for seizure prediction. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 1604–1613 (2021) 13. Acharya, U.R., Molinari, F., Sree, S.V., Chattopadhyay, S., Ng, K.-H., Suri, J.S.: Automated diagnosis of epileptic EEG using entropies. Biomed. Signal Process. Control 7(4), 401–408 (2012) 14. Kannathal, N., Choo, M.L., Acharya, U.R., Sadasivan, P.K.: Entropies for detection of epilepsy in EEG. Comput. Methods Programs Biomed. 80(3), 187–194 (2005) 15. Pravin Kumar, S., Sriraam, N., Benakop, P.G., Jinaga, B.C.: Entropies based detection of epileptic seizures with artificial neural network classifiers. Expert Syst. Appl. 37(4), 3284–3291 (2010) 16. Zheng, R., et al.: Scalp EEG functional connection and brain network in infants with west syndrome. Neural Netw. 153, 76–86 (2022) 17. Cao, J., et al.: Using interictal seizure-free EEG data to recognise patients with epilepsy based on machine learning of brain functional connectivity. Biomed. Signal Process. Control 67, 102554 (2021) 18. Sha, Z., Wager, T.D., Mechelli, A., He, Y.: Common dysfunction of large-scale neurocognitive networks across psychiatric disorders. Biol. Psychiatry 85(5), 379– 388 (2019) 19. Toda, Y., et al.: High-frequency EEG activity in epileptic encephalopathy with suppression-burst. Brain Develop. 37(2), 230–236 (2015) 20. Feng, Y., et al.: 3D residual-attention-deep-network-based childhood epilepsy syndrome classification. Knowl.-Based Syst. 248, 108856 (2022)
Coherence Matrix Based EIEE Analysis with ResNet
101
21. Dinghan, H., Cao, J., Lai, X., Wang, Y., Wang, S., Ding, Y.: Epileptic state classification by fusing hand-crafted and deep learning EEG features. IEEE Trans. Circuits Syst. II Express Briefs 68(4), 1542–1546 (2021) 22. Wang, Z., Duanpo, W., Dong, F., Cao, J., Jiang, T., Liu, J.: A novel spike detection algorithm based on multi-channel of BECT EEG signals. IEEE Trans. Circuits Syst. II Express Briefs 67(12), 3592–3596 (2020) 23. Dinghan, H., Cao, J., Lai, X., Liu, J., Wang, S., Ding, Y.: Epileptic signal classification based on synthetic minority oversampling and blending algorithm. IEEE Trans. Cogn. Develop. Syst. 13(2), 368–382 (2021) 24. Zhendi, X., Wang, T., Cao, J., Bao, Z., Jiang, T., Gao, F.: BECT spike detection based on novel EEG sequence features and LSTM algorithms. IEEE Trans. Neural Syst. Rehabil. Eng. 29, 1734–1743 (2021) 25. Cao, J., Dinghan, H., Wang, Y., Wang, J., Lei, B.: Epileptic classification with deep-transfer-learning-based feature fusion algorithm. IEEE Trans. Cogn. Develop. Syst. 14(2), 684–695 (2022) 26. Cao, J., Zhu, J., Wenbin, H., Kummert, A.: Epileptic signal classification with deep EEG features by stacked CNNs. IEEE Trans. Cogn. Develop. Syst. 12(4), 709–722 (2020) 27. Cao, J., et al.: Unsupervised eye blink artifact detection from EEG with gaussian mixture model. IEEE J. Biomed. Health Inform. 25(8), 2895–2905 (2021) 28. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 29. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Gradcam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618– 626 (2017)
Constrained Canonical Correlation Analysis for fMRI Analysis Utilizing Experimental Paradigm Information Ming Li1(B) , Yan Zhang1,2 , Pengfei Tang1 , and Dewen Hu1 1 College of Intelligence Science and Technology, National University of Defense Technology,
Changsha, China [email protected] 2 Xi’an Satellite Control Center, Xi’an, China
Abstract. Data-driven methods have been successfully used in functional magnetic resonance imaging (fMRI) data analysis, utilizing no explicit prior information. However, in analysis of task fMRI, incorporating prior information about paradigms would be useful to improve the power in detecting desired activations. In this paper, we incorporated the experimental paradigm information into canonical correlation analysis (CCA) model and proposed a temporal constrained CCA approach. Comparing to noise and artifact signal, the response BOLD signal from activated regions changes only after stimulus begin. By incorporating the difference before and after stimulus, the spatial patterns which respond well to stimulus-occurrence become manifest, and the activations can be detected more accurately. Comparisons on simulated data indicated that incorporating prior information about paradigm can improve the accuracy of CCA on activation detection. The proposed method obtained more accurate and robust results than conventional CCA and showed improved power in activation detection than pure data-driven methods. Keywords: Data driven · Paradigm · Canonical correlation analysis
1 Introduction Data-driven methods have been successfully applied to functional Magnetic Resonance Imaging (fMRI) studies [1–4]. Widely used data-driven methods include principal component analysis (PCA) [5], independent component analysis (ICA) [6, 7] and canonical correlation analysis (CCA) [8, 9], etc. PCA and ICA try to maximize the variance and the independence, which are the statistical features of components. Both of them do not character the spatial modes or temporal response patterns, while CCA makes a bit more constraints on spatiotemporal features, i.e., the smoothness of neural signals. However, neither of them incorporates explicit characters of the interested underlying sources. Utilizing explicit prior information, such as the experimental paradigm, would be significantly helpful to task fMRI analysis applications. Calhoun et al. utilized the paradigm information to apply temporal constraints in spatial ICA, and improved the robustness of © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 102–111, 2023. https://doi.org/10.1007/978-981-99-0617-8_8
Constrained Canonical Correlation Analysis
103
“blind” ICA in the presence of noise [10, 11]. Tahir Rasheed et al. incorporated temporal prior information into ICA to apply closeness constraint and improved the effectiveness of the method [12]. Wang et al. incorporated a constrained closeness between the output and a predicted reference signal into ICA and demonstrated improved accuracy [13]. However, in these algorithms, such constraint may be too tight. The real response may be inconsistent with the predicted response, because HRF model varies in different subjects, tasks and brain regions. If the HRF model used does not match the real response curve, the constraints may be incorrect and lead to bad performance. In our study, we made the assumption that there exists significant difference in the BOLD-level between before and after stimulus. We incorporated the BOLD-level difference as a constraint into conventional CCA model to detect task-related regions, and presented the so-called difference constrained CCA method (dCCA). To validate and evaluate the proposed method, experiments were conducted. Results on simulated data showed that compared to conventional methods, our method detected more accurate activations.
2 Theory Considering t zero-mean observed signals x1 , x2 , · · · xt , each of them can be modeled as a linear combination of r zero-mean uncorrelated source signals s1 , s2 , · · · sr : r xi = s1 a1i + s2 a2i +, · · · , +sn ani = sj aji (1) j=1
where aji (j = 1, · · · , r; i = 1, · · · , t) is the mixing coefficient. Let x = (x1 , x2 , · · · , xt ) represent the observed signals of a subject, s = (s1 , s2 , · · · , sr ) and A represents the matrix with elementsaji , Eq. (1) can be written as: x =s·A
(2)
In most cases, a source signal s can be modeled as a linear combination of observed signals s = xT w
(3)
where w is the demixing vector corresponding to s, w = 1. Our algorithm assumes that the amplitude of task-related time series exists significant mean-difference between before and after stimulus. And because real sources are smooth, a source signal s should also fulfill the maximal autocorrelation hypotheses (the CCA hypotheses). Therefore we obtain the following optimization problem: max Obj = θ · γ + (1 − θ ) · g w
(4)
where γ and g denote the autocorrelation and BOLD signal amplitude mean-difference between before and after stimulus. θ is a real number within the interval of (0,1).
104
M. Li et al.
According to the theory of CCA, the autocorrelation term γ can be quantified by the autocorrelation of s: (5) γ = E sss() / E s2 E ss() ss() where E(·) denotes the expectation, ss() is the shifted version of s, the superscript s() denotes a shift of step , and can be set to 1 for simplicity. Because ss() and s are composed of the same sample points, they have the same variance, i.e., E s2 = E ss() ss() . Then we have (6) γ = E sss() / E s2 The difference term g denotes the BOLD mean-difference between before and after stimulus. It should have the following features. Firstly, g must be positively correlated to the BOLD level mean-difference between before and after stimulus. The more significant the mean-difference is, the larger the value of g should be. Secondly, g should be written as an explicit formula of the demixing vector w, so that it is convenient to find the demixing vector to make (4) approach its extrema. Finally, the value of g should be normalized to the interval [0,1] to make it comparable to γ . Thus it is easy to adjust the proportion of g by changing the value of weight θ . In the following discussion, we will develop the mathematical description of the difference term with the above three criteria. According to the model represented by (1), each individual source signal s is multiplied by its time fluctuation, which can be reflected by a row of mixing coefficients corresponding to s in A. Therefore we can define ν to describe the mean difference of amplitude between before and after stimulus by the following formulation: 2 h 1 k 1 · am − · am (7) ν= m=k+1 m=1 h−k k where k and h is the onset time and end time of the stimulus curve. Therefore the experimental paradigm is incorporated as a prior information. Let α represent a row in A, then ν can be written as: 2 (8) ν = βα T where β is a difference vector between before and after stimulus: −1 −1 −1 1 1 1 β=( , ,··· , , ,··· , ), ) k k k h − k h − k h − k
k
(9)
h−k
From (1) and (2), the mixing coefficient vector α can be obtained by x and s, i.e., α = sT · x. Then we have 2 ν = βxT s (10)
Constrained Canonical Correlation Analysis
105
The original data is dimensionally reduced by principle component analysis (PCA) before dCCA analysis, thus the source signals are demixed from the dimensionally reduced data. Let y = (y1 , y2 , · · · , yK )T be the reduced data of x, where yi (1 ≤ i ≤ K) is oneof the K (K ≤ r) retained components after reduction and yi has been normalized, i.e., E yi2 = 1. (3) should be rewritten as s = yT w
(11)
It should be noted that the x in (10) cannot be replaced by y because the difference term is defined as the mean-difference in observed data instead of reduced data. Inserting (11) into (10), the mean-difference can be represented by (12) ν = wT yxT βT βxyT w According to the PCA theory, the remained principle component can be represented as yi = ϕiT x/|λi |, where λ2i and ϕi are the i-th eigenvalue and eigenvector of E yyT [4]. Then we obtain: 1 1 1 (13) , ,··· , y = diag (ϕ1 , ϕ2 , · · · , ϕK )T x = −1 T x |λ1 | |λ2 | |λK | where = (ϕ1 , ϕ2 , · · · ϕK ) and = diag(|λ1 |, |λ2 |, · · · , |λK |). The diagonal elements of are sorted in descending order and λ21 is the maximum eigenvalue. Inserting (13) into (12), ν can be reduced to 2 ν = wT −1 T E βxyT −1 w = wT 2 w
(14)
According to (10) and (11): ν = βxyT wwT yxβT Under the constraint w = 1 and E yi2 = 1, ν can be reduced to ν = βxxT βT = (βx)2
(15)
(16)
Then ν can be normalized to the interval of [0,1], and the difference term g can be defined as the normalized ν: ν (17) g = = wT 2 w/(βx)2 ν Inserting (3), (6) and (17) into (4), we can obtain the objective function: T 2 T s() Obj = w θ · E y y + (1 − θ ) · · w = wT Uw = wT Uw (βx)2 T where U = θ · E y ys() + (1 − θ) ·
2 . (βx)2
(18)
Under the constraint w = 1, it can be
determined that the w cause Obj approach its extrema are the eigenvectors of U + UT .
106
M. Li et al.
Note that (7) and (9) are established under the condition that only one stimulus exists in the experimental paradigm. When N stimulus are included in the paradigm, and each with start and end moments of k1 , . . . , kN and h1 , . . . , hN respectively, we will obtain a generalized form of β: −1 −1 −1 −1 −1 −1 β=( ,··· , , ,..., ,..., ,..., , k1 k1 k2 − h1 k2 − h1 kN − hN −1 kN − hN −1
k1
k2 −h1
kN −hN −1
1 1 1 1 1 1 ,··· , , ,··· , ,..., ,··· , ) h1 − k1 h1 − k1 h2 − k2 h2 − k2 hN − kN hN − kN
h1 −k1
h2 −k2
(19)
hN −kN
3 Materials and Methods Experiment on simulated data was designed to assess the validity of the proposed method and compare its performance with standard CCA. Multiple fMRI-like datasets were generated via the SimTB toolbox (http://mialab.mrn.org/software/simtb/index.html) [14]. Simulated data were generated under a linear mixture model using fMRI-like source images (140 × 140 pixels) and associated 260-point time courses (See Fig. 1). Rician noise was added to the generated data with a specified contrast-to-noise ratio (CNR) [15]. Repetition time (TR) was 2 s/sample. 3.1 Experiment I To validate and evaluate the proposed method, 1000 simulated datasets with a CNR of 0.1 were generated using 27 fMRI-like source images, which includes one task-related source. Among these datasets, variability of sources in spatial location and shape was generated by assigning translations (mean of 0, standard deviation (SD) of 10 pixels), rotations (mean of 0, SD of 5°) and spread (mean of 2, SD of 0.3) to the sources of each dataset. Temporal variation of time courses also exists among datasets and was generated by assigning unique events at each TR. The activation and corresponding time course of one dataset are shown in Fig. 1 as an example. 3.2 Experiment II A parameter that should be decided in the proposed method is the weight θ , which is set to adjust the proportion of the difference term in the optimization objection. Thus we designed this experiment to explore the effect of θ to the performance of the proposed method. The 1000 simulated datasets generated in above Experiment I (CNR = 0.1) were applied to this experiment, and the weight θ for the proposed method was set from 0 to 1 with a step of 0.01. According to the Eq. (4), when θ = 0, the proposed algorithm separates sources completely according to the difference term, which is obviously not enough for a satisfied performance. And when θ = 1, the method becomes standard CCA and incorporates no prior information, which is insufficient to achieve best performance
Constrained Canonical Correlation Analysis
107
Fig. 1. Simulated activations and their associated time courses. Example of the simulated activation are shown in left and the corresponding response are on right. The red curves are the experimental paradigms corresponding to each time course.
either. Therefore, we hypothesized that with the increase of θ from zero, the accuracy of estimated components will display an ascend trend at the beginning because of the including of CCA. However, as θ continues to increase, the performance will end up with a decrease because the available prior information becomes less and less.
4 Results and Discussion To compare the performance of the proposed method and standard CCA, we computed the correlation between the ground-truth (GT) signal and estimated task-related time course by each method. The correlation coefficients obtained by the two methods on 1000 datasets were displayed in Fig. 2(a). As shown in Fig. 2(a), almost all points lie above the line of y = x, indicating that the task-related components estimated by the proposed method correspond better with the GT time series on most datasets. The results suggest that the involvement of difference term is helpful to the estimation of task-related components, and the accuracy of obtained activation can be improved under the guidance of paradigm information. Figure 2(b) shows the distribution of the correlation from the
108
M. Li et al.
two methods. As can be seen, the blue bars concentrate in the interval of 0.7–0.9, while the orange bars are less concentrated and mainly distributed in the range of 0.5–0.8. The results not only suggest that the proposed method achieves higher overall accuracy, but its performance is also more stable among different datasets, which is reasonable because under the guidance of temporal information, time courses show higher correspondence with the paradigm are selected and the results can become more stable. From the results shown in Fig. 2, the proposed method obtained more accurate and stable results than standard CCA, suggesting that the experimental paradigm information can be helpful to the estimation of task-related components if it is effectively incorporated into analysis, and both the accuracy and robustness of “blind” CCA on activation detection can be improved.
Fig. 2. Comparsion between the proposed method (dCCA) and traditional method (CCA). (a) The distribution of the Pearson correlation coefficient between the ground-truth (GT) time course and the estimated task-related time course by the proposed method (dCCA) and CCA. The horizontal axis indicates the correlation of CCA, and the vertical axis represents the value of the proposed method. The dotted line of y = x is drawn as a reference. (b) The distribution of the calculated correlation from the proposed method (in blue) and CCA (in orange). The horizontal axis indicates the correlation value and the vertical axis indicates the quantity of datasets.
To compare the two methods in different noise condition, we computed the average correlation between estimated time course and stimulus curve under varying CNRs and the results are displayed in Fig. 3. As shown in Fig. 3, as the CNR increases, the performance of both two methods goes up at the beginning and finally maintains stable. But under all CNR levels, the correlation obtained by the proposed method is always higher than blind CCA. Besides, the standard deviation is relatively lower for the proposed method, and is nearly half of the value of CCA when CNR is higher than 0.1. It’s obvious that the proposed method did a much better job estimating the task-related components than CCA at all CNR levels. These results suggest that under the guidance of experimental paradigm information, both the accuracy and robustness of the data-driven method can be improved no matter in high or low noise situations. To explore the influence of the parameter θ , we computed the correlation between the obtained task-related time courses and the GT time series at different θ values and plotted the curve of average correlation varying with θ from 0 to 1 (in Fig. 4). As shown in Fig. 4,
Constrained Canonical Correlation Analysis
109
Fig. 3. The comparison between the proposed method and standard CCA under varying CNRs. The blue curve indicates the temporal Pearson correlation coefficient obtained by the proposed method, and the orange curve indicates the correlation obtained by standard CCA. The horizontal axis indicates the CNR value, and the vertical axis indicates the correlation. Error bars represent ±SD.
with the increase of θ , the accuracy of estimated time courses rises in the beginning, and then decreases after 0.52 (just as expected). It’s reasonable because when θ approaches to zero, the difference term plays a decisive role in optimization objection, thus the method concentrates on extracting signals whose amplitude changes significantly after stimulus. Apparently, noise signals or artifacts can meet such a simple requirement and be selected as the results. Therefore, greater involvement of autocorrelation term will definitely help to distinguish meaningful signals and improve the accuracy. When the proption of the autocorrelation term becomes comparable to the difference term, the overall performance can reach its top. As for the datasets in this experiment, the θ value that makes the method achieve highest accuracy is 0.52. When θ is higher than 0.52, however, the accuracy of estimated time course experiences a decrease. It is likely that the proportion of the autocorrelation term is considerably higher than the difference term, meaning that the prior information incorporated into analysis is less. This downward trend proves our hypothesis again, that the prior information is helpful to the detection of task-related components and can improve the accuracy of the method. In addition, it can be seen that the standard deviation rises slightly with the increase of θ , suggesting that greater incorporation of prior information can improve the robustness of the method to a larger extend.
110
M. Li et al.
Fig. 4. The performance curve of the proposed method at various θ. The horizontal axis indicates the θ value varying from 0 to 1 with a step of 0.01, and the vertical axis indicates the average Pearson Correlation Coefficient value obtained from 1000 datasets at every θ. The gray shadings represent ±SD.
5 Conclusion In this study, we proposed the dCCA approach by incorporating the experimental paradigm information into standard CCA model. The comparisons with conventional methods indicate that incorporating prior information improved the accuracy of datadriven method, and the proposed method achieves better performance in activation detection for fMRI data analysis.
References 1. Cheriyan, M.M., Michael, P.A., Kumar, A.: Blind source separation with mixture models – a hybrid approach to MR brain classification. Magn. Reson. Imaging 54, 137–147 (2018) 2. Sun, X., Xu, J., Ma, Y., Zhao, T., Ou, S.: Single-channel blind source separation based on attentional generative adversarial network. J. Ambient. Intell. Humaniz. Comput. 13(3), 1443– 1450 (2020). https://doi.org/10.1007/s12652-020-02679-4 3. Lee, K., Tak, S., Ye, J.C.: A data-driven sparse GLM for fMRI analysis using sparse dictionary learning with MDL criterion. IEEE Trans. Med. Imaging 30(5), 1076–1089 (2011) 4. Li, H., Correa, N.M., Rodriguez, P.A., Calhoun, V.D., Adali, T.: Application of independent component analysis with adaptive density model to complex-valued fMRI data. IEEE Trans. Biomed. Eng. 58(10), 2794–2803 (2011) 5. Andersen, A.H., Gash, D.M., Avison, M.J.: Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework. Magn. Reson. Imaging 17(6), 795–815 (1999) 6. Zhang, W., Lv, J., Li, X., Zhu, D., Jiang, X.: Experimental comparisons of sparse dictionary learning and independent component analysis for brain network inference from fMRI data. IEEE Trans. Biomed. Eng. 66(1), 289–299 (2019) 7. Valente, G., De Martino, F., Filosa, G., Balsi, M., Formisano, E.: Optimizing ICA in fMRI using information on spatial regularities of the sources. Magn. Reson. Imaging 27(8), 1110– 1119 (2009)
Constrained Canonical Correlation Analysis
111
8. Wu, X., Zeng, L.-L., Shen, H., Li, M., Hu, Y., Hu, D.: Blind source separation of functional MRI scans of the human brain based on canonical correlation analysis. Neurocomputing 269, 220–225 (2017) 9. Zhang, Y., Li, M., Shen, H., Zeng, L.-L., Hu, D.: A robust multi-subject fMRI analysis method using dimensional optimization. IEEE Access 7, 125762–125770 (2019) 10. Calhoun, V.D., Adali, T., Stevens, M.C., Kiehl, K.A., Pekar, J.J.: Semi-blind ICA of fMRI: a method for utilizing hypothesis-derived time courses in a spatial ICA analysis. Neuroimage 25(2), 527–538 (2005) 11. Wei, L., Rajapakse, J.C.: Approach and applications of constrained ICA. IEEE Trans. Neural Netw. 16(1), 203–212 (2005) 12. Rasheed, T., Lee, Y.-K., Kim, T.-S.: Constrained spatiotemporal ICA and its application for fMRI data analysis. In: Lim, C.T., Goh, J.C.H. (eds.) 13th International Conference on Biomedical Engineering, pp. 555–558. Springer Berlin Heidelberg, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92841-6_136 13. Wang, Z., Xia, M., Jin, Z., Yao, L., Long, Z.: Temporally and spatially constrained ICA of fMRI data analysis. PLoS ONE 9(4), e94211 (2014) 14. Erhardt, E.B., Allen, E.A., Wei, Y., Eichele, T., Calhoun, V.D.: SimTB, a simulation toolbox for fMRI data under a model of spatiotemporal separability. Neuroimage 59(4), 4160–4167 (2012) 15. Geissler, A., Gartus, A., Foki, T., Tahamtan, A.R., Beisteiner, R., Barth, M.: Contrast-to-noise ratio (CNR) as a quality parameter in fMRI. J. Magn. Reson. Imaging 25(6), 1263–1270 (2007)
Algorithm
BFAct: Out-of-Distribution Detection with Butterworth Filter Rectified Activations Haojia Kong1(B) and Haoan Li2 1
Beijing Institute of Technology, Beijing, China [email protected] 2 Waseda University, Tokyo, Japan [email protected] https://haojiak.github.io/
Abstract. Out-of-Distribution (OOD) Detection has drawn a lot of attention recently due to it being an essential building block for safely deploying neural network models in real-life applications. The challenge in this field is that modern neural network tends to produce overconfident predictions on OOD data, which work against the principles of OOD detection techniques. To overcome this challenge, we propose Butterworth Filter rectified Activations (BFAct), a technique to rectify activations and drastically alleviate the overconfidence predictions on OOD data. Our work is motivated by an analysis of a neural network’s internal activation and proved to be a surprisingly effective post hoc method for the OOD Detection task. The advantage of using Butterworth Filter is that the passband of the filter has a smooth and monotonically decreasing frequency response, which helps to correct the abnormal activations to normal distribution. BFAct is not only able to generalize on various neural network architectures but also compatible with various OOD score functions. Our main experiments are evaluated on a large-scale OOD Detection benchmark based on ImageNet-1k, which is closer to practical application scenarios. We also conduct experiments on CIFAR10 and CIFAR-100. The results illustrate that our method outperforms the state-of-the-art on both large-scale and common benchmarks. Keywords: Out-of-distribution detection Activations · Butterworth filter
1
· Neural network ·
Introduction
It is established that machine learning classifiers display outstanding performance when the input data is consistent with the training data. However, the reality in real-world systems is that they often encounter out-of-distribution (OOD) inputs: the samples those distributions or categories have not been exposed to the network during the training phase. Research shows that the H. Kong and H. Li—These authors contributed equally to this work. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 115–129, 2023. https://doi.org/10.1007/978-981-99-0617-8_9
116
H. Kong and H. Li
Machine Learning model can incorrectly classify test samples from unknown classes or different semantic distributions as one of the known categories with high confidence [24]. Classifiers failing to indicate when they are likely mistaken can limit their adoption and cause serious accidents. For instance, in June 2020, a Tesla under autopilot control was unable to identify a static crashed truck lying on its side in the middle of a highway, which crashed without any breaking, causing the death of the driver. This incident could have been avoided if the autopilot system was trained with OOD detection in mind. This way, the system could take measures to warn the driver about an unanticipated situation, like a flipped truck in the middle of a road, and slow down. OOD detection considers the problem of distinguishing in- and out-ofdistribution samples in testing phase [3,4,10,11]. In the OOD detection setting, test samples come from the distribution with the semantic shift from indistribution (ID) P(xin ) = P(xout ).
Fig. 1. Subgraph (a) shows the value of activations at the corresponding position of the 2048 activation units on the penultimate layer before applying BFAct. Subgraph (b) shows that of the rectified activations after applying BFAct. Subgraph (c) shows the distribution of OOD Detection uncertainty scores without BFAct while subgraph (d) shows the distribution of the uncertain score with BFAct. ID data is shown in blue and OOD data in orange. (Color figure online)
This paper will approach the OOD detection problem by rectifying the abnormal unit activations of OOD data while preserving that of the ID data. The filter is built on the principles of the Butterworth filter [29], which originated from the field of analogue signal processing. The Butterworth filter act as a low-pass filter, leaving the low values to be untouched while greatly attenuating the higher values, as shown in Fig. 1(c). The main advantage of Butterworth filter is that it has a maximally flat magnitude response in the pass-band. This smooth filtering characteristic makes the Butterworth filter correct the activation value to be closer to the original normal distribution. In addition, our models are based on pre-trained neural network architectures, which is closer to industrial application
BFAct: Out-of-Distribution Detection
117
scenarios. By choosing this method, we would like to make discoveries in a field that few have yet explored [15,18,19,22]. Both empirical and theoretical analysis will be provided to illustrate the superiority of BFAct. Experimentally, state-of-the-art performance is established on both large-scale ImageNet-1k benchmark and common OOD Detection benchmark CIFAR. Theoretically, we prove that BFAct enhances the gap between ID and OOD data, which helps OOD estimators better identify ID and OOD samples. In summary, the key results and contributions are: – BFAct (Butterworth Filter rectified Activations), a post hoc OOD detection approach for rectifying OOD detection performance, is introduced. This method is highly compatible with various uncertainty score functions and can be implemented on modern neural architectures without retraining. – BFAct is tested on a phrase where ImageNet-1k is used as ID dataset and four datasets with non-overlapping categories w.r.t. ImageNet-1k are used as OOD datasets. The evaluations span a diverse range of domains and are close to real scenarios. BFAct achieves an average reduction in FPR95 by 4.07% when compared to the best-performing methods. BFAct is also evaluated on CIFAR-100 and CIFAR-10 benchmarks that are routinely used in literature. – How abnormally high activations on OOD data harm detection is empirically analyzed. How effectively BFAct rectifies activations and separates ID and OOD distributions is theoretically analyzed. This paper also provides comprehensive ablation studies on different design choices, like network architectures, different parameters, applying BFAct on different layers, etc.
2
Method
Consider the situation of distinguishing in-distribution (ID) and out-ofdistribution (OOD) images on pre-trained neural networks. There are two stages in the OOD Detection process. The course begins with encoding properties in the images by using a CNN model and ends at the binary classification by using a score function. Figure 2 displays the general workflow with more details. 2.1
Model and Strategy
To start with, an introduce to the OOD Detection process is as follows. Let Din and Dout denote two distinct data domains. The image space is defined as X and Y is defined as the output space. Let f : X → R|Y| represent a neural network trained on X and output a logit vector. f is trained on a dataset drawn from Din and Dout are fed to the model sequentially. After that, a decision function detects whether the sample comes from Din or Dout , which has a form as follows: 0, if x ∼ Dout g(x; f ) = (1) 1, if x ∼ Din
118
H. Kong and H. Li
Fig. 2. Apply BFAct with an energy-based OOD Detection framework. Butterworth Filter is applied on the penultimate layer of the CNN model to rectify abnormal activations. This process of correcting activations assists energy function to detect OOD samples.
The effectiveness of OOD Detection mainly depends on the design of decision function g (x; f ). A well-founded decision function should have the ability to separate the distributions of ID and OOD data. An inefficient one, on the contrary, would make predictions randomly. The key idea in this paper is to apply a low-pass filter as a modification to the unit activation. This paper aims to find an appropriate low-pass filter to modify the unit activation. Amongst the available options, Butterworth Filter is chosen for its flat frequency response in the passband, which is an optimal characteristic for handling activation. BFAct takes the form of: BF Act(x) =
1 x 1 + ( threshold )2N
(2)
Let the activations for the penultimate layer be defined as z := (z1 , z2 , ..., zn ). The progress of BFAct can be presented as: zˆ = z · BF Act(z)
(3)
And the output of the CNN model after BFAct can be given as: z ; θ) = W zˆ + b f BF Act (ˆ where W is the parameter matrix and b is the bias vector.
(4)
BFAct: Out-of-Distribution Detection
119
N and threshold in Eq. 2 are two adjustable parameters. N can be adjusted for better OOD performance and model stability. Rectification threshold can also be selected according to the model to preserve the activation of ID data while rectifying the OOD data. Further details can be found in the Sect. 3.
3
Experiments and Results
Two types of experiments with BFAct are conducted which are presented in Sects. 3.2 and 3.3. In Sect. 3.2, a large-scale dataset, ImageNet-1k [7], is evaluated to assess the performance of the method in practical application. Furthermore, classical scaled-down image datasets CIFAR-10 and CIFAR-100 is included to compare the performance of BFAct with other broadly used models in Sect. 3.3. The code is publicly available at https://github.com/Lorilandly/BFAct. 3.1
Evaluation Metrics
The following metrics are measured: P – The False Positive Rate (FPR 95). F P R = F PF+T N , where F P and T N denote false positive and true negative respectively. In the OOD Detection task, FPR 95 can be interpreted as the probability that an OOD example is P misdivided as ID data when the F P R = F PF+T N is as high as 95. – The Area Under the Receiver Operating Characteristic Curve (AUROC) [6]. The ROC curve summarizes the performance of an OOD detected method. – Area Under the Precision-Recall Curve (AUPR) [21]. An essential measure for the situation when there exists a class imbalance between OOD and ID data.
3.2
ImageNet-1k Classification Experiments
In-distribution Datasets. The ImageNet-1k dataset [7], proposed by Huang and Li, is used as the ID data. This dataset is realistic (with higher resolution figures) and has been recognized as a challenge for current OOD detection models due to its large scale (1,000 categories). Out-of-Distribution Datasets. Our OOD test datasets are subsets of iNaturalist [32], SUN, Places 365 [38], and Textures [5]. The subsets are selected with non-overlapping categories w.r.t. ImageNet-1k. All OOD data are prepossessed and resized to 224 × 224. These OOD datasets have been processed to ensure that there are no categories overlapping with the ID dataset. Training Details. Experiments are taken dominance of two kinds of CNN Models for backboned architecture. One is the main model, a pre-trained ResNet50 [9] network; the other is MobileNet-v2 [28], a lightweight model. MobileNetv2 is suitable for OOD Detection in the situation of the mobile application. The first stage of OOD Detection is calculating the parameters of the Energy
120
H. Kong and H. Li
function and threshold on the ID dataset. The latter process is testing on OOD datasets and outputting the value of metrics. Practically, the threshold is fixed to the 95-th percentile of activations, which is entirely estimated on the ID dataset. Models have been built to measure the performance of BFAct, and their backbone is taken from two of the most well-established CNN models: ResNet-50 and MobileNet-v2. The latter is suitable for applications where the computing power is limited, e.g. IoT applications. There are two stages to OOD Detection. First, the energy function parameters and the threshold of the ID dataset are computed. Next, the OOD datasets and the value of the metrics are outputted. In practice, the threshold is fixed to the 95−th percentile of the activations, estimated on the ID dataset. The main results are demonstrated in Table 1. Table 1. Main results Model ResNet50
OOD datasets
iNaturalist FPR95↓ AUROC↑ SUN FPR95↓ AUROC↑ Places FPR95↓ AUROC↑ Textures FPR95↓ AUROC↑ Average FPR95↓ AUROC↑ MobileNet iNaturalist FPR95↓ AUROC↑ SUN FPR95↓ AUROC↑ Places FPR95↓ AUROC↑ Textures FPR95↓ AUROC↑ Average FPR95↓ AUROC↑
Methods Mahalanobis 97.00 52.65 98.50 42.41 98.40 41.79 55.80 85.01 87.43 55.47 62.11 81.00 47.82 86.33 52.09 83.63 92.38 33.06 63.60 71.01
ODIN 47.66 89.66 60.15 84.59 67.89 81.78 50.23 85.62 56.48 85.41 55.39 87.62 54.07 85.88 57.36 84.71 49.96 85.03 54.20 85.81
ReAct 20.38 96.22 24.20 94.20 33.85 91.58 47.30 89.80 31.43 92.95 42.93 92.95 52.68 87.21 59.81 84.04 40.30 90.96 48.93 88.74
MSP 54.99 87.74 70.83 80.86 73.99 79.76 68.00 79.61 66.95 81.99 64.29 85.32 77.02 77.10 79.23 76.27 73.51 77.30 73.51 79.00
Energy 55.72 89.95 59.26 85.89 64.92 82.86 53.72 85.99 58.41 86.17 59.50 88.91 62.65 84.50 69.37 81.19 58.05 85.03 62.39 84.91
Ours 16.31 96.93 21.28 94.99 30.24 92.51 44.93 91.25 28.19 93.92 40.08 93.08 51.55 87.47 33.42 83.99 33.42 92.89 46.09 89.36
Table 1 correspondingly demonstrates the outcomes of BFAct combined with ResNet50 [9] and MobileNet [28]. Noticeably, BFAct performs better on almost all of the metrics. On the ImageNet benchmark, BFAct beats some of the best established methods. It reduces FPR by 28.29% compared to ODIN and raises the AUROC by 7.75% compared to Energy. ReAct has a solid performance, yet it still falls short behind BFAct by a small margin due to the difference in the choice of adjusted function. Compared with the minimum function used by ReAct: zˆ = min (z, threshold) (5)
BFAct: Out-of-Distribution Detection
121
BFAct with function form Eq. 2 is more suitable as the weights are adjusted automatically based on activations, and its continuous nature has minimum sides. Results on Varying Filter Parameters. In BFAct function, N and threshold are the adjustable parameters that can make a significant impact on the accuracy of the final results. The issue that these parameters addressed is how the abnormal activations should be eliminated without mangling the normal activations. threshold determines the value of which activations above are attenuated. It is based on the p-th percentile of activations estimated on the ID data. Accounting for the characteristics of the Butterworth filter, the 95-th percentile is a reasonable value to use in practice that fits most use cases. Setting the threshold at an appropriate level, the possibility of distorting the activations of ID samples could be reduced significantly while leaving a reasonable overhead for OOD detection to take effect. To further fine-tune the accuracy, N empowers the users to adjust the threshold transition at the threshold. Those activations that are above the threshold are more likely abnormal, but those that are only slightly above might still be a normal activation. Thus, this parameter is introduced such that a smaller N allows the values that are slightly above the threshold to be punished only slightly, while a larger N brings a much sharper transition. One of the drawbacks is that for N to achieve its capability, it needs to be tuned manually according to each dataset. The various datasets could have slightly different preferences on this parameter. Table 2. Effect of butterworth filter parameter N Butterworth filter N N = 1 N = 2 N = 3 N = 5 N = 10 FPR95
37.52
28.19 29.49
38.43
49.40
AUROC
91.86
93.94 93.67
91.37
87.45
AUPR
98.28
98.76 98.73
98.29
97.46
Acc
75.74 74.55
70.15
65.48
73.17
The effect of N is characterized in Butterworth Filter form on Imagenet-1k. The OOD Detection performance is recorded with N = 1, 2, 3, 5, 10 respectively. From the results displayed in Table 2, the optimal value for N could be found. As N gets greater than 3, the performance of BFAct decreases progressively. At N = 2, the model achieves the best result overall. Thence, all experiments are uniformly set to N = 2 for all Imagenet-1k tasks.
122
H. Kong and H. Li
Results on Applying BFAct on Different Layers. We report the performance of BFAct which is applied on different layers. Pre-trained ResNet-50 combined with the Energy method is used on ImageNet-1k for OOD Detection. In the ResNet-50, there are four residual blocks. Layer 1, Layer 2, Layer 3, and Layer 4 are the output layer of four residual blocks separately. In Table 3, the result is provided in situations when the position of adding BFAct varies from Layer 1 to Layer 4. Table 3. Ablation experiment: applying BFAct on different layers Layer
Layer1 Layer2 Layer3 Layer4 Without
FPR95
67.25
61.94
72.26
28.19
57.74
AUROC 81.31
85.83
74.31
93.92
86.97
AUPR
96.57
96.90
94.67
98.75
97.13
Acc
68.75
73.25
71.72
74.55
74.51
It is conspicuous that administering BFAct on the penultimate layer leads to maximum effect. In the shallower layers, the difference between OOD activations and ID activations is subtle. Conversely, deep layers are proven to catch semanticlevel features that widen the difference between ID and OOD data, thereby emerging the effectiveness of BFAct. 3.3
CIFAR-10 and CIFAR-100 Classification Experiments
Datasets. As for the in-distribution datasets, CIFAR-10 and CIFAR-100 [14] are used. As for the out-of-distribution datasets, two regular benchmark datasets, Textures [5] and SVHN [23], are evaluated. The size of OOD images is cropped to 32 × 32 during the test. Training Detail and Ablation Experiment. A standard ResNet-18 [8] model is trained for 100 epochs on the ID data of CIFAR-10 and CIFAR-100. The predefined learning rate schedule starts with 0.1 and decays by a factor of 10 at epochs 50, 75, and 90. BFAct is compatible with several OOD scoring functions. We aggregate the results of BFAct applied on three ID datasets and conjoin Energy [20], Softmax (MSP) [10], and ODIN [18] discretely. Table 4 presents that BFAct does improve the interpretation for OOD Detection on each ID dataset.
4
Theoriscal Analysis
This analysis is based on the experiment on ImageNet-1k in Sect. 3.2. The goal of this section is to provide rigorous proof of the BFAct effect on improving performance of OOD Detection estimator by increasing separation between activation distributions.
BFAct: Out-of-Distribution Detection
123
Table 4. Ablation experiments with multiple scoring functions Model
Metrics
W/o BFAct W BFAct ImageNet-1k CIFAR-100 CIFAR-10 ImageNet-1k CIFAR-100 CIFAR-10
Energy
FPR95↓ AUROC↑ AUPR↑ Acc↑
57.74 86.97 97.13 76.15
80.45 78.87 83.67 74.62
56.05 87.27 87.66 94.54
28.23 93.93 98.75 74.60
68.83 84.26 86.35 73.06
55.93 87.31 87.77 94.33
Softmax FPR95↓ AUROC↑ AUPR↑ Acc↑
72.01 81.50 96.05 67.49
80.18 77.24 91.91 75.12
85.44 88.46 90.65 94.35
52.52 88.63 97.59 74.56
75.05 80.88 84.18 73.06
64.92 87.90 89.48 94.34
ODIN
50.80 87.57 97.19 76.13
72.78 77.07 80.49 74.62
66.34 73.64 71.49 94.34
40.91 90.69 97.99 74.56
65.44 77.06 77.01 73.06
64.88 77.14 75.83 94.33
FPR95↓ AUROC↑ AUPR↑ Acc↑
In Sect. 2.1 z is defined as the activations on the penultimate layer. After applying ReLU, the activations have no negative components. Figure 1 (c) illustrates that the distributions of ID and OOD data are positive-skewness,i.e. they have a long tail that extends to the right. Thus the distributions of zin and zout are modeled with the epsilon-skew-normal (ESN) distribution. z ∼ ESN (μ, σ, ) ⎧ 1 z−μ ⎪ ⎪ φ( ), if z < μ ⎪ ⎪ σ σ(1 + ) ⎪ ⎨ , with q(z) = ⎪ ⎪ ⎪ 1 z − μ ⎪ ⎪ ), if z ≥ μ ⎩ φ( σ σ(1 − )
(6)
μout , σin > σout , in < out . With these three conditions, Ein [zi − zˆi ] and Eout [zi − zˆi ] can be located, shown as blue zone and red zone respectively in Fig. 3. It is clear that the value of Eout [zi − zˆi ] is much higher than that of zi ) is larger than D(zi ). Ein [zi − zˆi ]. In other words, D(ˆ D(ˆ zi ) − D(zi ) = Eout [zi − zˆi ] − Ein [zi − zˆi ] > 0
(9)
Therefore, the BFAct helps the OOD estimator perform better by increasing the distance between the estimation of activations of OOD and ID data.
5
Discussion
In Table 1, the performance of BFAct is compared with alternative OOD Detection methods on a variety of metrics. To get hold of the effectiveness of the algorithms, the chosen OOD Detection methods are post hoc methods applied on pre-trained networks, namely, Mahalanobis distance [17], ODIN [18], ReAct [30], Maximum Softmax Probability [10], and energy score [20]. Other than ReAct, which uses a similar underlying approach, BFAct achieved a >30% decrease
BFAct: Out-of-Distribution Detection
125
in FPR95 and a > 5% increase in AUROC. The improvement is more significant considering that BFAct requires little overhead on the original network. In contrast, methods such as Mahalanobis require training a binary classifier separately, putting a significant bottleneck on running speed as the sample space scales, making them a less viable option when working with huge datasets. ReAct [30] is a similar approach that aims to improve the accuracy of energybased OOD Detection by processing the layer data at the post-ultimate layer. However, unlike the low-pass filter used in this research, ReAct uses a cap to limit the abnormal activations. BFAct, compared to ReAct, achieves an average increase in accuracy of 3.14% on ResNet50, with an increase of as much as 2.84% on MobileNet. There is further room for improvement as the parameters can be tuned according to each dataset. The question remains where the difference originated from. The exact reason for the improvement in performance is hard to grasp, as it requires a depth examination by which a network is trained and evaluated. However, here is a conceivable explanation based on the data collected across multiple test suites. From the activation data collected, it is apparent that some indices are more likely to trigger abnormal activation than others, and it is a pattern that abnormal activation happens on only specific fields. The implication here is that the indices in which abnormal activations took place should be disregarded by the classifier to minimize noise. ReAct limits the maximum value that activations can take, effectively controlling the undesirable impact that abnormal activations can have on the result; however, it still allows the abnormal activations to carry a part of the weight to the classifier, leaving a portion of accuracy on the table. Our method aims not to reduce the influence of abnormal activations, but to eliminate such influence altogether. This is a promising approach as the abnormal activations are the artifacts of the network when processing unaccounted data; they do not carry any useful information about the input, and thus should be discarded to maximize accuracy. Upon encountering activation values that are noticeably higher than the threshold, these values will be dropped to near zero, which carries no weight towards the energy classifier. As a result, this method is observed with an assuring improvement in accuracy.
6 6.1
Related Work Out-of-Distribution Uncertainty Estimation
Detecting out-of-distribution samples has a long history. A survey written by Yang et al. [36] provides comprehensive and detailed ideas about this topic. Here we highlight several representative works. At the outset, Nguyen et al. [24] revealed that neural networks would produce overconfidence in out-ofdistribution samples. Later, research on OOD Detection sprang up. These works attempted to improve the OOD uncertainty estimation by proposing the Mahalanobis distance-based confidence score [17], the ODIN score [12,18], and the gradient-based GrandNorm score [13]. The energy score method was proposed by Liu et al. [20] which proved to overcome the disadvantages of the softmax
126
H. Kong and H. Li
confidence score both empirically and theoretically. Sun et al. [30] revealed an observation that OOD data can trigger highly distinctive unit activations, which led to the overconfidence problem. Inspired by this idea, we propose a post hoc measurement applying to the penultimate layer of CNN model, which can rectify the abnormal activations and improve the effectiveness of OOD Detection. 6.2
Energy-Based Learning
Energy-based machine learning models date back to Boltzman machines [1,27]. And energy-based learning [16,25,26] provides a polarized framework for both probabilistic and non-probabilistic approaches to learning. Furthermore, it was demonstrated that energy values could be used to differentiate between real and generated images [37]. Xie et al. [33–35] first showed that a generative random field model can be extrapolated from discriminating neural networks from an energy-based perspective. Besides, energy-based methods are also used in structure prediction [2,31]. Liu et al. [20] was the first work proposing using energy score for OOD uncertainty estimation. Their method proves to outperform on common benchmarks and is widely used by others. In this paper, we combine our post hoc BFAct method with the energy score function and achieve state-ofthe-art on both large-scale dataset and common benchmarks.
7
Conclusion
OOD Detection is essential in safely deploying machine learning in a variety of business models. Post hoc OOD Detection is a branch of this model that can be easily integrated into existing neural network models. However, one of the difficulties faced by this method is irregular activations caused by the OOD samples. To address this issue, this paper presents BFAct, an activation rectification method that provides a concrete performance advantage among post hoc OOD methods. By applying the Butterworth filter at the penultimate layer of the neural network, our method truncates the irregular activation patterns caused by the OOD test data, thus significantly improving the performance of the energy OOD classifier. Empirically, in our ImageNet benchmark test, by applying BFAct, FPR has a notable reduction of 29.51%, and the other metrics have received improvements at various degrees. In a comparison with ReAct, one of the best-performing OOD methods, our technique offers a 4.07% performance advantage in FPR, and around 3–5% improvement across the board. The study of OOD Detection will continue to be an important topic in the near future as the use of AI technologies becomes even more ubiquitous. There is an urgent need to explore new possibilities and patch shortcomings of the technology. Though the field is yet to see a revolutionary change to the way OOD Detection is performed, our study is a firm and steady evolution in the course forward. Looking ahead, we would like to inspire more work and attract researchers to explore the internal mechanisms of OOD Detection.
BFAct: Out-of-Distribution Detection
A
127
Derivations for Theoretical Analysis
We use linear fractional function to approximate the BFAct(x): ⎧ ⎪ ⎨1, h(x) =
√1 (1 ⎪ 2
⎩ 0,
−
N (x−threshold) ), 2
if 0 < x ≤ threshold if threshold < x ≤ threshold + if x > threshold +
2 N
(10)
2 N
) is the taylor polynomial of degree n = 1 around where √12 (1 − N (x−threshold) 2 x = threshold for the function BFAct(x). E[z − zˆ] =
+∞
z · p(z)dz − 0
≈
z · BF Act(z) · p(z)dz 0
2 threshold+ N
threshold +∞
+
+∞
N (x − threshold) 1 )] · z · p(z)dz [1 − √ (1 − 2 2
2 threshold+ N
z · p(z)dz
threshold + N2 − μ threshold − μ ) − σ(1 − )2 Φ( ) σ(1 − ) σ(1 − ) threshold + N2 − μ + σ(1 − )2 φ( ) σ(1 − ) (11)
= −σ(1 − )2 Φ(
References 1. Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985) 2. Belanger, D., McCallum, A.: Structured prediction energy networks. In: International Conference on Machine Learning, pp. 983–992. PMLR (2016) ˇ 3. Bevandi´c, P., Kreˇso, I., Orˇsi´c, M., Segvi´ c, S.: Discriminative out-of-distribution detection for semantic segmentation. arXiv preprint arXiv:1808.07703 (2018) 4. Chen, J., Li, Y., Wu, X., Liang, Y., Jha, S.: Informative outlier matters: robustifying out-of-distribution detection using outlier mining (2020) 5. Chow, C.: On optimum recognition error and reject tradeoff. IEEE Trans. Inf. Theory 16(1), 41–46 (1970) 6. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006) 7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009) 8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
128
H. Kong and H. Li
9. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0 38 10. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-ofdistribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016) 11. Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606 (2018) 12. Hsu, Y.C., Shen, Y., Jin, H., Kira, Z.: Generalized ODIN: detecting out-ofdistribution image without learning from out-of-distribution data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10951–10960 (2020) 13. Huang, R., Geng, A., Li, Y.: On the importance of gradients for detecting distributional shifts in the wild. In: Advances in Neural Information Processing Systems, vol. 34 (2021) 14. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) 15. Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems, vol. 30 (2017) 16. LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energybased learning. In: Predicting Structured Data, vol. 1 (2006) 17. Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-ofdistribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems, vol. 31 (2018) 18. Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017) 19. Lin, Z., Roy, S.D., Li, Y.: MOOD: multi-level out-of-distribution detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15313–15323 (2021) 20. Liu, W., Wang, X., Owens, J., Li, Y.: Energy-based out-of-distribution detection. Adv. Neural. Inf. Process. Syst. 33, 21464–21475 (2020) 21. Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999) 22. Mohseni, S., Pitale, M., Yadawa, J., Wang, Z.: Self-supervised learning for generalizable out-of-distribution detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5216–5223 (2020) 23. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011) 24. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015) 25. Ranzato, M., Poultney, C., Chopra, S., Cun, Y.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, vol. 19 (2006) 26. Ranzato, M., Boureau, Y.L., Chopra, S., LeCun, Y.: A unified energy-based framework for unsupervised learning. In: Artificial Intelligence and Statistics, pp. 371– 379. PMLR (2007) 27. Salakhutdinov, R., Larochelle, H.: Efficient learning of deep Boltzmann machines. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 693–700. JMLR Workshop and Conference Proceedings (2010)
BFAct: Out-of-Distribution Detection
129
28. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018) 29. Selesnick, I.W., Burrus, C.S.: Generalized digital butterworth filter design. IEEE Trans. Signal Process. 46(6), 1688–1694 (1998) 30. Sun, Y., Guo, C., Li, Y.: React: out-of-distribution detection with rectified activations. In: Advances in Neural Information Processing Systems, vol. 34 (2021) 31. Tu, L., Gimpel, K.: Learning approximate inference networks for structured prediction. arXiv preprint arXiv:1803.03376 (2018) 32. Van Horn, G., et al.: The inaturalist species classification and detection dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8769–8778 (2018) 33. Xie, J., Lu, Y., Gao, R., Zhu, S.C., Wu, Y.N.: Cooperative training of descriptor and generator networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(1), 27–45 (2018) 34. Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S.C., Wu, Y.N.: Learning descriptor networks for 3D shape synthesis and analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8629–8638 (2018) 35. Xie, J., Zhu, S.C., Wu, Y.N.: Learning energy-based spatial-temporal generative convnets for dynamic patterns. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 516–531 (2019) 36. Yang, J., Zhou, K., Li, Y., Liu, Z.: Generalized out-of-distribution detection: a survey. arXiv preprint arXiv:2110.11334 (2021) 37. Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016) 38. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation Yuhan Suo1 , Rui Wang2 , Senchun Chai1(B) , Runqi Chai1 , and Mengwei Su1 1
School of Automation, Beijing Institute of Technology, Beijing 100081, People’s Republic of China {yuhan.suo,chaisc97,r.chai,3220180645}@bit.edu.cn 2 Institute of Automation, Chinese Academy of Sciences, Beijing 100190, People’s Republic of China [email protected] Abstract. This paper mainly studies the problem of sample generation for imbalanced intrusion datasets. The NKB-SMOTE algorithm is proposed based on the SMOTE algorithm by combining the K-means algorithm and using a mixture of oversampling and undersampling methods. The Synthetic Minority Oversampling (SMOTE) Technique sample generation is performed on the minority class samples in the boundary cluster, the Tomek links method is used for the majority class samples in the boundary cluster to undersample the boundary cluster, and the NearMiss-2 method is used to undersample the overall data. Then, multiclassification experiments are conducted on the UNSW-NB15 dataset, and the results show that the proposed NKB-SMOTE algorithm can improve the generation quality of samples and alleviate the fuzzy class boundary problem compared with the traditional SMOTE algorithm. Finally, the actual experiment also verifies the effectiveness of the intrusion detection model based on NKB-SMOTE in real scenarios. Keywords: Network security · Intrusion detection learning · Sample generation · NBK-SMOTE
1
· Machine
Introduction
With the development of science and technology, the Internet is constantly integrating with our work, study and life, and has become an indispensable part of human society [1]. However, new cybersecurity threats are emerging, such as the 2016 cyberattack on an energy company in Ukraine that caused widespread power outages in Kyiv, and the ransomware spread rapidly around the world in 2017, causing huge losses to people’s information security and property security [2,3]. Therefore, ensuring network security has also become a necessary link. Industrial system intrusion detection is an important part of network security protection. At present, the research of intrusion detection mainly includes This work was supported by the Basic Research Program under Grant JCKY******* B029. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 130–147, 2023. https://doi.org/10.1007/978-981-99-0617-8_10
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
131
four aspects, namely statistical analysis, data mining, feature engineering and machine learning. The intrusion detection technology based on statistical analysis is based on the analysis of network behavior by statistical means to determine whether the system is under attack, including regularization entropy based on mixed standard deviation [4], multivariate model based on attribute feature correlation [5], Marko based Intrusion detection is carried out using methods such as the husband process [6]. Intrusion detection based on data mining is based on the correlation between a large number of data to find the law between data [7], the most common method is the intrusion detection model based on K-means clustering algorithm [8], it has been proved that classification analysis methods such as mapping data feature attributes into different types can improve the detection effect. Intrusion detection based on feature engineering mainly includes feature dimensionality reduction and feature selection. Feature dimensionality reduction can extract the same features of attributes to reduce the data dimension [9], and feature selection can select the most effective features from a series of features [10]. Both methods can improve the detection efficiency to a certain extent. Intrusion detection based on machine learning is an emerging intrusion detection method in recent years [11,12], which has higher detection accuracy than traditional methods. It can be seen that most of the intrusion detection algorithms are based on data. However, because the industrial intrusion dataset is usually unbalanced, the classification effect of the minority class samples is often poor. In the actual system, the harm caused by the minority class samples is often greater, so how to improve the detection effect of the minority class samples remains to be studied. In order to improve the classification effect of the classification model on the minority data in the imbalanced data set, researchers have carried out a lot of research. The main method is to balance the dataset by resampling before training the model. The methods of resampling mainly include undersampling, oversampling and mixed sampling. The undersampling algorithm balances the dataset by reducing the majority class samples in the data, the oversampling algorithm balances the dataset by increasing the minority class samples in the data, and the mixed sampling combines the characteristics of undersampling and oversampling. The simplest undersampling algorithm is simple random undersampling, which randomly deletes the majority class sample data, but may lose important information; Bo et al. proposed an undersampling algorithm based on majority class classification, which eliminates the randomness of deleting samples [13]; Lin et al. cluster the majority class samples, and then delete the samples [14]; Padmaja et al. perform undersampling based on the filtering method, which improves the adaptive ability of sampling [15]. The oversampling algorithm may lead to overfitting due to the invalid replication of minority class samples. In order to avoid the occurrence of overfitting, Kang et al. [16] proposed an oversampling method based on K-nearest neighbors, and Chawla et al. [17] proposed a random oversampling-based SMOTE algorithm, which is also one of the classic algorithms in oversampling algorithms. Some scholars proposed improved algorithms based on the SMOTE algorithm, which greatly improving the effect of the original SMOTE algorithm [18–21]. Mixed sampling is mainly a combination
132
Y. Suo et al.
of the above two sampling methods. Seiffert et al. proposed a mixed sampling method that combines random undersampling and random oversampling [22]. It is proved by experiments that the mixed sampling method has better classification effect than single sampling. Li et al. proposed a hybrid sampling method combining the distance-based undersampling method and the improved SMOTE algorithm to improve the accuracy of data classification [23]. The research on the processing of unbalanced intrusion datasets helps to improve the detection of intrusion detection, especially for a small number of samples, and is of great practical importance for industrial system security problems. Therefore, this paper investigates the sample generation problem of unbalanced intrusion dataset, and the innovation points are shown as follows. – This paper improves the SMOTE algorithm to make up for the problems of fuzzy class boundary and uneven distribution of generated samples in the original algorithm. – The proposed NKB-SMOTE algorithm not only improves the quality of the newly generated samples, but also enables the data to be balanced faster. – The classification experiments for the existing dataset and the actual industrial control system dataset illustrate the effectiveness of the proposed algorithm compared to the traditional algorithm. In the second section, the relevant knowledge used in this paper is introduced. In the third section, the NKB-SMOTE sample generation algorithm based on the SMOTE technology is proposed to generate samples. Finally, machine learning multi-classification experiments are carried out on the UNSW-NB15 dataset and the dataset obtained from real scenes, which further verifies the effectiveness of the algorithm proposed in this paper.
2 2.1
Related Knowledge Intrusion Detection System
Intrusion Detection (ID) [24]: By monitoring the working network system, it can detect intrusion attempts, stop intrusion behaviors, and find intrusion results to ensure network system security. By recording network flow information and analyzing information characteristics, the attack behavior and abnormal operation can be judged. Essentially, ID is a classification operation on characteristic data such as network flow and system operation logs. Intrusion Detection System (IDS) [25]: It is a defense system that protects network data, classifying and identifying network flow data in real time, and proactively issuing alarms and taking defensive measures. The basic structure of IDS is shown in the Fig. 1, and the main contents are as follows. – Event generator: obtain network data stream and provide current data stream events to other parts of IDS; – Event analyzer: analyze the network data flow and pass the analysis results to the responder;
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
133
Fig. 1. The schematic diagram of the basic structure of the intrusion detection system.
– Responder: react to the event analysis result; – Event database: store intermediate data and final data. Intrusion detection systems can be divided into the following three categories according to the data source: – Host-based IDS: Its data sources are mainly log files (application logs, router logs, system logs, etc.), port usage records, and host audit information. Detection is performed through user usage records, but the detection is often delayed, that is, after the host is attacked, the intrusion is detected and alarm information is given. – Network-based IDS: Its data source is mainly network packet data packets in the monitored network. IDS obtains data packets from the network card (wired or wireless) connected to the network device, and detects the attack behavior in the network through feature matching analysis. – IDS based on mixed data sources: its data sources include the data sources of the first two IDSs, including audit records from hosts and data packets in the monitored network. The IDS based on mixed data sources is usually distributed, and can simultaneously discover the attack behavior inside the host and the abnormal behavior in the detected network. 2.2
SMOTE Technology
The full name of SMOTE technology is Synthetic Minority Oversampling Technique. Unlike random oversampling algorithm, SMOTE technology is no longer limited to simple copying of minority class samples, but by inserting new samples between minority class sample points and their adjacent minority class sample
134
Y. Suo et al.
points, which avoids the occurrence of model overfitting problems that may be caused by random oversampling techniques, as shown in Fig. 2.
Fig. 2. The schematic diagram of new sample points generated by SMOTE technology.
Assuming that the number of minority class samples in an imbalanced dataset is T , the expected number of minority class samples to increase is N T . Among them, N is the sample multiple to increase, (usually an integer). For each minority class sample point xi , i ∈ 1, 2, ..., T , K-nearest minority class sample points (k ≥ N ) are found by Euclidean distance. Then, N sample points (xnear1 , xnear2 , ..., xnearN ) are randomly selected from them, and the minority class sample points xi and each xnear are brought into the interpolation formula (1). In this way, the interpolation sample point xnew is obtained, which is the newly generated minority class sample point. By performing the above operations on T minority class sample points, N T new minority class sample points can be obtained to achieve the purpose of balancing the dataset. xnew = xi + RAN D (0, 1) ∗ (xnear − xi ) ,
(1)
where RAN D(0, 1) is a number randomly generated in the interval [0, 1]. 2.3
Evaluation Metrics
It is well known that one of the important evaluation metrics of intrusion detection performance is accuracy, which indicates the ratio of the number of correctly classified samples to the total number of samples, and one drawback of accuracy rate is that it does not perfectly reflect the effect of the model. Therefore, the metric of precision rate is proposed. The precision indicates the proportion of the number of correctly classified positive samples to the number of all predicted positive samples, and this metric is inversely proportional to the magnitude of the false positive rate. Therefore, this paper will subsequently adopt the precision as one of the evaluation metrics.
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
135
The AUC constant is usually used to evaluate the merit of a binary classifier, and its geometric meaning is the area under the curve in the ROC curve. The Fig. 3 shows the ROC curve, whose vertical axis is the true positive rate, i.e., the ratio of the number of positive sample predicted outcomes to the actual number of positive samples, and the horizontal axis is the false positive rate, i.e., the ratio of the number of negative sample outcomes predicted to be positive to the actual number of negative samples. Obviously, the closer the value of AUC is to 1 and the closer the ROC curve is to the upper left corner of the coordinate, the better the classifier is, which is an intuitive evaluation indicator.
Fig. 3. The ROC curve diagram.
2.4
The Statement of Problem
The SMOTE technology analyzes the minority class samples and artificially synthesizes new samples to add to the dataset to alleviate the overfitting problem caused by simply copying the sample points, but it also has some defects: – The quality of synthetic samples: if one of the root sample and the auxiliary sample is a noise sample, the new sample will most likely fall in the majority class area; – Fuzzy class boundary problem: Synthesize new samples from the minority class samples at the class boundary, and the minority class samples synthesized by interpolation will also fall in the overlapping area of the two classes, thus blurring the boundaries of the two classes more; – Minority class distribution problem: The original minority class dense area is still relatively dense after SMOTE, while the sparse distribution area is still relatively sparse.
136
Y. Suo et al.
In order to avoid the above-mentioned problems of SMOTE, this paper will study a new sample generation algorithm based on SMOTE technology, which can improve the quality of sample generation, blur the class boundary problem, and make the data reach balance faster.
3
Intrusion Detection Model Based on SMOTE Sample Generation
In order to avoid the above-mentioned problems of SMOTE technology, this section proposes the NKB-SMOTE algorithm to optimize the SMOTE technology. It not only combines the K-means algorithm and the SMOTE algorithm, but also uses a mixture of oversampling and undersampling. 3.1
The Theoretical Basis of NKB-SMOTE Algorithm
In the original SMOTE technology, during the sample generation process of the interpolation operation, it is necessary to set the number of adjacent sample points k of the minority class samples, which has a certain blindness, and the optimal value can only be obtained through repeated testing experiments. Such operations are not only time-consuming, but also cannot guarantee optimal values. The cluster center of the K-means algorithm after the clustering of the minority class samples is used as the core point of the interpolation formula, and the samples of the SMOTE technology are generated in combination with other minority class sample points in the cluster. At the same time, the majority class samples within the cluster are undersampled by Tomek links. This processing not only solves the blindness of choosing the k value of the number of adjacent samples, but also alleviates the problem of fuzzy class boundaries. In the NKB-SMOTE algorithm, when the K-means clustering algorithm is used to cluster the minority class samples in the dataset, it should be noted that the cluster center of the cluster must be limited to the minority class sample points. After the clustering is completed, k clusters are obtained, and each cluster is classified by the idea of classifying the minority class samples using the SMOTE algorithm. As shown in Fig. 4, according to the number of minority class samples and majority class samples in the cluster, the cluster is divided into a safe cluster S, a noise cluster N , and a boundary cluster B = (B1 , B2 , ..., Bm ), where m represents the number of boundary clusters. For the boundary cluster, with the cluster center as the core point, combined with other minority class sample points in the cluster, new sample points are generated by interpolation formula (1), and then the majority class samples in the cluster are undersampled by Tomek links algorithm. As shown in Fig. 5, it can be seen intuitively that the newly generated sample points are always in the circle with the radius of the connecting line between the cluster center and each minority class sample point, that is, always in the minority class area, so as to avoid the problem of marginalization of newly generated sample points. In addition, the Tomek links algorithm is used to undersample the majority class sample points in the cluster, so that
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
137
Fig. 4. The schematic diagram of clustering with minority class samples as cluster centers.
the boundary between the majority class and the minority class is clearer, which contributes to improve the classification accuracy. Finally, while the minority class samples are generated in the boundary cluster and the majority class samples are undersampled by the Tomek links algorithm, the majority class samples are undersampled by the NearMiss-2 algorithm on the entire dataset. The imbalance classification performance of the NearMiss-2 algorithm is better than that of the Tomek links algorithm, which can further speed up the data balance.
Fig. 5. The schematic illustration of oversampling and undersampling within boundary clusters.
It can be seen from the above description and schematic diagram that the NKB-SMOTE algorithm avoids the selection of the k value of the number of adjacent samples through the K-means clustering method, makes a pre-judgment for each cluster, and uses a mixture of oversampling and undersampling in the boundary clusters. The generated sample points are all in the cluster, which effectively alleviates the problem of fuzzy class boundaries in the traditional SMOTE algorithm, and the Tomek links algorithm undersampling can make the class boundaries clearer. And each minority class sample point in the boundary cluster is used only once, which is different from the repeated use of minority
138
Y. Suo et al.
class samples in the traditional SMOTE algorithm, which can effectively avoid the generation of duplicate data and meaningless data. Finally, by combining the global NearMiss-2 algorithm with undersampling of the majority class samples of the dataset, the dataset can reach balance faster. 3.2
The Research Content of NKB-SMOTE Algorithm
The focus of the research content of NKB-SMOTE algorithm is cluster classification and mixed sampling within boundary clusters. Suppose the given dataset is X = {x1 , x2 , ..., xn }, where n represents the number of samples in the dataset. The set of minority class samples in the dataset is Xmin = {x1 , x2 , ..., xp }, where p represents the number of minority class samples in the dataset. The set of samples of the majority class in the dataset is Xmax = {xp+1 , xp+2 , ..., xn }, where n − p + 1 represents the number of majority class samples in the dataset. K-means is performed with the sample points in the set of minority class samples Xmin of the dataset X as the cluster center Clustering operation, after the clustering is completed, the set of clusters C = (C1 , C2 , ..., Ck ) is obtained, where k is the number of clusters. The sample points in each cluster may contain several minority class samples and several majority class samples. According to the quantitative relationship between the majority class samples and the minority class samples in the cluster, the type of the cluster is judged. 1. Security Cluster Judgment: Calculate the number of minority class samples and the number of majority class samples in clusters Ci , i = 1, 2, ..., k, after clustering is completed. As shown in formula (2), if the number of minority class samples is greater than the number of majority class samples (including all minority class samples and the number of majority class samples is 0), the cluster can be judged as a Security cluster. (2) |Ci ∩ Xmin | ≥ |Ci ∩ Xmax | . 2. Noise Cluster Judgment: Calculate the number of minority class samples and the number of majority class samples in clusters Ci , i = 1, 2, ..., k, after clustering is completed. As shown in formula (3), if the number of minority class samples is 1 (except for the cluster center, which are all majority classes), it can be known that the cluster center is a noise point, and the cluster can be judged as a noise cluster. |Ci ∩ Xmin | = 1.
(3)
3. Boundary cluster judgment: Calculate the number of minority class samples and the number of majority class samples in clusters Ci , i = 1, 2, ..., k, after clustering is completed. As shown in formula (4), if the number of samples of the minority class is greater than 1 and less than the number of samples of the majority class, it can be known that the cluster center is the boundary point, and the cluster can be
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
139
judged as the boundary cluster Bj ∈ (B1 , B2 , ..., Bm ), where 0 < j ≤ m, where m is the number of boundary clusters. 1 < |Ci ∩ Xmin | < |Ci ∩ Xmax | .
(4)
After completing the cluster classification, it is necessary to oversample and undersample the minority class samples and majority class samples in the boundary cluster B = (B1 , B2 , ..., Bm ), respectively. In the original SMOTE algorithm, each minority class sample point is used as a core point, combined with its K-nearest neighbor samples, and the samples are generated by interpolation formula. In the NKB-SMOTE algorithm, after the boundary cluster B = (B1 , B2 , ..., Bm ) is obtained, the cluster center of the boundary cluster is the core point, and the sample is generated by combining other minority sample points in the cluster. The interpolation formula of NKBSMOTE algorithm is shown in formula (5). xnew = ci + RAN D (0, 1) ∗ (xj − ci ) .
(5)
where xnew is a newly generated sample, ci , i = 1, 2, ..., m, is the cluster center of the i-th boundary cluster, RAN D(0, 1) is a randomly generated number between 0 and 1, xj , j = 1, 2, ..., t, represents the original minority sample points in the cluster except for the cluster center, and t is the number of original minority samples in the cluster. After completing the classification of the clusters, the Tomek links algorithm is also required to undersample the majority class samples in the boundary cluster B = (B1 , B2 , ..., Bm ): 1. Calculate the distance d(xmin , xmax ) between each minority class sample in the boundary cluster and all majority class samples in the cluster. 2. Judging the distance between each minority class sample point xmin , the majority class sample point xmax pair, whether there is a sample point y in the cluster (not limiting the majority class or minority class) satisfies any of the following formulas (6–7): d (xmin , y) < d (xmin , xmax ) ,
(6)
d (y, xmax ) < d (xmin , dmax ) .
(7)
3. If there is no such y sample, the pair xmin , xmax is the Tomek Links sample pair, and the majority class samples in the Tomek Links sample pair are deleted. 3.3
Detailed Description of NKB-SMOTE Algorithm
The specific operation steps of the NKB-SMOTE algorithm are shown as follows: Suppose the given dataset is X, where the minority class sample set is Xmin and the majority class sample set is Xmax .
140
Y. Suo et al.
1. Set the initial cluster center number k of K-means algorithm according to the given data set, randomly select k samples from the minority class samples as the initial cluster center, and use the K-means algorithm to perform the clustering operation on the given data set. Note that when updating the cluster center, the minority class sample point closest to the cluster mean is selected as the new cluster center. 2. In the cluster after the statistical clustering is completed, the number of minority class samples and the number of majority class samples, according to formula (2), formula (3) and formula (4) to determine the category of each cluster (security cluster, noise cluster and boundary cluster). 3. Oversampling the minority class samples of the boundary cluster, taking the cluster center as the core, combining with other original minority class samples in the cluster, and using the interpolation formula (5) to generate new sample points. 4. Undersampling the majority class samples of the boundary cluster, use the Tomek links algorithm to calculate the distance between each minority class sample and all majority class samples, and select the Tomek links sample pair according to formula (6–7), and delete the Tomek links sample pair in the sample pair. majority class sample. 5. Undersampling the majority class samples of the entire dataset, using the NearMiss-2 algorithm to select those majority class samples with the smallest average distance to the three furthest minority samples. 6. Determine whether the data set is close to the balance, if not, iterate the process of steps 3–5 until the data set is close to the balance, the algorithm ends.
4
Experimental Results and Evaluation
In order to verify the effectiveness of the NKB-SMOTE algorithm in network intrusion detection applications, this paper uses the NKB-SMOTE algorithm to generate samples from the UNSW-NB15 dataset, and then trains five common machine learning algorithms for intrusion detection. By comparing the classification accuracy of different sample generation algorithms, the performance of intrusion detection based on NKB-SMOTE algorithm is illustrated. Finally, the actual data acquisition and intrusion detection experiments are carried out for the sewage treatment industrial control system, which illustrates the detection performance of the algorithm in this paper in the real scene. 4.1
Data Preprocessing
The UNSW-NB15 dataset was collected by the Australian Cyber Security Center Laboratory in 2015 through real network environments, recording real modern normal activities and modern integrated attack behaviors [26,27]. Therefore, the UNSW-NB15 dataset can better reflect the characteristics of modern network traffic. The attack types included in this dataset are 9 categories: F ussers,
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
141
Analysis, Backdoors, Dos, Exploits, Generic, Reconnaissance, Shellcode, and W orms. In this dataset, normal traffic data N ormal accounts for the largest proportion, and attack types Generic and Exploits account for most of the intrusion data. Therefore, the data distribution of this dataset is unbalanced. Since the UNSW-NB15 dataset represents traffic records by traffic feature sequences, the representation forms are divided into character type, binary type and numerical type according to different characteristics, and the continuity can be divided into discrete type and continuous type. Machine learning algorithms require data to be only numerical, so before training machine learning classification algorithms, feature data needs to be preprocessed. Preprocessing includes character data digitization and continuous data standardization to ensure data readability and eliminate singular sample data. 1. Numericalization: There are some features in the UNSW-NB15 dataset that are useless for intrusion detection, so attributes such as IP address and port number should be removed first, and then each record in the dataset can be equivalent to a 42 feature value and 2 label values. vector. Then, the One-Hot method is used to convert the character data into numerical data, and finally, each record is programmed with 200-dimensional data. 2. Standardization: The value ranges of different attributes in the data set are very different, so we need to standardize the data, that is, scale the data to a specific interval first, so that the mean of the data is 0 and the variance is 1. Zik =
¯k xik − x , Sk
(8)
where Zik represents the k-th eigenvalue of the i-th data after z-score ¯k = normalization, xik represents the k-th eigenvalue of the i-th data, x n 1 x represents the mean value of the k-th eigenvalue, and S k = i=1 ik n n 2 1 ¯k ) represents the mean absolute deviation of the k-th i=1 (xik − x n eigenvalue. Then the data is normalized to the interval [0, 1] by formula (9),
Zik =
Zik − Zmin , Zmax − Zmin
(9)
where Zik represents the k-th eigenvalue of the i-th data after normalization, and Zmin and Zmax represent the minimum and maximum values of the k-th eigenvalue of the data, respectively. 4.2
Experimental Results
On the basis of data preprocessing, this paper carried out a multi-classification experiment of industrial control system intrusion detection. In order to illustrate the effectiveness of the proposed algorithm, five machine learning algorithms were tested on the original dataset and after processing by different sample generation algorithms. Intrusion detection is performed on the data set, as
142
Y. Suo et al.
shown in Table 1, the multi-classification accuracy of five machine learning algorithms under different sample generation algorithms, among which, category 1−9 represents categories F ussers, Analysis, Backdoors, Dos, Exploits, Generic, Reconnaissance, Shellcode, and W orms. It can be seen that when no sample generation algorithm is used, the five algorithms have poor classification results for some few-category samples. After using the sample generation algorithm, for a few types of attacks, such as ‘Analysis’, ‘Backdoor’, ‘Shellcode’ and ‘Worms’, the traditional SMOTE algorithm can significantly improve the classification effect of some classification algorithms, such as ‘Analysis’ attack type classification effect in decision tree algorithm and random forest algorithm, ‘Backdoor’ and ‘Worms’ attack type in decision tree algorithm, classification effect of random forest algorithm, etc. The NKB-SMOTE algorithm can improve the classification effect of all classification algorithms for minority class attack samples, and the improvement is more significant and stable. For most types of attack types, such as ‘DoS’, ‘Exploits’, ‘Generic’ attack types, the traditional SMOTE algorithm can not significantly improve its classification effect, and even the classification effect will be slightly worse, such as the classification effect of the ‘DoS’ attack type in the multilayer perceptron algorithm, the classification effect of the ‘Exploits’ attack type in the Naive Bayes algorithm, etc. The classification effect of the algorithm, the classification effect of the ‘Exploits’ attack type in the Naive Bayes algorithm, etc. The NKBSMOTE algorithm can not only improve the classification effect of the minority class attack samples, but also improve the classification effect of the majority class attack type. As can be seen from the comparison chart of AUC values in Fig. 6, the average AUC value of the traditional SMOTE algorithm has improved compared with the average AUC value of the original data set, but the improvement is not obvious. In fact, the AUC value of individual attack types has even decreased. The average AUC value of the NKB-SMOTE algorithm has been significantly improved. Compared with the original data set, the average AUC value of the three algorithms has increased by more than 10%, and is basically above 0.9, that is, the detection authenticity is very high. 4.3
Practical Application of NKB-SMOTE Sample Generation Algorithm
In the previous section, the algorithm proposed in this paper is used on the UNSW-NB15 dataset. In order to make the network traffic more realistic, this chapter simulates four common network attacks in the common sewage treatment industrial control system, and captures the real network traffic data. The data is then transformed into a format that can be used for intrusion detection classification through feature extraction. Finally, intrusion detection experiments of different methods are carried out on the new dataset obtained. The sewage treatment industrial control system has the characteristics of being simple and easy to understand, and is a typical control system in the
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
143
Fig. 6. Comparison of the average AUC of each algorithm under different sample generation algorithms.
process control industry such as water, chemical industry, and petroleum treatment. Therefore, it can be used as a general test platform for information security of industrial control systems. This sewage treatment industrial control system adopts a dual-capacity water tank control mode, which is a typical nonlinear and time-delayed object system. Figure 7(a) and Fig. 7(b) show the equipment diagram of the sewage treatment industrial control system and the control cabinet of the sewage treatment industrial control system, respectively. By connecting the attack equipment to the sewage treatment industrial control system, this paper simulates different types of attacks to attack the PLC controller, and uses the network packet capture tool Wireshark to sniff and collect network traffic. Then, this paper uses CICFlowMeter to extract features from traffic data and save it as a CSV file. Statistics on data types show that Dos attacks account for the largest proportion, while U2R and R2L attack categories account for a small proportion. Therefore, this dataset is an imbalanced dataset.
144
Y. Suo et al.
Table 1. Multi-classification accuracy of each machine learning algorithm in different sample generation algorithms. Types
Algorithm
Accuracy (%) LR DT RF
SVM
MLP
Fussers
None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE None SMOTE NKB-SMOTE
93.11 89.49 94.13 0 50.42 69.27 0 43.36 58.92 33.33 34.62 47.62 79.00 80.31 87.63 97.02 97.41 97.45 82.29 84.84 88.34 0 19.81 21.98 0 55.30 57.17
98.00 96.21 95.38 0 26.32 58.38 0 54.33 66.30 33.90 57.39 62.84 79.83 74.27 84.33 96.34 97.37 97.78 70.32 79.30 83.43 0 38.43 43.73 0 74.40 100
89.67 89.70 92.31 0 79.54 86.23 0 83.02 85.09 33.79 33.25 37.53 76.77 74.27 80.07 98.20 100 100 76.27 83.38 85.02 50.04 29.62 78.37 0 70.42 88.42
Analysis
Backdoors
DoS
Exploits
Generic
Reconnaissance
Shellcode
Worms
(a) Equipment diagram
87.28 92.03 93.33 10.89 67.55 82.85 20.60 74.33 91.27 30.33 30.87 36.26 73.73 73.09 81.47 97.01 97.87 98.03 89.29 83.36 91.25 42.72 42.63 77.36 30.00 65.47 95.87
94.15 92.27 94.27 0 100 90.25 0 95.02 84.43 33.25 40.22 46.36 69.19 66.02 79.62 99.00 99.05 100 90.11 91.27 95.27 80.20 71.75 84.28 50.67 67.98 95.36
(b) Control cabinet
Fig. 7. Sewage treatment industrial control system.
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
145
Based on the above datasets, we conduct multi-classification experiments on real datasets. Five machine learning algorithms are used to perform intrusion detection on the original data set and the data set processed by different sample generation algorithms. Table 2 shows the multi-classification accuracy of the five machine learning algorithms under different sample generation algorithms. Among them, category 1–4 represents categories DoS, P robe, U 2R, and R2L. It can be seen that before and after using the NKB-SMOTE sample generation algorithm in this paper, the intrusion detection accuracy of almost all attack types has been significantly improved, which also shows the effectiveness of the NKB-SMOTE algorithm for real network data collected by the sewage treatment industrial control system, which further proves the effectiveness of intrusion detection based on NKB-SMOTE sample generation in real scenarios. Table 2. Multi-classification accuracy of each machine learning algorithm in different sample generation algorithms. Types Algorithm DoS
Accuracy (%) LR DT RF
SVM MLP
None 1.00 75.12 95.82 1.00 76.74 NKB-SMOTE 98.12 93.72 95.47 97.69 93.27
Probe None 76.67 91.47 92.02 97.04 84.18 NKB-SMOTE 88.25 95.72 95.73 96.88 91.54 U2R
None 80.96 28.30 31.43 62.53 39.79 NKB-SMOTE 87.76 90.62 90.47 85.34 87.24
R2L
None 38.13 0 0 42.24 4.18 NKB-SMOTE 70.40 82.26 84.29 80.37 81.32
In summary, the multi-classification effect of the network data set of wastewater treatment industrial control system has been significantly improved after being sampled by the NKB-SMOTE algorithm, which proves that the NKBSMOTE algorithm is effective for the real network data collected from wastewater treatment industrial control system, and further proves the effectiveness of the intrusion detection algorithm generated based on NKB-SMOTE sampling in practical scenarios.
5
Conclusion
This paper mainly studies the application of SMOTE sample generation algorithm in intrusion detection. Firstly, the content and defects of traditional SMOTE algorithm are introduced. Then, the NKB-SMOTE algorithm is proposed, and the theoretical basis, research content and algorithm flow of the NKB-SMOTE algorithm are introduced. Finally, in order to verify the application effect of the NKB-SMOTE algorithm in intrusion detection and compare
146
Y. Suo et al.
the advantages of the traditional SMOTE algorithm, based on the UNSW-NB15 dataset, five commonly used machine learning algorithms are used to conduct multi-classification experiments. According to the experimental results, the classification effects of traditional SMOTE algorithm and NKB-SMOTE algorithm for minority class samples and majority class samples in multi-classification scenarios are analyzed respectively. In the future, the application of new machine algorithms in intrusion detection will be considered, and different classification algorithms will be combined in intrusion detection through ensemble learning.
References 1. Index, E.: Global. Nature 522(7556), S1-27 (2015) 2. Ali, S., Al Balushi, T., Nadir, Z., Hussain, O.K.: Cyber Security for Cyber Physical Systems, vol. 768, pp. 11–33. Springer, Cham (2018). https://doi.org/10.1007/9783-319-75880-0 3. Zhang, J., Pan, L., Han, Q.L., Chen, C., Wen, S., Xiang, Y.: Deep learning based attack detection for cyber-physical system cybersecurity: a survey. IEEE/CAA J. Autom. Sinica 9(3), 377–391 (2021) 4. Ashfaq, A.B., Javed, M., Khayam, S.A., Radha, H.: An information-theoretic combining method for multi-classifier anomaly detection systems. In: 2010 IEEE International Conference on Communications, pp. 1–5 (2010) 5. Ye, N., Emran, S.M., Chen, Q., Vilbert, S.: Multivariate statistical analysis of audit trails for host-based intrusion detection. IEEE Trans. Comput. 51(7), 810– 820 (2002) 6. Hajji, M., et al.: Multivariate feature extraction based supervised machine learning for fault detection and diagnosis in photovoltaic systems. Eur. J. Control. 59, 313– 321 (2021) 7. Yang, Y., Xu, X., Wang, L., Zhong, W., Yan, C., Qi, L.: Fast anomaly detection based on data stream in network intrusion detection system. In: ACM Turing Award Celebration Conference-China (ACM TURC 2021), pp. 87–91 (2021) 8. Tan, L., Li, C., Xia, J., Cao, J.: Application of self-organizing feature map neural network based on K-means clustering in network intrusion detection. Comput. Mater. Continua 61(1), 275–288 (2019) 9. Luo, F., Zou, Z., Liu, J., Lin, Z.: Dimensionality reduction and classification of hyperspectral image via multistructure unified discriminative embedding. IEEE Trans. Geosci. Remote Sens. 60, 1–16 (2021) 10. Kushwaha, P., Buckchash, H., Raman, B.: Anomaly based intrusion detection using filter based feature selection on KDD-CUP 99. In: TENCON 2017–2017 IEEE Region 10 Conference, pp. 839–844 (2017) 11. Jamalipour, A., Murali, S.: A taxonomy of machine learning based intrusion detection systems for the internet of things: a survey. IEEE Internet Things J. 9, 9444– 9466 (2021) 12. Tang, T.A., Mhamdi, L., McLernon, D., Zaidi, S.A.R., Ghogho, M.: Deep learning approach for network intrusion detection in software defined networking. In: 2016 International Conference on Wireless Networks and Mobile Communications (WINCOM), pp. 258–263 (2016) 13. Sun, B., Chen, H., Wang, J., Xie, H.: Evolutionary under-sampling based bagging ensemble method for imbalanced data classification. Front. Comp. Sci. 12(2), 331– 350 (2018). https://doi.org/10.1007/s11704-016-5306-z
NKB-S: Network Intrusion Detection Based on SMOTE Sample Generation
147
14. Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 409, 17–26 (2017) 15. Padmaja, T.M., Krishna, P.R., Bapi, R.S.: Majority filter-based minority prediction (MFMP): an approach for unbalanced datasets. In: TENCON 2008–2008 IEEE Region 10 Conference, pp. 1–6 (2008) 16. Kang, Q., Chen, X., Li, S., Zhou, M.: A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans. Cybern. 47(12), 4263–4274 (2016) 17. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) 18. Li, J., Zhu, Q., Wu, Q., Fan, Z.: A novel oversampling technique for classimbalanced learning based on SMOTE and natural neighbors. Inf. Sci. 565, 438– 455 (2021) 19. Dolo, K.M., Mnkandla, E.: Modifying the SMOTE and safe-level SMOTE oversampling method to improve performance. In: Woungang, I., Dhurandher, S.K. (eds.) 4th International Conference on Wireless, Intelligent and Distributed Environment for Communication. LNDECT, vol. 94, pp. 47–59. Springer, Cham (2022). https:// doi.org/10.1007/978-3-030-89776-5 4 20. He, H., Bai, Y., Garcia, E. A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008) 21. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059 91 22. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J.: Hybrid sampling for imbalanced data. Integr. Comput.-Aided Eng. 16(3), 193–210 (2009) 23. Li, H., Zou, P., Wang, X., Xia, R.: A new combination sampling method for imbalanced data. In: Sun, Z., Deng, Z. (eds.) Proceedings of 2013 Chinese Intelligent Automation Conference. LNEE, vol. 256, pp. 547–554. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38466-0 61 24. Dina, A.S., Manivannan, D.: Intrusion detection based on Machine Learning techniques in computer networks. Internet Things 16, 100462 (2021) 25. Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., Ahmad, F.: Network intrusion detection system: a systematic study of machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 32(1), e4150 (2021) 26. Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. Glob. Perspect. 25(1–3), 18–31 (2016) 27. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp. 1–6 (2015)
Mastering “Gongzhu” with Self-play Deep Reinforcement Learning Licheng Wu , Qifei Wu , Hongming Zhong , and Xiali Li(B) School of Information Engineering, Minzu University of China, Beijing, China [email protected]
Abstract. “Gongzhu” is a card game popular in Chinese circles at home and abroad, which belongs to incomplete information game. The game process is highly reversible and has complex state space and action space. This paper proposes an algorithm that combines the Monte-Carlo (MC) method with deep neural networks, called the Deep Monte-Carlo (DMC) algorithm. Different from the traditional MC algorithm, this algorithm uses a Deep Q-Network (DQN) instead of the Q-table to update the Q-value and uses a distributed parallel training framework to build the model, which can effectively solve the problems of computational complexity and limited resources. After 24 h of training on a server with 1 GPU, the “Gongzhu” agent performed 10,000 games against the agent that uses a Convolutional Neural Network (CNN) to fit the strategies of human players. “Gongzhu” agent was able to achieve a 72.6% winning rate, and the average points per game was 63. The experimental results show that the model has better performance. Keywords: Artificial intelligence · Gongzhu · Monte-Carlo · Deep reinforcement learning · Long short-term memory
1 Introduction The study of complete information game has been very mature. Especially in Go, from the original “Deep Blue” [1] to the present AlphaGo [2], AlphaZero [3] and MuZero [4], the agent has reached a high level and can beat the top professional human players. The research of incomplete information game is gradually becoming one of the hot researches due to the difficulty of its algorithm research. Common incomplete information games include Texas Hold’em [5–11], “Doudizhu” [12–16] and Mahjong [17]. In terms of Texas Hold’em, Zhang Meng designed a game solving framework that can quickly adapt to the opponent’s strategy in 2022, including two stages of offline training and online gaming. The performance of the constructed agent has been greatly improved [18]. Zhou Qibin reduced the exploitability of the strategy by considering the possible range of other players’ hands. In heads-up no-limit Texas Hold’em, the constructed agent DecisionHoldem [19] publicly defeated the agent Slumbot [20] and Openstack [21]. In terms of “Doudizhu”, Li Shuqin proposed a “Doudizhu” strategy based on the AlphaBeta pruning algorithm in 2022. In the WeChat applet of “Doudizhu”, the constructed agent has performed several double-player endgame tests and won all of them [22]. Guan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 148–158, 2023. https://doi.org/10.1007/978-981-99-0617-8_11
Mastering “Gongzhu” with Self-play Deep Reinforcement Learning
149
Yang proposed a technique called perfect information distillation, which allows the agent to use global information to guide policy training, and the constructed agent PerfectDou [23] beats the existing “Doudizhu” agent. In terms of mahjong, Wang Mingyan proposed an algorithm based on the combination of knowledge and Monte-Carlo (MC) to construct an adversary model to predict hidden information in 2021. The constructed agent KFTREE [24] won a silver medal in the Mahjong competition at the Computer Olympiad in 2019. Gao Shijing proposed an algorithm based on the combination of deep learning and XGBoost, and the hybrid model constructed has a higher winning rate than the model using either algorithm alone [25]. The research on “Gongzhu” is still in its infancy, and other incomplete information game algorithms need strong hardware resources, and the algorithms used cannot be directly applied to “Gongzhu”. At present, “Gongzhu” mainly adopts a supervised learning method to fit the playing strategy of human players, which is computationally complex and highly dependent on expert knowledge. In this paper, a Deep Monte-Carlo (DMC) algorithm is proposed, which uses a Deep Q-Network (DQN) to replace the Q-table to update the Q-value. The constructed model is based on distributed parallel training, and a large amount of training data can be generated every second, which can greatly improve training efficiency and effectively solve the problems of complex computing and limited resources. After the model is trained on a single GPU server for 24 h, its performance is stronger than that of the “Gongzhu” Convolutional Neural Network (CNN) model, and the constructed agent has a certain strategy for playing cards.
2 “Gongzhu” Background “Gongzhu” consists of a deck of 52 cards with the big and small kings removed. There are a total of 4 players in the game, and each player randomly gets 13 cards. In the game rules of “Gongzhu”, all cards are divided into two categories: “scored cards” and “non-scored cards”. Table 1 shows the corresponding basic scores of “scored cards” in “Gongzhu”. Table 1. Scored cards and their corresponding score. Cards
Hearts A
Hearts K
Hearts Q
Hearts J
Hearts (10–5)
Hearts (4–2)
Spades Q
Diamonds J
Score
−50
−40
−30
−20
−10
0
−100
+100
The “Gongzhu” game is divided into two stages: showing cards and playing cards. In the stage of showing cards, 4 cards that can be shown are set, namely Clubs 10, Diamonds J, Spades Q, and Hearts A. Players can choose to show or not to show according to the initial hand situation, that is, there are 16 kinds of action spaces for showing cards. In the two cases of showing cards and not showing cards, the corresponding scores of the scored cards are different. In the stage of playing cards, the player needs to choose a card of the same suit as the first player in the current round to play. If not, you can choose to
150
L. Wu et al.
discard a card of a different suit than the first, that is, there are 52 kinds of action spaces for playing cards. The actions of the showing cards stage can determine the playing strategy, and the whole game process is very reversible. Please refer to “China Huapai Competition Rules (Trial)” [26] for the detailed rules of “Gongzhu”.
3 Representation of Cards 3.1 The Representation of Showing Cards In the stage of showing cards, a total of 4 cards can be shown, and a 1 × 4 one-hot matrix is used to encode the states and actions of the shown cards, in the form of [x0 x1 x2 x3], and the matrix positions represent the Hearts A, Spades Q, Clubs 10 and Diamond J in turn, if a card is showed by the player, the corresponding matrix position element is set to 1, otherwise, it is set to 0. The element corresponding to the position of the matrix is 1 to show the card, and 0 to not show it. 3.2 The Representation of Playing Cards In the stage of playing cards, each player can only play one card per round, using a 1 × 52 one-hot matrix to encode the states and actions of the played cards, in the form of [x0 x1…… x50 x51], the element corresponding to the matrix position is 1 to indicate the card is played, and 0 to not. The position correspondence of each card in the 1 × 52 one-hot matrix is shown in Table 2. Table 2. The position of all cards in a 1 × 52 one-hot matrix. Cards
Hearts (A–K)
Spades (A–K)
Clubs (A–K)
Diamonds (A–K)
Matrix position
0–12
13–25
26–38
39–51
4 Deep Monte-Carlo Algorithm 4.1 Improved Monte-Carlo Algorithm The traditional MC algorithm and the Q-learning algorithm use Q(s, a) to determine the policy π, that is, the agent selects the action to obtain the maximum benefit according to Q(s, a). The way to evaluate Q(s, a) is to average the returns from all episode visits to (s, a). The update function of the Q-value is shown in (1), where R is the reward function, γ is the discount factor, and ρ is the learning rate. (1) Q(s, a) ← Q(s, a) + ρ R + γ max Q(s , a ) − Q(s, a) a
The evaluation and update process of the Q-value can be naturally combined with the neural network to obtain an improved MC algorithm, namely the DMC. Specifically,
Mastering “Gongzhu” with Self-play Deep Reinforcement Learning
151
Table 3. Pseudo-code of the DMC algorithm. Deep Monte-Carlo Algorithm Input: State-action pair (s, a), learning rate , discount factor Initialization: Q(s, a) for iteration = 1, 2, 3, …… do
For each episode
Initialize s for t = 1, 2, 3, …… T
do
For each step of the episode
Choose a from s using policy derived from Q(e.g., ε-greedy) Take action a, observe R, (Using a neural network and MSE)
Until s is terminal end end
the update of Q(s, a) is done using a neural network and Mean Square Error (MSE) instead of a Q-table. The pseudo-code of the DMC algorithm is shown in Table 3. Due to the high variance, the traditional MC algorithm is very inefficient when dealing with games with incomplete information. The DMC algorithm has been well verified in “Doudizhu” [16], and its algorithm is also very suitable for the “Gongzhu” game. First: “Gongzhu” is an episodic task, it does not need to deal with incomplete episodes; Second: the DMC algorithm can be easily parallelized and can generate more samples per second of data, which improves the efficiency of training; so that the problem of high variance can be solved. Third: The “Gongzhu” agent simulates game states and actions without reward. This situation will slow down the speed of Q-learning and make the model difficult to converge. The DMC algorithm can reduce the impact of this long time without feedback [16]. Therefore, it is feasible to apply the DMC algorithm to the “Gongzhu” game. 4.2 The Representation of States and Actions In this experiment, the action is set to play the card of the “Gongzhu” agent. The state is set to the hand cards of 3 players except for the “Gongzhu” agent, the cards collected by 4 players, and the cards showed by 4 players, the cards played by the 3 players other than the “Gongzhu” agent, the cards played in the current round, and the historical card
152
L. Wu et al.
information of the game in this round, of which the historical card information of this round is sequential data, and the rest are non-sequential data. The state of each player’s shown cards is represented by a 1 × 4 one-hot matrix; the remaining states of each player are represented by a 1 × 52 one-hot matrix, and the historical card information is 13 rounds of 4 players, that is, the size of the one-hot matrix is 13 × 208, and the card information of players of unknown rounds is filled with 0. The characteristic parameters of the “Gongzhu” agent are shown in Table 4. Table 4. Characteristic parameters of the “Gongzhu” agent. Characteristic
Matrix size
Action
The agent plays the card
1 × 52
State
Except for the agent, the other 3 players have hand cards
3 × 52
Cards collected by 4 players
4 × 52
4 players showed cards
4 × 52
Cards played by players other than the agent
3 × 52
Cards that have been played in the current round
1 × 52
The historical card information of this game
13 × 208
4.3 Network Structure Design The input of the “Gongzhu” deep network is the concatenated representation of states and actions, and the output is Q(s, a). Non-sequential data is encoded by a one-hot matrix and connected by flattening matrices of different characteristics, and sequential data is processed by the Long Short-Term Memory (LSTM) algorithm. The DQN consists of an LSTM and a Multi-layer Perceptron (MLP) with 6 layers of hidden dimension of 512, which replaces the Q-table in the traditional MC algorithm to complete the calculation and update of the Q-value. The structure of the “Gongzhu” Q-network is shown in Fig. 1.
5 Experimental Process 5.1 Parallel Training The parallel training method can make the model perform multiple iterations and updates in unit time, which greatly improves the training efficiency and effectively solves the problem of limited resources. The experimental process is divided into two categories: the Actor process and the Learner process. The Actor process generates training data, and the Learner process trains and updates the network. Figure 2 shows the distributed parallel training framework of the “Gongzhu” deep neural network. The parallel training of the “Gongzhu” deep neural network can be divided into 1 Actor process and 1 Learner process, of which 1 Actor process contains the actor processes of 4 players. The Learner process will store 4 Q-networks for the 4 actor processes,
Mastering “Gongzhu” with Self-play Deep Reinforcement Learning
153
Fig. 1. “Gongzhu” Q-network structure.
Fig. 2. “Gongzhu” distributed parallel training framework.
which are called the global network of each player. The 4 global Q-networks will update the Q-network with the MSE according to the data provided by the corresponding actor process to achieve the goal of approximating the target value. At the same time, each actor process also stores 4 Q-networks, which are called the local Q-networks of each player. The local Q-network is periodically synchronized with the global Q-network,
154
L. Wu et al.
and the Actor repeatedly samples action trajectories from the game and computes Q(s, a) for each state-action pair. The communication between the Learner process and the Actor process is carried out through buffers. The buffers can be divided into local buffers and shared buffers. The two buffers store a total of 9 types of information: whether the game is over, the game results of a batch of episodes, the target value of a batch of episodes, the encoding matrix without action in the stage of playing cards, the information encoding matrix of the action in the stage of playing stage, the historical playing information of a game, the encoding matrix without action in the stage of showing cards, the information encoding matrix of action in the stage of showing cards, and the target value of the stage of showing cards. 5.2 Training Parameters The parameters of the distributed parallel computing framework of the “Gongzhu” DMC algorithm are shown in Table 5, and its training code and test files can be found at https:// github.com/Zhm0715/GZero. Table 5. “Gongzhu” distributed parallel computing parameters. Parameter name
Parameter value
Actor process number
1
Actor process number
4
Learner process number
1
Exp-epsilon
0.11
Learner batch_size
32
Unroll_length (Time dimension)
100
Number_buffers (Shared-memory)
50
Num_threads (Learner)
4
Max_grad_norm
40
6 Experimental Analysis 6.1 Evaluation Indicators The Actor process trains a self-playing agent for each position in the game (The first round of cards is player A, and the counterclockwise rotation is player B, player C and player D). The change curve of the Loss value of each position agent with the epoch is shown in Fig. 3. It can be seen from Fig. 3 that the Loss value of each position agent fluctuates greatly, but the overall Loss value decreases with the increase of epoch, and finally tends to be stable, and the model reaches a state of convergence.
Mastering “Gongzhu” with Self-play Deep Reinforcement Learning
155
Fig. 3. The change curve of Loss value with the epoch.
The outcome of the game is determined by the player’s final score, and the player’s score is particularly important. In order to facilitate the calculation, the reward of the episode in the experiment is the original score divided by 100. The change curve of the return reward of each position agent with the epoch is shown in Fig. 4. As can be seen from Fig. 4, the reward of the episode returned by the agent at position A changes smoothly in the early stage of training, but in the later stage of training, the reward of the episode decreases with the increase of epoch. The rewards of episodes returned by the agents at positions B, C, and D fluctuate to varying degrees in the early stages of training. In the later stages of training, the reward of the episode increases with the increase of epoch. It shows that with the increase of training times, the performance of the agents in the B, C, and D positions is better than that of the agents in the A position.
156
L. Wu et al.
Fig. 4. The curve of the reward returned by episode with the epoch.
6.2 Simulated Game After the training is completed, save the model, and the agent of the DMC model of “Gongzhu” simulates 10,000 games with the agent of the “Gonzhu” CNN model that has fitted 20,000 game data of real human players. The results of the simulated game and the reward returned by the episode are shown in Table 6. Table 6. Simulation game data results. Position
Agent
Winning rate (%)
Reward
North
DMC
72.6
0.63
West
CNN
14.9
−0.25
South
CNN
22.7
0.16
East
CNN
13.2
0.22
It can be seen from Table 6 that the North position is set as the agent of the DMC model, and the three positions of West, South, and East are all set as the agent of the CNN model. The agent of the DMC model can achieve a winning rate of 72.6% in the simulated game, which is much higher than that of the agent of the CNN model. The
Mastering “Gongzhu” with Self-play Deep Reinforcement Learning
157
episode reward returned by the agent of the DMC model is 0.63, that is to say, it can obtain 63 points per round on average, which is the winner; while the episode reward returned by the agent of the CNN model in the other three positions are all negative values, which is the losers. From the above analysis, it can be seen that the performance of the DMC model in actual combat is much better than that of the CNN model.
7 Conclusion In this paper, a combination of MC algorithm and deep neural network is used to construct a self-playing “Gongzhu” artificial intelligence model based on distributed parallel computing. Through experimental training and simulated games, it can be seen that the performance of the “Gongzhu” agent based on the DMC algorithm is better than that of the CNN-based “Gongzhu” agent, and it has certain strategies for showing cards and playing cards. However, in the early stage of training, the small amount of low-quality training data generated by self-play is difficult to improve the overall effect of the model, and it is easy to fall into the dilemma of local optimality. In response to these problems, the next research will first use a large number of real game data of human master players to conduct supervised learning to improve the quality of the early data, and then use the reinforcement learning method of self-play to explore new strategies, so as to improve the intelligent body performance.
References 1. Holmes, R.J., et al.: Efficient, deep-blue organic electrophosphorescence by guest charge trapping. Appl. Phys. Lett. 83(18), 3818–3820 (2003) 2. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016) 3. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018) 4. Schrittwieser, J., et al.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020) 5. Blair, A., Saffidine, A.: AI surpasses humans at six-player poker. Science 365(6456), 864–865 (2019) 6. Moravˇcík, M., et al.: DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337), 508–513 (2017) 7. Brown, N., Sandholm, T.: Superhuman AI for heads-up no-limit poker: libratus beats top professionals. Science 359(6374), 418–424 (2018) 8. Jiang, Q., Li, K., Du, B.: DeltaDou: expert/level doudizhu AI through self-play. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 1265–1271. Morgan Kaufmann, San Francisco (2019) 9. Bowling, M., et al.: Heads-up limit hold’em poker is solved. Science 347(6218), 145–149 (2015) 10. Brown, N., Sandholm, T., Amos, B.: Depth-limited solving for imperfect-information games. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18), pp. 7663–7674. Curran Associates, Red Hook, NY (2018) 11. Li, Y., et al.: A decision model for Texas Hold’em game. Software Guide 20(05), 16–19 (2021). In Chinses
158
L. Wu et al.
12. Peng, Q., et al.: Monte Carlo tree search for “Doudizhu” based on hand splitting. J. Nanjing Norm. Univ. (Natural Science Edition) 42(03), 107–114 (2019). In Chinese 13. Xu, F., et al.: Doudizhu strategy based on convolutional neural networks. Computer and Modernization (11), 28–32. In Chinese 14. Li, S., Li, S., Ding, M., Meng, K.: Research on fight the landlords’ single card guessing based on deep learning. In: K˚urková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 363–372. Springer, Cham (2018). https://doi. org/10.1007/978-3-030-01424-7_36 15. You, Y., et al.: Combinational Q-Learning for Dou Di Zhu. arXiv preprint arXiv:1901.08925 (2019) 16. Zha, D., et al.: Mastering DouDizhu with self-play deep reinforcement learning. In: Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 12333–12344. ACM, New York, NY (2021) 17. Li, J., et al.: Suphx: mastering mahjong with deep reinforcement learning. arXiv preprint arXiv:2003.13590 (2020) 18. Zhang, M., et al.: An opponent modeling and strategy integration framework for Texas Hold’em AI. Acta Autom. Sin. 48(4), 1004–1017 (2022). In Chinese 19. Zhou, Q., et al.: DecisionHoldem: safe depth-limited solving with diverse opponents for imperfect-information games. arXiv preprint arXiv:2201.11580 (2022) 20. Jackson, E.: Slumbot.https://www.slumbot.com/ (2017). Last Accessed 04 Feb 2020 21. Li, K., et al.: Openholdem: an open toolkit for large-scale imperfect-information game research. arXiv preprint arXiv:2012.06168 (2020) 22. Guo, R., et al.: Research on game strategy in two-on-one game endgame mode. Intell. Comput. Appl. 12(04), 151–158 (2022). In Chinese 23. Yang, G., et al.: PerfectDou: dominating DouDizhu with perfect information distillation. arXiv preprint arXiv:2203.16406 (2022) 24. Wang, M., et al.: An efficient AI-based method to play the Mahjong game with the knowledge and game-tree searching strategy. ICGA Journal 43(1), 2–25 (2021) 25. Gao, S., Li, S.: Bloody Mahjong playing strategy based on the integration of deep learning and XGBoost. CAAI Trans. Intell. Technol. 7(1), 95–106 (2022) 26. China Huapai Competition Rules Compilation Team: China Huapai Competition Rules (Trial)[M]. People’s Sports Publishing House, Beijing (2009). In Chinese
Improved Vanishing Gradient Problem for Deep Multi-layer Neural Networks Di Wang1 , Xia Liu1 , and Jingqiu Zhang2(B) 1 School of Artificial Intelligence, Beijing University of Posts and Telecommunications,
Beijing 100876, China {diwang,liuxia}@bupt.edu.cn 2 College of Applied Arts and Science, Beijing Union University, Beijing 100191, China [email protected]
Abstract. Deep learning technologies have been broadly utilized in theoretical research and practical application of intelligent robots. Among numerous paradigms, BP neural network attracts wide attentions as an accurate and flexible tool. However, there always exists an unanswered question centering around gradient disappearance when using back-propagation strategy in multi-layer BP neural network. Moreover, the situation deteriorates sharply in the context of sigmoid transfer functions employed. To fill this research gap, this study explores a new solution that the relative magnitude of gradient descent is estimated, and neutralized via a new developed function with increasing properties. As a result, the undesired gradient disappearance problem is alleviated while reserving the traditional merits of the gradient descent method. The validity is verified by an actual case study of subway passenger flow, and the simulation results elucidate a superior convergence speed compared with the original algorithm. Keywords: Multi-layer neural network · Vanishing gradients · Planned events · Passenger flow forecast
1 Introduction With the promotion of Industry 4.0 standard, great advances have been made in deep learning, big data as well as other technologies, which directly gives birth to a giant leap of intelligent robots. Intelligent robots not only increase production efficiency in industrial applications, but also help or replace users to complete various tasks in daily life. External hardware equipment and internal decision-making system are cornerstones to an intelligent robot. Furthermore, the comprehensive assessment of iterative optimization usually plays a crucial role in decision-making system [1]. Therefore, it is of great practical significance to study the iterative model of robot task strategy.
This work was found by the National Natural Science Foundation of China under Grant 41771187. None of the material for this article has been published at the conference. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 159–173, 2023. https://doi.org/10.1007/978-981-99-0617-8_12
160
D. Wang et al.
Till now, there are many deep learning algorithms based on neural networks. Linear neural network owns a relatively simple suitable for solving linear approximation problems. RBF neural network is a feed-forward mechanism randomly adjusts the weights to approximate the system output, however owns a complex structure and a large calculation burden. In contrast, BP neural network has the following advantages: (1) The algorithm theory is mature, the structure is relatively simple, and the calculation cost is low. (2) For complex nonlinear relations, the whole system is adjusted through constant feedback error, so it has high generalization ability and fault tolerance [2–6]. In particular, the complex nonlinear dynamic of intelligent robots under investigation allows giving full play to BP neural network’s talent to establish a high-precision multi-layer network model. Nevertheless, an inescapable fact is that gradient disappearance/explosion occurs with the increasing number of layers [7, 8]. Several efforts have been made to settle this problem. Concretely, some scholars consider to alleviate the problem of gradient vanishing by changing the structure of the whole network. For example, many variants of recurrent neural networks (RNN) add a gating unit to capture long-term dependencies. Examples are Gated Recurrent Unit (GRU) and Long-Short Term Memory (LSTM) neural networks [9, 10]. However, the gating unit can save some values to prevent the gradient from vanishing, but it cannot keep all the information. From the perspective of the activation function, some scholars suggested to replace the activation function with the ReLU [11]. ReLU provides a straight line with a fixed slope of 1 in the positive interval to avoid the problem of the disappearance of the gradient. However, ReLU prohibits learning in the negative interval, which limits the data fitting performance. In addition, it brings bias shift problem as well, that is, ReLU has a non-zero output as the bias of the next layer, which will cause oscillation and impede learning [12]. Some scholars have made some improvements to the activation function and constructed a new activation function. For example, based on the sigmoid [13] or Tanh [14] function, the positive and negative effective thresholds are defined, and a slope value at the threshold is used. The linear function as the slope replaces the function after the threshold of the original function. In the above two methods, the derivative no longer decreases at both ends of the activation function, which may cause the problem of gradient explosion. Increase the derivative value range of the original activation function to slow down the vanishing gradient. On the basis of the sigmoid function, a linear function with a slope of β is added [15]. This approach does not take into account the effect of the number of layers on the update formula. Consider improving ways to update formulas. There is an oriented stochastic loss descent (OSLD) method [16], which updates a random-initialized parameter iteratively in the opposite direction of its partial derivative sign by a small positive random number, which is scaled by a tuned ratio of the model loss. This formula has the same effect as the original iterative formula. It uses a small random positive number to scale the overall loss function. Though it solves the problem of gradient disappearance but it may magnify the responsibility of a certain weight, causing the update to fluctuate. With the algorithm improvements, S Al-Abri et al. proposed a derivative-free optimization algorithm, which uses a new exponential mapping to deal with the optimization problems of gradient disappearance and explosion, and minimize the objective function.
Improved Vanishing Gradient problem for Deep Multi-layer Neural Networks
161
The experimental results show that even under noisy disturbances, a good approximate solution can still be found [17]. The algorithm needs to find different function mappings in different practical problems to achieve good results, and it is not suitable for all problems. Ibrahim Karabayir et al. proposed the Evolutionary Gradient Direction Optimizer (EVGO), which updates the weights of the DNN based on a gradient, introduces a new hyperplane, finds a suitable search space, and finds a suitable data set in this space [18]. This method locks the range of the gradient descent direction and finds the optimum in the optimization space, which greatly alleviates the problem of gradient disappearance. The calculation of this algorithm is more complicated, the search space determines the direction of the optimal solution, and the problem of local optimization may occur. Based on the iterative formula, this paper considers that the disappearance of the gradient is caused by the depth of the layer, and proposes a new iterative update method. Use the function of increasing the number of layers to increase the update value so that the update is continued. This method not only retains the update advantage that the original gradient decreases as it approaches the optimal value and increases away from the optimal value, but it also largely solves the problem of gradient disappearance. Based on the above methods, this article takes the past planned event in Beijing as an entry point, analyzes the changes in the passenger flow of the surrounding stations of each event, summarizes the laws, and establishes a passenger flow propagation model, which can be applied to predict a certain the increase in passenger flow at major stations around the second planned event scenario.
2 Problem Statement With the continuous improvement of people’s living standards, more and more planned events are held in cities. The passenger flow of subway stations near the event site is related to the traffic convenience of the station and the number of subway lines connected to it. The above-mentioned connections are hidden in the historical data of previous years. Therefore, this connection can be obtained by directly analyzing historical data, which facilitates the establishment of a passenger flow propagation model in the future. This article mainly analyzes the passenger flow data of a certain event in the past, and analyzes the influence of the activity on the passenger flow of the surrounding stations by comparing it with the passenger flow data under normal circumstances, so as to establish the passenger flow propagation model. This analysis takes the “Mayday concert” in Beijing on August 26, 2018 as an example. The planned event is located in the National Stadium. The event was held at the National Stadium near Anli Road Station. Therefore, Anli Road Station is regarded as the primary station. Figure 1 below shows the main stations of the Mayday concert venue, as well as the surrounding subway lines and important stations.
162
D. Wang et al.
Fig. 1. Event location and surrounding stations
Based on the data statistics of the passenger flow of Anli Road Station for a whole day on August 26, 2018, the distribution of passenger flow of Anli Road Station in one day (see Fig. 2).
Fig. 2. Arrival and departure passenger flow curve at Anli Road Station on August 26
In Fig. 2, the blue line is the curve of the passenger flow starting from the station, and the yellow line is the curve of the passenger flow arriving at the station. By observing the graph, one day’s entry and exit passenger flow of the station can be approximated as two normal curves. The two time periods when the passenger flow has increased significantly are the beginning and the end of the concert. Therefore, when we conduct passenger flow growth statistics, we only consider the passenger flow arriving at 17:00–19:00 and the departure from 22:00–23:00. First of all, the collected data of all transaction records of Metro Unicom users in Beijing from July to August 2018 will be used to count. Calculate the flow of people arriving at 17:00–19:00 and departing from 22:00–23:00 at 10 subway stations around the National Stadium every Sunday (see Figs. 3 and 4).
Improved Vanishing Gradient problem for Deep Multi-layer Neural Networks
163
Fig. 3. The flow of passengers arriving at the five stations every Sunday from 17:00 to 19:00
In order to weaken the influence of other planned activities on the data of this analysis and make the results more accurate, the data with obvious increase in data than usual is regarded as abnormal data and deleted. From the above statistics, we can see that there are only three stations where the Mayday concert will have a significant impact on the surrounding subway stations, Anli Road, Olympic Park, and Beitucheng. The remaining 7 stations showed no significant increase or even a downward trend. Analyzing the reasons, this station is the front and back station of the affected core station. Since most of the subway traffic is destined for the core station during this time period, a considerable part of the passenger flow that originally disembarked from nearby stations will be forced to take other time periods or adopt other methods. As a result, there is a downward trend in passenger flow at the surrounding stations. For departure, it will have relatively little impact on its surrounding stations. At the end of the concert, due to the late time, most people will choose to take a taxi back to their residence or directly stay nearby.
164
D. Wang et al.
Fig. 4. Passenger flow departing from five stations every Sunday from 22:00–23:00
3 Proposed Algorithm An improved multi-layer BP neural network prediction model is proposed by data reduction. BP neural network trains the sample data and continuously corrects the weights and thresholds of the network, so that the error decreases along the gradient direction and finds the global minimum of the error function until it approaches the desired output. The BP neural network is composed of an input layer, a hidden layer and an output layer. The hidden layer can have one or more layers. The Sigmoid function is used as the activation function. 1 (1) f (x) = 1 + e−x Take the three-layer BP neural network as an example, the hidden layer Hj is Hj = f (
n i=1
wij xi + aj ).
(2)
Improved Vanishing Gradient problem for Deep Multi-layer Neural Networks
165
The output layer Ok is Ok =
l
Hj wjk + bk .
(3)
j=1
Pass the error function in the reverse direction (ti + oi )2 . E= i 2
(4)
where ti is the expected output and oi is the actual output of the network. Use gradient descent wi = wi − η ×
∂E(wi ) . ∂wi
Introduce the iterative update formula of N-layer BP neural network: ⎧ wN = wN + ηON eN ⎪ ⎪ ⎨ wN −1 = wN −1 + ηeN wN f (HN −1 )ON −2 . ⎪ ... ⎪ ⎩ w1 = w1 + ηeN wN f (HN −1 ) . . . w2 f (H1 )O0
(5)
(6)
Constantly adjust the network weights and thresholds, and finally make the error function reach the global minimum. As the activation function Sigmoid function approaches zero or one, its gradient approaches zero (see Fig. 5):
Fig. 5. Sigmoid function
When the gradient is propagated downward in the back propagation algorithm, an item (the derivative of Sigmoid with respect to the linear combination value of each layer) will be added for each layer. As f (x) ∈ (0, 1). f (x) = f (x)(1 − f (x)) =
1 1 (1 − ). 1 + e−x 1 + e−x
(7)
166
D. Wang et al.
Therefore f (x) ∈ (0, 0.25]. In the weight update formula, the partial gradient of the error function with respect to the weight is a key multiplier. According to the chain rule, when multiple gradients less than one are multiplied, the update value will be severely attenuated, the gradient disappears, and the deeper parameters are not updated. In response to the above-mentioned problems, a new iterative formula is proposed according to the characteristics of the gradient disappearance problem when the number of layers increases. wi = wi − η × f (n) ×
∂E(wi ) . ∂wi
(8)
where n is the number of layers currently updated in the reverse direction, f (n) > 1 and is monotonically increasing. The new iterative update formula associates the number of layers with their update values, and uses a function that increases as the number of layers deepens to increase the update value so as to slow down the disappearance of the gradient. Propose the following functions f1 (n) = n2 , f2 (n) = n, f3 (n) = an , f4 (n) = log n, f5 (n) = nn
(9)
When n → +∞, f4 (n) < f2 (n) < f1 (n) < f3 (n) < f5 (n), obviously f5 (n) = nn is the largest, which can alleviate the disappearance of the gradient to the greatest extent. When n → +∞, g5 (x, n) = nn f (x)n → +∞. It may cause the problem of gradient explosion. It is further explained by plotting the five iterative update values as follows: g1 (x, n) = n2 f (x)n
(10)
g2 (x, n) = nf (x)n
(11)
g3 (x, n) = en f (x)n
(12)
g4 (x, n) = log n · f (x)n
(13)
g5 (x, n) = nn f (x)n
(14)
It can be seen from Fig. 6 that g3 obviously reduces the speed at which the original function converges to zero. The original updated value gradually tends to zero at the 4th level, and after the en item is added, it slowly approaches zero after the 10th level. Because of f (x) ∈ (0, 0.25], in order to avoid the gradient explosion problem, it is necessary to meet af (x) < 1, so there is a ∈ (0, 4). At the same time, in order to minimize the disappearance of the gradient, the value of a should satisfy a → 4− . The constructor is as follows: 1 g6 (x, n) = (4 − )n f (x)n n
(15)
Improved Vanishing Gradient problem for Deep Multi-layer Neural Networks
167
Fig. 6. Constructor function 1
Let f (x) = 0.25, take the limit to get lim g6 (n) = e− 4 < 1. n→∞ As the number of layers tends to infinity, the updated value of g6 converges to a small fixed value, which solves the problem of vanishing gradient. Using image analysis, the g3 and g6 function images are drawn as follows:
Fig. 7. Constructor function
As will be readily seen from Fig. 7 that there are g3 > g6 in the 5-layer neural network. When g6 solves the problem of vanishing gradient, its derivative value is obviously reduced, which causes the update value of each layer to tend to be smoothly unchanged. In order to avoid the change of the updated value, it is necessary to consider the specific exponential function in combination with the number of layers of the neural network required by the model. Derivation of the constructor:
168
D. Wang et al.
It is further explained by plotting the five iterative update values as follows: a a g = ( )n Ln( ) 4 4
(16)
Derived from g6 : g6 = (1 −
1 n 1 ) Ln(1 − ) 4n 4n
(17)
This paper chooses a 6-layer BP neural network, that is n = 6. Substituting the above formula, find the solution of the inequality g ≥ g6 to obtain a ∈ (1, 2.62]. In order to alleviate the disappearance of the gradient while avoiding the change of the updated value, we take the maximum value within its acceptable range, so there is an iterative formula: wi = wi − η × 2.62n ×
∂E(wi ) ∂wi
(18)
A general theorem is drawn. When the number of neural network layers is n, take a as the independent variable. In order to prevent the weight from being updated, substitute g ≥ g6 to find the inequality solution set as a ∈ (1, a1 ]. At the same time, the larger the value of a, the better the effect of solving gradient disappearance, so take a = a1 . Then its iterative update formula: wi = wi − η × a1n ×
∂E(wi ) ∂wi
(19)
Combined with the improvement of the proposed method, the single passenger flow growth of each major station around a certain event is taken as the output, and the actual total passenger flow growth of the main surrounding stations is taken as the input. Train the improved multi-layer BP neural network. Using the trained neural network, the total passenger flow growth sum of the main sites around a certain event is used as input to predict the passenger flow growth of each single station. So as to achieve a higherprecision prediction of the growth of passenger traffic at the core station and surrounding stations when a planned event occurs in the future.
4 Experimental Analysis Based on the subway passenger flow data of Beijing from July to August 2018, the experimental data is selected according to the date of emergency. For example, the Mayday concert will be held at the National Stadium on August 26. In order to study the impact of this event on the surrounding stations and the scope of radiation, the increase in the passenger flow of each station on the day of the event compared to the normal passenger flow is regarded as the degree of influence, and the greater the increase, it is regarded as the greater influence. The affected stations around are selected in order of near and far as follows: Anli Road, Olympic Park, Beitucheng, Datun Road East, the five sites of Anzhenmen. Count the number of people entering and leaving the five stations during the beginning and ending time of the concert on the day of the planned event.
Improved Vanishing Gradient problem for Deep Multi-layer Neural Networks
169
Similarly, the planned events from July to August 2018 include Meng Tingwei’s concert at the Beijing Exhibition Hall Theater on July 8th, Xue Zhiqian’s concert at the Beijing Artificial Stadium on July 14th, Wei Xu’s concert was held at the Beijing Exhibition Hall Theater on August 4, Zhang Jie’s concert was held at the National Stadium on August 11, and Teresa Teng’s concert was held at the Beijing Theater on August 17. The “Echo” concert will be held at the artificial stadium on August 17th, the iQIYI Scream Night concert will be held at the Cadillac Center on August 18th, the TFboys concert will be held at the artificial stadium on August 24th, and the Mayday concert took place at the National Stadium on August 24, 25, and 26. Based on the characteristics of the above 11 sets of data, 1000 sets of data are randomly generated using normal distribution. Take the sum of the passenger flow growth of the surrounding main stations as the input of the neural network, and use passenger flow growth of the surrounding single stations as the output, so as to build a multi-layer gray BP neural network model (see Fig. 8):
Fig. 8. BP neural network model
Use MATLAB R2016 to build a neural network model and test it (see Fig. 9), the input layer of the model has one neuron, four hidden layers, the number of neurons are 5, 40, 20, and 5 respectively, and the output layer has 5 neurons.
Fig. 9. BP neural network
It can be seen from Fig. 10 that the effectiveness of the training result on the test set is 0.867, and the training time is 0.07 s, the training effect is not good enough.
170
D. Wang et al.
Fig. 10. Fitting effect of BP neural network model
Through the improved multi-layer BP neural network proposed in Sect. 3, the iterative update value of each layer is multiplied by the current layer number 2.62n . Training again, the results are shown in Fig. 11. It can be seen that the effectiveness of the improved BP neural network training result on the test set is up to 0.961, and the training time is 0.01 s, the training effect is better, which shows the effectiveness of the improved method. Use the trained neural network model to predict one of the planned events, such as the Mayday concert on August 24. We predict that the total passenger flow will increase by 3000 based on the number of tickets sold and the proportion of Unicom users. The neural network model obtained the following prediction results as shown in Fig. 12. The red line is the result of neural network prediction, and the blue line is the result of actual passenger flow growth. The prediction effect is relatively good. The specific forecast data is shown in Table 1:
Improved Vanishing Gradient problem for Deep Multi-layer Neural Networks
Fig. 11. Improved BP neural network model fitting effect
Fig. 12. Comparison of prediction results
171
172
D. Wang et al. Table 1. Activity passenger flow growth predicted.
Actual passenger flow growth
Forecast passenger flow growth
2130
2108
318
285
200
192
1
2
−2
0
5 Conclusion The traditional BP neural network suffers form vanishing gradient problem under the multi-layer iteration, which causes the weight update speed to be slow or even unable to update. To overcome the vanishing problem, a new weight update formula is improved by using the number of layers as a variable. The convergence speed of the weight update of the multi-layer BP neural network is effectively improved. The improved multilayer BP neural network model is used to verify the actual data. The results show that the model effectively reduces the convergence speed. This achievement will effectively improve the decision-making efficiency of robots when applied to the field of intelligent robots.
References 1. Chen, Y., Wang, Y.C., Lan, S.L., Wang, L.H., Shen, W.M., Huang, G.Q.: Cloud-edge-device collaboration mechanisms of deep learning models for smart robots in mass personalization. Robot. Comput. Integr. Manuf. 77, 102351 (2022) 2. Li, C.-B., Wang, K.-C.: A new grey forecasting model based on BP neural network and Markov chain. J. Cent. South Univ. Technol. 14(5), 713–718 (2007). https://doi.org/10.1007/ s11771-007-0136-7 3. Zhang, Y.G., Chen, B., Pan, G.F., Zhao, Y.: A novel hybrid model based on VMD-WT and PCA-BP-RBF neural network for short-term wind speed forecasting. Energy Convers. Manag. J. 195, 180–197 (2019) 4. Saha, T.K., Pal, S., Sarkar, R.: Prediction of wetland area and depth using linear regression model and artificial neural network based cellular automata. Ecol. Inform. Energy Convers. Manag. 62, 101272 (2021) 5. Shao, Y.X., et al.: Prediction of 3-month treatment outcome of IgG4-DS based on BP artificial neural network. Oral Dis. 27(4), 934–941 (2021) 6. Gurcan, C., Negash, B., Nathan, H.: Improved grey system models for predicting traffic parameters. Expert Syst. Appl. 177, 114972 (2021) 7. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feed-forward neural networks. J. Mach. Learn. Res. 9, 249–256 (2010) 8. Zhang, S.R., Wang, B.T., Li, X.E., Chen, H.: Research and application of improved gas concentration prediction model based on grey theory and BP neural network in digital mine. Procedia CIRP 56, 471–475 (2016) 9. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Internat. J. Uncertain. Fuzziness Knowledge-Based Systems 6(2), 107–116 (1998)
Improved Vanishing Gradient problem for Deep Multi-layer Neural Networks
173
10. Apaydin, H., Feizi, H., Sattari, M.T., Colak, M.S., Shamshirband, S., Chau, K.W.: Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water 12(5), 1500 (2020) 11. Hahnloser, R.L.T.: On the piecewise analysis of networks of linear threshold neurons. Neural Netw. 11(4), 691–697 (1998) 12. He, J.C., Li, L., Xu, J.C.: ReLU deep neural networks from the hierarchical basis perspective. Comput. Math. Appl. 120, 105–114 (2022) 13. Qin, Y., Wang, X., Zou, J.Q.: The optimized deep belief networks with improved logistic Sigmoid units and their application in fault diagnosis for planetary gearboxes of wind turbines. IEEE Trans. Industr. Electron. 66(5), 3814–3824 (2018) 14. Wang, X., Qin, Y., Wang, Y., Xiang, S., Chen, H.Z.: ReLTanh: An activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis. Neurocomputing 363, 88–98 (2019) 15. Roodschild, M., Sardiñas, J.G., Will, A.: A new approach for the vanishing gradient problem on sigmoid activation. Prog. Artif. Intell. 9(4), 351–360 (2020). https://doi.org/10.1007/s13 748-020-00218-y 16. Abuqaddom, I., Mahafzah, B.A., Faris, H.: Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients. Knowl.-Based Syst. 230, 107391 (2021) 17. Al-Abri, S., Lin, T.X., Tao, M., Zhang, F.M.: A derivative-free optimization method with application to functions with exploding and vanishing gradients. IEEE Control Syst. Lett. 5(2), 587–592 (2021) 18. Karabayir, I., Akbilgic, O., Tas, N.: A novel learning algorithm to optimize deep neural networks: Evolved gradient direction optimizer (EVGO). IEEE Trans. Neural Netw. Learn. Syst. 32(2), 685–694 (2020)
Incremental Quaternion Random Neural Networks Xiaonan Cui1 , Tianlei Wang1 , Hao Chen1 , Baiying Lei2 , Pierre-Paul Vidal1,3 , and Jiuwen Cao1(B) 1
3
Machine Learning and I-health International Cooperation Base of Zhejiang Province and Artificial Intelligence Institute, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China {xncui,jwcao}@hdu.edu.cn 2 National-Regional Key Technology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen 518060, China [email protected] Plateforme Sensorimotricit´e and COGNAC-G (COGNition and ACtion Group), Universit´e Paris Descartes, 75270 Paris, France [email protected] Abstract. Quaternion, as a hypercomplex number with three imaginary elements, is effective in characterizing three- and four-dimensional vector signals. Quaternion neural networks with randomly generated quaternions as the hidden node parameters become attractive for the good learning capability and generalization performance. In this paper, a novel incremental quaternion random neural network trained by extreme learning machine (IQ-ELM) is proposed. To fully exploit the second-order Q-properness statistic of quaternion random variables, the augmented quaternion vector is further applied in IQ-ELM (IAQ-ELM) for hypercomplex data learning. The network is constructed by gradually adding the hidden neuron one-by-one, where the output weight is optimized by minimizing the residual error based on the fundamental of the generalized HR calculus (GHR) of quaternion variable function. Experiments on multidimensional chaotic system regression, aircraft trajectory tracking, face and image recognition are conducted to show the effectiveness of IQ-ELM and IAQ-ELM. Comparisons to two popular quaternion RNNs, Schmidt NN (SNN) and random vector functional-link net (RVFL), are also provided to show the feasibility and superiority of using quaternions in RNN for incremental learning. Keywords: Quaternion algebra · Quaternion split activation function · Incremental ELM · Generalized HR calculus · Augmented quaternion
This work was supported by the National Natural Science Foundation of China (U1909209), the National Key Research and Development Program of China (2021YFE0100100, 2021YFE0205400), the Natural Science Key Foundation of Zhejiang Province (LZ22F030002), and the Research Funding of Education of Zhejiang Province (GK228810299201). c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 174–191, 2023. https://doi.org/10.1007/978-981-99-0617-8_13
Incremental Quaternion Random Neural Networks
1
175
Introduction
Quaternion, as a hypercomplex number with three mutually orthogonal imaginary components, usually provides a convenient and elegant representation for three- (3D) or four-dimensional (4D) vector signals, and has the inherent advantages due to its good representation capability in the cross-channel correlation encoding, and the rotation and orientation modeling. The quaternion has been effectively applied to statistical signal processing and machine learning, including EEG analysis [1], quaternion least mean square algorithm [2], quaternion nonlinear adaptive filter [3], quaternion Kalman filter [4], quaternion support vector machine [5], quaternion neural networks [6–9], etc. Recently, neural networks with randomly generated parameters (RNNs) became attractive recently where representative algorithms include Schmidt NN (SNN) [10], random vector functional-link net (RVFL) [11,12], extreme learning machine (ELM) [13–18], etc. Particularly, quaternion random neural network trained by ELM (QELM) has been studied for the advantage of compact data representation capability and the quaternion mean square error (QMSE) is constructed to optimize the output weight using least-square (LS) algorithm [19,20]. Although some achievements on quaternion RNN have been reported, there lacks of theoretical fundamental support to the network parameter optimization. Specially, the scalar LS algorithm are directly adopted in existing QELM and augment QELM. To address this issue, a novel incremental quaternion RNN is developed in this paper where the ELM and incremental ELM (I-ELM) are used as the backbone to develop the incremental quaternion ELM (IQ-ELM) based on the principle of quaternion linear algebra and the fundamental of the generalized HR calculus (GHR) of quaternion variable function. The hidden-layer nodes of IQ-ELM are gradually added one-by-one such that the residual error between the network output and the target is minimized. At each iteration, the added neuron is equipped with randomly generated hidden parameters and the optimal output weight is derived by minimizing the residual error through the quaternion derivative obtained by the GHR operator [21–23]. It is noted that the main objective is to explore the feasibility and superiority of using quaternion in RNNs for incremental learning, and the proposed algorithm is generally suitable for other RNNs, such as SNN, RVFL. Further, the augmented quaternion signal has been developed in exploring the second-order statistics of quaternion random variables [24,25] by the widely linear vector, i.e., augmenting the input vector with its quaternion conjugate. The quaternion widely linear filter and QELM using the augmented quaternion vector are proposed in [26,27], respectively. However, how the augmented quaternion statistics would benefit the I-ELM is not well investigated. To fully capture the second order statistics of the quaternion signals in the proposed IQELM, the augmented model (IAQ-ELM) is further developed in this paper by incorporating the quaternion involutions as inputs. The effectiveness of the proposed IQ-ELM/IAQ-ELM in hypercomplex signal regression and classification is verified by experiments on multidimensional chaotic system regressions, aircraft trajectory tracking, face and image recognition. The contributions of this paper
176
X. Cui et al.
are two-fold: 1) Two incremental learning algorithms, IQ-ELM and IAQ-ELM, for quaternion random neural network are developed, and a plenty of experiments on chaotic system prediction, aircraft trajectory tracking, face and image recognition are provided to demonstrate the superiority of IQ-ELM/IAQ-ELM. 2) The optimal output weight of the additive node at each iteration is obtained based on the quaternion derivative of GHR operator, and a theoretical support on the output weight optimization is also provided. Throughout the paper, μ is a unit quaternion, “(·)∗ ”, “(·)T ”, and “ (·)H ” represent the conjugate, transposition, and Hermitian (i.e., conjugate and transposition) of a quaternion, respectively, “R” and “H” denote the real-value domain and the quaternion domain, respectively.
2 2.1
Preliminaries Incremental ELM
The RNN with incrementally adding neurons trained by ELM has been well established in real (I-ELM) [13] and complex (IC-ELM) [28] domain. Particularly, for I-ELM, the network model with n hidden nodes is fn (x) =
L
βi gi (x) , x ∈ Rd , βi ∈ R
(1)
i=1
where gi (x) = g(wi · x + bi ) is the output of i-th hidden node, wi and bi are randomly generated input weight and bias. The hidden neuron is added one-byone in I-ELM aiming to minimize the residual error en (x) = f (x) − fn (x), where f (x) is the target model and (w, b) of the newly added neuron are randomly generated. I-ELM can be obviously extended to the complex domain with the similar solution to the output weight. The more detail can be found in [28]. 2.2
Quaternion Algebra
Quaternion was proposed by Hamilton to represent hypercomplex number with three imaginary components as [29] q = a + bı + cj + dκ = Sq + Vq
(2)
where a, b, c, d ∈ R, and ı, j, κ are imaginary units satisfying ı2 = j2 = κ2 = ıjκ = −1. Sq = a and Vq = bı + cj + dκ are the real (scalar) and imaginary parts of a quaternion, respectively. The product of two quaternion is defined as pq = Sp Sq − Vp · Vq + Sp Vq + Sq Vp + Vp × Vq
(3)
where · and × are the inner and cross product of the imaginary parts. It is also observed that the quaternion product is noncommutative as pq = qp. Meanwhile, it has the following properties
Incremental Quaternion Random Neural Networks
• • • •
The The The The
177
conjugate of q: q ∗ =√a − bı −√cj − dκ modulus of q: |q| = qq ∗ = q ∗ q q∗ (q = 0) inverse of q: q −1 = |q| 2 quaternion rotation of q: q μ = μqμ−1
The quaternion involution is defined on the rotation with a pure unit quaternion as [30] q ı = −ıqı = a + bı − cj − dκ, q j = −jqj = a − bı + cj − dκ, q κ = −κqκ = a − bı − cj + dκ. For quaternion matrix, we have the following properties • • • • •
A + B = B + A, λ(A + B) = λA + λB, (λ ∈ R) (AB)C = A(BC), (A + B)∗ = A∗ + B∗ (AB)∗ = (BH AH )T , (AB)H = BH AH If A and B are invertible, then (AB)−1 = B−1 A−1 If A is invertible, then (A∗ )−1 = (A−1 )∗ .
2.3
GHR
The complete complex domain calculus principle is the fundamental support to analyze the quaternion optimization. One frequent way is the pseudo-derivative method but the cross-channel structural information will be broken in this method. On the contrary, an elegant generalized HR calculus (GHR), that provides full derivatives to quaternion variable, its three involutions and their associated conjugates, has been developed in [21–23]. The unified theoretical fundamentals of GHR formulate the basis of the optimization for the proposed IQ-ELM and IAQ-ELM. In addition, it should be pointed that the mean square error based optimization on the quaternion and augmented quaternion variables is a real-valued function resulting in the same left and right GHR derivatives. Thus, only the left GHR derivative is considered in the following. Lemma 1. Let f (q) : H → R be real-differentiable, the following derivatives holds ∂r f ∂f ∂r f ∂f = μ , = μ∗ μ μ∗ ∂q ∂q ∂q ∂q ν ν ∂f ∂f ∂f ∂f = νμ , = νμ∗ ∂q μ ∂q ∂q μ∗ ∂q ∗ ∗ ∂f ∂f ∂f ∂f = , = μ ∂q μ ∂q μ∗ ∂q μ∗ ∂q ∂r f where ∂f ∂q and ∂q are the left- and right- HR derivatives with respect to q, respectively, and ν is a quaternion.
Lemma 2. Let f (q) : H → H be real-differentiable, the following derivatives holds ∂(αf β) ∂f ∂(αf β) ∂f = α βμ β , = α βμ∗ β μ μ∗ ∂q ∂q ∂q ∂q
178
X. Cui et al.
∂r (αf β) ∂r f ∂r (αf β) ∂f = α (α−1 μ) β , = α (α−1 μ∗) β ∂q μ ∂q μ∗ ∂q ∂q where α and β are quaternion. Specifically, the following Lemmas on the derivatives of quaternion function from GHR will be adopted in the optimization of IQ-ELM and IAQ-ELM in this paper [31]: Lemma 3. Let f (q), g(q) : H → H be real-differentiable, and in particular μ = 1, then the following product rules hold ∂g ∂f ∂(f g) ∂(f g) ∂g ∂f =f + gg , = f ∗ + g∗ g ∂q ∂q ∂q ∂q ∗ ∂q ∂q With Lemma 1 and Lemma 2, it is straightforward to obtain the following derivatives for two specific functions f (q) = q and f (q) = q ∗ with μ = 1 as ∂(αqβ) ∂q ∂(αqβ) ∂q 1 = α β β = αSβ , = α β∗ β = − αβ ∗ ∗ ∂q ∂q ∂q ∂q 2 ∂q ∗ ∂(αq ∗ β) 1 ∗ ∂(αq ∗ β) ∂q ∗ = α β β = − αβ , = α β∗ β = αSβ ∂q ∂q 2 ∂q ∗ ∂q
(4)
For the full and detailed derivations and fundamentals of GHR calculus, one can refer to [31]. 2.4
Second-Order Statistics of Quaternion
Recent advances in the quaternion statistics analyses revealed that the covariance matrix Cxx = E[xxH ] is not adequate to fully capture the second-order statistics of quaternion. Instead, the augmented covariance matrix built on the augmented quaternion vector can completely characterize the second-order statistics of x as ⎤ ⎡ Cxx Cxı Cxj Cxκ ⎢ CxHı Cxı xı Cxı xj Cxı xκ ⎥ ⎥ Cxa xa = E[xa xaH ] = ⎢ ⎣ CxHj Cxj xı Cxj xj Cxj xκ ⎦ , CxHκ Cxκ xı Cxκ xj Cxκ xκ where Cxa xa compromises the covariance matrix Cxx , the three pseudocovariance matrices Cxı = E[xxıH ], Cxj = E[xxjH ] and Cxκ = E[xxκH ], as well as their involutions and Hermitian transposes. If the quaternion vector x is Q-proper, then the pseudo-covariance matrices become 0.
3 3.1
Proposed IQ-ELM and IAQ-ELM IQ-ELM
With the above GHR, the IQ-ELM with optimal output weight in the additional neurons is first given, as shown in Fig. 1. The model of IQ-ELM with the n-th
Incremental Quaternion Random Neural Networks
179
additive hidden neuron can be expressed as fn (qj ) =
n
βi σ(wi qj + bi ),
(5)
i=1
where βi , bi ∈ H , qj and wi ∈ Hm , and j = 1, . . . , N denotes the sample index. The input quaternion vector qj of the network offers an effective and compact representation. As shown in [6], the split type activation function σ(·) used in quaternion neural network satisfies the universal approximation capability property. Particularly, σ(·) is defined as σ(·) = σ ◦ R(q) + ıσ ◦ I(q) + jσ ◦ J(q) + κσ ◦ K(q)
(6)
where ◦ is the composition operator, and R(q) = a, I(q) = b, J(q) = c, K(q) = d, implying that the output of the split type activation function is a composition of the outputs of the activation function on the four independent components of the quaternion variable.
Fig. 1. Quaternion neural network with incremental node
The quaternion neural network model (5) can be further rewritten as fn (qj ) = fn−1 (qj ) + βn σn (qj ) where σn (qj ) = σ(wn · qj + bn ) is the n-th additive neuron, wn , bn are randomly generated quaternion input weight vector and bias, βn is the output weight of the n-th neuron, to be optimized. Assume f (q) : Hm → H is the target function to be approximated, with the n-th added neuron, the objective becomes finding the optimal βn such that the residual error between the target f and the model
180
X. Cui et al.
(5). That is min ||en ||2 βn
s. t.
en = f − fn
(7)
where ||en ||2 = eH n en with en = f (q) − fn (q). It can be further expressed as en = f −(fn−1 +βn σn ). By incrementally constructing the network, the objective (7) can be expressed in terms of the residual error sequences as ||en ||2 = ||en−1 − βn σn ||2 ,
(8)
where en−1 is the residual error of the previous n − 1 nodes. With randomly generated hidden node parameters that are fixed during the learning stage, (7) becomes a convex optimization problem where the objective function is defined in quaternion domain. With GHR and the Lemmas presented in Sect. 2.3, we have Proposition 1. Given any split-type continuous quaternion function or any split type bounded quaternion piecewise continuous function σ : H → H with randomly generated hidden parameters, for any target quaternion continuous function f , for the n-th added quaternion hidden neuron, lim ||f − fn || = 0 n→∞ holds with probability one if en−1 σn∗ , (9) βn = ||σn ||2 where σn∗ is the conjugate of the n-th additive neuron. The detailed proof of (9) is given in Appendix A.1. It can be readily derived the optimal output weight 2 n || by letting ∂||e = 0 because the derivations are conjugated when ||en ||2 is a ∗ ∂βn real-value [22]. The above derivation can be extended to themultiple training samples case. For a training dataset with N distinct samples (qj , tj )|qj ∈ Hm , tj ∈ H, j = 1, 2, ..., N , the network with the n-th added neuron is the error vector of all sample can be expressed as en = [en (q1 ), ..., en (qN )]T with en (qj ) = tj − fn (qj ) = en−1 (qj ) + βn σn (qj ), j = 1, 2, ... , N . Same to Proposition 1, the derivation of the objective ||en ||2 can be calculated as
N ∗ e (q )e (q ) ∂ 2 H j n j n j=1 ∂||en || ∂(en en ) = = ∂βn ∂βn ∂βn N N 1 1 =− en−1 (qj )σn∗ (qj ) + βn σn (qj )σn∗ (qj ) 2 j=1 2 j=1 The output weight βn can be optimized by N ∗ eT hH j=1 en−1 (qj )σn (qj ) = n−1 βn = N ∗ hH h j=1 σn (qj )σn (qj )
(10)
Incremental Quaternion Random Neural Networks
181
where h = [σn (q1 ), ..., σn (qN )] is the hidden node output vector of the newly added n-th neuron, and en−1 = [en−1 (q1 ), ... , en−1 (qN )]T is the residual error vector of the network with the existing n − 1 hidden nodes. With the optimal output weight (10), it is readily to have that ||en−1 ||2 ≥ ||en ||2 , indicating that the resultant neural network by adding the additive quaternion node with the associated output weight obtained by (10) is convergent. A brief summary on the proposed IQ-ELM is given in Algorithm 1.
Algorithm 1: IQ-ELM
Input: A training dataset q ∈ Hm×N , t ∈ H , the split type quaternion activation function σ(·), the maximum number of hidden nodes Lmax , the tolerance ε Let L = 0 and set e = t, where t = [t1 , ..., tN ]T ; while (Lε) do 1. Increase the hidden node by L = L + 1; 2. Randomly assign (wL , bL ) for the new neuron based on a pseudo-uniform distribution; 3. Calculate the new hidden node output hL = σ(wL q + bL ) and output weight βL by (10); 4. Update the residual error e = e − βL hL . end Output: input weight W = [w1 , ..., wL ]T , hidden nodes bias b = [b1 , ..., bL ]T , and output weight β = [β1 , ..., βL ]T
3.2
IAQ-ELM
To fully exploit the second order statistics of quaternion variables for both second order circular and noncircular signals, the augmented quaternion statistic comprised of three quaternion involutions has been developed in [20,24]. In this section, we extend IQ-ELM to the network with the augmented quaternion vector input qA , that is built on the basis {q, qı , qj , qκ }. qA can be well expressed using the real valued quadrivariate vectors of the four counterparts {qa , qb , qc , qd } by the following transformation ⎡ ⎤ ⎡ ⎤⎡ ⎤ q I ıI jI κI qa i⎥ ⎢ ⎥ ⎢ ⎥ ⎢ q I ıI −jI −κI ⎥=⎢ ⎥ ⎢ qb ⎥ (11) qA = ⎢ j ⎣ q ⎦ ⎣ I −ıI jI −κI ⎦ ⎣ qc ⎦ k qd I −ıI −jI κI q where I is the real-valued identity matrix. In the proposed IAQ-ELM, the input weight vector of the added hidden neuron is also extended to the associated augmented quaternion vector wA comprised of w, wı , wj , wκ . The augmented input weight w is generated by randomly assigned the four counterparts {wa , wb , wc , wd } of its real valued quadrivariate vectors independently, similar as in (11). The network model of IAQ-ELM
182
X. Cui et al.
with the n-th added hidden neuron can be expressed as fn (q) =
n
βi σ(wiA qA + bi ),
(12)
i=1
where q ∈ Hm×N denotes the input quaternion vector. To optimize the output weight of the n-th additive hidden neuron, Proposition 1 and (10) can be directly applied to IAQ-ELM, leading to the optimal output weight as βn =
en−1 σ H (wnA qA + bn ) , σ H (wnA qA + bn )σ(wnA qA + bn )
(13)
where wn and bn are the quaternion input weight vector and hidden node bias of n-th node respectively, en−1 is the residual error of the previous n − 1 nodes, and q ∈ Hm×N denotes the input quaternion vector. The proposed IAQ-ELM is summarized in Algorithm 2. Algorithm 2: IAQ-ELM
Input: A training dataset q ∈ Hm×N , t ∈ H , the split type quaternion activation function σ(·), the maximum number of hidden nodes Lmax , the tolerance ε Let L = 0 and set e = t, where t = [t1 , ..., tN ]T ; 4m×N by (11), j = 1, . . . , N ; Extend the input q to augment structure qA j ∈ H while (Lε) do 1. Increase the hidden node by L = L + 1; 2. Randomly generate the input weight wL ∈ H1×m and hidden bias bL ∈ H based on a pseudo-uniform distribution, and extend to the augment A ∈ H1×4m ; structure wL A A q + bL ) and the residual error 3. Calculate the hidden layer output σn (wL e; 4. Solve the optimal output weight by (13); A A q + bL ). 5. Update the residual error e = e − βL σ(wL end Output: The augment input weight WA = [w1A , ..., wiA ]T , hidden nodes bias b = [b1 , ..., bi ]T , and output weight β = [β1 , ..., βi ]T
4
Experiments
Experiments on the 3D/4D chaotic system prediction, aircraft trajectory tracking, color face and image recognition are provided in this section to show the effectiveness of the proposed IQ-ELM/IAQ-ELM. All these experiments involve the data processing with 3D or 4D signals, where extracting the cross channel correlation information is crucial for model regression and data classification. Instead of using the concatenated long vector representation, quaternion is considered the most suitable for the 3D/4D vector signal representation.
Incremental Quaternion Random Neural Networks
183
To demonstrate the superiority of using quaternion, the real domain IELM [13], directly modelling using the concatenated vector of all channels, is adopted for performance comparison. For regression experiments, the quaternion root mean squared error (RMSE) is applied as the evaluation metric for IQ-ELM/IAQ-ELM N 1 (ti − yi )∗ (ti − yi ) RMSEq = 4N i=1 where N the total samples, ti and yi are the target and network output, respectively. For I-ELM, the real domain RMSE is employed for the performance evaluation. For the classification experiment, the maximum number of hidden nodes is set to be Lmax = 150 for IQ-ELM/IAQ-ELM, but for I-ELM, we set the hidden nodes to 200. The accuracy tolerance is set to be ε = 0 to ensure that the nodes can be increased to Lmax one by one. Therefore, the comparisons are fair that all algorithms have the same total number of model parameters (TNP), Since the complexity of the quaternion and scalar variable based neural networks cannot be simply determined by the number of hidden nodes, TNP, that counts the number of randomly fixed and trainable parameters in the network, has been introduced for the complexity characterization [19]. For, I-ELM and IQ-ELM/IAQ-ELM, their TNPs are respectively defined as TNPI-ELM = l(d + m + 1), TNPIQ-ELM/IAQ-ELM = 4l(d + m + 1),
(14)
where d, l and m represent the numbers of neurons in the input layer, the hidden layer, and the output layer, respectively. For the chaotic system prediction and aircraft trajectory tracking, the maximum number of hidden nodes is chosen as Lmax = 50 and Lmax = 67 for IQ-ELM/IAQ-ELM and I-ELM, respectively, and the accuracy tolerance is still ε = 0. For each experiment, the average classification accuracy/RMSE and standard deviation (std) on 5 independent trials are obtained for the performance comparisons. Besides the performance on accuracy, the computational complexity is also included for comparisons in term of using the model training time. The quaternion least mean squares algorithm (QLMS) [2] and the quaternion neural network trained by the backpropagation algorithm (QBP) [32] are also compared. For QLMS, unlike the original one presented in [2], the novel GHR operator provided in [22] is employed to replace the original HR operator in optimization. The gradient descent algorithm is applied to QLMS and QBP, the network structure and the number of hidden layer nodes (L = 150) are the same to IQ-ELM, where the learning rate is fixed to 0.0001. All experiments are carried out on a PC with the Intel Core i7-4710MQ 2.50 GHz CPU and 8 GB RAM. 4.1
Chaotic System
Chaotic system is very sensitive to the initial condition, that makes the chaotic system estimation become a popular research topic [2,3]. In this section, the
184
X. Cui et al.
prediction on four representative 3D- and two 4D-chaotic systems are studied, where the fourth-order Runge-Kutta method is applied to generate the chaotic time series. For each sample, a pure quaternion is used to represent the three channel signals with its three imaginary components, respectively. The one-stepahead prediction model is tested where 300 samples for each chaotic system are generated. For each chaotic system, the first 100 samples are used for training while the rest are applied for testing. The step size of samples used for building the prediction model is set to be 4. 1. 3D Lorenz sequence: The chaotic Lorenz sequence is generated by a system of three ordinary differential equations, also known as the Lorenz equations: x˙ = α(y − x) , y˙ = x(ρ − z) , z˙ = xy − βz where the equations describe the rate of change of three quantities with respect to time, and they characterize the properties of a 2D fluid layer uniformly warmed from below and cooled from above. The system exhibits chaotic behavior when α = 10, ρ = 28 and β = 8/3. In this experiment, the initialized states are set to be x(0) = 5, y(0) = 5, z(0) = 20. 2. 3D R¨ osslor system: R¨ossler system [33] is a system of three nonlinear ordinary differential equations that exhibits chaotic dynamics associated with the fractal properties of the R¨ ossler attractor: x˙ = −y − z , y˙ = x + ay , z˙ = b + z(x − c) In this experiment, we study the chaotic system prediction by setting the parameters as [a, b, c]T = [0.2, 28/13, 7]T . 3. 3D Chua’s circuit: Chua’s circuit was invented by L. Chua [34]. It is a simple electronic circuit that exhibits the classic chaotic behavior, meaning that the system produces an oscillating waveform, that unlike an ordinary electronic oscillator, never “repeats”: x˙ = α[y − f (x)] , y˙ = x − y + z , z˙ = −βy where f (x) = m1 x + 0.5(m0 − m1 )(|x + 1| − |x − 1|), α = 7, β = 10, m0 = −0.2, m1 = 0.4. 4. 3D Chen’s equation: The Chen’s equation [35] is expressed as: x˙ = a(y − x) , y˙ = (c − a)x − xz + cy , z˙ = xy − bz where a, b, and c are positive real numbers satisfying c < a < 2c. When a = 35, b = 3, c = 28, the system has a unique chaotic attractor. 5. 4D Saito’s circuit system: The noncircular Saito’s circuit chaotic system is expressed as [3] x1 − ηρ1 h(z) x2 − ηρ2 h(z) x˙ 2 x˙ 1 −1 1 −1 1 , = = ρ ρ y˙ 1 y˙ 2 −α1 −α1 β1 y1 − η β11 h(z) −α2 −α2 β2 y2 − η β22 h(z)
Incremental Quaternion Random Neural Networks
185
where the normalized hysteresis h(z), the variables z, ρ1 and ρ2 are given as β1 β2 1, z ≥ −1 h(z) = , z = x1 + x2 , ρ1 = , ρ2 = −1, z ≤ 1 1 − β1 1 − β2 Normally, the parameters used in the Saito’s chaotic system are: η = 1.3, α1 = 7.5, α2 = 15, β1 = 0.16 and β2 = 0.097. 6. 4D L¨ u system: The L¨ u chaotic system has been widely used in chaotic control and synchronization [36], expressed as x˙ 1 = a(x2 − x1 ) + x4 , x˙ 2 = −x1 x3 + cx2 , x˙ 3 = x1 x2 − bx3 , x˙ 4 = x1 x3 + rx4 where a, b, c and r is parameter, and when a = 36, b = 3, c = 20, −0.35 ≤ r ≤ 1.3, the system has a hyperchaotic attractor. In this experiment, we choose r = 1.3, and set the state initials as [x1 (0), x2 (0), x3 (0), x4 (0)]T = [3, 4, 7, 6]T . The average RMSE presented in Table 1 shows that the proposed quaternion and augmented quaternion representation based random neural networks (IQELM and IAQ-ELM) are more efficient than QLMS, QBP, and I-ELM with the concatenated signal among channels. Particularly, IQ-ELM and IAQ-ELM win the lowest RMSE on all chaotic systems. Table 1. RMSE on 3D/4D Chaotic Systems Dataset
QBP QLMS I-ELM IQ-ELM IAQ-ELM RMSE Train time RMSE Train time RMSE Train time RMSE Train time RMSE Train time
Lorenz sequence 0.3242 8.0965 ± 0.0010
0.0696 3.8532 ± 0.0170
0.0775 0.4992 ± 0.0050
0.0668 ± 0.0030
1.1388
0.0632 1.1388 ± 0.0055
Rosslor system
0.3164 1.5444 ± 0.0022
0.0535 2.2932 ± 0.0180
0.0827 0.1404 ± 0.0182
0.0523 ± 0.0148
0.8112
0.0480 0.4524 ± 0.0038
Chua’s circuit
0.6060 1.6536 ± 0.0025
0.0692 1.1700 ± 0.0040
0.0779 0.0780 ± 0.0104
0.0613 0.5148 ± 0.0025
0.0813 ± 0.0060
0.8268
Chen’s equation 0.2711 1.6848 ± 0.0048
0.0279 1.2168 ± 0.0334
0.0378 0.0624 ± 0.0115
0.0272 0.5148 ± 0.0064
0.0363 ± 0.0066
0.4524
Saito’s circuit
0.1029 1.6068 ± 0.0070
0.0374 0.6240 ± 0.0500
0.0343 0.1872 ± 0.0105
0.0272 ± 0.0044
1.0452
0.0269 0.6396 ± 0.0084
L¨ u system
0.4305 1.7160 ± 0.0090
0.3242 0.7020 ± 0.0640
0.1656 0.0936 ± 0.0667
0.0876 ± 0.0192
0.4680
0.0663 0.5304 ± 0.0170
4.2
Aircraft Trajectory Tracking
Aircraft trajectory tracking (ATT) is simulated using a linear system through its position coordinates (x, y, z) of three directions and their associated speeds with
186
X. Cui et al.
respect to time (x, ˙ y, ˙ z) ˙ [4]. The motion state of an aircraft can be expressed as X(k) = [x, x, ˙ y, y, ˙ z, z] ˙T The linear system of an aircraft trajectory is given as X(k + 1) = F (k)X(k) In general, the measurement is simulated as Z(k) = H(k)X(k) +
√ 200G(k)
where F (k) and H(k) are the state transition and observation matrices, respectively, with ⎤ ⎡ 1T 1 0 0 0 0 0
⎢0 ⎢ ⎢0 F (k) = ⎢ ⎢0 ⎣0
0 0 1 0 0 0
0 0 T 1 0 0
0 0 0 0 1 0
0 0⎥ ⎥ 100000 0⎥ ⎥ , H(k) = 0 0 1 0 0 0 0⎥ 000010 T⎦ 1
Table 2. RMSE on aircraft trajectory tracking Dataset Algorithm
RMSE
std
Training Time (s)
ATT
0.3014 0.1512 0.2690 0.2272 0.2106
0.0021 0.0237 0.0200 0.0060 0.0128
2.0748 0.6708 0.1716 0.8424 0.4992
QBP QLMS I-ELM IQ-ELM IAQ-ELM
Fig. 2. The iterative error on aircraft tracking
Incremental Quaternion Random Neural Networks
187
G(k) ∈ R3×1 denotes the measurement noise, obeying the Gaussian distribution, and T is sampling time. Let T = 1 s in this experiment, 300 samples are generated for performance testing. The average RMSE presented in Table 2 further validate the superiority of the proposed IAQ-ELM/IQ-ELM. Figure 2 shows the comparisons on the iterative training RMSE on the aircraft trajectory tracking. The proposed IAQ-ELM performs consistently better than IQ-ELM and I-ELM. 4.3
Face Recognition and Image Classification
Two benchmark color face datasets, Faces96 and Grimace [37], are tested in this section. The two datasets include 3,040 images from 152 individuals and 360 images from 18 individuals, respectively. For each image sample, the pure quaternion with its three imaginary parts is adopted as the perfect representation of the R, G, B colors of each pixel. For each dataset, it has been randomly divided into training and testing datasets with the ratio of 1:1 from each subject. Besides, the BSDS300 dataset including 300 grayscale and color segmentation images is also tested. The dataset is divided into the training and testing datasets with the ratio of 2:1 in this experiment. Particularly, we choose the data of two categories, animals and scenes, for performance evaluation, where the color image with size of 481 × 321 × 3 is represented using the pure quaternion matrix with the size of 481 × 321. The accuracy shown in Table 3 demonstrates the superiority of the proposed IQ-ELM/IAQ-ELM over the original I-ELM. Apparently, using the concatenated R, G, B vector in I-ELM breaks the structure and correlation information among different channels. It is also noted that the augmented quaternion vector outperforms the basic quaternion representation, as highlighted using the bold font. Further, the iterative classification accuracy on the Grimace dataset depicted in Fig. 3 shows that IAQ-ELM/IQ-ELM consistently outperforms I-ELM.
Fig. 3. The iterative classification accuracy on Grimace.
188
X. Cui et al.
For all experiments, the network training times are also listed for comparisons. The average training time shown in Table 1, 2 and 3 reveals that for all chaotic system predictions and aircraft trajectory tracking, IQ-ELM/IAQELM generally learns faster than QLMS and QBP, but slower than I-ELM. On one hand, IQ-ELM/IAQ-ELM inherits the fast training advantage of ELM, on the other hand, extension to using quaternions slightly increases the computational burden in model training. For the face and image recognitions, a comparable training time has been spent by IQ-ELM/IAQ-ELM when comparing with QLMS and QBP. Table 3. Accuracy on face recognition and image classification datasets (%).
5
Dataset
I-ELM IQ-ELM IAQ-ELM Accuracy Train time Accuracy Train time Accuracy Train time
Face96
71.60 ± 1.40
106.7203
92.76 ± 0.68
825.1985
92.99 ± 0.54
2197.6998
Grimace
96.11 ± 1.51
17.0041
98.89 ± 0.42
78.9989
99.44 ± 0.47
361.3295
BSDS300 66.52 ± 8.94
16.9105
71.74 ± 5.37
124.8320
72.73 ± 0.03
540.4343
Conclusions
We developed the incremental learning scheme to the quaternion RNN by proposing two novel algorithms, IQ-ELM and IAQ-ELM. The optimization of the output weight is established based on the GHR of quaternion function. The proposed IQ-ELM/IAQ-ELM algorithms achieved better regression and classification performance than the real-value I-ELM. Compared with QBP and QLMS, the proposed IQ-ELM/IAQ-ELM obtained better performance with a lower computational complexity in model training. In the future, research will be focused on enhancing the efficiency of the quaternion based algorithm and exploring the superiority of quaternion on other RNN algorithms. The current preliminary idea is to develop an efficient quaternion matrix decomposition algorithm to make the derivation of the pseudo-inverse of a high-dimensional quaternion matrix more feasible and efficient. Acknowledgment. This work was supported by the National Natural Science Foundation of China (U1909209), the National Key Research and Development Program of China (2021YFE0100100, 2021YFE0205400) and the Open Research Projects of Zhejiang Lab (2021MC0AB04).
Incremental Quaternion Random Neural Networks
A A.1
189
Appendices Proof of Proposition 1
With the well established GHR of quaternion variable function and Lemmas 2∼4, the optimization of (7) can be solved by finding the gradient of the objective to βn . It is observed that the norm of the objective residual error ||en ||2 can be equivalently calculated as en e∗n or e∗n en , where e∗n is the conjugate of en . With the left- and right-derivatives, and the product rule in quaternion derivative, one can readily find that the same results will be obtained either on en e∗n or e∗n en . We take the derivative to e∗n en as an example in this paper, with Lemma 3, which can be expressed as ∂en ∂||en ||2 ∂e∗n = e∗n + en ∂βn ∂βn ∂βn en ∂(en−1 − βn σn ) ∂(en−1 − βn σn )∗ = e∗n + en ∂βn ∂βn en Since en−1 and βn are not related,
∂en−1 βn
(15)
= 0. With (4), we have
∂en ∂e∗n ∂||en ||2 = e∗n + en ∂βn ∂βn ∂βn en ∂(en−1 − βn σn ) ∂(en−1 − βn σn )∗ = e∗n + en ∂βn ∂βn en ∂(−σn∗ βn∗ ) = −e∗n S(σn ) + en ∂βn en 1 = −e∗n S(σn ) + σn∗ e∗n 2 1 1 = −e∗n S(σn ) + S(σn )e∗n − V (σn )e∗n 2 2 1 = − σn (e∗n−1 − σn∗ βn∗ ). 2 Here, en−1 denotes the residual error of previous n − 1 nodes, σn represents the n-th hidden node output, S(σn ) and V (σn ) are the real (scalar) and imaginary 2 n || = 0, the objective (7) reaches parts of the output σn , respectively. When ∂||e ∂βn the minimum, so we have 1 − σn (e∗n−1 − σn∗ βn∗ ) = 0 2 σn e∗n−1 = σn σn∗ βn∗ βn∗ = If and only if βn = the proof.
∗ en−1 σn ||σn ||2 ,
σn e∗n−1 ||σn ||2
the objective (7) reaches the minimum. That finishes
190
X. Cui et al.
References 1. Javidi, S., Took, C.C., Mandic, D.P.: Fast independent component analysis algorithm for quaternion valued signals. IEEE Trans. Neural Netw. 22(12), 1967–1978 (2011) 2. Took, C.C., Mandic, D.P.: The quaternion LMS algorithm for adaptive filtering of hypercomplex processes. IEEE Trans. Signal Process. 57(4), 1316–1327 (2009) 3. Ujang, B.C., Took, C.C., Mandic, D.P.: Quaternion-valued nonlinear adaptive filtering. IEEE Trans. Neural Netw. 22(8), 1193–1206 (2011) 4. Jahanchahi, C., Mandic, D.P.: A class of quaternion Kalman filters. IEEE Trans. Signal Proce. 58(7), 3895–3901 (2010) 5. Tobar, F.A., Mandic, D.P.: Quaternion reproducing kernel Hilbert spaces: existence and uniqueness conditions. IEEE Trans. Infor. Theory 60(9), 5736–5749 (2014) 6. Arena, P., Fortuna, L., Re, R., et al.: Multilayer perceptions to approximate quaternion valued function. Int. J. Neural Syst. 6(4), 435–446 (1995) 7. Greenblatt, A.B., Agaian, S.S.: Introducing quaternion multi-valued neural networks with numerical examples. Inf. Sci. 423, 326–342 (2017) 8. Gaudet, C., Maida, A.: Deep quaternion networks. CoRR , arxiv.org/abs/1712.04604 (2018) 9. Zhu, X., Xu, Y., Xu, H., Chen, C.: Quaternion convolutional neural networks. CoRR, arxiv.org/abs/1903.00658 (2019) 10. Schmidt, W., Kraaijveld, M., Duin, R.: Feedforward neural networks with random weights. In: Proceedings of 11th IAPR International Conference on Pattern Recognition, vol. II. pp. 1–4 (1992) 11. Pao, Y., Park, G., Sobajic, D.: Learning and generalization characteristics of random vector functional-link net. Neurocomputing 6, 163–180 (1994) 12. Zhang, Y., Wu, J., Cai, Z., Philip, B., Yu, S.: An unsupervised parameter learning model for RVFL neural network. Neural Netw. 112, 85–97 (2019) 13. Huang, G., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17(4), 879–892 (2006) 14. Cao, J., Zhang, K., Yong, H., Lai, X., Chen, B., Lin, Z.: Extreme learning machine with affine transformation inputs in an activation function. IEEE Trans. Neural Netw. Learning Syst. 30(7), 2093–2107 (2019) 15. Yang, L., Song, S., Li, S., Chen, Y., Huang, G.: Graph embedding-based dimension reduction with extreme learning machine. IEEE Trans. Syst. Man. Cybern. Syst. (2019). https://doi.org/10.1109/TSMC.2019.2931003. 16. Chen, H., Wang, T., Cao, J., Vidal, P.-P., Yang, Y.: Dynamic quaternion extreme learning machine. IEEE Trans. Circ. Syst. II Express Briefs 68(8), 3012–3016 (2021) 17. Cao, J., Dai, H., Lei, B., Yin, C., Zeng, H., Kummert, A.: Maximum correntropy criterion-based hierarchical one-class classification. IEEE Trans. Neural Netw. Learn. Syst. 32(8), 3748–3754 (2021) 18. Lai, X., Cao, J., Lin, Z.: An accelerated maximally split ADMM for a class of generalized ridge regression. IEEE Trans. Neural Netw. Learn. Syst. (2021). https:// doi.org/10.1109/TNNLS.2021.3104840 19. Minemoto, T., Isokawa, T., Nishimura, H., Siew, C.K.: Feed forward neural network with random quaternionic neurons. Signal Process. 136, 59–68 (2017) 20. Zhang, H., Wang, Y., Xu, D., Wang, J., Xu, L.: The augmented complex-valued extreme learning machine. Neurocomputing 311(15), 363–372 (2018)
Incremental Quaternion Random Neural Networks
191
21. Xu, D., Xia, Y., Mandic, D.P.: Optimization in quaternion dynamic systems: gradient, Hessian, and learning algorithms. IEEE Trans. Neural Netw. Learning Syst. 27(2), 249–261 (2016) 22. Xu, D., Mandic, D.P.: The theory of quaternion matrix derivatives. IEEE Trans. Signal Process. 63(6), 1543–1556 (2015) 23. Mandic, D.P., Jahanchahi, C., Took, C.C.: A quaternion gradient operator and its applications. IEEE Signal Process. Lett. 18(1), 47–50 (2011) 24. Took, C.C., Mandic, D.P.: Augmented second-order statistics of quaternion random signals. Signal Process. 91(2), 214–224 (2011) 25. Via, J., Ramirez, D., Santamaria, I.: Properness and widely linear processing of quaternion random vectors. IEEE Trans. Inf. Theory 56(7), 3502–3515 (2010) 26. Xiang, M., Kanna, S., Mandic, D.P.: Performance analysis of quaternion- valued adaptive filters in nonstationary environments. IEEE Trans. Signal Process. 66(6), 1566–1579 (2018) 27. Zhang, H., Lv, H.: Augmented quaternion extreme learning machine. IEEE Access 7, 90842–90850 (2019) 28. Huang, G., Li, M., Chen, L., Siew, C.K.: Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing 71(4–6), 576–583 (2008) 29. Sudbery, A.: Quaternionic analysis. Math. Proc. Camb. Philos. Soc. 85(2), 199–225 (1979) 30. Ell, T.A., Sangwine, S.J.: Quaternion involutions and anti-involutions. Comput. Math. with Appl. 53(1), 137–143 (2007) 31. Xu, D., Jahanchahi, C., Took, C.C., Mandic, D.P.: Enabling quaternion derivatives: the generalized HR calculus. Roy. Soc. Open Sci. 2(8), 1–24 (2015) 32. Rumelhart, D.E., and McClelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, p. 567. MIT Press (1986) 33. R¨ ossler, O.E.: An equation for continuous chaos. Phys. Lett. A 57(5), 397–398 (1976) 34. Matsumoto, T.: A chaotic attractor from Chua’s circuit. IEEE Trans. Circuits Syst. 31(12), 1055–1058 (1984) 35. Chen, G., Ueta, T.: Yet another chaotic attractor. Int. J. Bifurc. Chaos. 9(7), 1465–1466 (1999) 36. Chen, A., Lu, J., L¨ u, J., Yu, S.: Generating hyperchaotic L¨ u attractor via state feedback control. Stat. Mech. Appl. 364, 103–110 (2006) 37. Spacek, L.: Description of the collection of facial images (2011). http://cswww. essex.ac.uk/mv/allfaces/index.html
Application
Question Answering on Agricultural Knowledge Graph Based on Multi-label Text Classification Pengxuan Zhu1,2 , Yuan Yuan1 , Lei Chen1(B) , and Huarui Wu3 1 Institute of Intelligent Machines, HFIPS, Chinese Academy of Sciences, Hefei 230031, China
[email protected] 2 University of Science and Technology of China, Hefei 230026, China 3 Beijing Research Center for Information Technology in Agriculture, Beijing 100097, China
Abstract. Traditional search engines retrieve relevant web pages based on keywords in the entered questions, while sometimes the required information may not be included in these keyword-based retrieved web pages. Compared to the search engines, the question answering system can provide more accurate answers. However, traditional question answering systems can only provide answers to users based on matching the questions in a question answering pair. At the same time, the number of question answering pairs remain somewhat limited. As a result, the user’s requirements cannot be met well. In contrast, knowledge graphs can store information such as entities and their relationships in a structured pattern. Therefore, the knowledge graph is highly scalable as the data is stored in a structured form. Besides, the relationship between entities and the knowledge graph structure allows the desired answer to be found quickly. Moreover, the process of relation classification can also be regarded as an operation of text classification. Therefore, this study proposed a new approach to knowledge graph-based question answering systems that require a named entity recognition method and a multi-label text classification method to search for the answers. The results of entity name and question type are turned into a Cypher query that searches for the answer in the knowledge graph. In this paper, three models, i.e., TextCNN, bi-LSTM, and bi-LSTM + Att, are used to examine the effectiveness of multilabel text classification, demonstrating our method’s feasibility. Among these three models, TextCNN worked best, attaining an F1 score of 0.88. Keywords: Agricultural information presentation and metrics · Knowledge graph · Question answering system · Multi-label text classification
1 Introduction In recent years, several information technologies have been employed in daily life and production. In addition to industrial applications, these advanced information technologies are also applied in some agricultural fields to help solve problems encountered in the production process [1, 2]. The field of agriculture is an open environment with multi-source and heterogeneous information [3–7]. Therefore, having access to valid agricultural information is critical to improving production efficiency. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 195–208, 2023. https://doi.org/10.1007/978-981-99-0617-8_14
196
P. Zhu et al.
At the same time, with the continuous development of big agricultural data, people have higher requirements for precision and intelligence in information retrieval. Traditional search engines use document sorting and keyword matching as their leading technologies [8]. In addition, various technical difficulties prohibit meeting current requirements and have made the shortcomings of the existing methods increasingly apparent [9]. In particular, current methods do not understand the semantic requirements of the user’s search through keyword matching. Therefore, the search results are not sufficiently accurate. Meanwhile, the traditional search engines yield the return results as many alternative answer lists, making it difficult for users to accurately and quickly locate the required information, potentially requiring a second search. In 2011, Etzioni indicated that the question answering system is the basic form of the nextgeneration search engine [10], pointing to future research directions for accurate and intelligent information retrieval. The question answering system supports questions in natural human language sentences and returns the exact answer the user needs. However, there are two main problems with a traditional question answering system [11]. First, it is designed primarily for daily problems and cannot suggest practical answers to specialist agricultural questions. Second, the traditional question answering systems require many question and answer pairs to match the answers. By computing the similarity between the input question and the question in the question answering pairs, the answer corresponding to the question with the highest similarity is selected and returned. In addition, these pairs are usually compiled by experts in their respective fields, requiring significant time and labor costs and limiting the number of question answering pairs. To solve the above problem, the concept of a knowledge graph was proposed by Google in 2012. A new information retrieval technology was created by the emergence of the knowledge graph [12]. The basic unit of knowledge graph is a triple, < entity1, relation, entity2 >, where entity1 and entity2 exist in the real world, and the relation is used to describe the relationship between these two entities. As an advanced technology, many industry-specific information applications are supported by knowledge graphs. In the medical field, the medical knowledge graphs are utilized for drug analysis and disease diagnosis [13, 14]. It can also be applied in the financial field to avoid financial fraud1 . Furthermore, knowledge graph also plays a crucial role in other fields such as recommendation systems and information retrieval [15]. Various question answering systems based on knowledge graphs or knowledge bases have been proposed in recent years. Bao et al. [16] proposed a new method to solve multiconstraint questions based on basic query graph, constraint detection, constraint binding, search space generation, and features ranking. Huang et al. [17] proposed to answer a question by recovering its head, predicate, and tail entity representation in the knowledge graph embedding space. Zhang et al. [18] proposed a new model architecture and an end-to-end variational learning algorithm to address noise and multi-hop reasoning. The core of question answering system is locating the accurate answers quickly. However, typical question answering systems are only applied to answer common questions in daily life. Therefore, creating a specific question answering system in the agricultural field is necessary to help farmers or farm managers improve their production efficiency. Cao et al. [19] used document topics to generalize the search terms, while Kawamura et al. [20] 1 https://www.markrweber.com/graph-deep-learning.
Question Answering on Agricultural Knowledge Graph
197
proposed to link documents in an ontology by applying a knowledge base. Gaikwad et al. [21] employed TF-IDF and its variants with other scores to select candidate documents. In this study, we collected public data from the Internet and stored it in a Neo4j database after structuring it. Based on this particular model, the range of entities that can be answered by the question answering system is the entity nodes stored in the current database. Therefore, it only needs to determine whether the question contains entities within the scope of the database, which significantly reduces the consumption of entity matching. Besides, according to the structure of the knowledge graph, entity2 can be found through entity1 with the relation in a triple. By this method, a question answering system can be made. To identify the entities and their corresponding relation for a given problem, the named entity recognition method and the text classification method must be used. This strategy avoids using an external parser and allows better searching of data in the knowledge graph to answer questions. Text classification is a classic information processing problem in natural language processing. Traditional text classification is mainly based on expert rules and feature engineering [22, 23]. However, traditional efforts at feature-based text classification perform as expected and require much time to create features manually. In recent years, deep learning models have provided remarkable results in computer vision [24], which can automatically learn data features and achieve better results on different tasks. Based on the good generalization ability of the deep learning model, these models are trained on corpora or datasets in different fields to obtain better results. In 2014, CNN and pretrained word vectors [25] were used for text classification tasks. The model achieved excellent performance using neural networks to extract more efficient text features automatically. Then, a text classification method based on recurrent structure to capture contextual information [26] was proposed. When learning the representation of words, the recurrent structure can better obtain the text information to help reduce noise more than traditional window-based neural networks. In 2016, bi-LSTM based on the attention mechanism [27] was used for relation classification tasks and achieved good performance. Then, in 2017, the Google team [28] proposed a Transformer network structure based on the attention mechanism. It proved that the effect of neural networks could be improved by relying on the attention mechanism. In a triple structure like , entity2 can be found through entity1 with the relation between entities. First, entity1 can be identified using the named entity recognition method with the stored knowledge graph entities. After that, the process of identifying relation types can be regarded as text classification. In addition, entity1 can be linked to entity2 through different relationships, so this process can also be considered a multi-label text classification. In this study, a multi-label text classification method was used to identify the relation in the knowledge graph. Because these three TextCNN, bi-LSTM, and bi-LSTM + Att models are efficient and relatively simple, they also can be applied for multi-label text classification experiments on agriculture-related question texts. According to the experimental results, these models work well for multilabel classification tasks. Finally, the above results can be converted into a Cypher query for searching answers in the Neo4j database. The contributions in this study are as follows:
198
P. Zhu et al.
(i)
We constructed a knowledge graph representation of unstructured public agricultural data from Internet. (ii) Two different methods were proposed for creating agricultural-related multi-label text classification datasets. (iii) A question answering system was constructed based on the multi-label text classification, agricultural knowledge graph, and named entity recognition.
2 Materials and Methods From a macro perspective, this system consists of the question answering system and the agricultural knowledge graph. In this study, creating agricultural knowledge graph relies on structured data. The core of the question answering system relies on the established knowledge graph, named entity recognition, and multi-label text classification. The framework of the whole system is shown in Fig. 1. entity + classification label classification results
neural network architecture
generate cypher
public website
unstructured data
build dataset entity
search answer
text preprocessing structured data
entity attributes
agricultural knowledge graph
store as json
entity relation
answer
build neural network
train neural network
export data question
Fig. 1. The complete framework of this system.
2.1 Materials There is very little agricultural encyclopedia data on the Internet, and even the available data lack the corresponding structured data. In continuous searching and screening, these two public websites2,3 were finally selected, which contained appropriate agricultural information as data resources. Different sites have different data formats and attributes. It needs to select different agricultural data types on these websites according to the completeness of the data and data requirements. Ultimately, 11 types of data were selected as data resources, which include: aquatic products, Chinese herbal medicine, commercial crops, crop diseases and insect pests, edible mushrooms, fruits, pesticides, food crops, flowers and trees, veterinary medicine, and vegetables. Different types of data have different attributes. For example, the type of crop diseases and insect pests includes symptoms, pathogens, damage characteristics, morphological characteristics, prevention methods, and hosts. Some other data do not include these attributes. 2 http://shuju.aweb.com.cn/breed/breed-1-1.shtml. 3 http://tupu.zgny.com.cn/.
Question Answering on Agricultural Knowledge Graph
199
Web crawler method was used to collect these public data from the Internet. These unstructured raw data were stored as text files according to their types and titles. For a given case, the attributes may include the place of origin, cultivation methods, reproduction methods, and habits. However, these attributes and their descriptions are contained in long paragraphs of text as typically unstructured data. Thus, these unstructured data require some manually written rules for structured processing to extract attributes by attribute name. Due to the diversity of Chinese descriptions, the prompt words of the attributes themselves are also diverse. In addition, these attribute descriptions contain special symbols. Therefore, our strategy of data preprocessing consists of two steps. The data preprocessing is shown in Fig. 2. (i) Removing the special symbols in the text. (ii) Extracting the corresponding information based on the prompt words in the text.
original data
remove special symbols
structured data
manual check
extract attributes of data
data without special symbols
relatively clean data
Fig. 2. The data preprocessing process.
A random sample of data can be used to check for types of special symbols that may appear consecutively in the text. Then, these special symbol conditions can be used to write appropriate rules to remove these special symbols. For the prompt words of the attributes themselves, a dictionary of prompt words for the various attributes was compiled and extracted separately from the long text. Moreover, the triple is the basic unit of the knowledge graph, and the standard triple format is . In the agricultural knowledge graph determined by this system, entity1 is defined as the specific name of the above 11 types of data. The relation is used to define the attribute. The entity2 is the corresponding description of entity1 through the relation (attribute). For instance, entity1 is watermelon, and entity2 is the place where the watermelon is produced. The relation between entity1 and entity2 is the fruit origin. Therefore, according to the specialty of the triple data, for the preprocessed text, a series of data is stored as a node in a JSON (JavaScript Object Notation) format file in terms of its name, type, and data attributes, and other information. As a result of these partial processes, the data is standardized and structured. The total number of structured agricultural data of all types is 8400+. The agricultural knowledge graph can be created with these structured data. It consists of the following two steps: (i) Creating nodes and their attributes. (ii) Creating the relationships between nodes.
200
P. Zhu et al.
Fig. 3. A part of the completed agricultural knowledge graph. The apple is a subclass of fruit. The red parts are the types of apples, and the orange parts are the different types of fruit such as banana, grape, peach and so on. (Since the data source is Chinese, the data stored in the knowledge graph is likewise in Chinese. Therefore, the part of the agricultural knowledge graph intercepted in Fig. 3 can only be represented in Chinese.)
Figure 3 shows part of the agricultural knowledge graph, including apples under the fruit type, including different apple varieties.
Fig. 4. The process of generating dataset. This image contains four expressions of the same question in the Chinese context, and the translation of the question and question label is given in the picture. (Our agricultural question answering system is a Chinese question answering system, so the examples given shows some of the methods for asking questions about an attribute in Chinese. All four methods in the figure ask the same question as illustrated by the Question in Fig. 4. Also, we provided the pinyin annotations for all Chinese questions.)
Question Answering on Agricultural Knowledge Graph
201
Another part of completing the question answering task is multi-label text classification. Usually, deep learning models require a considerable deal of training data for optimization. However, there is no ready-made agricultural dataset to train a multi-label text classification model. With this in mind, a series of various agricultural named entities were saved while creating the agricultural knowledge graph, such as fruits, mushrooms, and flowers. As mentioned above, different types of entities have different attributes, but all of them have similar questioning approaches to Chinese expressions. A question list was created for questioning these attributes, which can help solve this problem. As shown in Fig. 4, taking watermelon as the example entity and the watermelon origin as the example question type, multiple expressions of the question can be obtained by combining entities with different questioning styles. At the same time, complex question expressions were considered, such as asking for two attributes of a single entity in the one question. The method of combination and permutation method were used to choose two different attributes from whole attributes in each entity type with its questioning methods to obtain a multi-label question type. In the datasets, some different types of data have the same attributes, and 47 attributes are included after de-duplication processing. A list of different questioning methods was designed for diverse attributes. Based on these attributes, two different methods of multi-label question text combination were proposed. The first method does not distinguish entity types. In this study, 11 datasets were used, 50% of the entities were extracted to generate the training set, and 40% were utilized to generate the test set. The training set entities and the test set entities are not duplicated. Based on the method in Fig. 4, the single-label text can be obtained by splicing the entity with the corresponding attribute questioning approach. After that, a random method was used first to select one of the 47 types of labels. To obtain the second label, any random number was subtracted from 1 to 5 from the serial number corresponding to the label. Each entity is used to generate 50 different multi-label combinations.
Variety Source
Units
Features
Productions
Origin
Fig. 5. Schematic diagram of sorting the attributes included in each type (taking the fruit type as an example)
The second method distinguishes entity types. In this question generation method, we expanded the types of questioning methods based on the first method. Two different attributes were extracted from the number of attributes contained in the entity type, and arranged them to generate a multi-label problem. The generation method is shown in Fig. 5. For the questioning methods of various attributes in each type, these questioning
202
P. Zhu et al.
methods were integrated through connectives based on the Chinese questioning habits to obtain a multi-label text. Since the training set has more questions than the test set, 10% of the entities were extracted to generate the training set and 30% to generate the test set to maintain the data balance in each data type. Accordingly, different questioning methods corresponding to different relationships and thus question labels can be generated. 2.2 Method In this study, multi-label text classification and the named entity recognition method were used to create a question answering system. Firstly, in creating the agricultural knowledge graph, the names of all entities were stored as text file. Therefore, it only required checking whether the question text contains entities within the scope so that it can get the named entity of the question. After that, a multi-label text classification method was applied to determine the attributes corresponding to the entities in the question text. The question entity and question type can be turned into Cypher code to search corresponding answers from the agricultural knowledge graph. As mentioned in Sect. 2.1, the dataset was generated by manual rules. The agricultural field still lacks appropriate pre-trained Chinese word vectors in the multi-label problem. Thus, the character-level text classification was chosen in this task. According to the above description, no pre-trained word vector was used in this experiment. So 300 dimension char vector was utilized for char level embedding. For all these models, the maximum length of the input text was limited to 32. All characters were represented through 300-dimensional vectors. Let xi be the i-th word in the sentence. Hence, a sentence consists of n (n = 32) words represents as the Eq. (1), where ⊕ is the concatenation operator. x1:n = x1 ⊕ x2 ⊕ . . . ⊕ xn
(1)
TextCNN, bi-LSTM, and bi-LSTM + Att three models were used to solve multi-label text classification. e CNN [25] model applied in this experiment is a basic CNN model that only contains a convolutional layer with a ReLU activation function, a fully connected layer with a sigmoid activation function, and a max-pl layer. It used multiple convolution kernels of different sizes to perform convolution on the input text. The RNN model used in this experiment is bi-LSTM. LSTM units were proposed in [29]. The core idea of LSTM is to present an adaptive gating mechanism. This mechanism determines how much the unit in the LSTM retains the previous content. Then, data features are extracted based on previously retained content and current input. An LSTM [30] includes an input, forget, and output gate. In Eq. (2) to Eq. (6) below, i, f , c, o, h are respectively the input gate, forget gate, output gate, cell vectors and hidden vector. The subscript t indicates the state at moment t. σ is the logistic sigmoid function and tanh is activation function. The detailed calculation process is as follows: it = σ (Wxi xt + Whi ht−1 + Wci ct−1 + bi )
(2)
ft = σ Wxf xt + Whf ht−1 + Wcf ct−1 + bf
(3)
Question Answering on Agricultural Knowledge Graph
203
ct = it tanh(Wxc xt + Whc ht−1 + Wcc ct−1 + bc ) + ft ct−1
(4)
ot = σ (Wxo xt + Who ht−1 + Wco ct + bo )
(5)
ht = ot tanh(ct )
(6)
where Wxi , Whi , Wci in Eq. (2) are the weight matrices corresponding to the current input xt , the last hidden layer output ht−1 and the last state of the cell ct−1 respectively. b is the bias vector. Each weight matrix inhe remaining equations corresponds to the vector behind it, and the meaning is similar to the above description. Since bi-LSTM was utilized in this experiment, the outpu i-th word contains the left and right sequence, as shown in Eq. (7). − → ← − (7) hi = hi ⊕ hi The application of the attention mechanism to neural networks has obtained good results in speech recognition, machine translation, and other fields [31, 32]. At the same time, bi-LSTM + Att also achieved good results in some tasks related to classification [27]. This study applied the attention mechanism in the multi-label text classification task. Let H (H ∈ d w × N ) be a matrix consisting of output vectors [h1 , h2 , . . . , hN ] produced by the bi-LSTM layer, where N (N = 32) represents the length of sentence, d w (d w = 300) represents the word dimension of word vectors. The representation of the input sentence s is computed through Eq. (8), where ω is a weighted matrix optimized during the training process. The ultimate sentence representation h∗ is calculated by Eq. (9). T s = H softmax ωT tanh(H )
(8)
h∗ = tanh(s)
(9)
3 Results The two datasets mentioned in Sect. 2.1 were used to train the multi-label text classification model. The entities used in the training set and the test set were different. A 2080Ti GPU was used to train these models. Recall, Precision and F1-score were utilized as evaluation metrics. 3.1 Experiment Setting Experiment Setting Data. In this experiment, only the dataset generated above was used. Both training and test datasets contain single-label text and multi-label text. According to the different generation methods, there are two different types of datasets. The training sets and test sets are shown in Table 1.
204
P. Zhu et al.
Table 1. Details of the dataset. No distinction represents the data obtained by the first method. Dataset type
Label type
Data size
Total
Training set (No distinction)
Single label
521 151
920 640
Multiple label
399 489
Test set (No distinction)
Single label
416 724
Multiple label
319 887
Training set
Single label
118 077
Multiple label
482 634
Single label
54 333
Multiple label
292 179
Test set
736 611 600 711 346 512
Hyper-Parameter Settings. Adam optimizer was used in the update procedure. For the dropout operation, the hidden unit activities were randomly set to 0 with a probability of 0.5 during training. More detailed parameters of three models are represented in Tables 2, 3 and 4. In addition, during model training, the training would end early if the loss value of the test set does not continue to decrease.
Table 2. Parameters used in TextCNN model. Filter size
Feature maps
Word dimension
Batch size
Question length
Dropout probability
Learning rate
2,3,4,5
128
300
32
32
0.5
1 × 10−3
Table 3. Parameters used in bi-LSTM model. Layers
Hidden size
Word dimension
Batch size
Question length
Dropout
Learning rate
2
128
300
32
32
0.5
1 × 10−1
Table 4. Parameters used in bi-LSTM + Att model. Layers
Hidden size1
Hidden size2
Word dimension
Batch size
Question length
Dropout
Learning rate
2
128
64
300
32
32
0.5
1 × 10−3 1 × 10−2
Question Answering on Agricultural Knowledge Graph
205
3.2 Experimental Results TextCNN, bi-LSTM, and bi-LSTM + ATT were used to train the datasets that distinguish entity types. TextCNN and bi-LSTM + Att models were used to train the datasets that do not distinguish entities. The final results are shown in Table 5. Table 5. Experimental results based on different models and data generation methods. Recall
Precision
F1-score
TextCNN (No distinction)
0.50 ± 0.02
0.80 ± 0.03
0.59 ± 0.03
Bi-LSTM + Att (No distinction)
0.60 ± 0.04
0.61 ± 0.06
0.60 ± 0.05
TextCNN
0.80 ± 0.03
0.96 ± 0.01
0.88 ± 0.02
bi-LSTM
0.56 ± 0.06
0.62 ± 0.1
0.60 ± 0.07
Bi-LSTM + Att
0.86 ± 0.04
0.85 ± 0.06
0.87 ± 0.03
It can be observed that TextCNN and bi-LSTM + Att models have better classification results on two different datasets. Therefore, both TextCNN and bi-LSTM + Att can be used as the ultimate text classification model. Moreover, the training of the model has a better effect on the dataset obtained by permuting and combining the attributes corresponding to the type compared to the dataset randomly generated without distinguishing the type.
4 Discussion In this study, we proposed a method of constructing an agricultural question answering system based on knowledge graph. The architecture of the question answering system mainly relies on the multi-label text classification method through deep learning models. Two different methods were used to generate the datasets to train an agricultural multilabel text classification model. From the experimental results, it can be found that the training result of the dataset distinguishing the entity type is significantly better compared to the dataset that not distinguishing the entity type. In addition, the number of training sets in a dataset that does not distinguish entity types is significantly higher compared to the dataset that distinguishes entity types. However, in the dataset distinguishing entity type composition, there are more multi-label text data than the latter, and single-label text data are far less than the data that do not distinguish entity types. The maximum length of texts was set as 32 in the experiments. For single-label text, the shortest training text length is only 3, which may cause data sparsity problems in the representation of text data. The shortest training text length for multi-label text is 7. The average length of the multi-label text is approximately 15.3, and the average length of the single-label text is approximately 9.3. From the perspective of feature extraction, multilabel text can better reduce the impact of sparse data. The datasets generated based on above two methods are divided into three intervals, 0–10, 11–20 and 21–32, depending on the length of the question text. The statistical results are shown in Fig. 6.
206
P. Zhu et al. 450000
Number of questions
400000
single
multi
350000 300000 250000 200000 150000 100000 50000 0 0~10
11~21
21~
Length of question text (a) Distinguish Entity Type
450000
Number of questionns
400000
single
multi
350000 300000 250000 200000 150000 100000 50000 0 0~10
11~21
21~
Length of question text (b) No Distinguish Entity Type
Fig. 6. Comparison of text lengths in datasets based on different generation methods.
It can be found that based on the method that distinguishes entity types, the length of the text in the dataset is mainly concentrated in the 11–21 interval, and most of the text is multi-label. On the contrary, the text length in the dataset generated through the method that does not differentiate entity types is mainly concentrated in the interval of 0–10. Most of these questions are single-label data. Therefore, it can be observed that more single-label and less multi-label data are contained in the dataset generated based on the method that does not distinguish entity types. This severe data sparsity problem leads to poor training results.
5 Conclusion The contributions of this study include the following aspects: (i)
An agricultural knowledge graph was created based on the public data from the Internet. (ii) Two methods were used for generating agricultural multi-label text classification datasets. (iii) Constructing a question answering system based on the agricultural knowledge graph.
Question Answering on Agricultural Knowledge Graph
207
In creating the agricultural knowledge graph, a large amount of structured data was constructed. These structured data can be used to create knowledge graphs and applied as effective data resources in other agricultural applications. The question answering system based on the agricultural knowledge graph can identify whether the entity in the question uses aliases, which can answer questions from users who used aliases for agriculture. At the same time, considering the good scalability of the knowledge graph, we look forward to continuing to collect more and better data to expand further the agricultural knowledge graph in future work. It can help the system expand the question range and improve the accuracy of answers. Furthermore, we will search for a better way to generate datasets to avoid the problem of sparse data and train the model more effectively. And we also consider using other better question answering methods to complete the question answering system based on the knowledge graph. Acknowledgment. The authors would like to thank the anonymous reviewers for their helpful reviews. The work is supported by the National Natural Science Foundation of China (Grant No. 32071901, 31871521) and the database in National Basic Science Data Center (NO. NBSDC-DB20).
References 1. Barbedo, J.: Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification. Comput. Electron. Agric. 153, 46–53 (2018) 2. Zhang, P., Yang, L., Li, D.: EfficientNet-B4-Ranger: A novel method for greenhouse cucumber disease recognition under natural complex environment. Comput. Electron. Agric. 176, 105652 (2020) 3. Jiang, S., Angarita, R., Chiky, R., Cormier, S., Rousseaux, F.: Towards the integration of agricultural data from heterogeneous sources: perspectives for the french agricultural context using semantic technologies. In: Proc. International Conference on Advanced Information Systems Engineering, pp. 89–94 (2020) 4. Coble, K., Mishra, A., Ferrell, S., Griffin, T.: Big data in agriculture: a challenge for the future. Appl. Econ. Perspect. Policy 40(1), 79–96 (2018) 5. Kamilaris, A., Kartakoullis, A., Prenafeta-Boldú, F.: A review on the practice of big data analysis in agriculture. Comput. Electron. Agric. 143, 23–37 (2017) 6. Kasinathan, T., Singaraju, D., Uyyala, S.: Insect classification and detection in field crops using modern machine learning techniques. Inf. Process. Agricu. 8(3), 446–457 (2021) 7. Kurmi, Y., Gangwar, S.: A leaf image localization-based algorithm for different crops disease classification. Inf. Process. Agric. (2021). (available online) 8. Voorhees, E.: Natural language processing and information retrieval. Information Extraction 32–48 (1999) 9. Lemos, J., Joshi, A.: Search engine optimization to enhance user interaction. In: Proc. International Conference on IoT in Social, Mobile, Analytics and Cloud, pp. 398–402 (2017) 10. Etzioni, O.: Search needs a shake-up. Nature 476(7358), 25–26 (2011) 11. Mori, T., Sato, M., Ishioroshi, M.: Answering any class of Japanese non-factoid question by using the web and example Q&A pairs from a social Q&A website. In: Proc. IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 59–65 (2008)
208
P. Zhu et al.
12. Mao, X., Li, X.: A survey on question and answering systems. J. Front. Comput. Sci. Technol. 6(3), 193–207 (2012) 13. Gong, F., Wang, M., Wang, H., Wang, S., Liu, M.: SMR: Medical knowledge graph embedding for safe medicine recommendation. Big Data Res. 23, 100174 (2021) 14. Li, L., et al.: Real-world data medical knowledge graph: construction and applications. Artif. Intell. Med. 103, 101817 (2020) 15. Huang, G., Yuan, M., Li, C., Wei, Y.: Personalized knowledge recommendation based on knowledge graph in petroleum exploration and development. Int. J. Patt. Recog. Artif. Intell. 34(10), 2059033 (2020) 16. Bao, J., Duan, N., Yan, Z., Zhou, M., Zhao, T.: Constraint-based question answering with knowledge graph. In: Proc. International Conference on Computational Linguistics: Technical Papers, pp. 2503–2514 (2016) 17. Huang, X., Zhang, J., Li, D., Li, P.: Knowledge graph embedding based question answering. In: Proc. the Twelfth ACM International Conference on Web Search and Data Mining, pp. 105– 113 (2019) 18. Zhang, Y., Dai, H., Kozareva, Z., Smola, A., Song, L.: Variational reasoning for question answering with knowledge graph. In: Proc. Thirty-second AAAI Conference on Artificial Intelligence, pp. 6069–6076 (2018) 19. Cao, L., Zhang, X., San, X., Chen, G.: Latent semantic index applied in question-answering system about agriculture technology. In: Proc. Advanced Materials Research, pp. 4785–4788 (2014) 20. Kawamura, T., Ohsuga, A.: Question-answering for agricultural open data. Transactions on Large-Scale Data-and Knowledge-Centered Systems XVI, pp. 15–28 (2014) 21. Gaikwad, S., Asodekar, R., Gadia, S., Attar, V.: AGRI-QAS question-answering system for agriculture domain. In: Proc. International Conference on Advances in Computing, Communications and Informatics, pp. 1474–1478 (2015) 22. Zhou, F., Zhang, F., Yang, B., Yu, X.: Research on short text classification algorithm based on statistics and rules. In: Proc: Third International Symposium on Electronic Commerce and Security, pp. 3–7 (2010) 23. Haralambous, Y., Lenca, P.: Text classification using association rules, dependency pruning and hyperonymization. arXiv Preprint arXiv:1407.7357 (2014) 24. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Proc. Neural Information Processing Systems (2012) 25. Kim, Y.: Convolutional neural networks for sentence classification. In: Proc. Empirical Methods in Natural Language Processing, pp. 1746–1751 (2014) 26. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Proc. Twenty-ninth AAAI Conference on Artificial Intelligence, pp. 2267–2273 (2015) 27. Zhou, P., et al.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proc. The 54th Annual Meeting of the Association for Computational Linguistics, pp. 207–212 (2016) 28. Vaswani, A., et al.: Attention is all you need. In: Proc. Neural Information Processing Systems (2017) 29. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 30. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308. 0850 (2013) 31. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. preprint arXiv:1409.0473 (2014) 32. Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Proc. Neural Information Processing Systems, pp. 577–585 (2015)
Dairy Cow Individual Identification System Based on Deep Learning Zhijun Li, Huai Zhang, Yufang Chen, Ying Wang, Jiacheng Zhang, Lingfeng Hu, Lichen Shu, and Lei Yang(B) Nanjing Tsingzhan Artificial Intelligence Research Institute Co., Ltd., Nanjing 211100, China [email protected]
Abstract. Personal record keeping, behaviour tracking, accurate feeding, disease prevention and control, and food traceability require the identification of dairy cows. This work proposes a unique identification that combines Mask R-CNN and ResNet101 to identify individual cows in milking parlours accurately. Using 265 Holstein cows in various positions, a facial image dataset of their faces was created using the milking hall’s webcam. The feature pyramid network-based Mask R-CNN instance segmentation model was trained to separate the cows’ faces from their backgrounds. The ResNet101 individual classification network was trained using the segmentation above data as input and cow individual numbers as output. Combining the two techniques led to the creation of the cow individual recognition model. According to experimental findings, the Mask R-CNN model has an average accuracy of 96.37% on the picture test set. The accuracy of the Resnet101based individual classification network was 99.61% on the training set and 98.75% on the validation set, surpassing that of VGG16, GoogLeNet, ResNet34, and other networks. The study’s suggested individual recognition model outperformed the combined effect of the YOLO series model and ResNet101 in terms of test accuracy (97.58%). Furthermore, it outperforms the pairing of Mask R-CNN with VGG16, GoogleNet, and ResNet34. This research offers precision dairy farming an excellent technical basis for individual recognition. Keywords: Deep learning · Computer vision · Individual cow individual identification · Mask R-CNN · ResNet101
1 Introduction The basic tendency in breeding dairy cows is large-scale, standardized, intelligent, and exact management [1]. Most dairy farms in China are still in the early stages of intelligent and exemplary management, and the level of automation and information scale, generally, could be a lot higher [2]. However, the scale and standardization of breeding have improved quickly. The basis for dairy product traceability, particularly in milking parlours, is the individual identification of dairy cows. It is also the most crucial component of genetic improvement and performance recording systems for dairy cows [3]. Radio Frequency Identification (RFID) technology is the most widely used technique © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 209–221, 2023. https://doi.org/10.1007/978-981-99-0617-8_15
210
Z. Li et al.
for identifying individual cows, although this technique has issues, including high input costs and efficiently breakable equipment [4]. Researchers domestically and internationally are interested in biometrics as a fundamental study in the area of deep learning. The animal husbandry field adopted deep learning and other computer technologies, which has sparked a lot of research [5–7]. The bidirectional Long Short-Term Memory Network (Bidirectional Long Short-term Memory Network) model was used to collect the dynamic activity of the cows, and the self-attention mechanism was included. Wang Xi [8] proposed a unique cow recognition method based on cow activity data and deep learning. Although there were few cows in the sample, the test set identification accuracy was 91.8%. Convolutional neural networks were employed by He Dongjian et al. [9] to extract the features of a cow’s back and trunk. However, they were best suited for identifying cows with prominent body features. In order to extract the features of the texture of the bovine nose for learning and representation, the method of Bello et al. of superimposing a denoising autoencoder and a deep trust network was proposed [10]. The recognition accuracy was 98.99%. However, the recognition process’ accuracy was falling due to the soiled bovine nose. In contrast to other body parts, the cow’s face has striking structural and textural characteristics, such as its eyes, nose, and mouth, making it helpful in distinguishing between different cow identities. In order to assess and contrast the effectiveness of several deep learning approaches for detecting cows’ faces, Yao Liyao et al. [11] built a dataset of more than 10,000 cows under various situations. However, this study identified bovine faces without identifying specific individuals. With an average accuracy of 96.40%, Xing Yongxin et al. [12] modified the SSD algorithm by fusing features of several feature maps to address the issue that the algorithm has trouble detecting overlapping objects. This study created a face image data set of 265 Holstein cows in a milking parlour environment using the Mask R-CNN instance segmentation model and the ResNet 101 image classification model to study the effects of various object detection and image classification models on the outcomes. This work aims to develop a model for identifying sure cows. In order to provide technical guidance for intelligent and precise breeding in a contemporary pasture setting, this model is employed as the input to forecast the individual cow number.
2 Image Acquisition of Dairy Cows The data of this study were collected at the dairy cow breeding base (37°52 N, 114°24 E), No. 178, Guoyuan Road, Yuanshi County, Shijiazhuang City, Hebei Province, with Holstein dairy cows. Figure 1 depicts the daily milking times of 4:00–6:00, 11:00– 13:00, and 18:00–20:00. A batch of 16 cows entered the milking area randomly through two openings, and there were 16 pieces of milking equipment at each entrance. When the cows were finished milking, they all evenly exited through the exit while the crew operated in the middle of the two milking machines.
Dairy Cow Individual Identification System Based on Deep Learning
211
Fig. 1. Dairy cow milking process
A schematic illustration of the in-place face images of dairy cows gathered in the milking parlour is depicted in Fig. 2. A Nikon camera with a horizontal field angle of 49.1°, an aperture size of F 2.0, a focal length of 6, and 1920 × 1080 ppimm imaging resolution serves as the picture acquisition tool. A bracket held the camera above the wall immediately across from the cow at 2.5 m and 3 m from the cow’s face to record footage.
Fig. 2. Face image acquisition of dairy cows
The videos were recorded at a rate of 15 frames per second, and they were blurry, highly obscured, and dimly illuminated. Other photos were eliminated to create a face image dataset of 265 Holstein cows with a total of 5492 images, of which 60% were utilized for training, 20% for validation, and 20% for testing.
212
Z. Li et al.
3 Individual Identification Methods of Dairy Cows 3.1 Overall Structure The face segmentation and individual classification models are the two components of the study’s proposed individual cow recognition model, as shown in Fig. 3. The input image is segmented using Mask-RCNN by the segmentation model, producing the precise image of the cow’s face with several masks. The result of the model above was used as the current input by the individual classification model to fit the various cows’ IDs using the ResNet101 model. The two models were trained independently. Following training, the cow image could be sent into the fused individual recognition model, which would output the associated individual number.
Fig. 3. Structure diagram of the individual identification model for dairy cows
3.2 Facial Segmentation Model The target detection network can locate the cow face in cow face recognition. However, it can only reliably get the image of the cow face with a backdrop. The cow face can be segmented at the pixel level using the semantic segmentation network, but the cow face’s position cannot be determined. It is challenging to complete both detection and segmentation tasks using this network model utilizing a single task. When using the combined model of detection and segmentation, there may also be some segmentation misjudgment issues brought on by pixels on the edge of the cow face. Alternatively, the detection box may only partially frame the cow face due to the detection network’s low accuracy, leading to incomplete segmentation results [13]. In this study, cow face segmentation was done using the Mask R-CNN technique to address the issues above. He et al. [14] proposed the Mask R-CNN. While object detection is being done, semantic segmentation can be accomplished by adding a Mask branch to the Faster
Dairy Cow Individual Identification System Based on Deep Learning
213
R-CNN network. Mask R-CNN uses the same two-stage detection method as Faster RCNN and is primarily composed of three modules: Faster R-CNN, RoIAlign, and Fully Convolutional Networks. A Region Proposal Network is created in the initial step, and feature extraction is accomplished using the Resnet-FPN architecture. The resampling method of Faster R-CNN, RoIPooling, is replaced in the second stage with the RoIAlign method. Each candidate receives a segmentation Mask, bounding box regression, and category prediction from Mask R-CNN. The loss function of Mask R-CNN L is expressed as: L = Lcls + Lloc + Lmask
(1)
where Lcls is the classification loss, Lloc is the bounding box regression loss, Lmask is the mask loss. For cows with faces of various sizes, multi-scale face prediction was made, and the Feature Pyramid Network was employed for multi-scale Feature learning. FPN uses top-down and bottom-up bidirectional multi-scale border prediction methods. These approaches may incorporate features from all levels, enabling them to simultaneously possess both vital semantic and spatial solid information [15]. The schematic diagram of the FPN network structure is shown in Fig. 4.
Fig. 4. Schematic diagram of the Feature Pyramid Network
3.3 Individual Classification Model Using the Mask R-CNN model, the exact shape of the cow face must be divided into segments before each segmented image can be independently identified. The classification assignment with the recognition can be thought of as grouping photographs of the same cow into one class and images of various cattle into multiple classes. The ResNet algorithm was employed in this work to categorize photos of dairy cow faces. He et al. [16] presented ResNet as a solution to the issue of loss and depletion in the information transmission process brought on by model deterioration. Convolution is implemented in deep networks by creating a short-circuit mechanism using the residual learning unit. With a Top5 error rate on ImageNet of 3.6%, ResNet took first place in the 2015 ImageNet Competition Classification challenges. ResNet can be divided into ResNet18, ResNet34, ResNet50, and ResNet101 based on different network layers. The individual classification model in this study is ResNet101.
214
Z. Li et al.
ResNet101’s network structure is depicted in Fig. 5, including the convolutional, pooling, and fully connected layers. The convolution layer is in charge of the input for extracting image features. The pooling layer reduces model-fitting probability after convolutional image feature dimension reduction, and the connection layer is in charge of the classification effect throughout the convolutional neural network. To a certain extent, it can map the distributed characteristics of online learning to sample tag space and maintain model complexity.
Fig. 5. The ResNet 101 network structure diagram
The main structure of ResNet101 consists of basic blocks of different groups, which are shown in Fig. 6.
Fig. 6. The ResNet base block
Dairy Cow Individual Identification System Based on Deep Learning
215
ResNet introduces a shortcut loop in the basic block to learn the residual function. Using mathematical notation to define the basic structure of ResNet basic blocks: yl = F(xl , wl ) + h(xl )
(2)
xl+1 = f (yl )
(3)
xl is the input of l residual unit, wl = {wl,k |1 ≤ k ≤ K} is the series weight of l residue unit, F representing the calculation process of the residue unit (excluding the ReLU part), h(xl ) = xl representing the pathway of shortcut, and f representing the activation function. To facilitate analysis, simplify the problem, ignore the activation function, and xl+1 = f (yl ) = yl , for arbitrary L: xl+1 = F(xl , wl ) + xl xL =
L−1
F(xl , wi ) + xl =
i−1
L−1
=
∂ε ∂xL ∂xL ∂xl
F(xl , wi ) + x0
(5)
i−0
In the case of back propagation can see: ∂ε ∂xl
(4)
=
∂ε ∂xL
1+
∂ ∂xl
L−1
F(xl , wi )
(6)
i−1
From Eq. (6), it is clear that the residual units can directly obtain sparse information during forward propagation. The gradient disappearance phenomenon is difficult to manifest during backpropagation, providing the theoretical underpinnings for the residual network’s efficient information extraction.
4 Experimental Results 4.1 Experimental Environment Experimental environment based on the Ubuntu18.04 operating system, the processor for the Intel (R) Core (TM) i9-7960X, faster 2.8 GHz, graphics for NVIDIA RTX A6000, memory size for 48 GB, running for 128 GB of memory, PaddlePaddle 2.3.0 was used for model training and testing. Python 3.7.10 was used as the programming language and version. At the same time, a visual tool was used to monitor each data index in real-time. 4.2 Evaluation Indicators In this study, mean (mAP) and Accuracy were used as evaluation indexes of segmentation and classification models respectively. The accuracy rate (AP) of all categories refers to the area enclosed by the curve and coordinate axes. mAP of all categories can be obtained by calculating the AP of all categories and taking its mean value. The accuracy rate is the ratio of the number of all
216
Z. Li et al.
correctly classified samples to the total number of samples, as shown in formula (7) to (11). P=
TP TP+FP
(7)
R=
TP TP+FN
(8)
1
AP = ∫ P(R)dR
(9)
0
mAP = Accuracy =
1 N
N
APi
(10)
i=1
TP+TN TP+FP+TN +FN
(11)
TP is the correct number of positive samples, FN is the wrong number of negative samples, FP is the wrong number of positive samples, TN is the correct number of negative samples; APi is the prediction accuracy of class i, N is the number of categories. 4.3 Performance Analysis of Segmentation Model Using a transfer-learning approach, the Mask R-CNN instance segmentation model was improved and retrained, and the adopted pre-trained model was trained on the COCO dataset. The learning rate was initially set to zero and decreased to zero at 2000 and 0.0001 at 2500 rounds during the training rounds, totalling 3000 rounds. Figure 7 displays the loss value change curves and mAP in the training procedure.
Fig. 7. Curve of sum loss values for mAP of Mask R-CNN
The performance of the model was assessed after training by comparing the predictions with the annotated data, and the plotted curves P-R are presented in Fig. 8. According to the figure, the average accuracy (AP) is 96.37%, and due to the number of categories annotated in this study, the average accuracy AP and the mean average accuracy mAP are 96.37%. Moreover, according to the P-R curve, the equilibrium point (Break-Event Point) is close to the (1.0, 1.0) point, and the Mask R-CNN model fits well to the sample data.
Dairy Cow Individual Identification System Based on Deep Learning
217
Fig. 8. The accuracy-recall curve
Figure 9 illustrates the model test in the data set for the rendering portion of the image. As can be seen from the graphic, the Mask R-CNN model can accurately discriminate between the cows’ faces and the background, making the segmentation of the cows’ faces more careful. However, the model’s segmentation of the cows’ ear tags could be more precise, with a few pixels being wrongly classified as background. However, the Mask R-CNN model still performs satisfactorily in detecting cow faces, indicating that it is appropriate for segmenting cow faces from their backgrounds and has strong detection performance for cow face features.
Fig. 9. Partial segmentation effect of cow face instance
218
Z. Li et al.
4.4 Performance Analysis of the Classification Models We train the classification network using photos, the accompanying unique identification number, and the single cow face output of the Mask R-CNN model from Sect. 4.3. The VGG16, MobileNetV3, ResNet34, and ResNet101 models were used for comparison experiments. One hundred epochs were trained based on the pre-trained model for the ImageNet dataset, and the experimental findings are displayed in Table 1. Under the premise of fixed learning rate lr = 0.01, by comparing the floating-point operations of different models, it can be seen that the VGG16 model has the largest amount of operations, including 15.3GFLOPS, followed by ResNet101, Res Net 34 and GoogLeNet. In terms of accuracy, ResNet18 has great advantages over VGG16 and GoogLeNet, while the ResNet101-based cow individual classification network performed the best, with the highest training and validation accuracy of 99.61% and 98.75%, respectively. Table 1. Performance comparison of different models (showing the best effect in bold) Model
Initial learning rate
Floating point operation
Training accuracy: /%
Validation accuracy of /%
VGG16
0.01
15.3 × 109
90.28
88.72
0.01
110.5 × 109
92.36
90.41
ResNet34
0.01
310.6 × 109
95.53
94.17
ResNet101
0.01
710.6 × 109
99.61
98.75
GoogLeNet
Figure 10 compares the training accuracy and loss values of various models. The ResNet101-based training loss has the lowest value, the fastest convergence speed, the highest accuracy, and the best performance, followed by the ResNet34 and VGG16 networks. The VGG16 network, on the other hand, has the most significant loss value and the slowest convergence pace. According to the findings above, the ResNet network is better suited than Google Net and VGG16 model for processing experimental data. Furthermore, when the network structure is deepened, the residual structure can further enhance model accuracy and speed up model convergence.
Dairy Cow Individual Identification System Based on Deep Learning
(a) Change curve of training accuracy of different models
219
(b) Change curve of training loss value of different models
Fig. 10. Plot of training accuracy/loss values changes for different models
4.5 Performance Analysis of the Fusion Model The test sample data in this study consisted of cow photos and their corresponding numbers from the test set, with accuracy employed as the overall evaluation index of individual identification. YOLOv3, YOLOv4, and YOLOX target detection models were chosen as this study’s cow face extraction part. The fusion with ResNet101 was utilized as the comparison test model to confirm this research’s effectiveness. Table 2 displays the test results. Table 2. Test Accuracies of the different model fusions Model
Test accuracy rate/%
YOLOv3 [17] + ResNet101
92.38
YOLOv4 [18] + ResNet101
94.35
YOLOX [19] + ResNet101
95.27
MaskR-CNN + VGG16
87.90
MaskR-CNN + GoogLeNet
90.15
MaskR-CNN + ResNet34
94.66
MaskR-CNN + ResNet101
97.58
The Mask R-CNN + ResNet101 model has the highest accuracy (97.58%), as seen in the table. The YOLOv3, YOLOv4, YOLOX, and ResNet101 fusion models all demonstrated lower accuracy on the test set than the Mask R-CNN + ResNet101 model, with the maximum difference being 5.20%. The cause is that the target detection model’s output is the instance categories and bounding boxes, while the images entered into the classification network are rectangular frames with backgrounds. While the instance segmentation model generates the instance’s bounding box, category, and mask, images
220
Z. Li et al.
entered into the classification network are face-profile plots of cows without a background, as illustrated in Fig. 11. By using the instance segmentation model, one can increase classification accuracy by reducing background impact on classification results.
(a) cow face effect diagram of (b) cow face effect diagram of target detection output output of instance segmentation Fig. 11. Face effects of dairy cows output from different models
Additionally, the models for facial recognition fused with Mask R-CNN and ResNet101 performed better than those fused with Mask R-CNN and VGG16, GoogLeNet, and ResNet, in that order. This shows that the residual network can enhance the model’s accuracy while deepening its structure. The test results demonstrate the model’s suitability for cow identification by an individual in a milking parlour.
5 Conclusion This study suggested a deep learning-based individual recognition model that incorporated the Mask R-CNN instance segmentation model and ResNet101 image classification model, along with the milking hall dairy cow dataset, and then trained and tested the model. The model’s goal was to address the individual recognition problem of dairy cows. These are the conclusions: (1) The segmentation of cow face and backdrop regions may be done using the Mask R-CNN model, which is based on the feature pyramid network and has good recognition accuracy in milking hall cow picture segmentation, with an average accuracy of 96.37%. (2) The individual cow face picture output was utilized as input, and the individual cow ID was used as an output to train the ResNet101 classification model using the Mask R-CNN model. The accuracy on the training set and validation set is 99.61% and 98.75%, respectively, according to the results, demonstrating that the ResNet101 model can increase classification accuracy while retaining network depth. (3) To create the individual cow recognition model, the trained Mask R-CNN model and ResNet101 model were combined. The recognition accuracy of the Mask R-CNN + ResNet101 model outperforms the combination effect of the YOLO series model and ResNet101 by 97.58%. It is also superior to the combination of the Mask R-CNN with VGG16, GoogLeNet, and ResNet34. This was determined by comparing and analyzing the outcomes of the combinations of the various models. The outcomes demonstrate that the individual cow recognition model suggested in this study consistently recognises cow images taken in milking halls.
Dairy Cow Individual Identification System Based on Deep Learning
221
Future research can decrease the computation amount of the model and develop an effective and lightweight network structure, even if the method in this study boosts recognition accuracy while also increasing the network computation amount in proportion.
References 1. Atthews, S.G., et al.: Early detection of health and welfare compromises through automated detection of behavioural changes in pigs. Vet. J. 217, 43–51 (2016) 2. He, D., Liu, D., Zhao, K.: Review of perceiving animal information and behavior in precision livestock farming. Trans. Chin. Soc. Agric. Mach. 47(5), 231–244 (2016) 3. Liu, J.: Individual Identification of Dairy Cows Based on Deep Learning. Northwest A & F University (2020) 4. Li, H.: A Study on the Proportion Schema for Dairy Cattle Feeding Based on IOT Techniques. Shanghai Jiao Tong University (2014) 5. Kumars, S., et al.: Real-time recognition of cattle using animal biometrics. J. Real-Time Image Process. 13, 505–526 (2016) 6. Ren, X., et al.: Dairy cattle’s behavior recognition method based on support vector machine classification model. Trans. Chin. Soc. Agric. Mach. 50, 290–296 (2019) 7. Xie, Q., et al.: Individual pig face recognition combined with attention mechanism. Trans. Chin. Soc. Agric. Eng. 38(7), 180–188 (2022) 8. Wang, X.: Research on Individual Identification Method of Dairy Cows Based on Activity Data and Deep Learning. Inner Mongolia University (2021) 9. He, D., et al.: Individual identification of dairy cows based on improved YOLO v3. Trans. Chin. Soc. Agric. Mach. 51(4), 250–260 (2020) 10. Bello, R., Talib, A.Z.H., Mohamed, A.S.A.B.: Deep learning-based architectures for recognition of cow using cow nose image pattern. Gazi Univ. J. Sci. 33(3), 831–844 (2020) 11. Yao, L., et al.: Comparison of cow face detection algorithms based on deep network model. J. Jiangsu Univ. Nat. Sci. Ed. 40(2), 197–202 (2019) 12. Xing, Y., et al.: Individual cow recognition based on convolution neural network and transfer learning. Laser Optoelectron. Prog. 58(16), 503–511 (2021) 13. He, R., et al.: Identification and counting of silkworms in factory farm using improved mask R-CNN model. Smart Agric. 4(2), 163–173 (2022) 14. He, K.M., et al.: Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision, October 22–29, 2017, Venice, Italy. IEEE Press, New York, pp. 2980–2988 (2017) 15. Lin, T.-Y., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017) 16. He, K., et al.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, pp. 1–12 (2016) 17. Redmon, J., Farhadi, A.: YOLOV 3: An Incremental Improvement. arXiv Preprint arXiv: 1804.02767 (2018) 18. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOV 4: optimal speed and accuracy of object detection. Comput. Vis. Pattern Recognit. 17(9), 198–215 (2020) 19. Ge, Z., et al.: Yolox: Exceeding Yolo Series in 2021. arXiv Preprint arXiv: 2107.08430 (2021)
Automatic Packaging System Based on Machine Vision Chunfang Liu(B) , Jiali Fang, and Pan Yu Faculty of Information and Technology, Beijing University of Technology, Beijing, China {cfliu1985,panyu}@bjut.edu.cn, [email protected]
Abstract. In the industrial production line, putting several small packages into large packages with the required arrangement rules still requires a lot of manual work to complete. In order to solve this problem, we build an automatic robot grasping system based on robotic arm, industrial camera and a uniform-speed conveyor, which can automatically put small packages into large packages. We use the YOLO algorithm to identify the position, category and number of small packages. The number of snacks identified is utilized to plan the degree of gripper closure. The trajectory of the small packages can be predicted according to the speed of the conveyor and the position of small packages on the robot coordinate system, then we fuse multiple predicted trajectories to form a complete and coherent robot arm trajectory to grasp the small packages and put them into large packages. This system can save a large amount of labor and reflect the intelligence of the robot.
Keywords: Object detection
1
· Industrial production · Robot grasping
Introduction
In recent years, with the development of robotics, robots have been used in some industry areas, such as food sorting, automatic packaging and so on. However, in most of these applications, one time the robot only operates one object. There are still many operations that needs to be completed manually, putting multiple small packages into large packages is one of the typical cases (See Fig. 1). Hence, it is a valuable and meaningful thing to research the key technologies in this tedious labor for robot replacing human. Meanwhile, it is also a challenging thing since the multiple small packages are moving in a high speed on the conveyor, which requires (1) accurate and fast visual detection and (2) high robot arm speed. In this work, aiming at the problem of the current robot grasping technologies only work on a single object, and the tedious operation of putting multiple small packages into a large package still requires lots of manual work. We design a system that completes the robot arm grasping multiple small packages with the required arrangement rules and it is suitable for industrial production line. The c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 222–233, 2023. https://doi.org/10.1007/978-981-99-0617-8_16
Automatic Packaging System Based On Machine Vision
223
Fig. 1. Industrial packaging line
system consists of a 6-degree-of-freedom robot arm, a uniform-speed conveyor, a Basler industrial camera and a PC. We use the YOLO algorithm to obtain the position, category and number of small packages, the dataset used in the algorithm is built by ourselves. Its high identification verifies its suitability. The number of small packages identified by YOLO algorithm is utilized to plan the degree of gripper closure, which is an important part of grasping multiple small packages. The system can predict the trajectory of small packages online through the position of small packages on the robot coordinate system and the speed of conveyor, then fuses multiple predicted trajectories as the trajectory of the robot arm to complete grasping [12]. Combining the object detection technology and trajectory planning technology, this system solves the problem of grasping the required number of small packages and putting them into large package on the industrial production line which still requires a lot of manual labor. The main contributions of this paper are as follows: (1) We develop a detection model which can detect the category, position and number of small packages based on YOLOv5. The results are utilized to plan the degree of gripper closure and predict the trajectory of small packages. Then fusing predicted trajectories of small packages as the trajectory of the robot arm to complete grasping multiple small packages and put them into a large package. (2) An automatic packaging system is built based on the detection model and trajectory planning.
2
Related Work
Robotic grasping has been extensively studied and has evolved significantly over the past 20 years. [8] developed an algorithm that can grasp moving target objects off the conveyor. Later, with the rapid development of robots, [9,10,21] and several other studies have achieved accurate grasping of target objects, but these technologies only work on a single object, there is relatively little research has been done on the grasping of several target objects in the industrial production line. For object
224
C. Liu et al.
detection, with the advent of Fater R-CNN [17], the target object can be recognized. Later, the YOLO [16] emerged, the detection of target objects became faster. For path planning, the work in [1] involves selecting a moving object while also avoiding collisions with static obstacles. A framework was provided to solve high-dimensional planning problems in [4]. [7] presented the HBF planning algorithm can effectively build long manipulation plans in a cluttered environment. [18] proposed an asymptotic optimal operation planning method. A minimal time algorithm for intercepting a target on a conveyor by a robot is proposed in [19], which introduces a time function for robot and object arrival. Cowly et al. [5] proposed a method based on ROS, 3D perception and robotic arm motion planning, which allows a general-purpose robot to perform pick-and-place operations on a moving work surface. A path planning technique was proposed in [2] and the algorithm was validated on a circular conveyor. A search-based kinematic motion planning algorithm that matches the velocity of an object throughout the grasping motion is presented in [15]. [3,14] used optimal algorithm with traditional robot time. Su Tingting et al. [20] proposed a dynamic object grasping based on the Ferrari method, which is accurate and fast in calculating the grasping position and very easy to use. [6] proposed a new dynamic grasping method for Delta robot grasping control, but the results are not very good and the grasping success rate is very low. However, none of the above work deals with the design of automatic packaging systems, the real-time tracking and accurate grasping of multiple target objects on a conveyor in an industrial production line.
3
Method
We propose an automatic packaging system to identify and grasp small packages and put them into large package, which is suitable for the industrial production line. The flowchart of system is shown in Fig. 2. The next three subsections describe the methods used in the system, including the method of object detection, coordinate system conversion and robot arm trajectory planning. 3.1
Object Detection
Object detection algorithms have made great breakthroughs in recent years and can be divided into two categories, one is two-stage, which needs to use RPN first and do classification and regression on Region Proposal. The other category is one-stage, YOLO is one of them, which can accurately obtain the coordinate and class probability of object from 2D image. YOLOv5 is an improved algorithm based on YOLOv4, the improvements include mosaic data enhancement on the input side, adaptive anchor frame calculation, adaptive image scaling operation and so on. These improvements make the speed and accuracy of object detection greatly improved. The network predicts 4 coordinates for each bounding box, tx , ty , tw , th . If the cell is offset from
Automatic Packaging System Based On Machine Vision
225
Fig. 2. The flowchart of the developed system
the top left corner of the image by (cx , cy ) and bounding box prior has width and height pw , ph , then the predictions correspond to: bx = 2σ(tx ) − 0.5 + cx
(1)
226
C. Liu et al.
by = 2σ(ty ) − 0.5 + cy
(2)
bw = Pw (2σ(th ))2
(3)
bh = Ph (2σ(tw ))
(4)
2
Considering the real-time of the system, this work adopts the YOLOv5 algorithm to detect the category, position and number of small packages. The dataset is built by ourselves (see Fig. 3), Fig. 3(a)(b) are pictures of different arrangements in a strong light environment, while Fig. 3(c)(d) are pictures in a weak light environment.
Fig. 3. Dataset
The results of detection are shown in Fig. 4, each small package can be accurately identified. And we take the average of small packages’ center pixel coordinates as the coordinate of multiple small packages. For example, the pixel coordinate of the first small package from left to right is (x1 , y1 ), the coordinate of the second small package is (x2 , y2 ), the coordinate of the nth small package is (xn , yn ). So the coordinate that should be taken is as follows:
Automatic Packaging System Based On Machine Vision
x=
n 1
x
(5)
n n
1 y n where n represents the number of small packages recognized.
y=
227
(6)
Fig. 4. Identification results
The gripper used on the robot arm is a two-finger Robotiq gripper. The closure degree of the gripper is adjusted according to the required number of small packages as shown in Fig. 5. The gripper controls closure by inputting the number 0∼255, always fully open at 0, at 255 completely closed. After experiments, the gripper can grasp 6 small packages at most, so the number of small packages placed on the conveyor at a time cannot exceed six. And we found that the number of small packages is linearly related to the input value of the gripper, and the expression is as follows: q = α − 34(n − 1)
0 0 is the elasticity coefficient. The friction between the rigid body and the wall is in the opposite direction of the relative move in the normal plane and its magnitude is proportional to the contact force N , which is f = μN
(14)
where μ ∈ (0, 1) is the coefficient of kinetic friction. Then, we can obtain the external force FW ∈ R3 acted on the contact point in the hit coordinate system, which is composed of the contact force N and the friction f . Also, we assume that the external moment τW ∈ R3 acted on the hit point is 0. Then we can obtain the external hit force and moment acted on the CoM of the aircraft: RTH 0 FW FH = (15) τH τW S(d )RTH I where RH ∈ SO(3) is the rotation matrix of the aircraft relative to the hit coordinate system based on the hit normal vector and here I is the three-dimensional identity matrix. FH and τH are parts of fe &τe or Fe in the dynamic equation of the aircraft Eq. 3 and 4 or Eq. 7. In the collision process focusing on the mechanical response, as the deformation increases, the contact force acted on the aircraft increases from 0 to the maximum, and thus the aircraft is slowed down. Then the contact force decreases from the maximum and the aircraft is pushed in the opposite direction of the wall, and thus the deformation decreases which results in smaller contact force until the aircraft is pushed out of the contact. The parameters of both contact models can be modified. Also, AeroBotSim supports custom collision models based on different collision dynamic models through contact APIs.
4 Experiments 4.1 Simulation of Baseline Trajectory Linearization Controller In this part, we implement a baseline trajectory linearization controller (TLC) in a ROS node to validate the control loop of AeroBotSim. We use the same controller node in simulation and in real flight experiment. The closed loop system is shown in Fig. 5.
AeroBotSim: A High-Photo-Fidelity Simulator for Heterogeneous Aerial Systems
283
Fig. 5. The control system block.
We adopt a Lyapunov-based control design. To design control input U in Eq. 7, a non-negative Lyapunov potential is given by: k RT e I 0 T x 0 xT ∨ ∈ R6 ∇ T = (16) −S(d ) I kr Rd R0 − R0 Rd
where ex and RTd R0 − RT0 Rd are position error and attitude error respectively. kx and kr are 3-dimensional positive parameter diagonal matrices for position error and attitude error. Mapping (×)∨ : R3×3 → R3 changes a 3D skew-symmetric matrix to a 3D vector, which is the reverse mapping of S(×). Then, the control input U is given by U = M ξ˙d + Cξd − kv eξ − ∇ T + G − Fe
(17)
where ξd is the desired linear velocity and angular velocity vector corresponding to ξ in Eq. 7. eξ = ξ − ξd with the damping gain matrix kv . The result of reproducing a real flight trajectory tracking is shown in Fig. 7.1 We record command messages and state messages during a real flight with our prototype aerial robot in a rosbag. Real flight experiment is shown in Fig. 6. And we directly replay input commands from rosbag into ROS network and simulate with our simulator. Except for oscillations caused by disturbances in real outdoor flight, simulation and real flight share the same trends when there are changes in the trends of commands.
Fig. 6. Real outdoor flight experiment.
1 https://youtu.be/7Ju_dlErOJU.
284
J. Du et al.
Fig. 7. Comparison between real flight and simulation when executing the same trajectory commands.
4.2 SLAM using RGB images AeroBotSim can set locations, FOV angles, pixels, and number of cameras and send image data to ROS network through socket. An example of images captured from AeroBotSim is shown in Fig. 8. We can obtain three images at 18 Hz and one image at 20 Hz in an office scene shown in Fig. 8.
Fig. 8. Three images captured simultaneously from child drones looking in different directions. Our test laptop has an i7-11800H @ 2.30 GHz processor and an NVIDIA GeForce RTX 3070 Laptop GPU.
To validate our image API, we subscribe simulated image messages and IMU sensor message published in ROS and run a SLAM algorithm node, VINS-mono [24]. Then, we obtain estimated position shown in Fig. 9 from a moving aircraft. Estimated position obtained from VINS-mono is basically consistent with real trajectory in simulation.
Fig. 9. Comparison between result of VINS-mono and real trajectory.
AeroBotSim: A High-Photo-Fidelity Simulator for Heterogeneous Aerial Systems
285
4.3 Impedance Controller In this part, we implement an impedance controller [25] for the multi-quadrotor aircraft to verify the contact interfaces. The impedance controller, which acts on the outer loop of baseline pose controller, accepts external force and moment as inputs and then revises the trajectory target to adapt to external force and moment. The impedance control system is shown in Fig. 10.
Fig. 10. The impedance control system.
The position admittance control law of the robot is given by
Fext = M P¨ r − P¨ d + D P˙ r − P˙ d + K(Pr − Pd )
(18)
where Fext ∈ R3 is the external force acted on the CoM of the aircraft in the world frame. Pd , Pr ∈ R3 are the original trajectory target and the revised trajectory target respectively. M , D, K are positive definite diagonal parameter matrices, representing mass, damping, and spring parameters respectively. We test the simulator using a constrained multi-quadrotor aircraft with a rod, as shown in Fig. 4. 2 The tooltip is at the end of rod with a force/torque sensor. And the
Fig. 11. Simulated position and contact force while approaching, moving along, and halting against the wall. 2 https://youtu.be/JhFNrjos9v8.
286
J. Du et al.
position of the tooltip in the center-body-fixed frame is xt = (0; 0; −1). And the wall is 3 m back from the starting point of the aircraft. First, we fly the aircraft back until hitting the wall from 5 s to 7 s. Then, after holding still for a while, we move the aircraft along the wall from 17 s to 19 s. Results are shown in Fig. 11. When the tooltip hits the wall, we obtain simulated contact force proportional to deformation and friction force proportional to contact force. The admittance controller revises target trajectory to adapt to external force.
5 Conclusion AeroBotSim offers high-fidelity scenes with realistic photos and contact simulation. AeroBotSim also supports multi-configuration aircraft not limited to multirotors. AeroBotSim provides APIs to ROS to utilize and test existing and new algorithms. AeroBotSim also provides APIs to image capture and contact behavior simulation, which is necessary for vision-based algorithms and manipulation tasks. AeroBotSim provides low-cost and safe test solution to complex aircraft and opens up more possibilities in control, vision, operation, etc. Also, due to the modular design, aircraft models are easy to duplicate to run tests in parallel in AeroBotSim, which helps training learning-based algorithms in it. Besides, based on our modular design, we can integrate specific interfaces to popular physics simulator to obtain more realistic dynamics response, even non-rigid body dynamics, such as Gazebo, RaiSim [26], NVIDIA PhysX, Chaos [27], etc. In future work, we will integrate more types of sensors such as depth camera, point clouds in AeroBotSim.
References 1. Pan, S.J., Yang,Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191 2. Bagnell, J.A.D.: An Invitation to Imitation. Tech. rep., Carnegie Mellon University (Mar 2015) 3. Kober, J., Peters, J.: Reinforcement learning in robotics: a survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning: State-of-the-Art, pp. 579–610. Springer, Berlin Heidelberg, Berlin, Heidelberg (2012) 4. Hwangbo, J., Lee, J., Hutter, M.: Per-contact iteration method for solving contact dynamics. IEEE Robot. Autom. Lett. 3(2), 895–902 (2018). https://doi.org/10.1109/LRA.2018.2792536 5. Guerra, W., Tal, E., Murali, V., Ryou, G., Karaman, S.: FlightGoggles: photo-realistic sensor simulation for perception-driven robotics using photogrammetry and virtual reality. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6941– 6948 (2019) 6. Foster, C., Pizzoli, M., Scaramuzza, D.: SVO: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 15–22 (2014) 7. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: Orb-slam: a versatile and accurate monocular slam system. IEEE Trans. Robot. 31(5), 1147–1163 (2015). https://doi.org/10.1109/TRO. 2015.2463671
AeroBotSim: A High-Photo-Fidelity Simulator for Heterogeneous Aerial Systems
287
8. Zhang, W., Ott, L., Tognon, M., Siegwart, R.: Learning variable impedance control for aerial sliding on uneven heterogeneous surfaces by proprioceptive and tactile sensing. arXiv e-prints p. arXiv:2206.14122 (2022) 9. Ruggiero, F., Lippiello, V., Ollero, A.: Aerial manipulation: a literature review. IEEE Robot. Autom. Lett. 3(3), 1957–1964 (2018). https://doi.org/10.1109/LRA.2018.2808541 10. Unreal Engine: https://www.unrealengine.com/. Last accessed 26 Aug 2022 11. ROS – Robot Operating System: https://www.ros.org. Last accessed 26 Aug 2022 12. Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3, pp. 2149–2154 (2004) 13. Juliani, A., et al.: Unity: A general platform for intelligent agents. arXiv e-prints p. arXiv: 1809.02627 (2018) 14. Kohlbrecher, S., Meyer, J., Graber, T., Petersen, K., Klingauf, U., von Stryk, O.: Hector open source modules for autonomous mapping and navigation with rescue robots, pp. 624–631. Springer, Berlin Heidelberg, Berlin, Heidelberg (2014). roboCup 2013: Robot World Cup XVII 15. Furrer, F., Burri, M., Achtelik, M., Siegwart, R.: RotorS—a modular Gazebo MAV simulator framework. In: Koubaa, A. (ed.) Robot Operating System (ROS): The Complete Reference (Volume 1), pp. 595–625. Springer International Publishing, Cham (2016) 16. Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: High-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics, pp. 621–635. Springer International Publishing, Cham (2018) 17. PhysX|GeForce: https://www.nvidia.cn/geforce/technologies/physx/. Last accessed 26 Aug 2022 18. Song, Y., Naji, S., Kaufmann, E., Loquercio, A., Scaramuzza, D.: Flightmare: A flexible quadrotor simulator (2021) 19. Dai, X., Ke, C., Quan, Q., Cai, K.Y.: RFlySim: Automatic test platform for UAV autopilot systems with FPGA-based hardware-in-the-loop simulations. Aerosp. Sci. Technol. 114, 106727 (2021). https://doi.org/10.1016/j.ast.2021.106727 20. Unreal Engine 5 – Unreal Engine. https://www.unrealengine.com/unreal-engine-5. Last accessed 28 Aug 2022 21. Brandt, J., Deters, R., Ananda, G., Selig, M.: UIUC propeller database, University of Illinois at Urbana-Champaign. http://m-selig.ae.illinois.edu/props/propDB.html (2015) 22. Nguyen, H.N., Park, S., Park, J., Lee, D.: A novel robotic platform for aerial manipulation using quadrotors as rotating thrust generators. IEEE Trans. Robot. 34(2), 353–369 (2018). https://doi.org/10.1109/TRO.2018.2791604 23. Zilles, C., Salisbury, J.: A constraint-based god-object method for haptic display. In: Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots, vol. 3, pp. 146–151 (1995). https://doi.org/10. 1109/IROS.1995.525876 24. Qin, T., Li, P., Shen, S.: Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans. Robot. 34(4), 1004–1020 (2018). https://doi.org/10.1109/TRO.2018. 2853729 25. Hogan, N.: Impedance control: an approach to manipulation. In: 1984 American Control Conference, pp. 304–313 (1984). https://doi.org/10.23919/ACC.1984.4788393 26. Hwangbo, J., Lee, J., Hutter, M.: Per-contact iteration method for solving contact dynamics IEEE Robotics and Automation Letters 3(2), 895–902 (2018), www.raisim.com 27. Chaos Physics Overview|Unreal Engine Documentation. https://docs.unrealengine.com/5.0/ zh-CN/physics-in-unreal-engine/. Last accessed 26 Aug 2022
Trailer Tag Hitch: An Automatic Reverse Hanging System Using Fiducial Markers Dongxi Lu1,2 , Wei Yuan3(B) , Chaochun Lian4 , Yongchun Yao4 , Yan Cai4 , and Ming Yang1,2 1
Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China 2 Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China 3 University of Michigan - Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China [email protected] 4 SAIC-GM-Wuling Automobile Co., Ltd., Liuzhou, Guangxi 545007, China
Abstract. Unmanned tractor-trailer vehicles are widely used in factory transportation scenarios. However, the trailer hitching process is still manually operated. The automatic trailer hitching is the precondition of fully unmanned logistics. Existing auto hitching methods directly detect the trailer or coupler features, making them hard to generalize and deploy to various types of trailers and couplers. To address this problem, this paper proposes a trailer hitch system using fiducial markers. The system is divided into two modules: hitch coupler pose estimation and visual servoing. An algorithm based on AprilTag detection is used for hitch coupler pose estimation. The pose messages guide the tractor to reverse to the trailer. An algorithm based on decoupled lateral-longitudinal control is used for visual servoing. The proposed system is experimented on 4 different tractor-trailer vehicles with 254 tests under various conditions. Large scale test condition variations include initial position and orientation, light illumination, indoor/outdoor scenario, and fiducial marker damage. Overall success rate of 95% shows that the proposed system is robust to environment variation and different kinds of trailers. Keywords: Trailer hitching system marker
1
· Hitch pose estimation · Fiducial
Introduction
The tractor and trailer vehicles are widely used in transportation scenarios. The front tractor has an active steering wheel. The rear trailer has no active power, and is towed by the tractor. The tractor and trailer are connected by tractor tow pole and trailer hitch coupler, see Fig. 1. Tractor-trailer vehicles are flexible, efficient and easy to expand. Thus, they have been widely used in scenarios such as factory logistics and airport baggage transportation. The c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 288–300, 2023. https://doi.org/10.1007/978-981-99-0617-8_20
Trailer Tag Hitch: An Automatic Reverse Hanging System
289
research of tractor-trailer unmanned vehicles is important to promote the of automatic vehicle application. Although unmanned tractor trailer vehicles have been widely used, the hitching process between the tractor tow pole and the trailer hitch coupler is still manually operated. An automatic trailer hitching system helps to ensure safety, promote transportation efficiency, and reduce labor costs.
Fig. 1. Tractor trailer vehicle.
This paper focuses on the automatic reverse hanging problem, or trailer hitching problem: the front tractor reverse back to the stationary trailer, to dock the tractor tow pole to the trailer hitch coupler. There are two main aspects in trailer hitching system: perception and control. The purpose of perception is to detect the accurate pose of the trailer hitch coupler w.r.t. the tractor tow pole. Control aims to drive the tractor to the trailer precisely. This paper proposes Trailer Tag Hitch: an automatic reverse hanging system using fiducial markers. Visual fiducial markers are man-made landmarks for fast recognition and localization [7]. Current perception methods are limited to certain types of trailers. We aim to realise robust hitching on different types of trailers and couplers via fiducial markers. The contributions of this paper are: (1) Design a system using fiducial markers for trailer hitching. To the best of our knowledge, we are the first to utilize fiducial markers in the task of trailer hitching. (2) Develop a hitch coupler pose estimation method based on AprilTag, and a visual servo method based on decoupled lateral-longitudinal control. (3) Perform various experiments under different test conditions to validate the system. We conduct 254 experiments on 4 tractor-trailer vehicles, with 95% success rate. Experiments show that the system is robust to distance, position, angle, indoor/outdoor scenario, light illumination and various fiducial marker conditions. The proposed system can be easily deployed and transplanted to multiple kinds of trailers and couplers.
2
Related Works
Automatic reverse hanging is a hot research topic in recent years. For detecting the trailer, two main methods are image-processing-based and deep-learningbased. Image-processing-based methods are shown in this paragraph. RamirezLlanos E et al. [9] developed automated reversing to trailer method based on one
290
D. Lu et al.
rear view fish-eye camera. Discriminative Correlation Filter - Channel Spatial Reliability (DCF-CSR) is used to detect the hitch coupler. For visual servo, a proportional controller guides the tractor to reverse to the trailer. However, this trailer assist system is still at low level of automation. The operator needs to select a trailer bounding box on the human-machine interface. Liu Y et al. [6] designed a Time-of-Flight (ToF) Camera based detection method. They detect trailer vehicle from 2D color image. The 3D pose of the hitch is obtained by using the depth image from the ToF camera. However, the ToF camera suffers from sunlight pollution, thus it can not work in outdoor or semi-outdoor environments. Moreover, ToF cameras are more expensive than monocular cameras. Deep-learning-based methods use neural networks for the trailer/coupler detection. Dahal A et al. [3] designed a Convolutional Neural Network (CNN). The CNN detects the trailer vehicle, its hitch coupler and towing ball from fish eye camera image. Atoum Y et al. [2] developed a Multiplexer-CNN based method to detect coupler using a monocular camera. The deep-learning based methods require high computational resources, restricting its deployment to embedded devices. Moreover, deep-learning methods are hard to debug, restricting its transferability.
Fig. 2. Different types of trailers and hitch couplers [1, 11].
Above mentioned methods detect the trailer or hitch coupler directly from their color-space or 3D-space feature. However, trailers and hitch couplers varies extensively in practical use, see Fig. 2. Those methods could lose the trailer or coupler feature if the appearance or shape is different. Thus, it is hard for those direct-detect methods to be generalized to multiple types of trailers and hitch couplers. To address this problem, this paper propose a detecting method based on fiducial markers. Due to accuracy and robustness, fiducial markers have been widely used in 3D pose estimation [4] and camera calibration. AprilTag [5,7,12] is an important fiducial marker series proposed by Michigan University. Bernd Pfrommer et al. [8] used AprilTag for Simultaneous Localization And Mapping. Andrew Richardson [10] et al. used AprilTag for camera calibration. Despite many applications of AprilTag fiducial markers, no one has used it in trailer hitch coupler pose estimation. This paper introduces fiducial marker detection into hitch coupler pose estimation. This method takes full advantage of the convenience, flexibility, and robustness of fiducial marker detection. The
Trailer Tag Hitch: An Automatic Reverse Hanging System
291
proposed trailer hitch system can be easily deployed and transplanted to multiple kinds of trailers and couplers.
3
Method
Proposed trailer hitching system can be divided into two main modules: perception (pose estimation) and servo, see Fig. 3.
Fig. 3. The framework of Trailer Tag Hitch system.
3.1
Hitch Coupler Pose Estimation
This module aims to get the pose transform from the tractor tow pole to the trailer hitch coupler. Their transform relation is shown in Fig. 4.
Fig. 4. Transform relationship among the tow pole (tp), camera (c), fiducial marker (m), and hitch coupler (hc).
AprilTag [12] detection is one major procedure of pose estimation. The main steps of utilized AprilTag detector is shown in Fig. 5. Adaptive thresholding is used to binarized the image to black and white regions. Regions with not enough contrast are removed. The union-find algorithm segments the connected black and white components. The same border pixels are clustered together. Each cluster is used to detect lines and corner points for fitting a quad. Valid quads are checked with tag encodings. Finally, tag detection and pose estimation are obtained.
292
D. Lu et al.
(a) Original image
(b) Adaptive thresholding
(d) Detected quads
(c) Segmentation
(e) Tag detection
Fig. 5. Intermediate steps of the AprilTag detector. (a) The original image can be colored or gray scaled. (b) The binarized image after adaptive thresholding. (c) The black and white regions with connected boundaries are segmented as different components. (d) The border pixels of the same black and white regions are clustered together. Border pixel clusters are used to fit quads. (e) Final valid tag detections. (Color figure online)
From the detector above, we can get the pose transform from the camera to the fiducial marker: m m m c R c p (1) c T = 0 1 m where m c R is the rotation matrix from the camera to the marker, and c p is the position vector from the camera to the marker. Besides the 3D pose estimation from the camera to the fiducial marker m c T, we also need two fixed pose transforms, namely camera w.r.t. tow pole ctp T , and hitch coupler w.r.t. marker hc m T , see Fig. 4. These two fixed transforms need to be calibrated only once. Finally, combining all three transforms above, we can estimate the 3D pose from the tractor tow pole to the trailer hitch coupler: hc tp T
=
hc m c m T c T tp T
(2)
Figure 6 shows the hitch coupler pose estimation result of camera frame, AprilTag frame, and trailer hitch coupler 3D bounding box. 3.2
Visual Servoing
In the previous pose estimation module, the pose transform from the tractor tow pole to the trailer hitch coupler is obtained. The 3D pose contains the error
Trailer Tag Hitch: An Automatic Reverse Hanging System
(a) Camera image
293
(b) Hitch coupler pose estimation result
Fig. 6. Hitch coupler pose estimation result.
messages including longitudinal distance error, lateral distance error, and lateral angle error. Those errors are used for visual servoing. Figure 7 shows the block diagram of the control module. The first layer controller computes desired speed, brake, and steer. The second layer Proportion Integration Differentiation (PID) controller computes acceleration, brake torque, and steer wheel torque as input for servo motors. This subsection focuses on the first layer controllers.
Fig. 7. Block diagram of control module.
Lateral Steer Control. Pure proportional control is utilized in lateral steer control for computing steering wheel angle. Lateral distance error and lateral angle error are two main inputs: δdesired = kdist ∗ errdist + kangle ∗ errangle
(3)
where δdesired is the desired steer; errdist is the lateral distance error; errangle is the lateral angle error; kdist and kangle are the proportional coefficients.
294
D. Lu et al.
Longitudinal Speed Control. This part computes desired speed and brake. For precise hitching, the vehicle slows down the reversing speed as it gets closer to the target. ⎧ dhigh ≤ errlong ⎪ ⎪ vhigh , ⎨ vmedium , dlow ≤ errlong < dhigh vdesired = (4) vlow , dstop ≤ errlong < dlow ⎪ ⎪ ⎩ 0, errlong < dstop where vdesired is the desired speed, errlong is the longitudinal distance error, dhigh and dlow are the distance threshold for high speed and slow speed, respectively, dstop is the distance threshold for stop. Note that the large steer together with relative large speed could cause the tractor vehicle to swing rapidly. To address this problem, an additional constraint is added to longitudinal speed control: vlow , |δdesired | > δlarge (5) vdesired = vdesired , otherwise where δlarge is the large steer angle threshold.
4
Experiment
We conduct multiple experiments under various test conditions, to verify the effectiveness and robustness of the proposed system. The video of the auto trailer hitching experiments is shown with this following link: https://www.bilibili.com/ video/BV1Ye4y1b74v/. 4.1
Experiment Setup
Tractor. The tractor is an electrically operated tricycle, as is shown in Fig. 8. For motion control of the tractor, the tractor itself is equipped with servo motors for accurate steer, speed and brake control. A modular USB camera is installed at the back end of the tractor, see Fig. 8. The horizontal Field of View angle of the camera is 100 ◦ C. The camera has maximum 2592 × 1944 resolution. In the experiments, only 1280 × 720 resolution is used since that is enough for pose estimation. The camera intrinsic parameters are calibrated for AprilTag detection. An industrial personal computer is mounted on the tractor for electrical operation. The proposed system is implemented in C++ based on Robotic Operation System (ROS). The experiments were run on a 2.80 GHz Intel Core i7-6700T processor with 16 GB RAM. The perception node for hitch coupler pose estimation runs with more than 15 fps on the computer. The proposed method was implemented on four different tractors. We conduct tractor tow pole w.r.t. camera extrinsic parameter calibration for each of the tractors.
Trailer Tag Hitch: An Automatic Reverse Hanging System
295
Fig. 8. Tractor vehicle.
Trailer. The trailer has no active steering power, and is towed by the tractor, as is shown in Figs. 1, 2 and 11. The advantage of using AprilTag fiducial markers is their robustness to different kinds of trailer vehicles and hitch couplers. The experiments test the proposed method on different types of trailers to verify the robustness. The AprilTag fiducial marker is mounted at the front end of each trailer. The markers are printed on A4 papers, with a size of 10 cm. We conduct hitch coupler w.r.t. AprilTag extrinsic parameter calibration for each of the trailers. Test Conditions. To verify the effectiveness and robustness of the proposed method, 254 experiments on 4 different tractor-trailer vehicles are conducted at multiple scenarios. Variant test conditions are show in Fig. 9. The overall success rate reaches 95%.
Fig. 9. Multiple test conditions.
296
D. Lu et al.
4.2
Case Study
A typical trailer hitching process is described as follows. The tractor first adjust its rear to the trailer hitch coupler, and slowly reverse to the coupler. In the reversing process, the tractor manipulates steering angle according to the coupler pose w.r.t. the tow pole. Figure 10 shows error and control commands in one whole trailer hitching process. The main reversing step is from 40 s to 85 s. It can be shown from the figure that the lateral distance error and lateral angle continue to reduce, despite of the small fluctuation. The longitudinal distance error decreases smoothly as the tractor reverses. All three error terms converge to almost zero at the end of hitching. The speed command and steer command are bounded, and converge to zero as the hitching process go through. The speed and steer command feedback follows the control command well, respectively.
Fig. 10. Case study: error and control commands. Units: angle error (rad), distance error (m), speed command (cm/s), steer command (122.2 deg).
4.3
AprilTag Detection and Pose Estimation
In this subsection, we test the AprilTag detection and pose estimation method in different conditions, including light illumination, indoor/outdoor scenario, and fiducial marker damage. Figure 11 shows that the AprilTag can be detected in sunny, cloudy, outdoor, semi-outdoor and indoor light. Figure 12 shows that the detection is robust to possible marker conditions such as ripped off corner, scratches, dirt, and small size. In general, the maximum tag detection distance is about 7 m.
Trailer Tag Hitch: An Automatic Reverse Hanging System
(a) Outdoor sunny
(b) Semi-outdoor
(d) Cloudy
297
(c) Indoor
(e) Light shadow
Fig. 11. Different light illumination in outdoor, semi-outdoor and indoor scenarios. The typical outdoor illuminance is 5000 lx. Indoor illuminance is 200 lx. Semi-outdoor illuminance is 2500 lx.
(a) Normal
(b) Ripped off (c) Scratches corner
(d) Dirt
(e) Small size
Fig. 12. Different fiducial marker damages simulating common factory conditions.
The accuracy of pose estimation is also verified. We want to estimate the pose transform from the tractor tow pole to the trailer hitch coupler. Since the ground truth of pose transform is hard to measure, we decide to check the error at zero position. We manually move the tow pole and the hitch coupler at the same position, and check whether the pose estimation output is zero. In most cases, the output is within 2 cm. Robust AprilTag detection and pose estimation allow us to use the perception result in real vehicle field test. 4.4
Hitching Success Rate and Accuracy
To test the visual servoing in field, we design different initial pose setting of the tractor. The longitudinal distance, lateral distance and heading angle are varied, as is shown in Fig. 13. We focus on the success rate and final hitching accuracy in this subsection.
298
D. Lu et al. straight alignment
negative (20 deg)
lateral 0.68, 1.0 , 1.7m
positive (20 deg)
longitudinal 2, 3, 5m
right position (20 deg) center position
trailer
left position (20 deg)
Fig. 13. Different settings of longitudinal distance, lateral distance and heading angle.
We conduct 254 experiments under different test conditions on 4 tractortrailer vehicles. The differences between test conditions are show in Figs. 11, 12 and 13. Overall success rate of 95% verifies the feasibility of deploying the framework to factory real production environments. Tests over multiple conditions show that our proposed method is robust to variations including longitudinal distance, lateral position, orientation, scenario, light, and fiducial marker. 13 out of 254 tests fail to hitch the tractor tow pole to the trailer coupler. Major reasons are that the heading angle between tractor and trailer is over 30 ◦ C, causing that the camera loses sight of the edge part of AprilTag. Figure 14 shows the final error distribution of 64 manually-measured-error tests. It can be seen that most of the final errors are within 3 cm. This accuracy meets the requirement of trailer hitching. Moreover, most points distributes very near to the origin point, showing the effectiveness of the perception and servo method. Lateral and longitudinal error histogram is also shown in Fig. 14. The lateral error distribution leans a little to positive direction. The reason could be the composition of several calibration errors. The longitudinal error distribution is nearly a standard. The longitudinal error distribution has peek at around zero, showing that speed and brake control work well at the end of hitching.
Fig. 14. Final hitching lateral and longitudinal error.
Trailer Tag Hitch: An Automatic Reverse Hanging System
299
Table 1 shows the lateral and longitudinal error mean, root mean square (rms) and standard deviation over multiple test conditions. It shows that the overall lateral and longitudinal rms are both less than 2 cm. The longitudinal error grows as the longitudinal distance increases. The lateral error is best at center lateral position and aligned heading angle. Indoor scenario is inferior to outdoor and semi-outdoor on lateral rms. The reason could be inadequate light illumination in indoor scenarios. Scratched and small fiducial markers are difficult for lateral error perception. Table 1. Lateral and longitudinal error statistics over different test conditions. Lat is lateral; long is longitudinal; rms is root mean square; std is standard deviation. All units are cm. Overall
Long dist-2 m
Long dist-3 m
Long dist-5 m Lat pos-center Lat pos-left
Lat pos-right Angle-aligned Angle-positive Angle-negative
Lat rms (cm)↓ Lat mean (cm) Lat std (cm)↓ Long rms (cm)↓ Long mean (cm) Long std (cm)↓
1.904 1.000 1.620 1.285 0.043 1.284
2.043 1.181 1.667 1.109 −0.069 1.107
2.000 0.944 1.763 1.213 0.611 1.048
1.624 0.796 1.416 1.528 −0.185 1.516
1.967 0.778 1.807 1.027 0.259 0.994
Lat rms (cm)↓ Lat mean (cm) Lat std (cm)↓ Long rms (cm)↓ Long mean (cm) Long std (cm)↓
1.764 1.222 1.272 1.524 −0.019 1.524
1.972 1.000 1.700 1.255 −0.111 1.250
Scenario-outdoor Scenario-semi-outdoor Scenario-indoor Light-sunny
Light-cloudy
1.291 0.333 1.247 1.563 −1.111 1.100
1.333 −0.222 1.315 2.034 −1.278 1.583
1.592 0.917 1.435 1.306 0.144 1.298
2.468 1.741 1.750 1.139 0.259 1.109
1.585 0.917 1.368 1.169 0.178 1.155
1.876 0.964 1.609 1.203 0.250 1.176
1.929 1.058 1.613 1.369 −0.288 1.339
1.908 0.981 1.636 1.284 0.148 1.275
Light-indoor light Tag-normal
Tag-scratch
Tag-dirt
Tag-small
2.468 1.741 1.750 1.139 0.259 1.109
2.468 1.741 1.750 1.139 0.259 1.109
1.312 0.056 1.311 1.814 −1.194 1.366
2.809 2.778 0.416 0.577 0.111 0.567
0.832 0.192 0.810 1.118 0.731 0.846
Experiments verify that the proposed system is robust to initial position and orientation, light illumination, indoor/outdoor scenario, and fiducial marker damage. Also, it can be easily deployed and transplanted to multiple kinds of trailers and couplers.
5
Conclusion
This paper proposes Trailer Tag Hitch: an automatic reverse hanging system using fiducial markers. The system includes two main modules: pose estimation and visual servoing. For the hitch coupler pose estimation module, we build an algorithm based on AprilTag detection, which gives the pose from camera to AprilTag. We combine it with fixed transforms of tow pole w.r.t. camera and coupler w.r.t. AprilTag, to get the target pose estimation from the tractor tow pole to the trailer hitch coupler. For the visual servoing module, we decouple lateral steer control and longitudinal speed control. The control module drives the tractor to reverse to the trailer hitch, and stop at trailer hitching finish. The proposed system is evaluated by large scale experiments. A total of 254 experiments are conducted on 4 tractor-trailer vehicles. Test conditions vary from initial position and orientation, light illumination, indoor/outdoor scenario, and fiducial marker damage. The overall success rate reaches 95%, showing that the proposed method is robust, efficient, and ready to deploy to multiple kinds of trailers. It should be noted that detecting the fiducial marker rather than the trailer/coupler needs an extra calibration from the marker to the coupler. For future work, we plan to test other fiducial markers and visual servo method.
300
D. Lu et al.
Acknowledgement. This work was supported by National Natural Science Foundation of China (62203301/62173228/61873165).
References 1. Action trailers (2022). https://actiontrailers.ca. Accessed 13-Sept 2022 2. Atoum, Y., Roth, J., Bliss, M., Zhang, W., Liu, X.: Monocular video-based trailer coupler detection using multiplexer convolutional neural network. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5478–5486 (2017) 3. Dahal, A., et al.: DeepTrailerAssist: deep learning based trailer detection, tracking and articulation angle estimation on automotive rear-view camera. In: International Conference on Computer Vision Workshop, pp. 2339–2346 (2019). https:// doi.org/10.1109/ICCVW.2019.00287 4. Kalaitzakis, M., Carroll, S., Ambrosi, A., Whitehead, C., Vitzilaios, N.: Experimental comparison of fiducial markers for pose estimation. In: 2020 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 781–789. IEEE (2020) 5. Krogius, M., Haggenmiller, A., Olson, E.: Flexible layouts for fiducial tags. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2019) 6. Liu, Y., Wang, C., Yuan, W., Yang, M.: Time-of-flight camera based trailer hitch detection for automatic reverse hanging system. In: Sun, F., Hu, D., Wermter, S., Yang, L., Liu, H., Fang, B. (eds.) ICCSIP 2021. CCIS, vol. 1515, pp. 439–450. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-9247-5 34 7. Olson, E.: AprilTag: a robust and flexible visual fiducial system. In: Proceedings - IEEE International Conference on Robotics and Automation, pp. 3400–3407 (2011). https://doi.org/10.1109/ICRA.2011.5979561 8. Pfrommer, B., Daniilidis, K.: Tagslam: Robust SLAM with fiducial markers. CoRR abs/1910.00679 (2019). https://arxiv.org/abs/1910.00679 9. Ramirez-Llanos, E., Yu, X., Berkemeier, M.: Trailer hitch assist: lightweight solutions for automatically reversing to a trailer. In: IEEE Intelligent Vehicles Symposium, Proceedings, pp. 781–788 (2020). https://doi.org/10.1109/IV47402.2020. 9304637 10. Richardson, A., Strom, J., Olson, E.: AprilCal: assisted and repeatable camera calibration. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2013) 11. Southern USA trailers (2022). https://cargotrailerguide.com/trailer-dealer/ southern-usa-trailers/. Accessed 13 Sept 2022 12. Wang, J., Olson, E.: Wang2016Iros. In: Proceedings of the EEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2–7 (2016). https:// april.eecs.umich.edu/media/pdfs/wang2016iros.pdf
Anatomical and Vision-Guided Path Generation Method for Nasopharyngeal Swabs Sampling Jing Luo1 , Wenbai Chen1 , Fuchun Sun2(B) , Junjie Ma2 , and Guocai Yao2 1 Beijing Information Science and Technology University, Beijing, China 2 Tsinghua University, Beijing, China
[email protected]
Abstract. Due to the global COVID-19 pandemic, there is a strong demand for pharyngeal swab sampling and nucleic acid testing. Research has shown that the positive rate of nasopharyngeal swabs is higher than that of oropharyngeal swabs. However, because of the high complexity and visual obscuring of the interior nasal cavity, it is impossible to obtain the sampling path information directly from the conventional imaging principle. Through the combination of anatomical geometry and spatial visual features, in this paper, we present a new approach to generate nasopharyngeal swabs sampling path. Firstly, this paper adopts an RGB-D camera to identify and locate the subject’s facial landmarks. Secondly, the mid-sagittal plane of the subject’s head is fitted according to these landmarks. At last, the path of the nasopharyngeal swab movement in the nasal cavity is determined by anatomical geometry features of the nose. In order to verify the validity of the method, the location accuracy of the facial landmarks and the fitting accuracy of mid-sagittal plane of the head are verified. Experiments demonstrate that this method provides a feasible solution with high efficiency, safety and accuracy. Besides, it can solve the problem that the nasopharyngeal robot cannot generate path based on traditional imaging principles. It also provides a key method for automatic and intelligent sampling of nasopharyngeal swabs, and it is of great clinical value to reduce the risk of cross-infection. Keywords: Nasopharyngeal swabs · Nucleic acid detection · Geometric anatomy · Path generation and visual navigation
1 Introduction Due to the global COVID-19 pandemic with the constant variability of virus strains and its high infectivity and insidious properties, there is an urgent and strong demand for nucleic acid detection [1, 2]. As one type of non-contact technology, a robot can be utilized in nucleic acid detection for non-contact diagnosis: 1) Reducing the risk of cross-infection. Healthcare workers who perform swabs are at high risk of infection due to aerosol from patients during the process of sampling, therefore, applications of a swab robot to avoid contact between healthcare workers with patients and reduce the risk of infection during sampling [3, 4]; 2) Improving the quality of swab samples. Under the background of nucleic acid screening on large scale, tremendous tasks and time limits © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 301–315, 2023. https://doi.org/10.1007/978-981-99-0617-8_21
302
J. Luo et al.
are both leading to prevent healthcare workers from maintaining standardized sampling over time. Therefore, a series of problems such as unstandardized sampling action and unqualified samples may lead to “false negatives” [5, 6]. As the research has shown, the samples collected by robots have higher validity and smaller standard deviation [7]. As a result, the standardized operation of the robot can improve the quality of samples effectively [8]. Moreover, oropharyngeal swab collection is prone to a high probability of false negative samples [9, 10], Nicole et al. of the Chinese University of Hong Kong published a Meta-analysis in 2021 in the top medical journal “The Lancet”, which found that the effectiveness of oropharyngeal swabs is only 68% of that of nasopharyngeal swabs [11]. It can be seen that the research on the relevant technologies of nasopharyngeal swabs sampling robot is an important carrier for epidemic prevention and control. Nasopharyngeal swabs sampling robot is a complex system, and the research on it involves a lot of technical difficulties, especially the research on the structure of the nasal cavity. The nasal cavity is a dome-shaped multi-channel structure with complex anatomical characteristics, it is divided into two separate cavities by the nasal septum [12], and the back of each side of the nasal cavity is divided into the superior meatus, middle meatus, and inferior meatus, so the existence of other meatuses greatly interferes with swab movement, increasing the difficulty of swab entering the inferior meatus. Moreover, concealed vision also increases the difficulty of swabs entering the inferior meatus. On the one hand, occlusion of the wing of the nose makes it impossible to obtain the internal structure information of the nasal cavity by using the external camera. On the other hand, restricted by the narrow structure of the nasal cavity and the prevention of cross-infection, the airborne camera that enter the nasal cavity with swabs is not allowed. Therefore, a deficiency of structural information in the nasal cavity makes it impossible to establish an accurate environmental model for swab movement. As a result, combining anatomy and vision guidance is a feasible solution to the problems above. In geometric anatomy, Paata Pruidze et al. [13] proposed a medical guideline for nasopharyngeal swabs sampling based on anatomical studies by simulating anatomical head and neck specimens and recording information about important anatomical parameters. The information on the angles and distances between anatomical landmarks during the collection of nasopharyngeal swabs was recorded. Ming Li et al. [14] proposed a method that imposes geometric constraints in pathological regions, which guides the robot using collaborative control and virtual fixtures derived from complex geometry. In the visual guidance method, Jingwen Cui et al. [15] proposed a method for identifying and locating crops that combines the YOLOv3 algorithm with point cloud images to obtain 3D coordinates of feature points of the target crops. Qiuping Lan et al. [16] proposed an extraction method of tunnel geometric feature lines from the 3D point cloud, which can correctly identify the trend of the tunnel and determine the normal direction of the cross-section. Combining the above scholars’ research results in anatomy and computer vision, a new method is presented to generate the nasopharyngeal swabs sampling path through the combination of anatomical geometry and spatial visual landmarks. This non-contact method in this paper can solve the problem of not being able to perform nasopharyngeal swab sampling directly by acquiring information about the interior nasal cavity through conventional imaging principle, and it provides a feasible solution for nasopharyngeal
Anatomical and Vision-Guided Path Generation Method
303
swabs sampling path generation on robots in the future. The paper will be divided into two sections: spatial information acquisition of facial landmarks and sampling path generation based on facial landmarks.
2 Spatial Information Acquisition of Facial Landmarks In an anatomical study conducted by Paata Pruidze et al. [13] of Medical University of Vienna in 2021, the swab positions during nasopharyngeal swab sampling were accurately measured based on 157 head and neck specimens, and a geometric anatomical description method for the swab was eventually developed. In this paper, Under the guidance of the swab position data and the geometric anatomical description method, seven basic facial landmarks was carefully selected to generate the sampling path of the swab, namely the nasion point A, the nasal tip point B, nostril points C1 and C2 , mouth corner points D1 and D2 , and the subnasal point E, as shown in Fig. 1.
Fig. 1. Facial landmarks
Fig. 2. Block diagram of the localization of facial landmarks
The process of the localization of facial landmarks is shown in Fig. 2.
304
J. Luo et al.
2.1 Get the 2D Coordinates of the Basic Landmarks In order to obtain the spatial information of application scenarios, combining color images and depth images is an indispensable step, because there is 2D information in color images and the depth information can only be obtained by depth images. However, only after aligning color images to depth images [14, 15] can we obtain the complete spatial information of the corresponding application scenarios, due to the inconsistent field of view and image resolution of the RGB camera and depth sensor. The steps can be detailed as follows: Step 1: RGB images and depth images are obtained by using the Azure Kinect DK camera. Step 2: The Haar_Cascade Classifier algorithm [16] from the OpenCV library was used for face recognition on the aligned image. Step 3: 68 landmarks of the face are detected by loading the shape_predict_68_face_landmarks.dat model from the DLib library [17] after determining the anchor frame range of the face. Step 4: obtain the 2D coordinates of five points A, B, D1 , D2 , E by indexing 68 landmarks [18, 19]. 2.2 Get the 2D Coordinates of the Landmarks of the Nostrils Since nose detection is an important part of face images analysis technology, the detection of the nose needs to be further based on the detection of the face to obtain the localization of the two points in the center of the nostril. The steps to locate the center of the nostrils can be detailed as follows: Step 1: The image of the nose is grayed out, Gaussian smoothing filtered, region segmented, and thresholded [20] respectively after determining the anchor frame range of the nose. Step 2: The set of contour boundary points is found in the thresholded image. Step 3: The 2D coordinates of two nostril centroids C1 and C2 , can be obtained by calculating the center of mass, area, and center of gravity of the nose image [21]. 2.3 Get the 3D Coordinates of All Landmarks The depth_image_to_pointcloud function is called to generate point cloud from the depth image [22] and index the corresponding landmarks to get the spatial 3D coordinates of the seven landmarks. The detection results are shown in Fig. 3(a) and Fig. 3(b).
Anatomical and Vision-Guided Path Generation Method
305
Fig. 3. (a). Face detection, nose detection, and nostril region segmentation. (b). Localization of facial landmarks
3 Sampling Path Generation Based on Facial Landmarks In order to generate the sampling path based on obtaining these visual landmarks, this section will be introduced in the following two parts: the fitting of the mid-sagittal plane of the head and the path generation of the swab sampling. The sampling path generation algorithm based on facial landmarks is shown in Fig. 4.
Fig. 4. Block diagram of sampling path generation based on facial landmarks
3.1 Fitting of the Mid-Sagittal Plane of the Head Fitting a more accurate equation of the mid-sagittal plane of the head [23] is an important cornerstone of sampling path generation. The coronal plane, the sagittal plane [24] and the transverse plane are shown in Fig. 5. The sagittal plane of the head is the plane that divides the head into left and right parts and cuts it longitudinally, and the mid-sagittal plane of the head is the plane that the left and right parts are equal.
306
J. Luo et al.
Fig. 5. Schematic diagram of the anatomical surface of the head
The midpoint D of the symmetry points D1 and D2 on the mid-sagittal plane is introduced due to the equation of the mid-sagittal plane of the head determined by only three non-coincident points may lead to an excessive error. Therefore, four more dispersed and non-coincident points A, B, D, and E are selected for solving the equation of the mid-sagittal plane of the head, as shown in Fig. 6.
Fig. 6. Landmarks on the mid-sagittal plane of the head
The plane equation of the mid-sagittal plane of the head can be assumed to be: Ax + By + Cz + D = 0 (C = 0)
(1)
Equation (1) can be transformed into: B D A z=− x− y− C C C Assume that the parameters before the variables are: A B D p0 = − , p1 = − , p2 = − C C C Therefore, plane equation of the mid-sagittal plane of the head is expressed as: z = p0 x + p1 y + p2
(2)
(3)
(4)
Anatomical and Vision-Guided Path Generation Method
307
Substitute the 3D coordinates of the A, B, E, D four points (xi , yi , zi ){i = 0, 1, · · ·, 3} into Eq. (1) to minimize S. If the minimization of S = 3i=0 (p0 xi +p1 yi + p2 − zi )2 should satisfy:
That is:
∂S = 0, k = 0, 1, 2 ∂pk
(5)
⎧ 3 ⎨ i=0 2(p0 xi + p1 yi + p2 − zi )xi = 0 3 2(p0 xi + p1 yi + p2 − zi )yi = 0 ⎩ i=0 3 i=0 2(p0 xi + p1 yi + p2 − zi ) = 0
(6)
This can be translated into: 3 3 ⎞ ⎞ ⎛ ⎞ ⎛ 3 ⎛ 3 2 p0 i=0 xi i=0 xi yi i=0 xi i=0 xi zi 3 3 ⎠⎝ p1 ⎠ = ⎝ 3 yi zi ⎠ ⎝ 3 xi yi yi 2 i=0 yi i=0 i=0 i=0 3 3 3 p2 3 i=0 xi i=0 yi i=0 zi
(7)
The parameters p0 , p1 , p2 of the plane equation can be obtained by matrix operations: p0 x + p1 y − z + p2 = 0
(8)
The results are shown in Fig. 7.
Fig. 7. The fitted mid-sagittal plane of the head
3.2 Path Generation for Swab Sampling Since the mid-sagittal plane of the head is fitted by the Least Squares Method, it is necessary to find the projection points A and E on the fitted mid-sagittal plane of the head for point
A and point E, respectively. Assume that the coordinates of the projected points are xj , yj , zj {j = 0, 1}, the coordinates of its projection points on the fitted mid-sagittal plane of the head are (xk , yk , zk ){k = 0, 1}. The vertical constraint can be satisfied as the line between the projection and the projection point is perpendicular to the fitted mid-sagittal plane of the head:
yk = pp01 xk − xj + yj
(9) zk = −1 p0 xk − xj + zj
308
J. Luo et al.
Bring yk and zk into Eq. (8) yields:
2 p1 + (−1)2 xj − p0 (p1 yj − zj + p2 ) xk = p0 2 + p1 2 + (−1)2 Bring Eq. (10) into Eq. (9) yields:
2 ⎧ 2 1 (p0 xj −zj +p2 ) ⎨ yk = p0 +(−1)2 yj −p 2 2 ⎩ z = k
p0 +p1 +(−1) (p0 2 +p1 2 )z j +(p0 xj +p1 yj +p2 )
(10)
(11)
p0 2 +p1 2 +(−1)2
The coordinates of the projection point A (xa , ya , za ) and the coordinates of the projection point E (xe , ye , ze ) can be obtained from the above. Referring to Fig. 8, the linear equation of the nasion-subnasal baseline located on the mid-sagittal plane N can be obtained l1 after obtaining the points A and E : l1 :
x − xa y − ya z − za = = xe − xa ye − ya ze − za
(12)
Fig. 8. The illustration of plane M and plane N.
Project l1 onto the plane M and get l1 and its direction vector through the point C1 (xc , yc , zc ): x − xc y − yc z − zc = = xe − xa ye − ya ze − za − → s = (x − x , y − y , z − z
l1 :
1
e
a
e
a
e
a
(13) (14)
Referring to Fig. 9(a), the swab collection side was placed at the known center point C1 of one of the nostrils so that the acute angle of the line l2 where the swab was located → → s1 to − s2 remains and the line l1 had an angle α of 35.4°. Since the x-component of − → − → s2 can be unchanged, s2 can be assumed and the y-component and z-component of − − → obtained by left multiplying the rotation matrix with s1 : q sinα cosα y − y e a = (15) −cosα sinα r ze − za
Anatomical and Vision-Guided Path Generation Method
309
Fig. 9. Swab collection procedures
The linear equation of l2 can be obtained: l2 :
z − zc x − xc y − yc = = xe − xa q r
(16)
The angle α is kept constant and the swab collection side is elongated by 10 mm along the → → direction of − s2 so that the swab collection side reaches point O1 after − s2 is obtained. The − → point O1 can be obtained by normalizing s2 to generate a unit vector and then multiplying → the xyz component of − s2 by 10 respectively and superimposing the coordinates of point C1 , which is O1 (xc + s2 _x, yc + s2 _y, zc + s2 _z). Referring to Fig. 9(b), the swab collection side is fixed at point O1 after reaching it, and the end of the swab stops rotating upward along the plane M with point O1 as the center of the circle until the wing of nose is lifted slightly, and the swab is slowly advanced in this direction, and the optimal advancement length of the swab is about 94 mm [25]. Referring to Fig. 9(c), making the constraint interval of the angle α within [76.3°, 82.9°] allows the swab collection side to reach the upper limit point F1 and the lower limit point F2 of the posterior wall of the nasopharynx, respectively. Similarly, the coordinates of points F1 and F2 are obtained in the same way as point O1 . The spatial collection trajectory is shown in Fig. 10.
Fig. 10. Spatial sampling path
310
J. Luo et al.
4 Experiments As a bridge of connecting anatomy, the localization accuracy of landmarks and the fitting accuracy of the mid-sagittal plane of the head have been verified in this research to verify the effectiveness of the proposed method. 4.1 Hardware and Software Configurations The hardware used in the experiments was a personal computer and Azure Kinect DK camera. The PC is equipped with an Intel i7-10750H CPU, 16.0 GB of RAM, and NVIDIA-1650Ti discrete graphics card, while the Azure Kinect DK camera is equipped with a 1-megapixel depth camera, a 12-megapixel full HD RGB camera, and advanced AI sensors. The depth camera is in NFOV_UNBINNED mode of operation, and the RGB camera is in 1280 × 720 resolution in this experiment. The software in the experiment uses PyCharm 2021.2.2 and MATLAB 2018a as integrated development environment tools, DLib 19.24 and OpenCV 3.4.15 as databases for computer vision applications. 4.2 Experimental Design of Localization Accuracy of Facial Landmarks As a rigid body in space, the human head has the property that the distance between any two of its given points remains constant in time during motion [26], the localization accuracy of the critical points can be evaluated according to this principle. The experimental subject was made to do small amplitude translational motion facing the camera during the experiment, and the 3D coordinates of 15 sets of landmarks were collected, as shown in Table 1. The Euclidean distance between point D1 and point D2 , point A and point B, point B and point E, point A and point E, point D1 and point A, point D2 and point A were selected, and the coefficients of variation were obtained for these six different observations. Table 1. 3D coordinates of landmarks. Num
A (mm)
B (mm)
E (mm)
D1 (mm)
D2 (mm)
1
(37, −66, 330)
(33, −37, 309)
(33, −23, 321)
(6, 3, 331)
(57, 6, 326)
2
(35, −63, 330)
(32, −36, 307)
(32, −21, 319)
(4, 4, 329)
(55, 8, 325)
3
(36, −64, 330)
(32, −36, 307)
(32, −21, 319)
(4, 5, 330)
(56, 9, 325)
4
(36, −65, 330)
(32, −37, 308)
(32, −24, 320)
(5, 3, 331)
(56, 6, 326)
5
(36, −66, 330)
(32, −37, 308)
(32, −23, 320)
(6, 3, 330)
(56, 7, 326) (continued)
Anatomical and Vision-Guided Path Generation Method
311
Table 1. (continued) Num
A (mm)
B (mm)
E (mm)
D1 (mm)
D2 (mm)
6
(36, −66, 330)
(32, −36, 307)
(33, −23, 320)
(6, 3, 329)
(56, 7, 325)
7
(37, −66, 330)
(33, −36, 307)
(33, −23, 319)
(7, 3, 329)
(56, 7, 325)
8
(37, −65, 330)
(32, −35, 306)
(32, −21, 319)
(5, 4, 330)
(57, 8, 325)
9
(36, −65, 330)
(31, −33, 305)
(31, −20, 317)
(5, 3, 327)
(56, 7, 324)
10
(36, −65, 330)
(32, −34, 305)
(31, −21, 318)
(6, 3, 327)
(56, 7, 324)
11
(37, −65, 330)
(34, −38, 307)
(34, −24, 319)
(6, 2, 328)
(56, 6, 324)
12
(35, −64, 330)
(30, −33, 305)
(31, −20, 317)
(4, 3, 329)
(54, 7, 324)
13
(36, −69, 330)
(33, −42, 308)
(32, −26, 318)
(4, 1, 329)
(55, 5, 323)
14
(37, −68, 330)
(34, −40, 307)
(32, −24, 318)
(5, 2, 328)
(56, 6, 324)
15
(35, −68, 330)
(32, −40, 307)
(31, −24, 318)
(4, 2, 328)
(54, 6, 323)
4.3 Experimental Design for Fitting Accuracy in the Mid-Sagittal Plane of the Head Calculating the distances from the four points A, B, D, E used for fitting to the midsagittal plane of the head, respectively. Assuming that the coordinates of the fitting point are (xi , yi , zi ){i = 0, 1, 2, 3, t = A, B, D, E}, the distance from the fitting point to the mid-sagittal plane of the head can be expressed as: |p0 xi + p1 yi − zi + p2 | dt = p0 2 + p1 2 + (−1)2
(17)
4.4 Experimental Results and Analysis The experimental results of localization accuracy of facial landmarks are shown in Fig. 11 and Table 2. The coefficient of variation of each observation is basically stable within 5%, and the volatility of the six distance observations is basically stable within the interval of [1.83, 6.09] mm, which can satisfy the requirements of localization accuracy of facial landmarks and has strong robustness.
312
J. Luo et al.
Fig. 11. Waveform diagram of observations
Table 2. Standard deviation, mean and coefficient of variation of the observations Indicators ρD1 D2 (mm)
ρAB (mm)
ρBE (mm)
ρAE (mm)
ρAD1 (mm)
ρAD2 (mm)
ρ
36.115027
18.616563
44.423752
75.379127
75.225948
50.906572
σ
0.7751602
1.8078061
0.6677365
1.3099524
1.2083430
1.2168721
CV
0.0152271
0.0500568
0.0358678
0.0294876
0.0160302
0.0161762
Table 3. Distance from the landmarks to the fitted mid-sagittal plane of the head Num
dA (mm)
dB (mm)
dD (mm)
dE (mm)
1
0.005925089733
0.00899735848
0.0015361343772
0.013386313842
2
0.047418635209
0.02212869643
0.0031612423498
0.072708573992
3
0.011309128782
0.04215220727
0.0005140513082
0.030329027188
4
0.073859637198
0.27410132026
0.0246198790660
0.175621804004
5
0.027798286654
0.11000179147
0.0055596573309
0.076643847490
6
0.025327647053
0.09904919115
0.0063319117633
0.067389632338
7
0.014344307938
0.06317012534
0.0035860769845
0.045239740420
8
0.028551315159
0.11039841861
0.0071378287898
0.074709274667
9
0.047387372789
0.17770264796
0.0118468431974
0.142162118369 (continued)
Anatomical and Vision-Guided Path Generation Method
313
Table 3. (continued) Num
dA (mm)
dB (mm)
dD (mm)
dE (mm)
10
0.047404300455
0.19823616554
0.0093372106958
0.160169075782
Avg
0.032932572097
0.11059379225
0.0073630835863
0.085835940809
Max
0.073859637198
0.27410132026
0.0246198790660
0.175621804004
Min
0.005925089733
0.00899735848
0.0005140513082
0.013386313842
The experimental results of the fitting accuracy in the mid-sagittal plane of the head are shown in Table 3. It can be seen that the distance errors of the ten groups of measurements are within 0.28 mm, indicating that the method has a high fitting accuracy, which can lay a solid foundation for the subsequent work on clinical swabs sampling and provide a feasible solution.
5 Conclusions In this paper, we propose an anatomical and vision-guided path generation method for nasopharyngeal swabs sampling. A computer vision approach is used to locate landmarks on the subject’s face, combining the visual guidance of an external camera with anatomical biological structure features to solve the problem that the nasopharyngeal swab robot cannot directly generate path based on traditional imaging principles. The experimental results demonstrate that the proposed method provides a feasible path generation scheme for nasopharyngeal swabs sampling. However, there are still some issues that need to be further explored, such as algorithm optimization issues to improve the localization accuracy and detection speed of facial landmarks, and how to fit with the nasopharyngeal swab sampling robot in future work. In future, we will introduce torque sensors and encoders for force and bit sensing to improve the security of the sampling process. And combining more clinical experience to improve the accuracy of the algorithm and apply it to practical scenarios as soon as possible. Acknowledgements. This project was supported by National Natural Science Foundation of China (Fund No. 62276028), Tsinghua University Guoqiang Institute (Fund No. 2020GQ0006) and China Postdoctoral Science Foundation (Fund No. 2021M701890).
References 1. Ye, Q.: Recent advances and clinical application in point-of-care testing of SARS-CoV-2. J. Med. Virol. 94(5), 1866–1875 (2022) 2. Araf, Y.: Omicron variant of SARS-CoV-2: Genomics, transmissibility, and responses to current COVID-19 vaccines. J. Med. Virol. 94(5), 1825–1832 (2022) 3. Li, S.: Clinical application of intelligent oropharyngeal-swab robot: implication for COVID19 pandemic. Eur. Respiratory J. 56(2), (2020)
314
J. Luo et al.
4. Yang, G.: Combating COVID-19—the role of robotics in managing public health and infectious diseases. Sci. Robot. 5(40), eabb5589 (2020) 5. Rong, Y.: Clinical characteristics and risk factors of mild-to-moderate COVID-19 patients with false-negative SARS-CoV-2 nucleic acid. J. Med. Virol. 93(1), 448–455 (2021) 6. Zedtwitz-Liebenstein, K.: Correctly performed nasal swabs. Infection 49(4), 763–764 (2021). https://doi.org/10.1007/s15010-021-01607-8 7. Li, C.: Cause analysis and control points of false negative nucleic acid test for severe acute respiratory syndrome coronavirus. J. Xiamen Univ. Nat. Sci. 59(3), 310–316 (2020) 8. Ma, J.: A portable robot system for oropharyngeal swabs Sampling. J. SCIENTIA SINICA Inform. 1674–7267 (2022) 9. Li, Y.: Clinical characteristics, cause analysis and infectivity of COVID-19 nucleic acid repositive patients: a literature review. J. Med. Virol. 93(3), 1288–1295 (2021) 10. Wang, X.: Comparison of nasopharyngeal and oropharyngeal swabs for SARS-CoV-2 detection in 353 patients received tests with both specimens simultaneously. Int. J. Infect. Dis. 94, 107–109 (2020) 11. Tsang, N.: Diagnostic performance of different sampling approaches for SARS-CoV-2 RTPCR testing: a systematic review and meta-analysis. Lancet. Infect. Dis 21(9), 1233–1245 (2021) 12. Ashikawa, R.: Surgical anatomy of the nasal cavity and paranasal sinuses. Auris Nasus Larynx 9(2), 75–79 (1982) 13. Pruidze, P.: Performing nasopharyngeal swabs—guidelines based on an anatomical study. Clin. Anat. 34(6), 969–975 (2021) 14. Ming, L.: Spatial motion constraints in medical robot using virtual fixtures generated by anatomy. In: IEEE International Conference on Robotics and Automation, vol. 2, pp. 1270– 1275 (2004) 15. Cui, J.: Extraction and research of crop feature points based on computer vision. Multidisc. Digit. Publ. Inst. 19(11), 2553 (2019) 16. Qiuping, L.: Automatical extraction geometric feature lines in the tunnel from 3D point cloud. Eng. Surv. Mapp. 24(10), 1–4 (2015) 17. Walid, D.: Coarse to fine global RGB-D frames registration for precise indoor 3D model reconstruction. In: International Conference on Localization and GNSS, pp. 1–5 (2017) 18. Gaurav, G.: On RGB-D face recognition using Kinect. In: IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems, pp. 1–6 (2013) 19. Paul, V.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. I (2001) 20. Vahid, K.: One millisecond face alignment with an ensemble of regression trees. In: EEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014) 21. Heng, Y.: Face sketch landmarks localization in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 21, pp. 1321–1325 (2012) 22. Shome, A.: Driver drowsiness detection system using DLib. In: Proceedings of the 2nd International Conference on Advance Computing and Innovative Technologies in Engineering, pp. 193–197 (2022) 23. Luca, C.: Nostril detection for robust mouth tracking. In: IET Irish Signals and Systems Conference, pp. 239–244 (2010) 24. Peng, B.: Three-dimensional measurement of nostril positions based on stereo-camera. In: Advanced Materials Research, pp. 1470–1474 (2013) 25. Hong, H.: Research on 3D point-cloud registration technology based on Kinect V2 sensor. In: Chinese Control and Decision Conference, pp. 1264–1268 (2018) 26. Gateno, J.: The primal sagittal plane of the head: a new concept. Int. J. Oral Maxillofac. Surg. 45(3), 399–405 (2016)
Anatomical and Vision-Guided Path Generation Method
315
27. Yanxi, L.: Robust midsagittal plane extraction from normal and pathological 3-D neuroradiology images. IEEE Trans. Med. Imaging 20(3), 175–192 (2001) 28. Callesen, R.: Optimal insertion depth for nasal mid-turbinate and nasopharyngeal swabs. Diagnostics 11, 1257 (2021) 29. Kordkiewski, J.: Mechanics of a rigid body. The University of Melbourne Department of Mechanical and Manufacturing Engineering, Australia (2008)
A Hierarchical Model for Dynamic Simulation of the Fault in Satellite Operations Danni Nian(B) , Jiawei Wang, and Sibo Zhang Beijing Institute of Spacecraft System Engineering, Beijing 100094, China [email protected]
Abstract. Since the static simulation method cannot display the time sequence variation and influence range of the fault in satellite operations, this paper develops a dynamic fault simulation method based on hierarchical model. This method adopts dynamic simulation propulsion to describe the fault sequence variation process via fault models in different levels. The effectiveness of this method has been verified by the fault simulation of a satellite energy security mode. The results show that our approach can not only effectively improve the precision of fault simulation and provide support for fault treatment, but also provide reference for the design of satellite fault simulation system. Keywords: Satellite fault simulation · Hierarchical modeling · Dynamic simulation propulsion
1 Introduction With the increasing number of on-orbit satellites and the increasing application complexity, the amounts of faults increases and the fault phenomena are more complex. Therefore, the simulation of complex satellite faults is facing great challenges. Satellite on-orbit fault will cause performance degradation and function loss, and bring huge losses to users. It is necessary to handle the fault quickly and accurately. Therefore, refined fault simulation has become essential in constructing satellite simulation system. At present, static simulation is the main approach for satellite fault simulation at home and abroad. Static simulation method is to set satellite telemetry changes and simulate instantaneous fault state according to the clear plan, which can simulate the variation process of fault in satellite operations [1]. However, due to a lack of simulation of correlated time sequence change between satellite parts, the static simulation is not conducive to the real-time fault treatment. In this paper, a dynamic simulation method of satellite fault based on hierarchical model is developed. This method can realize the continuous simulation of multi-layer complex faults, and assist users to identify the vulnerable links in the fault process, as well as to simulate the fault treatment in real time [2]. Meanwhile, this method can dynamically simulate the fault state and influence range in the time sequence dimension, and provide decision-making data support for users. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 316–326, 2023. https://doi.org/10.1007/978-981-99-0617-8_22
A Hierarchical Model for Dynamic Simulation
317
2 Dynamic Fault Simulation Approach Satellite is consisted of multiple components, which are interrelated and have definite functions. When the satellite system fails, the state of each component changes. During the simulation, not only the current fault state of the component is simulated, but also the relevant telemetry changes need to be simulated according to the “influence relationship” between the components. According to whether the fault mechanism of satellite on orbit is clear or not, the faults can be divided into four classes. Typical faults are shown in Table 1: 1) 2) 3) 4)
Faults that have occurred on orbit with clear conclusions; Faults that have occurred on other satellites of the same type with clear conclusions; Faults that are able to design instruction sequence for recovery; Faults that can be simulated by modifying inputs (including software on satellite and external excitation signals, etc.).
Table 1. Typical satellite faults No. Fault type
Fault name
Simulation mode
1
Faults that have occurred on orbit
• Short circuit fault of on-orbit bus • Thermistor works abnormally • Output current of solar array reduces • Fault of minimum energy security mode
Dynamic simulation
2
Faults that have occurred on • Overvoltage fault of bus other satellites of the same type • Abnormal startup of reaction and have clear conclusions wheel
Dynamic simulation
3
Faults that are able to design instruction sequence for recovery
• Abnormal attitude control in normal mode • Fault of 10N thruster
Static simulation
4
Faults that can be simulated by modifying inputs
• Abnormal uplink locking of spread spectrum transponder • Loss of digital sun sensor SP
Static simulation
2.1 Comparison of Two Simulation Methods At present, the static simulation method is adopted for fault simulation. This method initially sets the fault state according to the existing fault plan or the fault report with clear causes, so as to realize a single and linear fault state. Starting from a fault example of the system, this method can find the telemetry state table related to the fault, list the
318
D. Nian et al.
state of each component, and implement the fault state table in the simulation process. The static simulation method is targeted, the simulation phenomenon is single, and the dynamic change process of satellite state after fault is not simulated. Satellite is a whole composed of multiple components, which are interrelated and have definite functions. In the simulation application, users pay more attention to the time sequence change process of satellite state and influence range of fault, so as to provide decision-making basis for rapid and accurate fault treatment. The comparison of the two fault simulation methods is shown in Fig. 1: Fault contingency plans Fault contingency plans
State of the simulation
Dynamic state to sequential matrix sequence
...
TM1
TM18 ...
TM2 TM14
TM4
TM15
M1
TM5
TM2
TM3
TM4
TM5
TM6
TM7
TM8
Component level state matrix
...
...
TM9
TM9
Subsystem level state matrix
0 Static fault simulation
...
TM21 ...
TM8
...
M2 TM16
...
TM7
...
TM20
TM6
TM1
TM19 TM11
TM3
The fault status changes to a single list
... System level state matrix
t Dynamic fault simulation
Fig. 1. Comparison of fault simulation methods
1) The static fault simulation generally constructs fault model in component level. During the simulation, repeated single simulation is carried out without considering the fault evolution process, that is, the detailed simulation of continuous change at subsystem and system level is not realized. Taking the component level fault as the starting point, the hierarchical fault dynamic simulation method establishes the dynamic correlation and transition relationship between fault states both at subsystem and system level, and promotes the simulation step by step; 2) This paper focuses on the correlation time-domain characteristics and dynamic transition simulation method between fault states in different levels. There are a series of events in the process of fault state change, and these events are composed of components and states involved in the fault, which can be expressed by satellite telemetry matrix. Besides, these events are connected according to a certain correlation, and the correlation between them is expressed by transition matrix. Dynamic fault simulation method is to simulate the time-domain characteristics of different levels and grades of fault states.
A Hierarchical Model for Dynamic Simulation
319
2.2 Dynamic Fault Simulation Process After defining the simulated satellite fault phenomenon, the dynamic fault simulation process is shown in Fig. 2: 1) Taking the defined fault plan as input, the fault simulation model in a component level is established, which is a matrix composed of component telemetry parameters; 2) The telemetry correlation model in the component is established to judge whether the fault affects other components of the subsystem. If it does, the dynamic correlation matrix is constructed; 3) According to the subsystem composition relationship, the subsystem fault state matrix is established, which is composed of component fault matrix and correlation matrix; 4) According to the system composition relationship, the entire system fault state matrix, which is composed of subsystem fault matrix and correlation matrix. The simulation is propelled through the operation of state matrix until the end of the simulation task.
Fig. 2. Dynamic fault simulation process
320
D. Nian et al.
3 Research on Key Technologies 3.1 Hierarchical Fault Modeling At present, most of the existing satellite fault simulation models are used for that in singlelayer component level, which do not consider the change process of state time sequence after fault, and can not completely and truly reflect the fault information. There are electrical, mechanical and thermal connections between satellite subsystem components, and there will be information interaction at the same time. The faulty components will affect the normal operation of other equipment in the subsystem. Generally, the fault first appears on a single equipment, which is called component level fault. This kind of fault usually will not bring serious consequences to the satellite. However, if this fault cannot be removed in time, it will spread and cause multiple faults in the subsystem, which may cause the fault of the whole satellite. Satellite composition can be divided into component level, subsystem level and system level. According to the state and influence range when the fault occurs, the sequence process of fault evolution can be divided into component level fault, subsystem level fault and system level fault. According to the time-domain evolution process of simulated faults, hierarchical fault models are established respectively. At the same time, the development path of fault sequence is analyzed, and the sequence transition matrix of fault evolution at different levels is established to form a dynamic fault simulation method. Each level of fault corresponds to a fault state model, as shown in Fig. 3:
Fig. 3. Hierarchical fault modeling approach
A Hierarchical Model for Dynamic Simulation
321
1) Establishment of component level fault state model: • First, the telemetry state of satellite components to normal state and fault state is established, and then the eigenvalues that can describe the working state of components is extracted to form the eigenvector of fault simulation; • Second, the fault state configuration is set according to the eigenvector composed of component telemetry matrix. 2) Establishment of subsystem level fault state model: • The subsystem level model describes the characteristics of the correlation among the components in the subsystem. The simulation transition matrix is constructed by pre-configuring the component correlation feature quantity to form the simulation propulsion process in the transition process; • The current subsystem fault state model is obtained by multiplying the component fault state matrix and the transition matrix. 3) Establishment of system level fault state model: • The system is composed of subsystems, and the incidence matrix of the system is pre-configured according to experiential knowledge; • According to the simulation task proposed by the user, the incidence value in the system incidence matrix can be modified. The final satellite fault state model is obtained by multiplying the fault state matrix and the transition matrix. 4) Hierarchical simulation propulsion process: • When the fault state model in component level is established, whether the current time parameters meet the conditions for entering subsystem level fault simulation need to be determined. If the parameters meet the conditions, they will enter fault simulation layer in subsystem level; otherwise, they directly conduct single level; • Whether the current time parameters meet the conditions for entering system level fault simulation need to be determined. If the parameters meet the conditions, they will enter fault simulation layer in system level and feed back the system transition matrix to the fault state simulation matrix to form the final simulation. 3.2 Dynamic Simulation Propulsion In the process of satellite fault simulation, the propulsion simulation in different levels is realized by differential dynamics model, and the time sequence dynamic process corresponds to the differential dynamics equation. The propulsion simulation relationship between fault states at different levels is expressed in the form of one-to-one, one-tomany or many-to-many mapping between telemetry nodes. Therefore, after determining the relationship between levels, the fault simulation process can be regarded as the superposition of three levels of simulation process.
322
D. Nian et al.
Assume that there is a dynamic correlation between component simulation state A and subsystem simulation state B. In the fault propulsion simulation from A to B, the telemetry point in A is connected with the relevant telemetry point in B, which formed state matrix is recorded as C. The sequence dynamic fault simulation process is regarded as the superposition of simulation process of simulation state matrix A B and C. The dynamic fault simulation method is shown in Fig. 4: TM14 TM13
State A
TM8
TM6
TM1
TM12
State B
TM8
TM3
TM7
TM6 TM14
TM16
TM4
TM5
TM2
TM7
TM4
TM5
TM13
Incidence matrix C TM12
TM16
TM1 TM15
TM11
TM9
TM2
TM3
State A
State B
Fig. 4. Dynamic simulation propulsion method
1) Firstly, the simulation state matrix A and matrix B are modeled. The state change functions of telemetry nodes of fault state matrix A and simulation state matrix A are constructed. The fault state matrix is consisted of the current telemetry state and its correlation, and each node corresponds to a telemetry parameter. The fault state A in the component level is expressed as [1]: mA,i (t) = FA,i (qA,1 (t), qA,2 (t) · · · qA,n (t))
(1)
Component-level fault state B is expressed as [1]: mB,i (t) = FB,i (qB,1 (t), qB,2 (t) · · · qB,n (t))
(2)
where mS,i (t), s ∈≥ {A, B} represents the state function of telemetry node i in the fault state matrix S, and qsij (t) represents factor j affecting the telemetry node j in the fault state matrix S. Fs,i (qsij (t)) is the mapping function that maps the set of influencing factors interacting in the fault simulation process to the state of telemetry node. 2) During the propulsion of sequence simulation, the simulation model is established through interaction relationship function of the telemetry nodes between simulation state A and simulation state B, and the simulation propulsion process between different levels is constructed by using the properties of differential equation solutions
A Hierarchical Model for Dynamic Simulation
323
[3]. The transition process from simulation state A to simulation state B is expressed as: ⎧ dm (t) ⎫ A,i ⎪ = DA (mA,j (t), mB,j (t))⎪ ⎨ ⎬ dt (3) ⎪ ⎩ dmB,i (t) = D (m (t), m (t))⎪ ⎭ B A,j B,j dt 3) In the fault propulsion model based on interrelated state matrix, the relationship between nodes in different levels of state matrix is generally simplified to one-toone. That is, when the telemetry node in simulation state A fails, the corresponding telemetry node B fails with probability e. The parameters of differential equations are used to reflect the one-to-one or other relationship and its impact in the simulation propulsion process. 4) For a single component, the sum of all component level fault state matrices of the satellite is defined as SA,i (t) = i FA,i (t), and the sum of all subsystem level fault states is defined as SB,i (t) = i FB,i (t). The interaction relationship between the two levels is reflected by the coefficient e of probability cross term. Where e11 , e22 represent the internal derived probability of simulation level, and e11 , e22 represent the derived probability between simulation levels. The simulation process of satellite from component level fault to subsystem level fault is expressed as [3]: ⎫ ⎧ dS (t) A ⎪ = SA (t)[e11 + e12 SB (t)] ⎪ ⎬ ⎨ dt ⎪ ⎭ ⎩ dmS,i (t) = S (t)[e + e S (t)]⎪ B 22 21 A dt
(4)
4 Case Study Taking the fault of a satellite entering the minimum energy security mode as an example, this paper uses the dynamic fault simulation method to display its occurrence, evolution and treatment process. After the satellite enters the observation and control radian, the V-T curve of the power supply subsystem is switched from curve 3 to curve 2. Due to the large discharge current of group A battery and the T-V control circuit in the process of curve switching, the battery changes from one-stage charging to two-stage charging. Finally, the battery charging current is less than 5A and the duration reaches 90 min, which triggers the entry of independent emergency management conditions, and the satellite enters the minimum energy mode. The fault evolution relationship is shown in Fig. 5: After the satellite charging controller fails, the battery charging current is abnormal. At the same time, the fault propagates to the data management subsystem, and the satellite enters the minimum energy mode. The fault is consequence of a series of successive happenings caused by the fault of component charging controller, which leads to the fault of power subsystem and finally causes the satellite to enter the minimum energy mode. The dynamic simulation sequence relationship of the whole fault process is shown in Fig. 6:
324
D. Nian et al.
Fig. 5. Evolution relationship of a satellite fault
Fig. 6. Sequence of fault simulation process
1) Simulating time t0 : the satellite is in the earth shadow period, and the T-V curve adjustment of group A battery and group B battery is implemented according to the instruction. After the instruction is sent, the telemetry display of T-V curve state is normal, consistent with the interpretation criteria, and thus the instruction is sent and implemented correctly; During this process, the satellite load is in operation. The current state is the⎡initial state ⎤ at the beginning of the simulation. We set the state 0 · · · 22.5 ⎢ ⎥ matrix mA (t0 ) = ⎣ ... . . . ... ⎦ according to the initial values of each telemetry 1 · · · 15 parameter of the charging controller;
A Hierarchical Model for Dynamic Simulation
325
2) Simulating time t1 : after the satellite leaves the observation and control radian and the T-V curve of group A battery is switched, the satellite enters the illumination area from the earth shadow area. The charging state of the battery A turns to the stop charging or two-stage charging state, and the one-stage charging cannot be restored, which is the fault source. At the beginning of the fault, it only appears in ⎡ the charging ⎤ 0 · · · 22.5 ⎢ ⎥ controller. We set the component-level fault state matrix mA (t1 ) = ⎣ ... . . . ... ⎦
1 · · · 15 according to the fault phenomenon; 3) Simulation time t2 : the fault of the charging controller causes the charging current of the battery to be less than 5A, and the duration reaches 90 min. During this period, the fault of the charging controller has affected other components of the power subsystem, and the telemetry parameters such as charging current and battery voltage change abnormally. The simulation correlation function can be expressed as ⎡ ⎤ Qin = 6.6 ∗ t ⎢ ⎥ ⎢ VN2 = U + (30 − P2N ) ∗ 1.6 ∗ 27 ∗ 10−3 ⎥ ⎥ according to the relevant FB,i = ⎢ VN2 ⎢ ⎥ ⎣ ⎦ P2N Qout = IN3 ∗ T = ∗T VN2 formulas of satellite power subsystem, and fault state matrix in the power subsystem level is calculated by mB (t2 ) = FB,i ∗ mA (t1 ); 4) Simulation time t3 : the influence range of component fault has expanded, resulting in that the satellite state changes, the load stops working, the satellite maintains the minimum energy mode, and the satellite payload, GPS receiver and other equipment are in the shutdown state. According to the formula, the state values in the simulation propulsion process are calculated to form the simulation derived process.
5 Conclusions In terms of the engineering implementation, a dynamic fault propulsion simulation method based on hierarchical modeling is developed in this paper. The satellite fault process is divided into different levels: component fault, subsystem fault and system fault, the simulation models corresponding to different time domains are designed, and the fault multi time domain evolution is realized by differential dynamic equation. This method is helpful for improving the accuracy of simulation and simulation degree, and convenient for engineering implementation, and also provide reference for fault treatment. The method in this paper only analyzes and models the typical faults of satellites, and does not cover all types of faults. Therefore, a lot of modeling work is needed to do in order to simulate the fault evolution process of all platform satellites. In addition, the simulation propulsion method in this paper simplifies the correlation between satellite components, in the future, modeling and analysis should be carried out for the real and complex correlation between components to continuously improve the authenticity of simulation.
326
D. Nian et al.
References 1. Cui, T., Li, S., Zhu, B.: Construction space fault network and recognition network structure characteristic. Appl. Res. Comput. 36(8), 2400–2403 (2019) 2. Numfu, M., Riel, A., Noël, F.: Virtual reality based digital chain for maintenance training. Proc. CIRP (S2212-8271) 84, 1069–1074 (2019) 3. Wang, J., Kang, J., Zhou, K.: Fault recovery evolution technique based on FPGA. Comput. Eng. Sci. 40(12), 2120–2125 (2018) 4. Yao, R., Huang, S., Sun, K., et al.: A multi–timescale quasi-dynamic model for simulation of cascading outages. IEEE Trans. Power Syst. 31(4), 3189–3201 (2016) 5. Sun, Y., Liu, G.: Construction of underwater platform operational deduction system based on Web service and multi-Agent technology. Command Control Simul. 38(5), 90–95 (2016) 6. Cui, T., Ma, Y.: Discrete space fault tree construction and failure probability space distribution determination. Syst. Eng.-Theory Pract. 36(4), 1081–1088 (2016) 7. Li, B.C., Bo, M., Jie, C.: Maintenance and management of marine communication and navigation equipment based on virtual reality. Proc. Comput. Sci. (S1877-0509), 139, 221–226 (2018) 8. Liu, Z., Zhao, Q., Zhu, H., et al.: The virtual simulation system for training and demonstrating the design of the head-end of spent nuclear fuel reprocessing. Ann. Nucl. Energy (S0306-4549) 108, 310–315 (2017) 9. Liu, M., Zhu, Q., Zhu, J., et al.: The multi-level visualization task model for multi-modal spatio-temporal data. Acta Geodaetica Cartograph. Sinica 47(8), 1098–1104 (2018). https:// doi.org/10.11947/J.AGCS.2018.20180104
Manipulation and Control
Design and Implementation of Autonomous Navigation System Based on Tracked Mobile Robot Hui Li, Junhan Cui, Yifan Ma, Jiawei Tan, Xiaolei Cao, Chunlong Yin, and Zhihong Jiang(B) School of Mechatronical Engineering, Beijing Advanced Innovation Center for Intelligence Robots and System, Beijing Institute of Technology, Beijing 100081, China [email protected]
Abstract. In this paper, we introduce an autonomous exploration and rescue robot system based on a tracked mobile robot platform equipped with a 7 degree-of-freedom (DoF) manipulator, which realizes autonomous navigation in indoor environments and autonomous stair climbing for safe and efficient search and rescue tasks. In Sect. 2, the hardware design of the robot system is presented, which allows flexible movement and high passability to complete obstacle crossing and stair climbing. In Sects. 3 and 4, the indoor navigation algorithm and the stair detection algorithm of the robot system are presented, respectively. The ROS-based system uses the cartographer algorithm for map construction, ROS navigation for autonomous navigation and obstacle avoidance, and a depth camera for stair detection. The process of a fourflipper tracked mobile robot stair climbing is designed. The robot system is experimentally verified in Sect. 5. Keywords: Tracked robot Autonomous stair climbing
1
· SLAM · Path planning · Navigation ·
Introduction
Mobile rescue robots play an extremely important role in search and rescue efforts for building fires and other situations. When building structures and fire conditions are not clear, firefighters who rashly enter a building on fire for rescue may face huge unknown risks; In response to this situation, we have designed a robot that can autonomously search and rescue in an indoor environment. The robot can perform SLAM mapping, autonomous navigation and movement in such indoor environment. In addition, it has the capability of autonomously climbing stairs to achieve autonomous exploration and rescue between multiple floors. Supported by the National Natural Science Foundation of China (61733001, U22B2079, 61873039, 62103054, U1913211, U2013602, 62273049). c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 329–350, 2023. https://doi.org/10.1007/978-981-99-0617-8_23
330
H. Li et al.
At present, Indoor navigation methods of mobile robots are mostly based on wheeled robots [2], but wheeled robots have many limitations, such as being unable to complete some obstacle-crossing operations in the indoor environment, and being unable to complete climbing stairs to explore multiple floors. much research on stair-climbing robots does not involve the navigation method of indoor mobile robots as well. For example, the article [1,5] proposes an autonomous stair climbing method based on tracked mobile robots. In addition, some methods for stair detection have not been applied to mobile robots, such as the stair detection algorithm proposed in the article [9,10] for disabled people carrying wearable devices. In view of the existing problems, we use a tracked mobile robot with four flippers to carry out indoor rescue exploration tasks. the tracked robot with high-power motor could move quickly in an indoor environment, and due to its differential configuration, it can turn around and change direction in a narrow space, so it has high dexterity and can complete narrow obstacle avoidance tasks in space. At the same time, because the tracked robot has flippers at the front and rear, it also has high passability under the promise of satisfying dexterity and can complete tasks such as climbing stairs and crossing various obstacles. The four-flipper tracked mobile robot has excellent performance, this paper proposes a method of autonomous mobile navigation and autonomous stair climbing based on it. The proposed method realizes the robot’s autonomous navigation, obstacle avoidance, and autonomous stair climbing in the indoor environment so that the robot can complete detection and rescue tasks in various complex environments. This paper assumes that the robot is located in a multi-story indoor environment, and the task objectives are as follows: (i) Detecting the floor and generating a floor map. (ii) Rescue would be carried out while the situation is found during the detection process, otherwise, entering the stairwell and carrying out detection on the next floor. (iii) Detecting stairs, completing the task of autonomous stair climbing, and entering the next floor for exploration. In the second section, the hardware components of the whole robot system and the experiment environment are mainly described. In the third and fourth sections, the autonomous navigation algorithm and the autonomous stair-climbing method of the robot are introduced. In the fifth section, the process and results of the whole experiment are presented.
2
Robot System Design
The four-flipper tracked mobile robot designed in this paper is used for the rescue of simulated fire buildings. The robot system consists of a four-flipper tracked robot platform, a 7-DoF redundant manipulator and sensors. The overall structure of the robot is shown in Fig. 1. The robot system is driven by a vehicle-mounted 48 V lithium battery, with a weight of 180 kg. It adopts a four-flipper tracked robot platform and a humanoid manipulator configuration scheme. A four-flipper tracked robot platform is designed in this paper, two main tracks provide power, and four flippers
Design and Implementation of Autonomous Navigation System
331
Fig. 1. The overall appearance of the robot.
are used as an auxiliary, which can make the tracked platform have high passability and stability. The tracked platform is 986 mm long, 731 mm wide, 306 mm high, and 623 mm in length of the flipper. The left and right tracks are driven by two 2000 W tracked motors, and the flipper are driven by two 400 W motors. The industrial computer of the robot system designed in this paper is the Nvidia Jetson AGX Xavier computer, and the operating environment is Ubuntu18.04, ROS melodic. The entire robot system is developed based on the ROS system to implement algorithms such as SLAM mapping, autonomous navigation, and stair detection. The hardware framework and communication method of the robot system designed in this paper are shown in Fig. 2. In the robot system designed in this paper, the sensors used in the navigation and detection algorithms are lidar, IMU, and depth camera. The lidar is Velodyne VLP-16, the IMU is Xsens MTI-630, and the depth camera is Microsoft Kinect v2. Lidar and IMU are used for SLAM mapping of robots, mobile navigation, and obstacle avoidance, IMU is also used for state feedback when robots climb stairs, and depth cameras are used for the detection and recognition of stairs.
332
H. Li et al.
Fig. 2. The hardware framework and communication method of the robot system.
3
Mapping and Navigation Algorithm
In this paper, the cartographer algorithm developed by Google was adopted to complete the SLAM mapping of the robot in the indoor environment. Based on the ROS system, the localization of the cartographer and the move base developed by [2] were used to realize the mobile navigation of the robot in the indoor environment. 3.1
Cartographer Algorithm
Google Cartographer algorithm adopts the theoretical framework of SLAM based on graph optimization. It is mainly composed of Local SLAM and Global SLAM. The Local SLAM is responsible for scan-to-submap and submap insertion, the Global SLAM is responsible for optimizing pose estimation, and the branch and bound method is used to complete the global loop closures detection. The mathematical model of the Cartographer algorithm is expressed as: xk = f (xk−1 , uk ) + wk
(1)
Design and Implementation of Autonomous Navigation System
zk,j = h(yj , xk ) + vk,j
333
(2)
Equation (1) is called the motion equation, and the current position xk is calculated by the position xk−1 at the previous moment and the sensor data uk . Equation (2) is called the observation equation, which represents the observation data zk,j generated when the landmark yj is observed at the position xk . The cartographer algorithm introduces the concept of a submap in the Local SLAM part. When the Local SLAM module receives a new frame of lidar data, it uses odometry and IMU data to calculate the trajectory to obtain the robot pose estimate. Then the robot pose estimate is used as the initial value to match with the newly established submap, to complete the update of the robot pose estimate. The filtered frame data is inserted into the best position of the submap to form a new submap. When a new laser data frame arrives, the coordinate transformation needs to be done which transforms the laser data coordinate to the submap coordinate before inserting the new laser data frame into submap. cos ξθ − sin ξθ ξx p+ (3) Tξ p = sin ξθ cos ξθ ξy The front end of the Cartographer algorithm, scan matching is based on the scanto-map correlation method [8], modeled by the Maximum Likelihood Estimation: x∗k = arg max{p(zk |xk , yk )p(xk |xk−1 , uk )} xk
(4)
Since the laser data of each frame is independent to each other, it can be considered to be uncorrelated, then: p(zki |xk , yk ) ∝ log p(zki |xk , yk ) (5) p(zk |xk , yk ) = i
i
Equation (5) splits the probability of the entire observation data of one frame into the probability of each point of the current frame data. Before calculating this probability, the data frame and map need to be discretized. The process for updating an occupancy grid map is as follows: odds(x) =
p(x) 1 − p(x)
Mnew (x) = clamp[odds−1 (odds(Mold (x)) · odds(Phit ))]
(6) (7)
When new data frames additions are finished, the submap is complete. But maps generated by Local SLAM module alone have a large cumulative error over time. The Global SLAM module is responsible for optimizing the pose estimation [7], and the branch and bound method is used to complete the global loop closure detection [6].
334
3.2
H. Li et al.
Trajectory Planning Algorithm
The whole path planning process is devided into two parts. Firstly, a general path is planned by the global path planner, and then the local path planner divides it into several small segments and performs local path planning. The advantage is that the obstacles saved on the map can be avoided during global planning, and new obstacles (including dynamic obstacles) can be avoided during local path planning. Global path planning and local path planning cooperate to complete the navigation together. The Dijkstra algorithm and the A* algorithm are used for global path planning, and Dynamic Window Approach(DWA) is used for local path planning [4]. Global Path Planner. The global path planner mainly receives cartographer positioning information, global map information, and navigation end-point information. The Dijkstra algorithm and A* algorithm are mainly used in the shortest path planning in 2D grid maps, and the global path planning algorithm used in this paper is the A* algorithm. Dijkstra algorithm is a typical breadth-first-based shortest path search algorithm, which is used to calculate the shortest path from a specific vertex to all other vertices in a graph or network. The main feature of this algorithm is that it starts from one vertex and expands layer by layer until the expansion covers all vertices. Dijkstra algorithm is essentially a greedy algorithm, and the key is to obtain a local optimal solution in each step in order to expect a global optimal solution. Dijkstra algorithm will calculate the shortest path from the starting vertex to every vertices in the graph. Such a search is blind and consumes a lot of computing power and time. In order to solve the blindness of the search direction in the Dijkstra algorithm, the A* algorithm is improved by a heuristic function. The A* algorithm calculates the priority of each vertex by Eq. (8). f (n) = g(n) + h(n)
(8)
In the heuristic function of the A* algorithm, f (n) is the overall priority of node n. When selecting the next node to traverse, the node with the highest overall priority (least cost) is selected. g(n) is the cost of the node from the starting point and h(n) is the estimated cost of node n to the end point. The heuristic function h(n) affects the behavior of the A* algorithm. When h(n) = 0, the A* algorithm degenerates into Dijkstra algorithm; When h(n) is much greater than g(n), the A* algorithm becomes the best priority search algorithm, and the outputs cannot be guaranteed to be the shortest path; When h(n) is always less than or equal to the cost of node n to the end point, the A* algorithm guarantees that the shortest path will be found. The navigation map used in this paper is a 2D grid map generated by the cartographer algorithm,
Design and Implementation of Autonomous Navigation System
335
Algorithm 1: A* algorithms Input: Start, End. Output: Path. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
initialization Start and End; while True do if open list == NULL then Search failed, return; end Take the g(n) + h(n) smallest node in the open list Add the node to the closed list; if current node == End then Find the path, return; end Traverse current node neighbors that are not in the closed list; if node in open list then Update node of the g(n) ; else Calculate the g(n) of the node; add open list; end end
therefore the Euclidean distance from the current point to the target point can be used as the heuristic function h(n) 2 2 h(n) = (p2 .x − p1 .x) + (p2 .y − p1 .y) (9) Under the same grid map, the Dijkstra algorithm and the A* algorithm are used for path planning, as shown in Fig. 3 and Table 1. Table 1. Dijkstra and A* algorithm comparison Trajectory length
Point 1 Point 2 Point 3
Dijkstra algorithm 192.0
220.4
255.9
A∗ algorithm
192.0
220.4
255.9
Planning time
point 1 point 2 point 3
Dijkstra algorithm 22.309 s 30.545 s 33.493 s A∗ algorithm
2.065 s
8.908 s
13.629 s
It can be seen from the simulation that comparing the two path planning algorithms, although the generated path is not completely consistent, the length of path is completely consistent, and both are the shortest path result. From the perspective of planning time, the planning time of the A* algorithm is much
336
H. Li et al.
Fig. 3. Dijkstra algorithm and A* algorithm comparison.
lower than that of the Dijkstra algorithm. Therefore, it can be concluded that the A* algorithm can greatly shorten the planning time without affecting the path planning performance. Local Path Planner. The local path planner receives occupancy grid map generated by mapping algorithm and global path generated by the global path planner, and completes the local path planning based on the DWA, and then outputs the speed command of the tracked mobile platform movement. DWA algorithm is a typical action sampling algorithm, which can avoid new obstacles and dynamic obstacles to solving the local obstacle avoidance problem of the robot. The DWA algorithm considers the motion constraints of the robot velocity and acceleration. It forms a velocity vector space composed of linear velocity and angular velocity, and samples the velocity space to simulate the
Design and Implementation of Autonomous Navigation System
337
trajectory, and then scores the simulated trajectory through a predetermined cost function, and then selects the trajectory with the highest score as the output. Then the speed command is obtained for the difference of the trajectory. Choosing a velocity space considers the three main constraints: – Both linear and angular velocity of robot motion are limited by the maximum and minimum velocity. – The dynamic performance of the motor is limited, and the motor power that drives the robot is limited. – Restrictions on braking safety distances. The kinematic model of the tracked mobile platform in this paper is adopted a differential drive kinematic model, i.e., it can only move forward, backward and rotate. The kinematic model is shown in Eq. (10). t x(tn ) = x(t0 ) + ton v(t) cos(θ(t))dt tn (10) y(tn ) = y(t0 ) + to v(t) sin(θ(t))dt tn θ(tn ) = θ(t0 ) + to ω(t)dt Under the control in several discrete time Δt, the state of the tracked mobile platform changes as follows:
2 (v (ti ) + v˙ i Δt) cos θ (ti ) + ω(ti )Δt + 12 ω˙ i (Δt) dt i=0
n−1 ti+1 2 1 y(tn ) = y(t0 ) + (v (t ) + v ˙ Δt) sin θ (t ) + ω(t )Δt + ω ˙ (Δt) dt i i i i i 2 ti x(tn ) = x(t0 ) +
n−1
ti+1 ti
i=0
(11) At each moment Δt, the linear and angular velocity are sampled to calculate the simulated trajectory, as shown in Fig. 4: v(ti ) ∈ ω(ti ) ∈
vmin , vmin + Δv, · · · , vmax
ωmin , ωmin + Δω, · · · , ωmax
, vmin,max = v(ti−1 ) ∓ v˙ max Δt
(12) , ωmin,max = ω(ti−1 ) ∓ ω˙ max Δt
Several trajectories generated are evaluated and scored, and the evaluation functions used are: G(v, ω) = σ(α · heading(v, ω) + β · dist(v, ω) + γ · velocity(v, ω))
(13)
In (13), heading(v, ω) is the azimuth evaluation term used to evaluate the gap between the end point of the trajectory and the current robot orientation. In general, choosing trajectory with smaller azimuth makes the robot walk as straight as possible. dist(v, ω) is the trajectory distance evaluation term that represents the distance between the trajectory and the nearest obstacle. A higher value indicates that the trajectory is safer dist(v, ω) is the speed evaluation term. α,
338
H. Li et al.
Fig. 4. Simulated trajectory after velocity and angular velocity sampling.
β, and γ are the weighting coefficients of the three evaluation term. The physical meaning is that it is expected that the mobile platform can avoid obstacles, face the target, and drive at a high speed. DWA’s algorithm flow can be summarized as: The simulation results are shown in Fig. 5.
Algorithm 2: DWA algorithms Input: Start, End. Output: Speed control. 1 2 3 4 5 6 7 8 9 10 11
Initialization (maximum and minimum velocity of mobile platform, evaluation function weight, etc.); while True do if Arrived then Reach the target point, return; end Calculate the speed range of the current sample (dynamic window); Traverse all v,ω and simulate trajectories; The evaluation function is scored; Select the optimal v, ω and send it to the mobile platform; move; end
Design and Implementation of Autonomous Navigation System
339
Fig. 5. DWA algorithm simulation.
4
Autonomous Stair Climbing Algorithm
In this paper, we assume a scenario that the robot needs to go to the next floor after exploring one floor. Arriving near the stairs, once the depth camera carried by the robot detects the nearby stairs, the robot judges the stairs data output from the stair detection algorithm. If the length, width, and height of the stairs meet the conditions for robot climbing, the robot is navigated directly in front of the first step of the stairs, completes the stairs climbing preparatory posture, and then begins to climb the stairs autonomously. This section mainly introduces the stair detection algorithm and the robot climbing strategy. 4.1
Stair Detection Algorithm
The stair detection algorithm bases on the point cloud photos taken by the Kinect depth camera, and still uses ROS as a framework to process the point cloud data by using the Point Cloud Library (PCL). The point cloud data taken by the Kinect camera is too large, so in order to avoid affecting the calculation speed of the algorithm, it is necessary to carry out pre-processing first. The voxel filtering and statistical filtering are performed on the point cloud data to compress them, reduce computation load, and smooth the surface by eliminating the outlier points. After preprocessing the point cloud data, the stair detection algorithm is courried out next. First, the point cloud photos are segmented to divide all the planar regions of the scene, and then these planar regions are classified and the planar regions identified as “steps” are modeled. Specifically, the algorithm is as follows (Fig. 6):
340
H. Li et al.
(a) Normal estimation
(b) Region growth
(c) Planar test
(d) Plane clustering
(e) Plane classification
(f) Stair modeling
Fig. 6. Point cloud processing methods.
Normal Estimation. The normal estimation [11] bases on local surface fitting method. In the point cloud, for any point pi , its neighborhood pj ⊂ N bhd(pi ) is obtained by the KnnSearch method (K is set to 16 in this paper), and the centroid of the neighborhood is calculated: oi =
k 1 pj k j=0
(14)
After obtaining the centroid of the neighborhood, the covariance matrix is constructed as follows: cov =
1 k
(pj − oi ) · (pj − oi )
T
(15)
pj ⊂N bhd(pi )
The SVD decomposition is performed on the covariance matrix, and the eigenvector corresponding to the obtained smallest eigenvalue is the normal ni of the point. Region Growth. The region is obtained by using the region growth method. The region growth method starts with a seed, which is the point with the minimum curvature, and then expands the region towards neighboring points with small angle between the normals and similar curvature values. The new seeds are updated as the neighboring points that meet the normal and curvature thresholds. And repeat this process until the region can no longer expand. Then, a new
Design and Implementation of Autonomous Navigation System
341
initial seed is selected among the remaining points, and the process starts again until the regions are smaller than a certain threshold. Based on this method, the point cloud image is divided into several regions, which is convenient for subsequent detection and classification of each region. Planar Test. The detected object are steps, floors and walls that all of which are flat, so the planar test is required for several areas divided by using the regional growth method. Due to the work way of the region growth algorithm, most regions are highly flat, but it cannot be ruled out that the regions are a surface with a small curvature. The RANSAC algorithm [3] is used to perform the plane test by finding the largest plane in each region, if the number of outer points is less than a threshold, it will be considered a plane and the fitted plane equation is obtained. Plane Clustering. According to the position relationship of each region plane, the plane clustering is performed. If the difference in plane normal vectors between two regions is small, the vertical distance is small, and their boundaries are adjacent, it is considered as a planar region. At the same time, according to the direction of the normal vector of the plane, only the horizontal and perpendicular planes to the ground are retained, and other planes are regarded as obstacles without performing the subsequent plane classification. Plane Classification. The existing regions are all horizontal or vertical planes, which are divided into different categories according to the normal vector and relative position of the planes. The planes that is close to 0◦ to the ground are preliminarily defined as “steps”, and the planes that are close to 90◦ to the ground are preliminarily defined as “wall”. And then these regions are further divided through their position relations. Areas with a plane height of approximately zero are classified as floors, and areas with positive or negative heights are classified as steps. This paper sets the height range of the step to the minimum Hmin = 13 cm and maximum Hmax = 18 cm, with a allowable measurement error Htol = 3 cm, and the plane height above the ground Hmin − Htol = 10 cm is regarded as the candidate plane for the step. After having a step candidate plane, it is necessary to establish a connection between each step, which can eliminate non-step planes and also allow the steps to form a hierarchy. The strategy of this paper is to analyze each step from the bottom up: – The first is to screen out the first step and determine whether the plane height is between Hmin − Htol = 10 cm and Hmax + Htol = 18 cm, and meet this condition as the first step. If there are multiple planes that are satisfied, connectivity of that plane to the upper and lower layers is required.
342
H. Li et al.
– Once the first step is in place, the algorithm selects the remaining step candidates by height and begins testing connectivity and height conditions to determine whether they belong to the current layer or to the new layer. – When all the candidates have been checked, if the number of floors is greater than 1, the algorithm starts modeling the stairs and outputs the stair parameters (length, width and height). Stair Modeling. This paper refers to the method used in [10] to model the stair. The method makes full use of the geometric relationship of each plane to constrain the length, width and height of the stairs, uses the PCL library to draw and display the stair model, at the same time output the parameters of the stair. 4.2
Stair Climbing Strategy
Through the stair detection algorithm in the previous section, the length, width and height (Lstair , Wstair , Hstair ) of the stairs and the coordinates of the center Camera ) can be point of the first stair in the depth camera coordinate system (Pstair obtained. After obtaining the stair information, If Eq. (16) is satisfied, robot can begin to climb the stairs (Fig. 7).
Fig. 7. Simulation scene graph.
⎧ ⎨ θstair = Hstair/Wstair < θmax = 40◦ L > 1.5Wrobot ⎩ stair Hstair < Hmax = 50 cm
(16)
Design and Implementation of Autonomous Navigation System
343
Robot If the detected stair meets the climbable criteria, a navigation point Pnav is published, the robot navigates to the target point and enters the state of readiness to climb the stairs. The navigation point is determined by the following equation: Robot Robot Camera Camera = TCamera Pnav = Pstair + d n Pnav
(17)
Robot where Pnav is the coordinate of the navigation point in the robot’s coordiCamera is the coordinate of the navigation point in the camera nate system; Pnav Camera is the coordinates of the first stair in the camera coordinate system; Pstair coordinate system; n is the unit normal vector of the vertical plane of the first Robot represents the transformation stair in the camera coordinate system; TCamera matrix from the camera coordinate system to the robot coordinate system; d represents the distance directly in front of the stairs where the navigation point is located. After the robot enters the preparatory climbing point, it begins to climb the stairs, the process is shown in Fig. 8.
Fig. 8. Preparation and ascent phases.
As shown in Fig. 8(a), the robot navigates to the front of the stairs and adjusts the inclination angle of the flippers to the inclination angle of the stairs; The robot advances, reaches the state of Fig. 8(c), the whole robot cling to the steps, at this time the robot inclination angle is the angle of the stairs, exceeding the threshold angle, and the front flipper returns to the horizontal position; As shown in Fig. 8(d), the robot’s flipper are placed horizontally, increasing the length of the robot on the stairs to improve the stability of the robot’s climbing. During the landing state, the control of the front flipper let the robot to land smoothly after crossing the last step. The strategy adopted in this paper is using current loop to control the front flipper motor during the landing stage, and the
344
H. Li et al.
Fig. 9. Landing phase.
advantage is the front flipper can naturally sag to make the robot’s center of gravity as far forward as possible before the robot lands; At the moment of landing, the front flipper is close to the ground to avoid impact on the robot, and at the same time, it can be slowly rotated to make the robot land smoothly. As shown in Fig. 9(a), when the stair detection algorithm cannot detect the stairs, it means that the robot is about to enter the landing state, and the front flipper enters the damping mode and begins to hold the ground. Equation (18) gives the expected current of the front flipper motor, so that the front flipper can just overcome the friction of the reducer and rotate to ensure that the front flipper can always be close to the ground. After crossing the equilibrium point, the damping of the front flipper can effectively reduce the robot’s falling speed. Iref =
(Tf + ΔT ) − Dq˙ KT N
(18)
where, KT is the coefficient of electric current torque, N is the deceleration ratio of the reducer, Tf is the friction torque of the flipper, ΔT is the margin after the offset torque overcomes the friction torque, q˙ is the angular velocity of the flipper, and D is the damping coefficient.
5
Experiment
The experimental part of this paper mainly simulates the detection process of the robot on one floor (SLAM mapping and navigation) and the process of climbing the stairs to the next floor after discovering stairs.
Design and Implementation of Autonomous Navigation System
5.1
345
SLAM Mapping and Navigation Experiment
The experimental scene is in a circular corridor outside the laboratory, where the robot will perform 2D SLAM mapping based on cartographer. The experimental results are shown in Fig. 10:
Fig. 10. Cartographer map and actual scene.
Table 2. Cartographer measurement data Index Figure measured value/m 1 6.01 2 9.98 3 10.04 4 10.10 5 10.00 6 13.80 7 10.38 8 10.38 9 14.65 10 2.00
The measured value/m 6.18 10.15 10.15 10.15 10.15 13.93 10.63 10.50 14.86 1.95
Absolute error/m 0.17 0.17 0.11 0.05 0.15 0.13 0.25 0.12 0.21 0.05
Relative error/% 2.75 1.67 1.08 0.49 1.48 0.93 2.35 1.14 1.41 2.56
From Table 2, the average relative error of 2D map of the robot system in indoor environment is 1.57% and the maximum relative error is 2.75%, and wall boundary and obstacle boundary are sharply demarcated. After completing the SLAM mapping, the unmanned navigation operation is carried out indoors, and the experimental content is to complete the fixed-point
346
H. Li et al.
navigation and obstacle avoidance experiments: In the robot navigation and obstacle avoidance experiment, the robot completed the obstacle avoidance of the original obstacles and new obstacles on the map, smoothly reached the designated target point, and completed the robot’s navigation and obstacle avoidance task in the indoor environment (Fig. 11).
Fig. 11. Navigation and obstacle avoidance experiment diagram.
In this experiment, the robot completes the SLAM mapping of the corridor in the indoor environment, and can complete the trajectory planning and autonomously navigate to the target point, which verifies the robot’s ability to unmanned autonomous exploration in the indoor environment.
Design and Implementation of Autonomous Navigation System
5.2
347
Stair Detection Experiment
After the robot completes the exploration of this floor, it goes to the staircase, and in the process of approaching the staircase, the stair detection algorithm detects the staircase and calculates the coordinates of the length, width and height of stairs and the center point of the stairs in the coordinate system of the camera (Fig. 12).
Fig. 12. Stair detection experiment. Table 3. Stair detection data. Length/m Width/m Height/m Angle/◦ Algorithm detection value 1.645
0.287
0.161
29.29
The measured value
1.600
0.270
0.150
29.05
Absolute error
0.045
0.017
0.011
0.24
Relative error
2.81
6.29
7.33
0.83
It can be seen from Table 3 that the absolute error between the detected stair parameters and the actual stair parameters is within 5 cm, the relative error is within 8%, and the calculated stair inclination error is 0.24◦ , and the relative error is only 0.83%, indicating that the stair detection algorithm in this paper has high accuracy. 5.3
Stair Climbing Experiment
The stair parameters obtained by the stair detection algorithm meet the condition of Formula (16), that is, the condition of climbing the stairs, and the robot can climb the stairs autonomously. The robot navigates to the preparation point of climbing the stairs and starts to climb the stairs. The experimental process is shown in Fig. 13, 14: Experiments show that the autonomous stair climbing algorithm based on four-flipper tracked mobile robot proposed in this paper can effectively complete the function of autonomous stair climbing (Fig. 15).
348
H. Li et al.
Fig. 13. Preparation and ascent phases.
Fig. 14. Landing phase.
Fig. 15. Angle of robot.
Design and Implementation of Autonomous Navigation System
6
349
Summarize
In this paper, a tracked rescue robot system is designed for the scene of building rescue, which can be targeted to complete the unmanned autonomous detection and rescue in the indoor environment, and can detect the stairs to complete the task of autonomous stair climbing. Based on the cartographer algorithm, the SAM mapping in indoor environment is realized by using lidar and IMU, and the autonomous navigation function of tracked mobile robot is realized based on ROS system. The stair detection algorithm in literature [10] is improved and applied to the detection of stairs by mobile robots during driving, which has higher accuracy and stability. Finally, as the robot fit with the stair detection algorithm and the robot navigation algorithm, a strategy for climbing stairs on the four-flipper tracked robot platform is proposed, which realizes the function of the tracked mobile robot to climb the stairs autonomously. The robot system designed in this paper provides a new tool for the search and rescue work in case of building fire, and plays a role in protecting people’s lives and property.
References 1. Chen, B., Wu, J., Wang, F., Yang, D., Zhang, W.: Motion planning for autonomous climbing stairs for flipper robot. In: 2020 IEEE International Conference on Realtime Computing and Robotics (RCAR), pp. 531–538 (2020). https://doi.org/10. 1109/RCAR49640.2020.9303039 2. Deng, Y., Shan, Y., Gong, Z., Chen, L.: Large-scale navigation method for autonomous mobile robot based on fusion of GPS and lidar SLAM. In: 2018 Chinese Automation Congress (CAC), pp. 3145–3148 (2018). https://doi.org/10.1109/ CAC.2018.8623646 3. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981) 4. Fox, D., Burgard, W., Thrun, S.: The dynamic window approach to collision avoidance. IEEE Robot. Autom. Mag. 4(1), 23–33 (1997) 5. Guo, J., Shi, J., Zhu, W., Wang, J.: Approach to autonomous stair climbing for tracked robot. In: 2017 IEEE International Conference on Unmanned Systems (ICUS), pp. 182–186 (2017). https://doi.org/10.1109/ICUS.2017.8278337 6. Hess, W., Kohler, D., Rapp, H., Andor, D.: Real-time loop closure in 2D lidar SLAM. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1271–1278 (2016). https://doi.org/10.1109/ICRA.2016.7487258 7. Konolige, K., Grisetti, G., K¨ ummerle, R., Burgard, W., Limketkai, B., Vincent, R.: Efficient sparse pose adjustment for 2D mapping. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 22–29 (2010). https://doi.org/ 10.1109/IROS.2010.5649043 8. Olson, E.B.: Real-time correlative scan matching. In: 2009 IEEE International Conference on Robotics and Automation, pp. 4387–4393 (2009). https://doi.org/ 10.1109/ROBOT.2009.5152375 9. Perez-Yus, A., Guti´errez-G´ omez, D., Lopez-Nicolas, G., Guerrero, J.: Stairs detection with odometry-aided traversal from a wearable RGB-D camera. Comput. Vis. Image Underst. 154, 192–205 (2017)
350
H. Li et al.
10. P´erez-Yus, A., L´ opez-Nicol´ as, G., Guerrero, J.J.: Detection and modelling of staircases using a wearable depth sensor. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8927, pp. 449–463. Springer, Cham (2015). https:// doi.org/10.1007/978-3-319-16199-0 32 11. Qin, X.-J., Hu, Z.-T., Zheng, H.-B., Zhang, M.-Y.: Surface reconstruction from unorganized point clouds based on edge growing. Adv. Manuf. 7(3), 343–352 (2019). https://doi.org/10.1007/s40436-019-00262-5
Towards Flying Carpet: Dynamics Modeling, and Differential-Flatness-Based Control and Planning Jiali Sun1 , Yushu Yu1(B) , and Bin Xu2 1
School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China {1120193057,yushu.yu}@bit.edu.cn 2 School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China [email protected]
Abstract. Aerial systems assembled from multiple modules have the potential to perform tasks that are difficult for traditional aerial vehicles. In this work, we propose a novel aerial system, Carpet, which is composed by a matrix of quadrotor modules. In Carpet, a column of quadrotor modules connects to the neighbouring column via revolute joints. By adjusting the joint angles, Carpet is capable of working in flight mode and terrestrial mode. This is the first time that UAV assembly with this configuration is analyzed. The terrestrial mode could potentially increase the energy efficiency of aerial systems. While in flight mode, Carpet can fold up or expand to adapt to different tasks or environments, e.g., curling up to U form can make it more compact and agile when passing through narrow channels. The dynamics of Carpet is investigated. It is found that the entire Carpet in flight mode can produce 5D force/torque, and the joints connecting columns do not need extra actuators. The differential flatness of Carpet is analyzed. The motion controller and the trajectory generator are designed based on the differential flatness. The proposed trajectory planning method is able to guide Carpet to pass through narrow corridors. Numerical simulations are presented, illustrating the feasibility of the proposed Carpet. Keywords: Reconfigurable aerial systems · Differential flatness control · Minimum-snap trajectory planning
1 1.1
Introduction Motivation and Background
Aerial vehicles have been applied in many aspects, e.g., detection [1], mapping [2], manipulation [3–6]. The control, perception and planning techniques of the Supported by the National Natural Science Foundation of China under Grant 62173037, National Key R. D. Program of China, and State Key Laboratory of Robotics and Systems (HIT). c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 351–370, 2023. https://doi.org/10.1007/978-981-99-0617-8_24
352
J. Sun et al.
traditional aerial vehicle get unprecedented progressed in recent years. However, there still exist mission profiles that are challenging for current aerial vehicles. For example, traditional aerial vehicles are difficult to pass through narrow corridors and integrate a larger payload simultaneously. The energy efficiency of traditional aerial vehicles is also not optimized compared to the ground vehicles. This is properly induced by the fact that the design procedure present certain trade-offs between payload and endurance or size. In recent years, modular aerial vehicles are widely adopted to compose novel multi-functional aerial robots. Such robots assembled from modules have the potential to increase the maneuverability and manipulation ability, and thus enrich the mission profiles of the traditional aerial vehicles. Lee et al. proved that a load can be transported by at least three quadrotors in SE(3) via cables [7]. According to such principle, a wide range of aerial manipulators composed of multiple aerial vehicle modules is developed. Via et al. proposed a configuration composed of three aerial vehicles and three rigid links which provide connections to the load [8]. Lee et al. proposed an aerial manipulator actuated by the thrust of multiple aerial vehicles [9]. Franchi et al. developed a fly-crane, composed of three aerial vehicles towed to a platform by means of six cables [10]. The above researches significantly advance the manipulation ability and payload integration of the aerial vehicles. However, there is still a gap in terms of passing through narrow corridors and energy efficiency. Some researchers developed the aerial vehicles inspired by the idea of reconfigurable mechanism, the examples include [11–15]. The reconfigurable assembly of the aerial vehicle modules can pass through challenging environments by transforming the configuration of the assembly. Previous work also discusses the possibility of letting the aerial vehicles work in terrestrial mode to enhance the energy efficiency [16,17]. This work presents a novel flying Magic Carpet (“Carpet”) that is composed of multiple vehicle units as shown in Fig. 1. Each vehicle unit in Carpet is connected to neighbouring units via joints. Because of such novel structure, Carpet can curl up to different degrees within certain range. By altering the degrees, Carpet is capable of working in different modes, i.e., the flying mode and terrestrial mode. Carpet is able to pass narrow corridors by folding itself in flight mode. While in terrestrial mode, it can roll on ground. The configuration of Carpet also enables the system to integrate payload by increasing the quantity of the units.
Fig. 1. Conception of Carpet in different modes.
Towards Flying Carpet
1.2
353
Contributions
This paper contributes to the modeling, control and planning of Carpet. In particular, the dynamics of the novel system in flight mode and terrestrial mode is presented in detail. When modeling the dynamics, each aerial vehicle unit is considered as a link. Newton-Euler iteration formula and Lagrange approach are adopted in the modeling process. We will show that Carpet in flight mode as an integral can be actuated by 2D forces and 3D torques. Compared to the traditional quadrotor aerial vehicle, the proposed Carpet can therefore adjust position in 3 direction and attitude in 2 direction simultaneously. Besides, we will show that the joint motion can be actuated by rotor-forces provided by the multi-rotor units. Furthermore, the differential flatness of the system in flight mode is analyzed. The model-based feedback linearization controller for both flight mode and terrestrial mode is designed. The proposed Carpet can roll up by rotating the units around joints to pass through narrow corridors. The trajectory of such motion is planned in the flat output space, leading to a motion primitive represented by polynomials. To the best of the authors’ knowledge, this is the first time that the systematic modeling, control and planning method of the proposed novel system Carpet is investigated. The proposed Carpet has the potential to outperform the traditional aerial vehicles or quadrotor module assembly in terms of passing through narrow corridors, payload expandability, and energy efficiency. This paper consists of five sections. The configuration and the dynamics of Carpet in flight and terrestrial mode are presented in Sect. 2. The differential flatness, the flatness-based controller, and the trajectory generation method of Carpet are investigated in Sect. 3. Numerical simulation results of Carpet under the proposed control and planning approach are shown in Sect. 4.
2 2.1
Modeling Configuration of Magic Carpet
Carpet is actually a matrix of rotors. In our design, several quadrotors in a column form a unit, and multiple units in a row form Carpet that we see in Fig. 2. Each unit can be regarded as a multi-rotor vehicle, no matter how many rotors it contains. Two neighboring units are connected by hinges, which are indicated by the grey pads like thing in the figure, which allows Carpet to curl up to different degrees within certain range. In the flight mode as shown in Fig. 3, the units can rotate around the hinges, which serve as the revolute joints. The joint angles can be adjusted to adapt to different tasks, e.g., Carpet curls in to a U form, which will make it more compact and agile when passing through narrow channels. While in the terrestrial mode as shown in Fig. 4, Carpet curls in an exaggerated way so that the two ends of Carpet could touch each other and then be locked by a special mechanism. Therefore, Carpet is now in the form of a pentaprism. To make the rolling motion
354
J. Sun et al.
Fig. 2. The top view of Carpet.
Multi-rotor units
Revolute joints
Fig. 3. Configuration of Carpet in flight mode.
Multi-rotor units
Locked joints
Fig. 4. Configuration of Carpet in terrestrial mode.
Towards Flying Carpet
355
smoother and easier, additional supporting arcs will be added to each unit, thus the pentaprism can roll like a cylinder. From the configuration of Carpet, it is seen that each unit is regarded as a multi-rotor vehicle, although actually one unit contains multiple quadrotors. Then in modeling Carpet, the input of each multi-rotor unit is a 4D force: 1D net thrust and 3D torque. 2.2
Dynamic Modeling in Flight Mode
For brevity, we will use a Carpet including three units to show the general modelling process. The modeling process is the same for a Carpet with more than three units. In flight mode, state variables must be properly chosen so that the system configuration can be determined in a more intuitive manner. The coordinate systems and unit numbers are shown in Fig. 5, where {E} represents the earth frame and {B} the body frame. Here we decide E to express the system configuraRp ∈ SE(3) and the joint angles tion with the pose of the middle unit hB = B 0 1 β = [β1 , β2 ]T ∈ R2 , where p = (x, y, z)T ∈ R3 is the position of the middle unit, E B R ∈ SO(3) represents the attitude of the middle unit. Z − Y − X Euler angles Φ = [φ, θ, ψ]T are used as a supplement of the rotational matrix E B R. The configuration of Carpet with three units is therefore expressed by q = [p; Φ; β] ∈ R8 .
Fig. 5. The coordinate frames of Carpet (three-units) in flight mode.
By observation, it can be easily found that Carpet resembles a floating-base manipulator with each unit being a link. The Newton-Euler iteration approach is adopted to derive the dynamics of the system. In the Newton-Euler iteration formulation for an n-link manipulator, it is typical operation to iterate forward from link 1 to link n to compute velocities and accelerations and then backward from link n to link 1 to compute joint torques. In our cases, where the body frame is set at the middle unit (also considered as the floating base), the iteration will be carried out bidirectionally from middle to the ends. No matter how many
356
J. Sun et al.
units will be used, the Newton-Euler approach can always be applied, which justifies the simplification on unit numbers. Remark 1. One unit is the rigid connection of multiple quadrotors, then it is equivalent to a multi-rotor. Therefore, in the modelling here it makes sense to simplify the several quadrotors in one column into one multi-rotor unit whose input is equivalent to 1D net thrust and 3D torque. Figure 6 demonstrates how external forces and torques act on Carpet. Each link in the system is regarded as a multi-rotor aerial vehicle which is actuated by a net thrust and 3-D torque produced by the multi-rotor. Here Ti ∈ R denotes the thrust, and τi ∈ R3 is the synthesized torque on unit i and Gi the gravitational force (i = 0, 1, 2).
Fig. 6. Force analysis of Carpet (three-units) in the flight mode.
In order to apply the Newton-Euler iteration approach for floating manipulators here, actuation forces and torques Ti and τi are transformed into base wrench (Fe , τe ) and joint torques ne = (n1e , n2e ), as shown in Fig. 7.
Fig. 7. Equivalent forces of Carpet (three-units) in the flight mode.
Towards Flying Carpet
The equivalent forces are calculated according to ⎧ ⎪ ⎨f2e = T2 z2 f1e = −T0 z0 ⎪ ⎩ Fe = T1 z1 − f1e + f2e The equivalent torques are ⎧ 2 ⎪ ⎨n2e = τ2 + PC2 × T2 z2 n1e = −τ0 + 0 PC0 × T0 z0 ⎪ ⎩ Me = τ1 − n1e + n2e + 3 PC3 × f3e + 2 PC2 × f2e
357
(1)
(2)
The dynamic equation of the system can thus be expressed in these notations as, q + Cf (q, q) ˙ + Gf (q) = τf (3) Mf (q)¨ where Mf (q) ∈ R8×8 , Cf (q, q) ˙ ∈ R8 , Gf (q) ∈ R8 are the mass matrix, centrifugal and Coriolis vector, and gravity vector respectively. They can be obtained by Newton-Euler approach. The mass matrix Mf (q) is symmetric positive definite. The equivalent input of the entire system τf can be written as ⎡ ⎤ 0 ⎢ ⎥ T0 sin β1 − T2 sin β2 ⎥ ⎡ ⎤ ⎢ ⎢ ⎥ Fe T1 + T0 cos β1 + T2 cos β2 ⎢ ⎥ ⎢Me ⎥ ⎢τ0x + τ1x + τ2x − aT0 + aT2 − aT0 cos β1 + aT2 cos β2 ⎥ ⎢ ⎥ ⎢ ⎥ τf = ⎣ ⎦ = ⎢ ⎥ n1e ⎢ τ1y + τ0y cos β1 + τ2y cos β2 + τ0z sin β1 − τ2z sin β2 ⎥ ⎢ ⎥ n2e ⎢ τ1z + τ0z cos β1 + τ2z cos β2 − τ0y sinβ1 + τ2y sin β2 ⎥ (4) ⎣ ⎦ aT0 − τ0x τ2x + aT2 ⎡ ⎤ T ⎢ τ1 ⎥ ⎥ := Hf (β) ⎢ ⎣ τ2 ⎦ τ3 where T = [T0 , T1 , T2 ]T ∈ R3 , and τi = [τix , τiy , τiz ]T (i = 0, 1, 2) constitutes the input, and Hf (β) ∈ R8×12 is the input allocation matrix. Remark 2. From the dynamics Eqs. (3) and (4), it is seen that the 3-unit Carpet in flight mode is not a fully actuated system, as its configuration space is 8 dimensional while it can only generate 7 independent corresponding equivalent input force. The translational motion is coupled with the rotational motion of Carpet. However, compared to the traditional multi-rotor aerial vehicle, the coupling is less. Moreover, the system is controllable because it is a differential flat system as we will show later. It does not need to add additional torques to actuate the joints. The net thrust and torque of each unit can provide equivalent joint torque, though this invites the problem of input boundedness which is out of the scope of this paper.
358
J. Sun et al.
Remark 3. The rank of the allocation matrix H(β) is 7 if β1 = 0, β2 = 0. When β1 = 0, β2 = 0, it is seen that the rank of the H(β) reduces to 6. In this case, the behaviour of Carpet is similar to a traditional multi-rotor UAV. Therefore, we can say that the configurations with β = 0 are singular configurations of Carpet. 2.3
Dynamic Modeling in Terrestrial Mode
In terrestrial mode, the ends of the side units are joined together to form a prism. In order to facilitate the rolling motion, additional arcs are added to form a ring-like structure as demonstrated in Fig. 8. The coordinate systems are shown, where {E} represents the earth frame, {C} represents the intermediate frame, and {B} the body frame with the origin located at the geometric center of the system.
Fig. 8. Coordinate frames in terrestrial mode.
The forces and torques acting on the system are generated by the rotors on the multi-rotor units, where F denotes the resultant force along the yc direction, τφ is the torque around xB axis, and τψ is the resultant torque around e3 direction. For simplicity, here we disregard the friction force and the aerodynamic force acting on the system. To reveal the relationship between the equivalent input and the real input as done in (4), we adopt the force and torque notations shown in Fig. 8. The equivalent force and torque in frame {C} is given as following, 1 2 0 T 0 T0 + 1 T1 ) FC = C B R( 2 R T2 + 1 R (5) 1 0 T τC = C B R( 2 Rτ2 + 1 R τ0 + τ1 )
Towards Flying Carpet
359
i T T where C B R = Rx (φ), Ti = [0, 0, Ti ] and τi = [τix , τiy , τiz ] (i = 0, 1, 2). Rewrite (5) as ⎤ ⎡ 0 ⎡ ⎤ ⎥ ⎢ F T ⎥ ⎢ ⎢Fz − mg ⎥ ⎢ τ0 ⎥ ⎥ ⎢ ⎢ ⎥ (6) ⎢ τφ ⎥ = A1 (φ, β) ⎣τ1 ⎦ ⎥ ⎢ ⎣ τy C ⎦ τ2 τψ
Note that the equivalent torque around yC axis τyC is likely to cause rollovers of the cylinder, which we assume will not happen in our following analysis. Therefore, the weight of the system should be carefully designed to counteract the effects of τyC . Also, it is assumed that Fz − mg < 0 in the terrestrial mode. This assumption is reasonable as we can set it as a constraint in the control allocation. In this way, only F , τφ , and τψ takes effect on the motion in terrestrial mode. Hence, (6) can be further simplified to ⎡ ⎤ ⎡ ⎤ T F ⎢ τ0 ⎥ ⎣ τφ ⎦ = A2 (φ, β) ⎢ ⎥ (7) ⎣ τ1 ⎦ τψ τ2 The angular velocity of the system in body frame is T ωB = C B R ωC
˙ 0, ψ] ˙ T , [φ, ψ]T is the orientation expressed in {C}. where ωC = [φ, The kinetic energy is thereby denoted as KT =
1 1 T m(x˙ 2 + y˙ 2 ) + ωB IωB 2 2
(8)
where m is the system mass and I is the inertia tensor in {B}, [x, y]T is the position of the center expressed in {E}. We assume that there is no slip of Carpet in rolling on the ground. Let the radius of the ring be r, then x˙ and y˙ are subject to constraints: x˙ = rφ˙ sin ψ (9) y˙ = −rφ˙ cos ψ Therefore, the Lagrange equation of the system is given as d ∂L ∂L ( )−( ) = τq + λ1 a1q + λ2 a2q dt ∂ q˙ ∂q
(10)
where q = (x, y, φ, θ)T , λ1 and λ2 denote the Lagrange multipliers, L = KT because of the symmetric structure of Carpet, a1q and a2q are terms from the constraint equation, from (9) we have a1q = [1, 0, −r sin ψ, 0]T , a2q = [0, 1, r cos ψ, 0]T .
360
J. Sun et al.
By writing (10) out we have m¨ x = −F sin ψ + λ1 m¨ y = F cos ψ + λ2 d ∂L ∂L −r sin ψ r cos ψ ( ) = τΦ + λ 1 + λ2 )−( 0 0 dt ∂ Φ˙ ∂Φ where Φ = [φ, ψ]T , τΦ = [τφ , τψ ]T Taking the time derivatives of (9) yields, x ¨ = r(φ¨ sin ψ + φ˙ ψ˙ cos ψ) y¨ = −r(φ¨ cos ψ − φ˙ ψ˙ sin ψ) Then substituting (13) into (11) can solve λ, λ1 = F sin ψ + mr(φ¨ sin ψ + φ˙ ψ˙ cos ψ) λ2 = −F cos ψ − mr(φ¨ cos ψ − φ˙ ψ˙ sin ψ) From (14) and (12), we can obtain Ix + mr2 0 φ¨ 2 2 0 Iz cos φ + Iy sin φ ψ¨ 1 − 2 (Iy − Iz )ψ˙ 2 sin 2φ τφ − F r + = τψ (Iy − Iz )φ˙ ψ˙ sin 2φ
(11)
(12)
(13)
(14)
(15)
or to reveal the structure ˙ = τt Mt (Φ)Φ¨ + Ct (Φ, Φ)
(16)
˙ ∈ where Mt (Φ) ∈ R2×2 is the symmetric positive definite mass matrix, Ct (Φ, Φ) R2 is the centrifugal and Coriolis vector, τt is the equivalent input given by ⎡ ⎤ F −r 1 0 ⎣ ⎦ τφ τt = (17) 0 01 τψ Note that because we assume the structure of Carpet is symmetric, the gravity takes no effect on the motion in terrestrial mode. Hence there is no gravity term in (16). From (7) and (17) we can rewrite τt as ⎡ ⎤ T ⎢ τ0 ⎥ ⎥ (18) τt = Ht (φ, β) ⎢ ⎣ τ1 ⎦ τ2 where Ht (φ, β) ∈ R3×12 is the input allocation matrix for terrestrial mode.
Towards Flying Carpet
361
The EOM (16) is expressed in coordinate Φ. As the system in terrestrial mode is a nonholonomic system, it is difficult to express the EOM directly in coordinate (x, y). However, by taking the nonholonomic constraint equation (9) as kinematic equation, then (9) and (16) together can reveal the evolution of the position. In the simulation, according to (9) and (13), the other two coordinates [x, y]T can be updated by numerical solution from (16).
3
Control and Planning from Differential Flatness
3.1
Differential Flatness in Flight Mode
Differential flatness is a property of some dynamic systems that can be used to simplify trajectory generation process by reducing the number of variables that need to be planned. Moreover, the differential flatness of a nonlinear system implies the controllability of the system. We say a system is differentially flat if its states and inputs can be expressed as a function of the system’s selected outputs and their derivatives [18]. Taking a general quadrotor as an example, its full state is given by [x, y, z, φ, θ, ψ, x, ˙ y, ˙ z, ˙ p, q, r]T ∈ R12 which represents the position, Euler angles, linear velocity, and angular velocity. However, it has been proved that with carefully chosen flat outputs, e.g, σ = [x, y, z, ψ]T ∈ R4 which represents the position and yaw angle, only four variables need to be planned instead of twelve [19]. Here in our cases, differential flatness is considered for the flight mode of Carpet, where we choose σ = [x, y, z, θ, ψ, β1 , β2 ]T as flat outputs and the remaining variable φ and its derivatives will be expressed by σ, σ, ˙ σ ¨. As Z − Y − X Euler angles are adopted in this work, E B R is expressed with Euler angles (φ, θ, ψ) as, E BR
= Rz (ψ)Ry (θ)Rx (φ)
(19)
According to Newton’s equations of motion, we have m¨ p = −mge3 + E B RFB
(20)
where FB is the resultant force on the middle unit excluding the gravity force. Rewrite (20) in Euler angles as Rx (φ)FB = (Rz (ψ)Ry (θ))T (m¨ p + mge3 ) where
(21)
⎤ ⎡ ⎤ ⎡ ⎤⎡ FB,x 1 0 0 FB,x Rx (φ)FB = ⎣0 cos φ − sin φ⎦ ⎣FB,y ⎦ = ⎣cos φFB,y ⎦ 0 sin φFB,y 0 sin φ cos φ
Since the right hand side of (21) is known from the flat outputs σ, let ⎡ ⎤ Lx L = (Rz (ψ)Ry (θ))T (m¨ p + mge3 ) = ⎣Ly ⎦ Lz
(22)
362
J. Sun et al.
then φ can be expressed as φ = arctan 2(Lz , Ly )
(23)
along with other flat outputs, the joint forces can be derived. 3.2
Motion Control Design
In this section, we design the controller for Carpet to follow desired trajectories. For both flight and terrestrial modes, feedback linearization control is adopted, which is stated as following. For a system whose dynamic equation possesses the following structure, ˙ Θ˙ − G(Θ)) ¨ = M (Θ)−1 (τ − C(Θ, Θ) Θ
(24)
The feedback linearization control input is designed as, ¨ d − Kv e˙ − Kp e) + C(Θ, Θ) ˙ Θ˙ + G(Θ) τ = M (Θ)(Θ
(25)
where Θd = [pd , Φd , βd ]T is the desired trajectory vector of configuration, e = Θ − Θd is the error between the desired trajectory and actual trajectory, Kp and Kv are positive-definite constant matrices, respectively. Substituting (24) into (25) yields e¨ + Kv e˙ + Kp e = 0
(26)
By properly setting the positive gains of Kp and Kv , the errors will asymptotically approach zero as time goes to infinity. Specifically, the controller for flight mode is designed based on the dynamics as qd − Kv e˙ − Kp e) + Cf (q, q) ˙ + Gf (q) (27) τf = Mf (q)(¨ where q = [x, y, z, φ, θ, ψ, β1 , β2 ]T , e = q − qd . As Carpet in flight mode is an under-actuated system, it cannot track arbitrary reference trajectory qd . In order to apply control law (27) here, qd should satisfy dynamic constraints. A common way to deal with the under-actuated property is the cascaded controller containing an outer loop sub-controller and an inner loop sub-controller [7,20]. In this paper, aiming to simplify the controller design procedure, we do not design the cascaded controller. As will be seen in the next sub-section, one can design the reference trajectory in the flat output space, i.e., σd (t). Then the reference trajectory in the configuration space, i.e., qd (t) in (27), can be obtained by following the procedures in the proof of differential flatness. It is trivial to conclude that the derived qd (t) satisfies the dynamic constraints decided by the system dynamics. The net thrust and the torque of each unit can thus be calculated as, ⎡ ⎤ T ⎢τ1 ⎥ ⎢ ⎥ = H + (β)τf (28) f ⎣τ2 ⎦ τ3
Towards Flying Carpet
363
Similarly, the controller for terrestrial mode is designed as, ˙ + Gt (Φ) τt = Mt (Φ)(Φ¨d − Kv e˙ − Kp e) + Vt (Φ, Φ)
(29)
where Φ = [φ, ψ]T , e = Φ − Φd . Then, the net thrust and the torque of each unit in terrestrial mode are allocated as ⎡ ⎤ T ⎢τ1 ⎥ ⎢ ⎥ = Ht+ (φ, β)τt (30) ⎣τ2 ⎦ τ3 To apply feedback linearization control scheme, it must be assumed that all the parameters of the model are precisely known. Even if it is almost impossible in real life practice, the proposed controller can serve as a base on which more complicated control law can be further investigated. 3.3
Minimum Snap Trajectory Planning
When designing and optimizing trajectories for configuration variable q (the generalized position), certain index of system performance should be minimized while satisfying some constraints. The index we choose in this article is snap, the forth derivative of position. Trajectories are expressed with the form of polynomials qi (t) = ai0 + ai1 t + ai2 t2 + ... + ain tn
(31)
where qi means the i-th element of q, ai = [ai0 , ai1 , ai2 , ..., ain ]T is the corresponding coefficient vector. In this article, the duration of the trajectory will be limited to a given period T and will be designed to pass preset via points at certain instants. Therefore, the process of minimizing snap can be treated as an optimization problem as T (4) (qi (t))2 dt min ai
s.t.
0
qi (tj ) = qij , k
j = 0, ..., m
(32)
d qi |t=tj = 0 or f ree, j = 0, ..., m; k = 1, ..., 4 dtk The objective function is the integral of the square of the snap over time. After integration, it will actually become a function of the polynomial coefficient ai with each term being quadratic. The constraints are the conditions for the planned trajectories to pass the via points, where qij is the configuration at time tj . Hence, the problem (32) can be further formulated as a quadratic program (or QP) min aTi Qi ai (33) s.t. Aai = bi
364
J. Sun et al.
By solving the QP, we can obtain the desired coefficients for each polynomial that describes the trajectory. In general situations, it is not enough to use only one polynomial to describe a trajectory. Normally, a trajectory will be divided into multiple sections with each being described by one polynomial. For each section, the coefficients are calculated based on the same minimum snap method. It is worth noting that for flight mode only the trajectory of flat output, i.e., [x, y, z, θ, ψ, β1 , β2 ]T , needs to be planned according to the differential flatness of the system. The trajectory of the remaining configuration coordinate φ is generated using the method mentioned in Sect. 3.1. For terrestrial mode, only the trajectory [φ(t), ψ(t)]T is planned, and [x(t), y(t)]T of the system can be determined by the constraint relationship explained in Sect. 2.3.
4
Simulation Results
To verify the dynamic models and controllers constructed in previous sections, simulations are performed on MATLAB. The function ode113 is chosen to numerically solve the differential equations derived from the dynamics models. The simulation results for both the flight mode and terrestrial mode are presented in this section. The QP in the trajectory planning algorithm is solved by the MATLAB function quadprog. 4.1
Simulation Results of Flight Mode
In the simulation, the via points are shown in Table 1. There are two narrow corridors in 2nd and 3rd keyframes. Carpet needs to fold to pass the corridors, otherwise, it cannot pass through 2nd and 3rd keyframes. Carpet will fold first to pass the corridor and stretch to its unfolded state when it gets out, with the altitude decreasing first and then increasing. In this simulation case it is seen that Carpet is able to fold when it needs to pass through a narrow corridor, which is hard to complete for some other aerial systems assembled from multiple modules [21]. Table 1. Via points of flight mode State Time xd (m) yd (m) zd (m) θd (rad) ψd (rad) β1 d (rad) β2 d (rad) 0
0
π/12
7
π/12
π/12
2π/5
2π/5
7
π/12
π/6
2π/5
2π/5
0
π/3
π/12
π/12
1
0
2
2
2
Ts /3
8
4
3
2Ts /3 14
6
4
Ts
8
9
20
9
π/12
The simulation results are shown in Figs. 9, 10, 11 and 12. As shown in Fig. 9, four keyframes of Carpet corresponding to the via points are drawn along
Towards Flying Carpet
365
with the real path. At keyframe 2 and keyframe 3, Carpet is folded so that it can pass the segments of cylindrical channel which is almost impossible to go through when Carpet is in the stretching state. Figure 10 and Fig. 11 display the position and attitude evolution of the middle unit. Figure 14 shows the joint position evolution. It is seen that the planned trajectory smoothly connects the four via keyframes. The actual trajectory converges to the planned trajectory in the presence of initial errors.
Fig. 9. Carpet is flying along a planned path from keyframe 1 to keyframe 4. The color red, green and blue represent different units of Carpet. The narrow channels are indicated by the grey cylinders, which Carpet is able to pass when it folds up. (Color figure online)
4.2
Simulation Results of Terrestrial Mode
In terrestrial mode, Carpet will fold into a prism shape and will roll like a cylinder with the attached supporting arcs. The motion of the cylinder is enabled by the thrust force and torques produced by each multi-rotor unit. As seen in previous analysis, we only need to plan and track the angles φ (rotation) and ψ (deflection) to determine the system’s position and orientation. Hence we design the trajectory via points as listed in Table 2, which creates a turning scenario in Fig. 13. The reference and actual trajectory of φ (rotation) and ψ (deflection) is depicted in Fig. 14. The convergence of the controller in terrestrial mode can also be concluded from Fig. 14.
366
J. Sun et al.
x(m)
20 10 0
0
1
2
3
4
5
3
4
5
3
4
5
y(m)
Time(s) 8 6 4 2
0
1
2
z(m)
Time(s) 8 6 0
1
2 Time(s)
(rad)
Fig. 10. Tracking profile of the position of the middle unit. The red dashed lines represent desired trajectories, and the blue solid lines are the real trajectories. (Color figure online)
0.4 0.2 0 -0.2 0
1
2
3
4
5
3
4
5
3
4
5
(rad)
Time(s) 0.4 0.2 0
0
1
2
(rad)
Time(s) 1 0.5 0
0
1
2 Time(s)
Fig. 11. Tracking profile of the orientation of the middle unit, flight mode.
1
(rad)
Towards Flying Carpet
367
2 1 0
0
1
2
3
4
5
3
4
5
2
(rad)
Time(s) 2 1 0
0
1
2 Time(s)
Fig. 12. Tracking profile of the joint angles of Carpet, flight mode. Table 2. Trajectory via points of terrestrial mode State Time φd (rad) ψd (rad) 1
0
0
2
Ts /3
π
3
2Ts /3 3π
π/2
4
Ts
π
6π
0 π/6
Fig. 13. Rotational motion of the cylinder on the ground. The arrows indicate the moving direction of cylinder at each state.
368
J. Sun et al.
(rad)
20 10 0 0
1
2
3
4
5
3
4
5
Time(s)
(rad)
4 2 0 0
1
2 Time(s)
Fig. 14. Tracking profile of the orientation of the cylinder.
5
Conclusions
In this paper, we proposed a novel aerial system, which is known to be the first systematic analysis in terms of this configuration. Carpet is composed of multiple modules, enabling it to work in both flight mode and terrestrial mode. Because of the multiple modules, in flight mode Carpet as an integral is actuated by 5-D force/torque. In terrestrial mode, Carpet is capable of adjusting the yaw and pitch angle simultaneously. Therefore, the proposed system Carpet has advantages in terms of agility and energy efficiency. Modeling and controller design are performed for both modes. And by planning the motion of the joint angles, 3D position and 2D attitude, Carpet is able to cross through narrow corridors, which is demonstrated via numerical simulation. Due to such property, Carpet has the potential to deal with detection or manipulation in challenging environments. Future work includes the investigation of the mode switching strategy and the implementation of the proposed methods on a real world Carpet prototype.
References 1. Cao, Z., Fu, C., Ye, J., Li, B., Li, Y.: HiFT: hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 15457–15466 (2021) 2. Zhang, Z., Scaramuzza, D.: Fisher information field: an efficient and differentiable map for perception-aware planning. CoRR, vol. abs/2008.03324 (2020). https:// arxiv.org/abs/2008.03324 3. Yu, Y., Li, P., Gong, P.: Finite-time geometric control for underactuated aerial manipulators with unknown disturbances. Int. J. Robust Nonlinear Control 30(13), 5040–5061 (2020). https://onlinelibrary.wiley.com/doi/abs/10.1002/rnc.5041
Towards Flying Carpet
369
4. Hamaza, S., Georgilas, I., Heredia, G., Ollero, A., Richardson, T.: Design, modeling, and control of an aerial manipulator for placement and retrieval of sensors in the environment. J. Field Robot. 37(7), 1224–1245 (2020). https://onlinelibrary. wiley.com/doi/abs/10.1002/rob.21963 5. Cardona, G., Tellez-Castro, D., Mojica-Nava, E.: Cooperative transportation of a cable-suspended load by multiple quadrotors. IFAC-PapersOnLine 52(20), 145– 150 (2019). 8th IFAC Workshop on Distributed Estimation and Control in Networked Systems NECSYS 2019. https://www.sciencedirect.com/science/article/ pii/S2405896319319998 6. Yu, Y., Lippiello, V.: 6D pose task trajectory tracking for a class of 3D aerial manipulator from differential flatness. IEEE Access 7, 52257–52265 (2019) 7. Lee, T.: Geometric control of quadrotor UAVs transporting a cable-suspended rigid body. IEEE Trans. Control Syst. Technol. 26(1), 255–264 (2018) 8. Six, D., Briot, S., Chriette, A., Martinet, P.: The kinematics, dynamics and control of a flying parallel robot with three quadrotors. IEEE Robot. Autom. Lett. 3(1), 559–566 (2018) 9. Nguyen, H., Park, S., Park, J., Lee, D.: A novel robotic platform for aerial manipulation using quadrotors as rotating thrust generators. IEEE Trans. Robot. 34(2), 353–369 (2018) 10. Sanalitro, D., Tognon, M., Cano, A.J., Cort, J., Franchi, A.: Indirect force control of a cable-suspended aerial multi-robot manipulator. IEEE Robot. Autom. Lett. 7(3), 6726–6733 (2022) 11. Nguyen, H., Dang, T., Alexis, K.: The reconfigurable aerial robotic chain: modeling and control. In: IEEE International Conference on Robotics and Automation (ICRA) 2020, pp. 5328–5334 (2020) 12. Zhao, M., Nagato, K., Okada, K., Inaba, M., Nakao, M.: Forceful valve manipulation with arbitrary direction by articulated aerial robot equipped with thrust vectoring apparatus. IEEE Robot. Autom. Lett. 7(2), 4893–4900 (2022) 13. Anzai, T., Zhao, M., Murooka, M., Shi, F., Okada, K., Inaba, M.: Design, modeling and control of fully actuated 2D transformable aerial robot with 1 DoF thrust vectorable link module. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019, pp. 2820–2826 (2019) 14. Salda˜ na, D., Gabrich, B., Li, G., Yim, M., Kumar, V.: Modquad: the flying modular structure that self-assembles in midair. In: IEEE International Conference on Robotics and Automation (ICRA) 2018, pp. 691–698 (2018) 15. Fabris, A., Kleber, K., Falanga, D., Scaramuzza, D.: Geometry-aware compensation scheme for morphing drones. In: IEEE International Conference on Robotics and Automation (ICRA) 2021, pp. 592–598 (2021) 16. Kalantari, A., Spenko, M.: Modeling and performance assessment of the HyTAQ, a hybrid terrestrial/aerial quadrotor. IEEE Trans. Robot. 30(5), 1278–1285 (2014) 17. Kalantari, A., Spenko, M.: Design and experimental validation of HyTAQ, a hybrid terrestrial and aerial quadrotor. In: 2013 IEEE International Conference on Robotics and Automation, pp. 4445–4450 (2013) 18. Murray, R.M., Rathinam, M., Sluis, W.: Differential flatness of mechanical control systems: a catalog of prototype systems. In: ASME International Mechanical Engineering Congress and Exposition. Citeseer (1995) 19. Mellinger, D., Kumar, V.: Minimum snap trajectory generation and control for quadrotors. In: IEEE International Conference on Robotics and Automation 2011, pp. 2520–2525 (2011)
370
J. Sun et al.
20. Yu, Y., Ding, X.: A global tracking controller for underactuated aerial vehicles: design, analysis, and experimental tests on quadrotor. IEEE/ASME Trans. Mechatron. 21(5), 2499–2511 (2016) 21. Yu, Y., Shi, C., Shan, D., Lippiello, V., Yang, Y.: A hierarchical control scheme for multiple aerial vehicle transportation systems with uncertainties and state/input constraints. Appl. Math. Model. 109, 651–678 (2022). https://www.sciencedirect. com/science/article/pii/S0307904X22002268
Region Clustering for Mobile Robot Autonomous Exploration in Unknown Environment Haoping Zheng1,2 , Liwei Zhang1,2(B) , and Meng Chen1 1 Shanghai Key Laboratory of Spacecraft Mechanism, Shanghai, China 2 School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350116,
China [email protected]
Abstract. At present, most of the traditional algorithms for mobile robot autonomous exploration in unknown environments use frontier as a guide and adopt greedy strategies for determining the next exploration target. They generate new frontier as new regions of the map, and thus the cycle repeats itself to finally complete the exploration of unknown environments. In this paper, we propose a region clustering-based approach for autonomous exploration of mobile robots in complex environments. Our approach incorporates the concept of region clustering at its global level based on the current advanced hierarchical exploration framework. The robot is more inclined to finish exploring a certain region of the map first, thereby minimizing the robot’s repetitive exploration of explored regions. Mobile robot autonomous exploration experiments were implemented in our college campus. The experimental results show that the average exploration trajectory length was reduced by 14.03%, and the average exploration time was reduced by 16.15%, respectively. Keywords: Regional clustering · Hierarchical frameworks · Autonomous exploration
1 Introduction Mobile robots explore the environment autonomously and the main task is to guide the robot to the unknown region and complete the map construction with the coverage of the unknown region. Over the years autonomous exploration by robots has been widely used in disaster relief [1], underwater surveys [2], Mars exploration [3] and underground tunnel environment exploration [4, 5]. Autonomous exploration refers to the construction of a spatial map of the robot’s location through information acquired by sensors without the intervention of known maps and external human operations [6–8]. In the process of autonomous robot exploration, a reasonable and effective exploration strategy is the key to influence the outcome of autonomous robot exploration. The most common exploration strategy is to drive the robot to move toward an unknown region to obtain more information about the unknown environment. In this type of approach, the © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 371–388, 2023. https://doi.org/10.1007/978-981-99-0617-8_25
372
H. Zheng et al.
mobile robot uses sensors to sense its surroundings, find a new environmental region, and move toward that region. The above operations are repeatedly performed until the full environment traversal is completed. In this paper, we propose region clustering autonomous exploration method based on the current advanced hierarchical exploration framework [14]. The method divides the unexplored regions into regional clusters in the global planning stage, based on Iterative Self-Organizing Data Analysis Technique (ISODATA), which can dynamically adjust the number of regional clusters according to the actual situation of the region, and incorporates the actual planning paths. The method is based on the Euclidean distance path evaluation method to make the region clustering more realistic. It makes the robot autonomous exploration biased towards the map-level region exploration. We also add a new evaluation method to the environmental analysis of the exploration region to evaluate the passage complexity of the robot to explore the region, together with the path length cost and information gain, to evaluate the exploration cost more comprehensively and make the robot exploration more in line with the realistic exploration strategy. This paper is structured as follows. Section 2 introduces the existing work related to mobile robot autonomous exploration. Section 3 introduces our proposed autonomous exploration method for mobile robots based on region clustering, which is divided into three main parts: an autonomous exploration framework based on exploration region clustering, an exploration region partitioning strategy, and an exploration region evaluation function. In Sect. 4, we conduct relevant experiments to verify the effectiveness of our strategy. Finally, the paper is concluded in Sect. 5.
2 Related Work There are two main types of mainstream exploration algorithms, which are greedy exploration algorithms based on the unknown regions and autonomous exploration algorithms based on hierarchical structures. We will expand them in details in the following sections. 2.1 Greedy Approach to Unknown Regions Among the traditional methods of autonomous exploration, Yamauchi [9] first proposed a method based on frontier exploration, in which the authors defined the known and unknown areas of the map as frontier and set the frontier as exploration target. The robot is made to navigate to the target point by the environmental information acquired by the sensors, and the known map keeps expanding and the frontier expands outward to produce new frontier points, thus the cycle repeats itself and thus completes the exploration of the unknown environment. But the algorithm defines all the boundaries as the same type, which makes the robot navigate to the nearest frontier, and thus the efficiency is not good. Most of the traditional exploration algorithms use greedy strategies to determine the exploration target, this class of algorithms is crucial for the detection and extraction of frontier. Simmons [10] evaluated the information benefit of a frontier point into two parts, the expected information gain to the target and the exploration cost, to determine the optimal target. Umari [11] proposed a new search strategy based on multiple Rapidlyexploring Random Tree (RRT) [12]. The RRT algorithm is biased towards unexplored regions and can be extended to the properties of high dimensional space.
Region Clustering for Mobile Robot Autonomous Exploration
373
In this work, local and global trees are used to detect the frontier, where the local search tree will be reset when it reaches the frontier, while the global search tree will not be reset as a way to speed up the search. This method has a local tree not reset when searching the frontier at narrow entrances, which leads to a waste of computational cost. In addition, the RRT expands randomly in the free space, the regions not spanned by the RRT will be ignored. These methods tend to ignore regions, especially for regions with small openings, and produce incomplete coverage. 2.2 Hierarchical Approach to Autonomous Exploration Zhu [13] extended the algorithm for multiple RRTs and named it Dual Stage Viewpoint Planner (DSVP), where the authors divided the whole exploration process into two stages. The first stage is Exploration, this stage in which the robot keeps collecting unknown information is truly in line with the concept of exploration. In this phase, the RRT is used to randomly sample the observation viewpoints in a local region around the robot, to continuously expand the exploration region, and to expand the global observation viewpoints. The second stage is Relocation, in which the robot is not surrounded by unknown regions and needs to be redirected to a more distant unknown region to continue exploration, in which the robot does not collect any unknown information and only travels in known regions. However, because such algorithms optimize the short-term goals, the robot becomes short-sighted and ignores the long-term goals, and such a strategy makes the robot prone to take the path it has already taken, making exploration inefficient. Therefore, Cao [14] proposed an algorithm called TARE planner, in which the final exploration path performed by the robot is optimized at two levels: (1) at the global scale, a rough path is computed to guide the robot in the general direction of travel; (2) at the local scale, the algorithm looks for a route that satisfies the robot sensors to complete covering the local exploration region. The addition of this idea makes the robot’s driving path will be guided by the global and more purposeful. Although it has a clearer purpose under the role of global guidance, there is a large amount of repeated walking of the explored region, which leads to a decrease in exploration efficiency.
3 Region Clustering for Mobile Robot Autonomous Exploration With respect to the above problems, in this paper, we propose a region clustering-based algorithm for autonomous exploration of mobile robots in complex environments. The main contributions of this paper are as follows. 1. This paper proposes an region exploration method based on the current advanced hierarchical exploration framework, which can make the robot more biased to finish exploring a certain region of map in the process of autonomous exploration, so as to reduce the repeated exploration region of the robot; 2. This paper proposes a new region division method based on ISODATA algorithm, which incorporates the judgment of possibility within the region clusters to divide
374
H. Zheng et al.
the region clusters, which can make the robot more inclined to finish exploring the target region first. 3. This paper proposes a region-based exploration target screening function at the global level, which combines path cost and information gain to explore the exploration sequence planning for the exploration region by calculating the ground undulation changes in the region. In this paper, we propose an effective framework for robots to explore unknown environments autonomously, which mainly contains a region detection module, a region clustering module, and a path planning module, as shown in Fig. 1. Among them, the region detection module evaluates the robot state and the obtained information, and passes the region division results to the region cluster division module to obtain the region clusters by our proposed method. The path planning module sequentially plans the exploration areas through our proposed area evaluation method, and finally obtains the exploration path. The robot is guided by the global path, and local path planning is performed to get a fine path that satisfies the motion control to guide the robot to the target point, thus updating the map at the same time. Path Planning Module
Area Detecon
Regional Segmentaon Clustering
Global Path Planning
Local Path Planning
Waypoints Map Update
Robot
Fig. 1. Framework diagram for mobile robot autonomous exploration.
3.1 An Autonomous Exploration Framework Based on Exploring Regional Clustering The current advanced hierarchical planner was proposed by Cao [14] and named TARE planner, which divided the autonomous exploration task into two levels for exploration path optimization, and the two levels use different resolutions for planning paths. At the global level, the region to be explored is delineated using low-resolution information to guide the robot to plan global rough paths through the path length cost, making the robot autonomous exploration task globally purposeful. At the local level, finegrained paths are planned using high-resolution information in the vicinity of the robot to meet the robot motion control requirements. The authors use the Travelling Salesman Problem (TSP) at both levels to optimize the exploration strategy. The TARE planner is more effective in avoiding repeated visits (backtracking problem) and yielding higher exploration efficiency than the previous greedy one. Since the region to be explored uses a path length cost to solve the traveler problem, the robot does not focus on completing local map exploration tasks first compared to the entire large environmental map hierarchy.
Region Clustering for Mobile Robot Autonomous Exploration
375
In this paper, we improve the global path optimization strategy based on this hierarchical autonomous exploration framework by integrating low-resolution information through region clustering. The robot global exploration focuses on completing a local region of the map first, and thus cyclically completes the exploration task. Our approach focuses more on the construction of local maps in large and complex environments than the current advanced autonomous exploration and exploration frameworks. The first level uses high-resolution information in the vicinity of the robot, called the local planning region (the yellow dashed box shown in Fig. 2). The local planning region uses a random sampling of viewpoints to generate a sample of viewpoints. It filters the viewpoints, and uses the TSP to solve for the sequence of viewpoint accesses. Finally, it generates a feasible path that satisfies the robot kinematic level for the robot control module. The path generation method and the kinematic constraints used in this layer are referred to the method in [14]. In the second layer, the low-resolution map information is used to identify the region to be explored, and the region is integrated by our proposed region delimitation strategy, which eventually generates a regional intra-cluster path (the green solid line shown in Fig. 2) and an inter-cluster path (the blue solid line shown in Fig. 2). The region partitioning strategy makes the robot focus on completing local region exploration first during autonomous exploration. The region partitioning strategy guides the robot to complete region exploration in an orderly manner and reduces the probability of repeated visits.
Global path
Intercluster path
Unexplored area
Local path
Local planning area
Regional cluster
Fig. 2. A framework for autonomous exploration based on regional clustering. We regionally integrate the unexplored area (blue squares) that meet certain conditions into the same regional cluster, as shown in the blue dashed box in this figure. (Color figure online)
We define the autonomous exploration workspace as A, where A ∈ R3 , and divide A into the unexplored area Ag and the explored area Aed , as shown in Algorithm 1, and we denote the set of autonomous exploration workspaces as Eq. (1). A = Ag ∪ Aed
(1)
Define v to be denoted as a viewpoint, and a viewpoint is a point in 3D space where a frontier can be seen. Define a frontier as the junction space between a known region and an unknown region, and use f to denote it. Multiple f may be seen at v. We call the robot able to cover to these frontiers and get the map information. The information at v contains
376
H. Zheng et al.
the position and pose of the robot in this space, i.e., v = [probot , qrobot ]. We define the local planning area as Al . Within Al , the first-level planner generates a set of viewpoints V, and by optimizing the cost function, an approximately optimal local path is generated, denoted as T local . The local path needs to meet the control requirements of the robot kinematics layer. T local consists of a viewpoint set and a connecting line set, which is defined as V local ∈ V, V local = {v1 , v2 , …, vn }, and the connecting line set is represented as Edage = {Edage1,2 , Edage2,3 , … Edagen−1,n }, and finally a local path is represented as T local = { V local , Edge}. In the global planning region, by the second layer planner, for the unexplored area, define it as Ag = A/Aed , use our proposed region partitioning method to perform region integration, the method is shown in Algorithm2. Then we get the region cluster C g consisting of Ag , solve by TSP [15] to get the region cluster access sequence S e , expressed as S e = { S e0 , S e1 , S e2 ,…, S en-1 }, where n represents the number of regional clusters. The global path T global is obtained by solving the regional cluster access sequence S e and computing the shortest path between them. Traversing Ag within each C g for region evaluation, using our proposed region evaluation method, we optimally solve to obtain the order of region visits to be explored within the cluster and obtain the intra-cluster path, which is denoted as T intercluster . Finally certain rules connect T local , T intercluster and the global path T global to obtain the robot autonomous exploration path. The autonomous exploration method we use is described as follows. Description: Given an Al and a number of Ag , the Ag is divided by integration using Algorithm 2 to find the shortest path between the starting point of the robot and C g . Traverse each S e , evaluate the Ag belonging to its interior, and find the corresponding T intercluster , which is essentially a partial Ag access sequence, after completing the evaluation and access sequence solving optimization within all regional clusters. Access sequences about all Ag are obtained at this point. Based on the robot locations and access sequences, the near-optimal autonomous exploration path T is found, which is composed of T local , T global and T intercluster .
Region Clustering for Mobile Robot Autonomous Exploration
377
3.2 Unexplored Regional Divisions Our method uses the ISODATA-based idea to cluster the unexplored areas, and adds a clustering step in the global planning layer, so that the global path is planned from shallow to deep. The sequence of visited region clusters is planned first, and then the sequence of explored regions is planned according to this sequence, in order to reduce the probability of repeated visits of autonomous exploration in this planning way. The area is classified using a similar approach as in [14], where the area outside the local planning area (yellow dashed box in Fig. 2) is defined as the global exploration area, while within the global exploration area, the working area is divided into several subarea, each of which stores data on both covered and uncovered surfaces surveyed by the robot during autonomous exploration. As described in the paper, “surface” is defined as the generalized boundary between free space and non-owned space, and information about a surface can be perceived by the robot from multiple viewpoints. The status of the sub-area is defined according to the amount of these two types of data in the sub-area, and the sub-area is divided into “explored” and “unexplored” areas. If the sub-area does not contain any covered or uncovered surfaces, the sub-area is defined as “unexplored”, and if there are only covered surfaces in the sub-area, it is defined as an “explored”. We integrate the unexplored areas in the global planning region during each iteration and divide them into a number of region clusters. The number of clusters is obtained by iterating the algorithm, noting that here the number of region clusters needs to be customized with an expected target.This value will be determined based on the size and complexity of the explored scene, but the final value is obtained by evaluating the objective function of the algorithm. As Fig. 3 shows, the final result of the partitioning of the unexplored areas, the figure defines x to represent the unexplored area within the global region. The first subscript i denotes the ordinal number of the region, and
378
H. Zheng et al.
the second subscript j represents the area to be partitioned to belong to the jth cluster center K j after Algorithm2 calculation, i.e., x i,j indicates that the unexplored area with ordinal number i belongs to the cluster center with ordinal number j. Within the same cluster, L xi,j denotes the Euclidean distance between region x i and cluster center K j , and similarly L ki,kj denotes the Euclidean distance between cluster center K i and K j . According to the method we use, the following constraints need to be satisfied for region partitioning, and the constraints are shown in Eq. (2) and Eq. (3). ∀LKi,Kj s.t.LKi,Kj ≥ Kd min
(2)
∀Lxi,j s.t.Lxi,j ≤ ToKd max
(3)
In Eq. (2), it means that the distance between any two cluster centers needs to be greater than the set threshold when we are selecting the cluster centers. And this method is used to reduce the probability that the cluster centers are too close to each other, so as to improve the accuracy of region classification. In Eq. (3), it means that the regions belonging to the same cluster cannot be too far from the center, i.e., try to make the unexplored areas concentrated.For two unexplored areas, if the Euclidean distance between areas satisfies the above constraint, we will then conduct a search for the existence of planned paths between the two areas. If there is no planned path, the two areas will be directly grouped into the same region cluster, and if any, the constraint will be calculated using the actual path length and Euclidean distance in accordance with Eq. (4). pathlength xi ,xj P xi , xj = EDxi ,xj s.t.P xi , xj ≤ σ
(4) (5)
Define the planned path length as path_length, Euclidean distance as ED, and two subscripts indicate the area serial number. If there is a planned path between two areas, they need to be divided in the same cluster and need to satisfy the constraints shown in Eq. (5), as shown in Fig. 4. In the autonomous exploration environment, there are many situations that do not satisfy this constraint, for example, two unexplored areas are located in two indoor rooms, but there is a wall between them. At this point, the planned path between them is sought, the robot needs to bypass this wall when actually exploring, as shown in Fig. 5. Then our operation at this time is to divide the area into another region cluster, but also need to satisfy the above constraint, as shown in Fig. 5(a), otherwise a new region cluster with this region as the cluster center will be added, as shown in Fig. 5(b).
Region Clustering for Mobile Robot Autonomous Exploration
x1,1 K2
Lk1,k2
K1
379
Unexplored area
x1
Path_lengthx1,kn X3,2 x2,1
Obstacle
K3
k1
X4,3
Fig. 3. The results of the unexplored area division show.
Cluster center_1 Obstacle
Fig. 4. The path length satisfies Eq. (5), and can be classified into the same cluster.
Obstacle
x1
Cluster center_1 K1 K1 X1,1
Path_lengthk1,k2 X2,2 X2,1 K2
Path_lengthx1,k2 Cluster center_2
(a)
K2 Cluster center_2
(b)
Fig. 5. The Euclidean distance between two unexplored regions and the actual planning path length do not satisfy Eq. (5), we will perform the operation as shown in the figure. We perform the operation as shown in (a), which indicates finding a cluster center nearby that satisfies the condition, and (b), which performs a split into a new cluster class.
As shown in Fig. 5(a) and (b), we define this operation in Algorithm2 as the Split_new() function, which uses the function IsMeet() to determine whether the situation shown in Eq. (5) exists. For the area X 1,1 , since EDx1,k2 satisfy the intra-cluster constraints, but there are planning paths between the two area, the planning paths and the Euclidean distance ratio do not satisfy the constraints of Eq. (5). So we search in the remaining cluster centers and calculate the distance between them separately to find the nearest cluster center to this area. And if this distance satisfies the other constraints, then this region is reclassified to the cluster, an example is shown in Fig. 5(a), i.e., the cluster center K 1 that satisfies these constraints is searched. In the case that no other region cluster satisfies the condition is searched, the operation shown in Fig. 5(b) is performed, and we make this operation “Split”. When the search for other clusters does not satisfy the constraint, the rest is used as a new cluster center and a new cluster region is opened. Note that this cluster center will be specially processed, such as letting go of the EDk1,k2 distance constraint shown in Fig. 5(b), so that a new region cluster can be successfully opened.
380
H. Zheng et al.
3.3 Area Evaluation Our evaluation of the cost of autonomous exploration in the exploration area evaluation module is not limited to the aspects of path length and information gain in the region. Considering that unexplored areas belonging to the same region cluster are in close proximity, and that map information of some areas may be acquired when the robot explores nearby areas during autonomous exploration, we introduce a new evaluation method to improve the efficiency of exploration. We analyze the ground point clouds through these areas and use the degree of undulation of the ground point clouds as one of the priority indicators for the robot to explore the area. Define the set of inner point clouds belonging to the unexplored areas as P = {p1 , p2 , …, pn }, where n represents the number of point clouds belonging to this area. The ground point cloud segmentation is performed first to obtain the point cloud set used to evaluate the pavement condition, which we denote as Pground . The ground point cloud segmentation method used in [16], with special attention to the fact that we only perform this method to evaluate this area when the number of ground point clouds reaches a certain threshold, otherwise it will bring a large evaluation error. Assuming the case that the sampled surface of the point
Region Clustering for Mobile Robot Autonomous Exploration
381
cloud is smooth everywhere, the local neighborhood of any point can be well fitted with a plane. We use least squares to fit the local plane of the point, the local plane is denoted as f . The degree of undulation of the neighborhood of the point is estimated by estimating the normal vector of f and computing the covariance matrix and the eigenvalues, calculated as shown in Eq. (6)–(9). k − 2 → → n · pi − d f − n , d = arg min (n,d )
(6)
i=1
1 (pi − p)(pi − p)T k
(7)
s.t.λ0 ≤ λ1 ≤ λ2
(8)
λ0 λ0 + λ1 + λ2
(9)
k
M =
i=1
δ=
→ where − n represents the normal vector of the f , p denotes the set of point clouds with the nearest neighbor search for each point in the evaluation area, the KD-Tree (K-Dimension Tree) lookup is used for least-squares fitting of the domain plane, pi denotes the i-th point in the set of point clouds p, and d denotes the distance from f to the coordinate origin. The local planes in the ground least squares sense of the estimated area are obtained by fitting Eq. (6). The normal vector of the plane fitted by the k nearest points is the normal vector of the current sweep point. The normal vector of the f can be obtained by Principal Components Analysis (PCA), which shows that the f passes through the center of mass p of this neighborhood, and then the covariance matrix M in Eq. (7) is decomposed by eigenvalues to find each eigenvalue of M. The eigenvector corresponding to the smallest eigenvalue is the normal vector of the f . If the eigenvalues of the covariance matrix satisfy the condition of Eq. (8), the surface curvature of point p is δ, which is calculated as shown in Eq. (9), and the smaller this value is, the greater the change in the undulation of this neighborhood is represented. The surface curvature is solved for each ground point in the region and added to the surface curvature set δ set , and the regional evaluation index Cost tra is obtained using Eq. (10), where m denotes the number of elements in δ set , as an approximation to estimate the pavement complexity of the entire unexplored area. The smaller the value of Cost tra , the greater the variation of the road surface in the area and the greater the cost of the robot to pass this area. Cost tra =
1 x x∈δset m
(10)
Algorithm 3 is only one part of the integrated evaluation of the global exploration sequence, and the integrated evaluation function also contains two parts: path length and information gain in the area. The path length, as one of the navigation costs of autonomous exploration, is used as the navigation cost by searching for a planned path from the starting area to the target area. And if there is no planned path between two areas, the Euclidean distance is used as the navigation cost. The information gain part is
382
H. Zheng et al.
evaluated using the number of frontier that can be observed in the area. The integrated evaluation method is shown in Eq. (11)–(12). if path(As , At ) exists Pathlength(As ,At ) Cost length (As , At ) = (11) Euclideandistance(As ,At ) other E(As , At ) =
λ2 · Cost(At )tra + λ3 · eλ4·Cost length (As ,At ) λ1· Gain(At )
(12)
where As and At represent the starting area and the target area respectively. Equation (11) denotes the navigation length cost of the robot from As and At . If the planning path between the two areas can be searched in the established map, the navigation length cost is calculated using the planning path length, and the planning path length is calculated using Pathlength(As ,At ) . Otherwise the Euclidean distance between the two areas is calculated using Euclideandistance(As ,At ) . Where Gain(At ) serves to calculate the information gain of the robot in the At . The final use of Eq. (12) is to complete the comprehensive evaluation value from As explored to At . It is worth mentioning that the obtained E calculated here is a component of M cost in Algorithm1, and λ1 . . . λ4 are custom constants to modify the weights of each component to suit various types of exploration environments. In the process of autonomous exploration, frontier update, regional segmentation clustering module and area evaluation module are updated with constant frequency. The next optimal exploration area is obtained from the access sequence obtained from the previous iteration at the beginning of each iteration, then under the guidance of the Ag . Finally, the robot determines the final exploration trajectory under the guidance of the global and local levels, so as to complete the autonomous exploration task cyclically.
Region Clustering for Mobile Robot Autonomous Exploration
383
4 Experiments 4.1 Experimental Environment Setup In order to verify the performance and efficiency of the autonomous exploration method, we conducted experiments in a simulation environment and a real environment, and also compared the experimental results with other autonomous exploration methods. The mobile robot platform uses a home-made MR500 with a differential wheel model, as shown in Fig. 6. A Pandora 40-line LIDAR is built on the platform. The computational platform uses a ThinkPad T470p computer with i7-7700HQ CPU and 32 GB RAM. Ubuntu18.04 operating system is used for the experimental system environment, and Melodic distribution is used for the Robot Operating System (ROS) environment. The robot trajectory tracking and navigation module uses the method proposed in [17], while the map building and robot localization use the LOAM algorithm [18]. The simulation environment is built under the Gazebo software based on the ROS system, and the simulation environment considers common application scenarios for autonomous robot exploration, and is designed to build a map scene based on our school laboratory building, as shown in Fig. 7. In the real experiment, it is also based on this experimental building to complete the experiment.
Fig. 6. Experimental platform, (a) and (b) are the MR500 experimental platform and Pandora LiDAR respectively.
384
H. Zheng et al.
Fig. 7. Experimental building simulation environment, (a) and (b) are the display of different perspectives of the simulation environment in Gazebo, (c) is the actual exterior of our laboratory building, and (d) is the location of our laboratory building on campus.
4.2 Simulation Experiments In order to quantitatively evaluate the efficiency of the mobile robot to explore the unknown environment autonomously, we use the exploration time consumed by the robot and the length of the path traveled as evaluation indexes. The exploration time consumed by the robot after autonomously exploring the unknown environment can reflect the speed of algorithm solving, and the length of the path explored by the robot can reflect the optimization-seeking effect of the exploration algorithm. So by comparing the exploration time and exploration path length of the robot can better compare the performance of the robot autonomous exploration algorithm. We conducted a total of 100 sets of simulation experiments to compare the newly proposed region clustering autonomous exploration method with the previous exploration methods. We established a simulation environment of our college floor and conducted simulation experiments in this simulation environment with an area of about 58 m * 58 m, in which the number of rooms is 25. In order to intuitively reflect the exploration effect of these two methods, we select a set of trajectory data of the experimental results for comparison, as shown in Fig. 8 below. From the number of overlapping trajectories and trajectory length of the robot passing through the rooms, we can see that our method makes the robot reduce the number of repeated visits and makes the exploration path length shorter. Also, to more accurately compare the effects of these two strategies, we counted the exploration time, exploration path length, and amount of exploration space spent by the robot exploring the entire complete environment, and all data were averaged. The data of each exploration evaluation index are shown in Table 1 and Table 2. The experimental
Region Clustering for Mobile Robot Autonomous Exploration
385
Fig. 8. Experimental results in the simulated environment, (a) and (b) show the exploration results in the experimental building and typical indoor environment, respectively, showing that our method can reduce a large number of repeated visits.
Table 1. Explore data from our lab building Method
Average path length
Average exploration time
Average exploration volume
Ours
536.8 m
321.4 s
6083.7 m3
TARE
624.4 m
383.8 s
6102.2 m3
Table 2. Exploration data in a typical indoor environment Method
Average path length
Average exploration time
Average exploration volume
Ours
1136.6 m
665.5 s
5485.7 m3
TARE
1227.3 m
773.6 s
5610.5 m3
results comparison shows that in the simulation environment, the exploration trajectory length and exploration time required by our method to explore the same environment are significantly smaller than those of other methods, and the average exploration volume is not much different. Compared with other methods, the average exploration trajectory length is reduced by 14.03% and the average exploration time is reduced by 16.15%, while in the typical indoor environment, it is shortened by 7.3% and 13.9%, respectively. 4.3 Real Experiments We conducted several experiments using the proposed method in the real environment of the laboratory building of our school. In the real environment, the autonomous exploration system can better explore the unknown environment and establish a corresponding
386
H. Zheng et al.
map. The established map and exploration estimation are shown in Fig. 9. Due to the deviation between the real environment and the simulated environment, and the uncertainty of positioning, the final mapping effect is general. Therefore, in the actual experiment, after the exploration is completed, the robot is set to stop moving instead of returning to the starting point. It can be seen from the figure that the robot rarely has repeated visits during the autonomous exploration process, which is more in line with the autonomous exploration idea we designed, which improves the overall exploration efficiency.
Fig. 9. Results in the real environment, (a) shows the resulting path of our method, (b) is the result of other methods, (c) is the point cloud point map, and (d) is a part of the display in the real scene.
Table 3. Real environment experimental data Method
Average path length
Average exploration time
Average exploration volume
Ours
134.34 m
197.69 s
8153.3 m3
TARE
152.71 m
233.51 s
8237.9 m3
At the same time, in order to reflect the progress of the method proposed in this paper, several sets of real experiments were carried out on the same environment using the current advanced autonomous exploration method, and a set of experimental trajectory diagrams was selected as a reference, as shown in Fig. 9, and the final data are shown in Table 3. From the comparison of the final experimental data, in the experimental environment of the college building, the method in this paper requires less exploration trajectory length and exploration time than other methods, and the average exploration
Region Clustering for Mobile Robot Autonomous Exploration
387
amount is similar. Compared with other methods, the average exploration trajectory length is reduced by 12.03%, and the average exploration time is reduced by 15.33%.
5 Conclusion In this paper, a novel autonomous exploration method based on region clustering is proposed. Although some previous autonomous exploration methods introduce the guidance of global tasks, they also solve and plan the path length at the whole global level, which makes the path near-optimal solutions obtained by the robot in some environments have a large number of repeated visit probabilities and affect the exploration efficiency. In contrast, we use a region integration division based on the ISODATA idea for the unexplored regions in the global region. The region integration division needs to satisfy some constraints we specify, and this division can make these regions more in line with the realistic concept of exploration, so as to make the autonomous exploration task biased to complete the exploration of local regions of the environment map, such as a room on the floor. Secondly, we come up with a regional assessment method that is more comprehensive in judging the path of autonomous exploration and more in line with human exploration thinking, so as to reduce the difficulty of obtaining map information, which is also an aspect that reflects the efficiency of autonomous exploration. Finally, we combine the region division information, exploration cost and information gain to complete the exploration sequence planning and path planning. The simulation and real experimental results show that our proposed method reduces the probability of repeatedly visiting some regions in the process of autonomous exploration and makes the length of autonomous exploration paths significantly reduced, thus improving the efficiency of autonomous exploration. Acknowledgment. This work is supported by the Open Project of Shanghai Key Laboratory of Spacecraft Mechanism (Project No.18DZ2272200).
References 1. Delmerico, J., Mueggler, E., Nitsch, J., Scaramuzza, D.: Active autonomous aerial exploration for ground robot path planning. IEEE Robot. Autom. Lett. 2(2), 664–671 (2017) 2. Bennett, A.A., Leonard, J.J.: A behavior-based approach to adaptive feature detection and following with autonomous underwater vehicles. IEEE J. Oceanic Eng. 25(2), 213–226 (2000) 3. Thrun, S., Burgard, W., Fox, D.: Exploration in Probabilistic Robotics, pp. 569–574. MIT Press, Cambridge (2005) 4. Dang, T., Khattak, S., Mascarich, F., et al.: Explore locally plan globally: a path planning framework for autonomous robotic exploration in subterranean environments. In: 19th International Conference on Advanced Robotics (ICAR), pp. 9–16 (2019) 5. Khattak, S., Nguyen, H., Mascarich, F., Dang, T., Alexis, K.: Complementary multi-modal sensor fusion for resilient robot pose estimation in subterranean environments. In: 2020 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 1024–1029 (2020) 6. Craye, C., Filliat, D., Goudou, J.F.: RL-IAC: an exploration policy for online saliency learning on an autonomous mobile robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4877–4884 (2016)
388
H. Zheng et al.
7. Jadidi, M.G., Miro, J.V., Dissanayake, G.: Mutual information-based exploration on continuous occupancy maps. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6086–6092 (2015) 8. Bai, S., Chen, F., Englot, B.: Toward autonomous mapping and exploration for mobile robots through deep supervised learning. In: IEEE International Conference on Robotics & Automation (ICRA), pp. 2379–2384 (2017) 9. Yamauchi, B.: A frontier-based approach for autonomous exploration. In: IEEE International Symposium on Computational Intelligence in Robotics & Automation (2002) 10. Simmons, R.G., Apfelbaum, D., Burgard, W.: Coordination for multi-robot exploration and mapping. In: AAAI/IAAI, pp. 852–858 (2000) 11. Umari, H., Mukhopadhyay, S.: Autonomous robotic exploration based on multiple rapidlyexploring randomized trees. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1396–1402 (2017) 12. Lavalle, S.M.: Rapidly-exploring random trees: a new tool for path planning. Dept. Comput. Sci. 98(11) (1998) 13. Zhu, H., Cao, C., Scherer, S., Zhang, J., Wang, W.: DSVP: dual-stage viewpoint planner for rapid exploration by dynamic expansion. In; IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), pp. 7623–7630 (2021) 14. Cao, C., Zhu, H., Choset, H., Zhang, J.: TARE: a hierarchical framework for efficiently exploring complex 3D environments. In: Robotics: Science and Systems Conference (RSS). Virtual, vol. 10, no. 3 (2021) 15. Papadimitriou, C.H.: The complexity of the lin–kernighan heuristic for the traveling salesman problem. SIAM J. Comput. 21(3), 450–465 (1992) 16. Shan, T., Englot, B.: LeGO-LOAM: lightweight and ground-optimized lidar odometry and mapping on variable terrain. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4758–4765 (2018) 17. Cao, C., et al.: Autonomous exploration development environment and the planning algorithms. In: IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, pp. 8921–8928 (2022) 18. Zhang, J., Singh, S.: LOAM: lidar odometry and mapping in realtime. In: Robotics: Science and Systems, Berkeley, CA, vol. 2, no. 9, pp. 1–9 (2014)
Human Intention Understanding and Trajectory Planning Based on Multi-modal Data Chunfang Liu(B) , Xiaoyue Cao, and Xiaoli Li Faculty of Information and Technology, Beijing University of Technology, Beijing, China [email protected], [email protected]
Abstract. In the task of Human-Robot Interaction, human beings are at the center of the whole cooperation process. Robots always pay attention to the cooperator’s operation behavior through multi-modal information, constantly speculate on human’s operation intention, and then plan their interaction trajectory in advance. Therefore, understanding human’s operation intention and predicting human’s trajectory play an important role in the interaction between robot and human. This paper presents a framework of human intention reasoning and trajectory prediction based on multi-modal information. It mainly consists of two parts: Firstly, the robot accurately infers the human operation intention by combining four kinds of information: the human behavior identified by GCN-LSTM model, the object type obtained by YOLO, the relationship between verbs in sentences and interested objects and targets extracted by dependency parsing language, and the gesture of handheld objects obtained by GGCNN. Then, after understanding the intention of human beings, combined with some tracks that human beings are walking out of at present, KMP is used to predict the next tracks of human beings, which is the preparation for the trajectory analysis of robots in future human-computer interaction tasks. We verify the framework on several data sets, and the test results show the effectiveness of the proposed framework.
Keywords: Human-robot interaction Trajectory prediction
1
· Multi-modal information ·
Introduction
Human-Robot Collaboration (HRC) [1–3] means that robot and human cooperate to complete complex tasks in the shared workspace, Human-Robot Interaction (HRI) is one of the most important parts of HRC. In the HRI mode, robots are equivalent to human assistants, which can assist human beings to do repetitive, tiring and even dangerous work, while human beings are engaged that they are best at dexterous task. Hence, the cooperation between humans and robots c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 389–399, 2023. https://doi.org/10.1007/978-981-99-0617-8_26
390
C. Liu et al.
can give full play to their respective advantages, and it has a large application space in industrial fields and service industries. Therefore, HRI has become a hot research point in the robot industry [4–6]. In HRI, the first research focus is to reason human’s operation intention and predict human’s movement track, so that robots can plan their own travel route in advance and realize active interaction. In order to make robots quickly infer human’s operation intention and future movement position, Maniprice et al. combined Gaussian mixture model with Gaussian mixture regression to predict the target position to be reached by human and estimate the task space to be occupied by human [7]. Ferrer et al. put forward a probability framework consisting of prediction algorithm, behavior estimator and intention predictor, in which the intention predictor estimates the target position that humans want to approach [8]. Rehder et al. used a single neural network to deal with task identification and prediction [9]. Liu et al. proposed a deep learning system that combines convolutional neural network and LSTM with visual signals to accurately predict human motion [10]. However, in reasoning about human operation intention the information used by most of researchers is relatively single. In our work, we effectively integrate multi-modal data to analyze human operation intention. These data include the categories and posture of hand-held object, human skeleton data and human language. Then, Kernelized Movement Primitives (KMP) is used to predict the trajectory of human beings. The overall structure of this paper is shown in Fig. 1.
Fig. 1. Framework of human intention reasoning and trajectory prediction based on multi-modal information.
In this paper, the main contributions of our work are as follows: (1) Using multi-modal information to predict human operation intention in HRI. (2) Combining Dynamic Time Warping (DTW) with KMP. Specifically, the types of human hand-held objects are identified by You Only Look Once (YOLO), the posture of the hand-held objects is analyzed using Generative Grasping CNN (GG-CNN) so that robots can select a proper grasping way, and the manipulated objects and target positions are resolved by natural language. Human 3D skeleton data is classified by Spatial Temporal-Graph Convolutional NetworksLong Short Term Memory (ST-GCN-LSTM) to identify human behaviors. After the above four kinds of information are fused, the
Human Intention Understanding and Trajectory Planning
391
intention of human operation can be analyzed more accurately, making preparations for human trajectory prediction.On the other hand, In order to predict the trajectory of human body, DTW matches the types of tasks according to some observed trajectory points of human body, KMP generalizes the trajectory parameters of known tasks and takes the observation points as the expected path points of KMP.
2
Related Work
When a robot interacts with a human, it needs to first analyze the intention of the human through a variety of information, and then extract the appropriate human-taught trajectory features through imitation learning, so that the robot can plan the path according to the trajectory shape. Human Operation Intention. Aiming at the robot’s ability to understand human behavior, our research group proposed a multi-task model for recognizing human intentions, which consists of two sub-tasks of human action recognition and hand-held target recognition. Our experiment proves that the Spatial Temporal-Graph Convolutional Networks Long Short Term Memory (ST-GCNLSTM) model [11] is obviously superior to the single Graph Convolutional Networks (GCN) [12] model and the single Long Short Term Memory Network (LSTM) [13] model in realizing the task of human motion recognition. At the same time, we use You Only Look Once (YOLO) model to detect human handheld items. Farrajota et al. [14] utilizes the joint relationship between body joints, uses a series of residual automatic encoders to generate multiple predictions, and then combines them to provide a possibility mapping schematic diagram of motion trajectory. Chen et al. [15] developed a framework called Dirichlet process active region, which solved the problems caused by Gaussian Process Regression modeling human motion, thus improving the prediction of pedestrian motion and improving the prediction accuracy. Luo et al. [16] proposed an unsupervised online learning algorithm with a two-layer Gaussian mixture model framework. This unsupervised framework can build models at runtime, thus adapting to new people and new action styles. Skills Learning. Imitation learning [17] can extract its motion characteristics from a small number of teaching samples. Based on this advantage, more and more researchers use imitation learning algorithm to realize robot skill learning. For example, the Dynamical Movement Primitives (DMPs) [18] proposed by Ijspeert et al., which generates the desired trajectory through a dynamic system with asymptotic stability and a mandatory function that controls the trajectory shape. In [19], the hidden Markov model is used to model the motion trajectory, in which the identified key points are used to indicate the transition between states. In the new case, the trajectory is generated again via interpolation of these key points. In [20], Gaussian mixture model is used to model the teaching trajectory, and Gaussian mixture regression model is used to generate a new trajectory. Probability model method has strong robustness, the
392
C. Liu et al.
ability to deal with motion noise and the ability to encode high-dimensional motion.Because of the uncertainty of human motion in human-computer interaction, that is, human motion is changeable, in order to ensure the dexterity of HRI, the generation of robot motion must depend on the trend of human motion, which leads to the uncertainty of robot motion. For the uncertainty of this kind of motion, the dynamic motion system representation method through one-time teaching and learning can’t be applied. Therefore, Paraschos et al. put forward a Probabilistic Movement Primitives (ProMPs) [21] based on Gaussian distribution, which uses maximum likelihood estimation to estimate the probability distribution of motion trajectory parameters, and then uses Gaussian conditional probability to calculate the motion observation state to adjust the motion of new tasks. DMPs cannot meet the adjustment requirements of the desired point position and speed, and ProMPs can’t be extended to the case of high-dimensional input. Therefore, HUANG et al. put forward the Kernelized Movement Primitives (KMP) [22], which avoids the definition of the basis function by using the kernel technique, thus solving the problem of high-dimensional input in the process of learning and teaching trajectory probability model. The structure of this paper is as follows. The Sect. 3 introduces the prediction of human intention based on multi-modal information. The Sect. 4 explains how to predict the trajectory of human intention and part of human movement. In the Sect. 5, we evaluate the proposed model through experiments. The last Sect. 6 summarizes the research results and gives a conclusion.
3 3.1
A Speculative Model of Human Operation Intention Categories and Postures of Hand-Held Objects
We use two kinds of neural networks to realize the category detection and attitude detection of hand-held objects respectively. GGCNN is a lightweight framework, which can detect objects in real time. It predicts the grabbing pixel by pixel, instead of sampling and classifying a single grabbed object, so it realizes the recognition and grabbing faster. However, GGCNN cannot select objects to grab. To make up for this shortcoming, we introduce YOLO for object detection in advance to identify the object we want to grab. YOLO algorithm was originally proposed by Joseph et al. as a method to consider the problem of target detection with the idea of regression. It can directly predict the bounding box and category label from the input image with only one convolutional neural network structure. When the object held by human is a ‘cup’, (1) The first task: the task of human is to “hand the cup to the robot”, and the task that the robot receives through natural language is to “put the cup on the table”. At this time, the robot knows that the cup is in the hand through the camera, and according to part of the movement trajectory of the hand, the robot plans its own movement direction in advance, which is the direction of the hand, so as to realize interaction with human beings. (2) The second task: The human task is to “let the robot pour milk into the cup in his hand”, and the task that the robot receives through
Human Intention Understanding and Trajectory Planning
393
language is to “find the milk and pour it into the cup”. At this point, the first step of the robot is to find the milk and catch it; The second part, find the cup and do the dumping action. Obviously, it is difficult to predict the intention of human beings when the types of hand-held objects are the same. Therefore, we mainly fuses the information of hand-held object category with gesture recognition, language recognition and human motion recognition based on bone data, and more accurately infers the intention of human operation. 3.2
The Forecast of Human Behavior
Human movement can be described by the movement of some major joint points [23]. Therefore, for the task of human motion recognition, we use depth camera Kinect to obtain the main joints of human body to describe human motion. The bone data of each frame (time) corresponds to the coordinate position information of 25 bone points of human body, and each time series consists of several frames. Human body action recognition is to determine the type of action it belongs to by dividing the sequence in time domain in advance. Human motion data is a typical time-space sequence data, which not only depends on the motion changes of human joints in three-dimensional space, but also depends on the continuity of motion among human joints. Considering these two aspects, we effectively fuse the Long Short Term Memory (LSTM) and the Spatio-Temporal Graph Convolution Network (ST-GCN), so that the fused neural network, that is, the Spatial Temporal-Graph Convolutional NetworksLong Short Term Memory (ST-GCN-LSTM) [11], has powerful ability to describe sequential data and capture high-level spatial structure information in each frame to identify human behavior. LSTM and GCN use different description forms for human skeleton sequence data, so they show different advantages in the classification results of human behavior. We fuse the results of LSTM and GCN by the way: O = θOLST M + OST −GCN
(1)
In Eq. 1, OLS T M and OS T −G C N respectively represent the output representations of LSTM and ST-GCN models for human skeleton data, and θ represents the empirical parameters obtained in the experiment. 3.3
Imperative Statement Parsing
There are many interactive ways between humans and robots, such as human gestures, natural language, key devices, etc. Among them, the form of natural language is the simplest way for human beings. However, for robots, it is necessary to have the ability to independently transform natural language into action sequences that robots can understand. Therefore, it is required that robots can extract the key information of tasks according to human natural language instructions. In this part, CoreNLP [24], a syntactic analysis tool, is used to analyze the imperative sentences given by people, and the relationships between verbs, interested objects and targets in the sentences are extracted. CoreNLP
394
C. Liu et al.
contains rich annotators to handle natural languages differently. Firstly, pos annotator [25] is used to judge the part of speech of the words in a sentence, and then parse annotator [26,27] is used to analyze the dependencies among words. The analysis effect of CoreNLP is shown in Fig. 2, including the results of part-of-speech analysis, basic dependency analysis and enhanced dependency analysis.
Fig. 2. Example of analysis effect of corenlp.
4
Human Trajectory Prediction
A set of demonstrated training data based on time and trajectory is formulated N
H
as D = {{tn ,h , ξn ,h }n =1 }h=1 , where tn ,h represents time and ξn ,h ∈ R3 denotes the n − th position point of the h − th teaching trajectory. N and H correspond to the track length and the number of demonstrations, respectively. Let the output of parametric trajectory be the basis function multiplied by the weight coefficient as Eq. 2 and the matrix Θ(s) is basis functions matrix. Dp = ξ(s) = ΘT (s)ω
(2)
We assume that the weight vector ω follows a normal distribution, namely, ω ∼ N (μω , Σω )
(3)
where the mean μω and the covariance Σω both are unknown. Hence, the parametric trajectory satisfies: ξ(s) ∼ N (Θ(s)T μω , Θ(s)T Σω Θ(s))
(4)
The ultimate goal of ProMPs and KMP in trajectory reproduction is to solve the probability distribution of trajectory parameter ω, but the solution process is different.
Human Intention Understanding and Trajectory Planning
4.1
395
Kernelized Movement Primitives
ProMPs needs to specify the basis functions used to fit the trajectory in advance. However, for the case of high-dimensional input, a large number of basis functions are often needed, so it is difficult to apply ProMPs to learning the trajectory with multi-dimensional input. In order to restore the teaching trajectory as much as possible, the similarity between the parametric trajectory expressed by formula 2 and the probability reference trajectory Dr = {tn , μˆn , Σˆn }N n =1 needs to be as high as possible. More specifically, the ultimate goal of KMP is to make the probabilistic parametric trajectory distribution in Eq. 6 match the probabilistic reference trajectory distribution in Eq. 7. We adopt to minimize the Kullback-Leibler divergence (KL-divergence) to derive the optimal solution of mean μω and covariance Σω in the parametric trajectory so as to measure the distance between the two probability distributions and ensure that the information loss is minimized in the process of imitation learning. Suppose the objective function is Eq. 5. Jini (μω , Σω ) =
N
DKL (Pp (ξ|tn )||Pr (ξ|tn ))
(5)
n=1
P p (ξ|tn ) and P r (ξ|tn ) represent the probability distribution of parametric trajectory and reference trajectory when tn is input, respectively. Pp (ξ|tn ) = N (ξ|Θ(tn )T μω , Θ(tn )T Σω Θ(tn ))
(6)
Pr (ξ|tn ) = N (ξ|N (μω , Σω )
(7)
One of the highlights of KMP is that the kernel function is used to process the predicted output, which avoids the display mapping expression of the basis function for the input s∗ , thus solving the problem of high-dimensional input calculation. The kernel matrix is defined as: k(si , sj ) = Θ(si )T Θ(sj )
(8)
Then, for any trajectory, the output corresponding to the new input s∗ and KMP prediction is: (9) ξ(s∗ ) = k ∗ (K + λΣ)−1 U Σ = blockdiag(Σˆ1 , Σˆ2 , · · · , ΣˆN ) U = μˆ1 μˆ2 · · · μˆN where μˆN and ΣˆN refer to the mean and covariance of Dr (Fig. 3).
(10) (11)
396
C. Liu et al.
Fig. 3. Types and attitudes of objects.
5
Experiments
Extraction process of task key parameters in multi-modal interaction between language and vision. For the task of “putting apples in a bowl and bananas on a plate”, the instruction statement “Put the apple in the bowl, put the banana in the plate.” consists of two simple sentences. The task scene, the recognition results and the task sequence obtained by the method of extracting key parameters proposed in this paper are shown in Fig. 4. Among them, the task sequence is extracted by direct semantic analysis and keyword analysis. We transfer the feedback of the vision sensor installed on AUBO to the trained network framework, and use YOLO trained in RGB images and GGCNN network trained in depth images to check the results of object detection and grabbing. The selected objects are the same 3 objects-wooden sticks, wooden boards and cans. In ten experiments, we placed objects at random locations in the workspace. From the experiment, out of 50 target detection attempts, 38 were successful, and the detection success rate was about 76%. For the grabbing task, 32 of the 50 attempts were successful, which is about 64% of the success rate of the experiment. In the aspect of trajectory prediction, we first collect the human teaching trajectory as the input of KMP algorithm, namely the gray line in Fig. 5. Then, learn the parameters of the trajectory and visualize them, that is, the red line in Fig. 5. Finally, according to the points observed by the vision sensor, as the expected points of KMP, the KMP algorithm can predict the next trajectory.
Human Intention Understanding and Trajectory Planning
397
Fig. 4. Scene and task key information extraction result of task.
Fig. 5. Trajectory predicted by KMP.
6
Conclusion
In this paper, we propose a framework of human intention reasoning and trajectory prediction based on multi-modal information. Firstly, human intentions are analyzed from different angles, and key information is extracted according to multi-modal data, so that robots can understand human behavior. Secondly, combined with the observed part of human tracks, we can completely predict the next possible human tracks. Make preparations for the robot to start planning its trajectory in advance, and make the interaction between robot and human more natural and active. Specifically, multi-modal information consists of four parts: the object types identified by YOLO, the object posture analyzed by GGCNN, the target position and the object types of interest extracted by CoreNLP processing natural language, and the human behavior classified by ST-GCN-LSTM. Through the multi-angle information, the robot can understand the human inten-
398
C. Liu et al.
tion more accurately. Then, combined with DTW’s partial trajectory of human beings and KMP, the next trajectory of human beings is predicted. Next, our work will focus on robot trajectory prediction in human-computer interaction scenarios. Collect the teaching trajectory in the process of humancomputer interaction, and use imitation learning method to learn the trajectory characteristics. The reasoning of human intention and trajectory prediction in this paper are applied to human-computer interaction scenes. Acknowledgment. The work was jointly supported by Beijing Natural Science Foundation (4212933), Scientific Research Project of Beijing Educational Committee (KM202110005023) and National Natural Science Foundation of China (62273012).
References 1. Shu, T., Ryoo, M.S., Zhu, S.-C.: Learning social affordance for human-robot interaction. arXiv preprint arXiv:1604.03692 (2016) 2. Shu, T., et al.: Learning social affordance grammar from videos: transferring human interactions to human-robot interactions. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1669–1676. IEEE (2017) 3. Maeda, G., Neumann, G., Ewerton, M., Lioutikov, R., Peters, J.: A probabilistic framework for semi-autonomous robots based on interaction primitives with phase estimation. In: Bicchi, A., Burgard, W. (eds.) Robotics Research. SPAR, vol. 3, pp. 253–268. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60916-4 15. ISBN 978-3-319-60915-7 978-3-319-60916-4 4. Amor, H.B., et al.: Interaction primitives for human-robot cooperation tasks. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 2831–2837. IEEE, Hong Kong (2014). ISBN 978-1-4799-3685-4 5. Ewerton, M., et al.: Learning multiple collaborative tasks with a mixture of interaction primitives. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 1535–1542. IEEE, Seattle (2015). ISBN 978-1-4799-6923-4 6. Huang, Y., et al.: Toward orientation learning and adaptation in cartesian space. IEEE Trans. Robot. 37(1), 82–98 (2021). ISSN 1552-3098, 1941-0468 7. Mainprice, J., Berenson, D.: Human-robot collaborative manipulation planning using early prediction of human motion. In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 299–306. IEEE (2013) 8. Ferrer, G., Sanfeliu, A.: Behavior estimation for a complete framework for human motion prediction in crowded environments. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 5940–5945. IEEE (2014) 9. Rehder, E., et al.: Pedestrian prediction by planning using deep neural networks. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 5903–5908. IEEE (2018) 10. Liu, Z., et al.: Deep learning-based human motion prediction considering context awareness for human-robot collaboration in manufacturing. Proc. CIRP 83, 272– 278 (2019) 11. Liu, C., et al.: Robot recognizing humans intention and interacting with humans based on a multi-task model combining ST-GCN-LSTM model and yolo model. Neurocomputing 430, 174–184 (2021)
Human Intention Understanding and Trajectory Planning
399
12. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. Adv. Neural Inf. Process. Syst. 29 (2016) 13. Malhotra, P., et al.: Long short term memory networks for anomaly detection in time series. In: Proceedings, vol. 89, pp. 89–94 (2015) 14. Farrajota, M., Rodrigues, J.M.F., du Buf, J.M.H.: Human action recognition in videos with articulated pose information by deep networks. Pattern Anal. Appl. 22(4), 1307–1318 (2019) 15. Chen, Y., et al.: Predictive modeling of pedestrian motion patterns with Bayesian nonparametrics. In: AIAA Guidance, Navigation, and Control Conference, p. 1861 (2016) 16. Luo, R., Hayne, R., Berenson, D.: Unsupervised early prediction of human reaching for human-robot collaboration in shared workspaces. Auton. Robot. 42(3), 631–648 (2018) 17. Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999) 18. Ijspeert, A.J., et al.: Dynamical movement primitives: learning attractor models for motor behaviors. Neural Comput. 25(2), 328–373 (2013) 19. Vakanski, A., et al.: Trajectory learning for robot programming by demonstration using hidden Markov model and dynamic time warping. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 1039–1052 (2012) 20. Khansari-Zadeh, S.M., Billard, A.: Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Trans. Robot. 27(5), 943–957 (2011) 21. Paraschos, A., Daniel, C., Peters, J.R., Neumann, G.: Probabilistic movement primitives. Adv. Neural Inf. Process. Syst. 26 (2013) 22. Huang, Y., et al.: Kernelized movement primitives. Int. J. Robot. Res. 38(7), 833– 852 (2019) 23. Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2), 201–211 (1973) 24. Manning, C.D., et al.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014) 25. Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 252–259 (2003) 26. De Marneffe, M.-C., MacCartney, B., Manning, C.D., et al.: Generating typed dependency parses from phrase structure parses. In: Lrec, vol. 6, pp. 449–454 (2006) 27. Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. Adv. Neural Inf. Process. Syst. 15 (2002)
Robotic Arm Movement Primitives Assembly Planning Method Based on BT and DMP Meng Liu1 , Wenbo Zhu1(B) , Lufeng Luo1 , Qinghua Lu1 , Weichang Yeh2 , Yunzhi Zhang1 , and Qingwu Shi1 1 School of Mechatronics Engineering and Automation, Foshan University, Foshan 528000,
China [email protected] 2 National Tsing Hua University, Hsinchu, Taiwan, Republic of China
Abstract. To realize the skill migration and generalization of robotic arm in industrial production. We extracts 7 basic movement primitives from industrial tasks. Parameterize each primitive and build a library of movement primitives, and give the connection mode between primitives. Then, the robotic arm Behavior Tree (BT) is constructed according to the execution logic of the task. When faced with a new task, use Dynamic Movement Prmitives (DMP) to generalize the movement primitives according to the target pose. When faced with an unknown environment, control the action by selecting a specific BT. Finally, the effectiveness of the framework is verified through experiments. Keywords: Robotic arm · Behavior tree · Movement primitives · Dynamic movement primitives
1 Introduction Robotic arms are widely used in specific fields, such as cell phone parts assembly [24, 25] and fruit picking [26, 27], which have largely helped humans improve productivity and efficiency. Currently, there is a desire to extend the specific application areas of robots to non-specific areas, so that they can serve humans in a wider range of applications. This requires robots with strong generalization capabilities to perceived environments and demonstrated actions, and also requires robots with more accurate force-position control systems. Industrial robotic arms can already replace humans in most of the work on the production line [12], and the accuracy of operation in some actions has exceeded than that of humans [15]. But the most applied robotic arm control method is still the traditional programming control. In recent years, many scholars have carried out lots of research on the free programming of robotic arms, aiming to reduce the dependence of robotic arms on programmers [10, 19, 21]. Therefore, how to make industrial robots perceive the external environment and learn skills independently is the way to promote the development of industrial robots [22]. The 3C industry, which is the intersection of manufacturing and information industry, currently has a low level of automation due to its many products, small size, and © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 400–412, 2023. https://doi.org/10.1007/978-981-99-0617-8_27
Robotic Arm Movement Primitives Assembly Planning Method
401
tedious assembly process. Besides, due to its rapid product renewal, there is an urgent need for intelligent technology to achieve autonomous production, and let the intelligent system on the production line reach one-size-fits-all. Kinematics words such as trajectory segmentation and movement primitives have been used by researchers. When human arms complete a task, they often subconsciously combine a series of discrete low-level actions (such as elbow flexion and extension, shoulder rotation, operating finger force) are connected in series in a specific way [23]. In an industrial production line, the complex skills of a robotic arm are composed of a combination of several lower-level skills. Through the hierarchical analysis of task-skill-movement primitives, we enumerate common basic skill primitives, such as push, pull, lift, press, and rotate. Then build a basic movement primitive library [5]. When faced with new environments and tasks, skills can be quickly reorganized to adapt to new operating environments and tasks. In recent years, service-oriented robots have become a research hotspot. How to make robots evolve from a single task setting to a complex environment? This not only requires the robot to have the ability to fuse the high-dimensional information of the world [16], but also requires the robot to have the ability to make rapid independent decisions. In this paper, we propose a robotic arm autonomous learning system based on DMP [20, 29] and BT [30], which discretizes continuous robotic arm actions into movement primitives. DMP can generalize the movement primitives to make them work accurately in a new working environment [3]. We also introduces a BT, which not only creates the task model as a BT model, but also can complete the robotic arm’s execution of a new task [14]. Based on the actual task requirements, this paper studies the method of autonomous learning skills of the robotic arm. The main research methods of this research are as follows: (1) Summarize different industrial scenario categories and build a BT of similar task logic. (2) To segment the skill primitives from the trajectory of the robot arm and construct a relational skill primitive library. (3) Movement primitives are extracted from the primitives library according to the execution order of the primitives. The generalized action primitives are used as the leaf nodes of the BT.
2 Related work At present, many scholars have made contributions in the field of trajectory planning and autonomous learning of robotic arms. Some of them are based on bionicsanthropomorphic arms, and the motion of robots is planned by analyzing the characteristics of human motion; the other is in robotic arms. In terms of trajectory segmentation, the continuous trajectory is decomposed into sub-trajectories by dividing the motion trajectory to achieve effective control of complex motion. 2.1 Motion Planning of Anthropomorphic Arm Wei Yuan et al. [1] extracted the basic motion primitives of human arm motion, implemented a motion planning method of anthropomorphic arm with a framework based
402
M. Liu et al.
on action primitives, and compared it with the actual human arm on the NAO platform. Accuracy also simplifies complex movements. Fang Cheng et al. [2] established a library of motion primitives for anthropomorphic arms and defined the connection grammar between primitives. Giuseppe et al. [7] conducted experiments from 30 tasks in human daily life, used functions to fit these joint trajectories, and extracted the principal components of the human arm joint trajectory function through principal component analysis, effectively reducing the set of basic elements in complex motion. 2.2 Trajectory Segmentation and Generalization Zhang et al. [4] segmented the complete action and then generalized it with DMP, which greatly reduced the difficulty of modeling. Simon et al. [11] treat the connection of movement primitives as a multi-classification problem, learn the parameters of the DMP from manually segmented demonstrations, and improve the primitive reorganization method with the observed primitive connection order. Jankowski et al. [9] proposed a key position demonstration mechanism, which only provides a series of teaching points, and recovers the missing information through linear optimal control, which improves the planning efficiency of the robotic arm and improves the smoothness of the trajectory. Song et al. [28] proposed an unsupervised trajectory segmentation method. By setting a threshold for the parameter change of the trajectory, a representative trajectory is segmented according to the threshold key point, which reduces the number of key points when the trajectory recurs. Lioutikov et al. [13] proposed a segmentation algorithm for unlabeled trajectory data, which is used to segment the trajectory and form a primitive library. However, the action primitives segmented according to the trajectory data of the robotic arm are deterministic and can only be used in structured work. Environment, and the current manipulator planning algorithm is inefficient and difficult to adapt when faced with a new task scenario.
3 Robotic Arm Movement Primitives 3.1 Extraction of Movement Primitives The trajectory of robot motion is usually a series of high-dimensional data, modularizing the data in task-based motion planning is a common dimensionality reduction method for motion planning of the robot. The execution of tasks for robotic arm contains two levels: kinematics and dynamics, in this paper, most of the movement primitives are discussed based on the kinematic level. At present, most of the trajectory planning of the robotic arm is based on Cartesian space, because cartesian space can give the state of the end-effector (EE) more intuitively, which is conducive to human-machine interaction or obstacle avoidance operation of the robotic arm. In the cartesian space of the end of the robotic arm, there are the following 6 kinds of motion information, which are the displacement vector T of the geometric center of the EE, the clamping force of the EE or the elastic force F of the robotic arm to the external output, the degree of opening and closing of the clamping jaws of the EE θ and the opening and closing speed V , the torque of the rotation of the EE of the
Robotic Arm Movement Primitives Assembly Planning Method
403
Fig. 1. Several motion properties of the robotic arm and EE
robotic arm torque Tor. With the end coordinates reset C caused by the EE replacement operation (Fig. 1). These 6 motion attributes can be combined to form basic movement primitives. In the set of common operations for industrial and service robots, 7 movement primitives are extracted from the end Cartesian space, namely: translation, Push/pull/lift/insert, press, home, open jaw, rotation, closed/open EE, and EE replacement. The definition and properties of each movement primitive are shown in Table 1. Table 1. The properties and functions corresponding to each action primitive Movement primitives
Attributes
Encapsulated functions
Translation
T
ee_trans()
Push/pull/lift/insert
T, F
ee_act_on()
Press
F
ee_press()
Home
T, Tor
home()
Close/open EE
V, F, θ
ee_clopen()
Rotation
Tor
ee_rot()
EE replacement
C
–
3.2 Connection Pattern of Movement Primitives The order of connection and arrangement of movement primitives is not arbitrary, for example, the closing and opening EE cannot be performed at the same time, because they are a mutually exclusive set of actions. After we finish defining the movement primitives,
404
M. Liu et al.
the next step is to define the connection syntax of the movement primitives according to the task requirements in the industrial application scenario. In general, the robotic arm does not continuously execute one discrete movement primitive after another during its work. But selects the execution mode of the actions in the movement primitives library according to the actual needs, and the logical execution of movement primitives can be summarized as parallel, sequential and gradual. For example, the open and closed claws cannot be executed in parallel (Fig. 2).
Fig. 2. Different ways to connect movement primitives
If the motion primitive i and motion primitive j are asymptotically connected, we assume that mij (t) is mi (t), mj (t) are the motion parameters of the i-th and j-th movement primitives, respectively, tis , tjs are the start times of the i-th and j-th movement primitives, respectively. tie , tje are the end times of the i-th and j-th movement primitives, respectively. Then we have: ⎧ ⎪ t < tjs ⎨ mi (t) (1) mij (t) = λmi (t) + (1 − λ)mj (t) tjs ≤ t ≤ tie ⎪ ⎩ e mj (t) t > ti s t − tj π + 0.5 (2) λ = 0.5cos e ti − tjs 3.3 DMP-Based Generalization of Movement Primitives DMP is a trajectory imitation method that converts complex motion into a nonlinear description [6], producing a simulated trajectory similar to the taught trajectory. Due to its strong nonlinear nature and high real-time performance, it is widely used in the field of trajectory generation for robotic arms. The leaf nodes (movement primitives) of the BT need to be generalized using the DMP when the environmental parameters of the robotic arm performing the task change. The DMP can be represented by the spring motion decay model as follows. y¨ = α(β(g − y) − y˙ )
(3)
Robotic Arm Movement Primitives Assembly Planning Method
405
where g is the target vector and y denotes the current EE center coordinate or joint angle, so that y˙ , y¨ , denotes the velocity and acceleration, respectively. α, β are the system parameters to ensure that the system is in a normal state. Equation (3) can be viewed as a second-order system in which the robotic arm simply moves from (y(0), y˙ (0)) to (g, 0), but it cannot control the motion process. To address this, we add a nonlinear forcing term f to Eq. (3). We also have to control the velocity of the trajectory by changing the velocity to obtain a different trajectory, which requires multiplying the velocity y˙ by a deflation term τ . τ 2 y¨ = α(β(g − y) − τ y˙ ) + f
(4)
The forcing term f determines the trajectory of the motion, so f is suitable for normalized linear superposition with multiple nonlinear basis functions: N
f (t) =
ψi (t)ωi
i N
(5) ψi (t)
i
where ψi = exp −hi (x − ci )2 is the radial basis function, N is the number of basis functions, ωi is the weight coefficient of the basis function, ci is a constant between 0 and 1, and hi , ci are the variance and mean of the Gaussian distribution. Due to the nonlinearity of the Gaussian radial basis functions, it is easy to obtain that the DMP system is also a nonlinear system. Then we need to solve the weights ωi based on the schematic data samples, assuming a set of schematic samples as D = (y, y˙ , y¨ ), and according to Eq. (4), we can calculate the expression of the forcing term fd , and then reduce the distance between fd and the target forcing term fitted by the Gaussian radial basis function finally. Using the minimum distance algorithm: wi = argmin
N
2 fd (t) − ftarget x(t)
(6)
t
where x is a function of time and the final result: sT ψi fd (7) sT ψi s ⎛ ⎞ ⎛ ⎞ xt0 ψi (t0 ) ⎜ · ⎟ ⎜ · ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ where s = ⎜ · ⎟ · (g − y0 ), ψi = I · ⎜ · ⎟. ⎜ ⎟ ⎜ ⎟ ⎝ · ⎠ ⎝ · ⎠ ψi (tn ) xtN DMP has good ability to generate generalized trajectories. Each parameter of the model has been obtained, and when the trajectory is generalized or reproduced, only different parameters need to be set to get similar trajectory curves [8] (Fig. 3). ωi =
406
M. Liu et al.
Fig. 3. Trajectory of DMP generalization
4 Assembly Planning Method Based on BT The BT is a directed tree and a powerful tool for modularizing complex systems. The root node is its entrance. Other internal nodes are called control nodes, and leaf nodes are called execution nodes. Each leaf node represents a movement primitive of the robot arm. In robot task control, all parent nodes of the BT include sequential nodes, parallel nodes, and selection nodes. The leaf nodes of the sequential node are executed sequentially from left to right, and only if all the nodes return true, then the sequential node value is true; the leaf nodes under the parallel node are executed concurrently, and if one of them returns false, then the parallel node returns false; the leaf nodes under the selection node are executed sequentially from left to right, and if one of them returns true, then the selection node returns true. When the robot arm executes the task, it must start from the root node (the root node is a sequential node) and execute each child node of the root node in turn, as shown in Fig. 4. Starting from the root node, it traverses each leaf node from left to right in turn (Fig. 5).
Fig. 4. BT with only one sequential node
Generally, a BT represents a task, and if the BT needs to be expanded when a new task scenario is recognized, by adding subtrees the BT algorithm can be adapted to the changing scenario when performing the task. For example, when the target pose is different from the demonstrated target pose, a selection node to identify the target pose can be added to the BT. In industrial production lines, the execution logic of the same class of tasks of robotic arms is the same, i.e., they have the same BT. We compose a library of BT by establishing the BT of the execution logic of various tasks (e.g., singlearm pick&place, wipe, double robotic arms working together, etc.) [17, 18], and when
Robotic Arm Movement Primitives Assembly Planning Method
407
Fig. 5. BT with sequential and selection nodes
executing in industrial sites, we need to set the category of tasks artificially and extract the category from the library of BT. The BT is extracted from the BT library, and the specific BT is selected for deletion according to the actual task requirements, and then the parameters of the operation target are recognized by the camera, and the primitive trajectory is generalized using DMP (Fig. 6).
Fig. 6. Select the BT according to the task scenario and cull the tree according to the specific needs
5 Experimental Verification 5.1 Model and Task Building We did this experiments in a 6-degree-of-freedom UR5e robotic arm with a 60 cm * 60 cm rectangular working platform, and the camera is on top of the platform. Besides, the clamping device is a two-finger clamping jaw, and the maximum jaw opening is 86 mm.
408
M. Liu et al.
In the pick&place task environment, 50 sets of actions are demonstrated to obtain expert trajectory data. The demonstrated trajectory is segmented as the basic movement primitives, which are moving, opening jaw, clamping, and releasing. DMP generalizes the motion trajectory in the data, and sets target points in DMP according to different target positions recognized by the camera to obtain new trajectory. The execution sequence of this task is “move”,“open EE”, “clamp”, “translation”, and “release”. The basic BT is selected from the BT library according to the task execution logic as follows (Fig. 7):
Fig. 7. BT for pick&place tasks
During the experiments, the utility of the method was tested under a single pick task by first removing the place task from the BT and changing the position of the pinch target before each pick to test the generalization effect of DMP on trajectories under BT control. The place node was then added to the BT to test the effect of the method on the entire task (Fig. 8).
Fig. 8. Grasp cylinders and rectangles in different positions and poses
Robotic Arm Movement Primitives Assembly Planning Method
409
In the pick&place task, two taught trajectories are reproduced according to the DMP, pick and place, and we define the experimental error as the average value of the variance between the actual arrival of the robot arm at the point Pt and the position of the object at P. j i 2 1 1 Pjt − Pj (8) E= n n m m Within a certain range, the error is allowed to exist, the error is related to the rotation angle and opening angle of the robot arm EE, as the rotation angle is 360°, then only the amount of jaw opening is considered, the maximum value of the robot arm jaw opening determines the maximum error limit, and the larger the jaw angle, the larger the error limit. 5.2 Experiment Results The objects clamped during the experiment include the round bead-shaped object and the rectangular object, and the position and attitude angles of the objects identified by the camera during the experiment are x, y and θ respectively, as shown in Fig. 9(a). We conducted a total of 40 tests of the pick and pick&place task, with 20 cylindrical and 20 rectangular objects, respectively, whose geometric center coordinates in the camera recognition area are shown in Fig. 9(b). The pick task was tested first, followed by the pick&place task. We divided the pick&place task into two groups, picking cylindrical objects and rectangular objects, respectively.
Fig. 9. (a) are the parameters of the position and pose of a single grasping object. (b) are the distribution of the geometric centers of all grasping objects
Under the pick and pick&place tasks, the grasping results obtained by setting different error limits are shown in Fig. 10. The grasping success rate of a single pick is greater than that of pick&place, which indicates that the more complex the task is, the lower the success rate will be.
410
M. Liu et al.
Fig. 10. The effect of error limits on the success rate of pick tasks and pick&place tasks
Then we count the grasping success rate of cylindrical object and rectangular object under pick&place task, and the results are shown in Fig. 11. It is found that the grasping success rate of cylindrical object is greater than that of rectangular object, which is because only the EE needs to move to its geometric center when grasping the former, while the latter needs to move to the suitable position for grasping.
Fig. 11. The effect of error limits on the same task under different objects
Robotic Arm Movement Primitives Assembly Planning Method
411
6 Conclusion and Prospect In this paper, the control concept of BT is introduced to split the continuous motion of the robot arm into discrete movement primitives, so that the motion of the robot arm has better modularity and portability. And DMP is introduced to generalize the trajectories of primitives, and good results are obtained. However, the BT needs to be manually selected for the execution of different kinds of tasks. Since the tasks of the intelligences in different industrial production lines are very different, it will be a great challenge to make the intelligences perceive the tasks autonomously and achieve a real unmanned production mode to achieve the real sense of intelligent manufacturing. Our next work will focus on the perception and control of industrial assembly robots, so that robots can understand what task and state they are in at the moment, and can make autonomous decisions based on the difference between tasks and real-time states. This will be a great challenge. Acknowledgment. This work was supported by the “New Generation Artificial Intelligence” Key Special Project of the Guangdong Key Area Research and Development Program, “Multiple Degrees of Freedom Intelligent Body Complex Skill Autonomous Learning, Key Components and 3C Manufacturing Demonstration Application” (2021B010410002). Guangdong Provincial Key Area R&D Program (2020B0404030001) and National Natural Science Foundation of China Youth Project “Study on Adaptive Problems and Update Mechanisms of Online Learning of Visual Ash Data Stream” micron-level real-time vision inspection technology and system research and development Floating Measurement of Tailings” (62106048).
References 1. Wei, Y., Zhao, J.: Designing human-like behaviors for anthropomorphic arm in humanoid robot NAO. Robotica 38(7), 1205–1226 (2020). https://doi.org/10.1017/S02635747190 0136X 2. Fang, C., Ding, X., Zhou, C., et al.: A^2ML: a general human-inspired motion language for anthropomorphic arms based on movement primitives. Robot. Auton. Syst. 11 (2018) 3. Zhou, P., Zhao, X. Tao, B., Ding, H.: Combination of dynamical movement primitives with trajectory segmentation and node mapping for robot machining motion learning. IEEE/ASME Trans. Mechatron. (2022). https://doi.org/10.1109/TMECH.2022.3196036 4. Zhang, Y., Yang, C.: Automatic regrouping of trajectories based on classification and regression tree. Int. J. Modell. Identif. Control 35(3), 217–225 (2021) 5. Lioutikov, R., et al.: Learning movement primitive libraries through probabilistic segmentation. Int. J. Robot. Res. 36(8), 879–894 (2017) 6. Kober, J., et al.: Learning movement primitive attractor goals and sequential skills from kinesthetic demonstrations. Robot. Auton. Syst. 74, 97–107 (2015) 7. Averta, G., et al.: Exploiting upper-limb functional Principal Components for human-like motion generation of anthropomorphic robots. J. NeuroEng. Rehabil. 17(1), 1–15 (2020) 8. Maeda, G., et al.: Probabilistic movement primitives for coordination of multiple human— robot collaborative tasks. Auton. Rob. 41, 593–612 (2017) 9. Jankowski, J., Racca, M., Calinon, S.: From key positions to optimal basis functions for probabilistic adaptive control. IEEE Robot. Autom. Lett. 7(2), 3242–3249 (2022). https:// doi.org/10.1109/LRA.2022.3146614
412
M. Liu et al.
10. Kim, S., Coninx, A., Doncieux, S.: From exploration to control: learning object manipulation skills through novelty search and local adaptation. North-Holland (2021) 11. Calinon, S., Guenter, F., Billard, A.: On learning, representing, and generalizing a task in a humanoid robot. IEEE Trans. Syst. Man Cybern. Part B 37(2), 286–298 (2007) 12. Pantano, M., Eiband, T., Lee, D.: Capability-based frameworks for industrial robot skills: a survey (2022) 13. Lioutikov, R., et al.: Learning attribute grammars for movement primitive sequencing. Int. J. Robot. Res. 39(8), 027836491986827 (2019) 14. Scheide, E., Best, G., Hollinger, G.A.: BT learning for robotic task planning through Monte Carlo DAG search over a formal grammar. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE (2021) 15. Hou, M., et al.: A multi-behavior planning framework for robot guide (2022) 16. Bai, F., et al.: Hierarchical policy for non-prehensile multi-object rearrangement with deep reinforcement learning and Monte Carlo tree search (2021) 17. Gillini, G., et al.: A dual-arm mobile robot system performing assistive tasks operated via P300-based brain computer interface. Industr. Rob. 49(1), 11–20 (2022) 18. Stepanova, K., et al.: Automatic self-contained calibration of an industrial dual-arm robot with cameras using self-contact, planar constraints, and self-observation. Elsevier BV (2022) 19. Beik-Mohammadi, H., et al.: Model mediated teleoperation with a hand-arm exoskeleton in long time delays using reinforcement learning. In: The 29th IEEE International Conference on Robot & Human Interactive Communication. IEEE (2020) 20. Koskinopoulou, M., Maniadakis, M., Trahanias, P.: Kinesthetic guidance utilizing DMP synchronization and assistive virtual fixtures for progressive automation. Robot. Int. J. Inf. Educ. Res. Robot. Artif. Intell. 38(10), 1824–1841 (2020) 21. Niekum, S., et al.: Learning and generalization of complex tasks from unstructured demonstrations. In: IEEE/RSJ International Conference on Intelligent Robots & Systems. IEEE (2012) 22. Cao, J., et al.: Generalize robot learning from demonstration to variant scenarios with evolutionary policy gradient. Front. Neurorobot. 14, 21 (2020) 23. Gong, S., et al.: Task motion planning for anthropomorphic arms based on human arm movement primitives. Industr. Rob. Int. J. robot. Res. Appl. 47(5), 669–681 (2020) 24. Chang, W.C., et al.: Automatic robot assembly with eye-in-hand stereo vision. In: World Congress on Intelligent Control & Automation, pp. 914-919 (2011) 25. Ma, Y., et al.: Automatic precision robot assembly system with microscopic vision and force sensor. Ann. Am. Thoracic Soc. 16(3) (2019) 26. Lin, H.I., Chen, Y.Y., Chen, Y.Y.: Robot vision to recognize both object and rotation for robot pick-and-place operation. In: 2015 International Conference on Advanced Robotics and Intelligent Systems (ARIS). IEEE (2015) 27. Vrochidou, E., et al.: An autonomous grape-harvester robot: integrated system architecture. Electronics 10(9), 1056 (2021) 28. Song, C., et al.: Robot complex motion learning based on unsupervised trajectory segmentation and movement primitives. ISA Trans. 97, 325–335 (2020) 29. Schaal, S., et al.: Control, planning, learning, and imitation with dynamic movement primitives (2003) 30. Grunske, L., Lindsay, P., Yatapanage, N., Winter, K.: An automated failure mode and effect analysis based on high-level design specification with behavior trees. In: Romijn, J., Smith, G., van de Pol, J. (eds.) IFM 2005. LNCS, vol. 3771, pp. 129–149. Springer, Heidelberg (2005). https://doi.org/10.1007/11589976_9
Center-of-Mass-Based Regrasping of Unknown Objects Using Reinforcement Learning and Tactile Sensing Renpeng Wang1
, Yu Xie1(B) , Xinya Zhang1 , Jiangtao Xiao1 and Wei Zhou1
, Houde Liu2 ,
1 School of Aerospace Engineering, Xiamen University, Xiamen 361000, China
[email protected] 2 Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China
Abstract. The Center of Mass (CoM) is considered to be the most ideal position for robot grasping. Grasping far from the CoM is likely to cause the object to deviate from the expected pose. To robustly grasp unknown objects in unstructured and changing environments, it is necessary to rapidly predict and correct the imminent failed grasping in time and guide the robot to explore a stable grasping pose. This paper is the first to incorporate tactile sensing into reinforcement learning (RL) for robot CoM-based regrasping problem. The regrasping agent is developed and automatically optimize in a simulated environment without explicit knowledge of objects, the tactile information i.e. slip and tilt are integrated into the reward function. In 440 regrasping tests of 8 new random objects in the PyBullet simulation and 14 household objects in the real world, the average number of regrasps was 2.04 and 2.35, respectively. In comparative tests with binary search and heuristic step size adjustment strategy, our method has the highest average regrasping efficiency. Keywords: Robot grasping · Tactile sensing · Reinforcement learning
1 Introduction and Related Work Stable grasping is an essential prerequisite for robots to transfer or handle objects. How to achieve stable grasping of unknown objects has attracted considerable interest in recent years. Since the gravitational torques cancel each other out, the object’s Center of Mass (CoM) becomes a relatively balanced optimal grasping position. The force required to grasp the object in the CoM position is therefore smaller than in other positions, which can avoid the deformation and damage of the object due to the large grasping force [1]. The CoM-based grasping also improves the grasping quality while effectively reducing the joint load of the robot, providing greater safety and stability guarantees for subsequent tasks [2]. In fact, humans also pay more attention to the position of the object’s CoM during grasping [3]. In some analytics-based grasping strategies, computer vision techniques are usually used to calculate the CoM position of an object with uniform density, generate grasping © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 413–424, 2023. https://doi.org/10.1007/978-981-99-0617-8_28
414
R. Wang et al.
configurations, and predict grasp quality based on the CoM position [4–6]. However, in general, the intrinsic properties of objects (such as mass, stiffness, and density) are difficult to directly obtain through computer vision technology. When the CoM of the object is uncertain (e.g. the hammer shown in Fig. 1(b)) or even changes dynamically, the vision-based open-loop grasping method is likely to fail to grasp. The current mainstream solution is to re-plan a stable grasping pose that is close to the object’s CoM for the robot [2, 7, 8].
Fig. 1. CoM-based robot regrasping process. (a) A regrasping case in the simulated environment. (b) A real-world regrasping test on a hammer. (c) Tekscan’s tactile images for four grasping states (translational slip, anti-clockwise slip, clockwise slip, and stable grasp).
Research on the grasping force/load of multi-fingered hands and suction cup robots has shown that force/torque sensors can be used to predict the CoM position of an object [2, 9], and a grasping pose adjustment strategy can replace the increase in grasping force when the CoM of the object changes dynamically [1]. In the CoM-based grasping scene of the parallel-jaw gripper, the orientation of the object changes significantly when in an unbalanced force system. Tactile-based slip detection has been proven to predict such unstable grasps [7, 8]. Even if slipping in different directions is detected, finding the CoM of an unknown object is still a big challenge for the robot. Feng et al. [7] designed a supervised learning-based regrasping planner with multi-modal signals such as slip and force/torque sensing as input to predict the optimal grasping position near
Center-of-Mass-Based Regrasping of Unknown Objects
415
the CoM. However, a large amount of labeled ground-truth data needs to be collected. Some authors also used a heuristic method to calculate the regrasping step size [8], but their method does not take full advantage of the feedback provided by tactile, and thus lacks adaptability. Robots need to acquire regrasping skill in a more convenient way, and make full use of new and extreme situations to autonomously supplement and improve the previous experience. Model-free Reinforcement Learning (RL) algorithm is that the agent autonomously learns the knowledge to complete a specific task, according to the reward or punishment obtained by the interaction with the environment, without collecting a large number of ground-truth labels and any prior knowledge of the object. It has been shown to have extraordinary performance in robot grasping [10–12], dexterous operation [13, 14], and regrasping [15]. The combination of tactile sensing and RL is beneficial to improving the robustness of robot grasping: Merzi´c et al. [12] proved that an RL-based control strategy with contact force feedback provides more robustness for grasping unknown objects than open-loop grasping systems. Chebotar et al. [15] extracted spatiotemporal tactile features to predict potential failed grasps, and trained an RL controller to adjust the grasping pose of a multi-fingered hand [18]. The difference is, their grasp position on the object is almost unchanged just fine-tuning the grasp pose, but we are more concerned with finding the best grasp position on the object.
Fig. 2. (a) The Tekscan tactile sensor. (b) Real-world experiment setup.
In this paper, we use a Tekscan flexible thin-film matrix pressure sensor (Tekscan, South Boston, MA) [16] (see Fig. 2) to provide tactile sensing for the robot. The Tekscan sensor is composed of a polyester film on the surface and a pressure-sensitive conductor distributed in a grid shape inside. A data collection card is used to restore the contour and value of the pressure distribution in the form of a two-dimensional image. We train the robot’s regrasping agent in the simulation environment based on the Proximal Policy Optimization (PPO) [17] algorithm with the distilled tactile features as the supervision signal. When Tekscan detects a rotational slip, our regrasping agent will control the robot to reach the CoM of the object. Experiments in the real-world show that our method can assist robots to achieve a stable grasp of objects unseen before rapidly (see Fig. 1). A comparison of our work with recent related work is shown in Table 1.
416
R. Wang et al. Table 1. A comparison of related work
Sensor Type [1] [2]
3D Force Vision & Force/T orque
[4]
Vision
[5]
Vision
[6]
Vision
[7]
[8] Ours
Vision & Tactile & Force/T orque Vision & Tactile Vision & Tactile
Method Friction analysis Force/Tor que analysis Heuristic strategy Inertia analysis Physics analysis Learningbased Heuristic strategy Learningbased
No Label Data Required
Simulations
Realworld Experiment
Mass Distribution of Objects Even/ Dynamic
Total Number of Test Objects
Uneven
5
Even
20
Even
5
Even
Clutter
Uneven
5
Even/ Uneven/ Dynamic Even/ Uneven/ Dynamic
10
24 31
The main contributions of our work are: for the CoM-based regrasping problem, we integrate RL and tactile sensing, and develop a simulation training environment. We also show the implementation method of the trained agent in the practical 280 times regrasping task. The work provides a new approach for service robots and special robots to grasp unknown objects in an unstructured environment.
2 Methodology The general operation flow of our proposed framework is as follows: First, we use a visual method to determine the initial pose of the two-finger gripper, i.e. the antipodal grasp near the object’s geometric center and then lift. Under the sensing of internal pressure-sensitive elements, the sensor feeds back a series of dynamic contact contour images, and then we detect translational slip and adjust the initial grasping force to an appropriate value using a grasping force adaptive controller [18]. When the rotation of the object is detected, the robot will perform a grasp adjustment until the object reaches a stable state. Figure 3 shows an overview of our proposed system.
Center-of-Mass-Based Regrasping of Unknown Objects
417
2.1 Reinforcement Learning for Regrasping
Fig. 3. Overview of the proposed system.
Before the final stable grasping, the robot takes observations in a trial grasp i.e. slightly lifting objects to evaluate if the pose and force are suitable. When the position of the trial grasping is not at the CoM, and the friction provided by the normal grasping force is not enough, the object will have different degrees of tilt-slip, which hints the current grasp is not in a stable state. In the lifting process in Fig. 3, we constantly perceive the tilt angle of the hammer through tactile data. When the tilt angle exceeds the threshold (0.1 rad) we set, then the robot stops lifting, we calculate the rotation angular velocity ω of the hammer, and the lifting height h of the end effector at the current time step. In this trial grasping mode, the state space is s = {ω, h}.
(1)
The action at is defined as a scaling factor of 0.4L (L is the object’s length measured from vision). Therefore, the step size of each regrasp is regrasp step = at ∗ 0.4 ∗ L.
(2)
The main task of the agent in this paper is to achieve stable grasping, but if only the main task is rewarded, it often leads to the problem of sparse rewards. To give the agent more guidance, we add some auxiliary reward items to the reward function Rt = (1 − β1 )[α0 ω + α1 h + α2 P − α3 (C − 1)] + α4 β0 Rs − α5 β1 Rf ,
(3)
where ω is the difference between the rotational angular velocity at the current moment and the previous moment, and h is the difference between the end effector’s lift height at the previous moment and the current moment. Generally, there is a certain law between ω and h. When approaching the CoM from one side, at the same tilt angles, h required to be lifted by the end effector decreases, while ω increases accordingly (this is true
418
R. Wang et al.
when one end of the object is always on the ground, but the opposite is true when it is very close to the CoM). Therefore, when the lifting height is larger (over 30 mm), the signs of these two differences are opposite. P represents the movement size of the end effector. It is more difficult to find the CoM of the object from a farther position than it is to find it near the CoM, so extra rewards need to be given. To avoid the Reward Hacking problem in which the agent repeatedly gets local rewards while ignoring the initial target, we add a penalty term C that increases with the number of grasp attempts. Rs and Rf denote the reward for successfully achieving a stable grasp and the penalty for failure. We determined the corresponding weight coefficients αi (i ∈ {0, 1, . . . 5}) of each reward item through grid search. β0 and β1 are the flags of event occurrence, β0 , β1 = 1 if the event occurs, otherwise β0 , β1 = 0. The CoM-based regrasping problem can be described as a finite-horizon discounted Markov decision process (MDP). The goal of reinforcement learning is to find an optimal policy given an MDP that maximizes the reward that the agent can obtain. The policy function πθ (a|s) can be described as the probability pθ (at |st ) of the agent taking a certain action a in the current state s: πθ (a|s) = pθ (at = a|st = s),
(4)
where θ represents the parameter of the policy. During the exploration process of the agent, a set Dk = {τi } of state-action trajectories will be generated. According to the parameters θ , the probability pθ (τ ) of occurrence of a certain trajectory τi will be calculated. Using this probability for weighting, and the expected reward Rθ is given by Rθ = R(τ )pθ (τ ) = Eτ ∼pθ (τ ) [R(τ )]. (5) τ
The robot searches for the CoM along the main axis of the object, so action space in this paper is continuous, therefore, here we use Proximal Policy Optimization (PPO) [17] based on off-policy for parameter optimization, the pipeline of PPO is shown in Fig. 3. The objective function of PPO optimization is θ (θ ) ≈ JPPO2 k
pθ (at |st ) pθ (at |st ) At , clip( , 1 − ε, 1 + ε)At ). pθ k (at |st ) pθ k (at |st )
(st ,at )
min(
(6)
This is the formula for the PPO-Clip, where pθ k (at |st ) represents the old policy that interacts with the environment, and ε = 0.2 is a hyperparameter used by PPO-Clip to ensure importance sampling. In addition, we also calculate the Kullback-Leibler (KL) divergence, but it is not included in the objective function. When the current KL divergence is 1.5 times greater than the target value (0.01), the parameter update of the policy is stopped. At is an advantage estimate calculated based on the current state value function Vφ (s):
At = δt + (γ λ)δt+1 + · · · + · · · + (γ λ)T −t+1 δT −1 ,
(7)
where δt = Rt + γVφ (st+1 ) − Vφ (st ).
(8)
Center-of-Mass-Based Regrasping of Unknown Objects
419
γ = 0.99 and λ = 0.97 denote the discount factor and Generalized Advantage Estimation (GAE) [19] parameter. The state value function Vφ (s) evaluates the current state st , and its parameter φ is updated with mean-squared error: T 1 2 (Vφ (st ) − Rt ) , t=0 |Dk |T
(9)
where Rt = Rt + γRt+1 + · · · + · · · + γT −t+1 RT −1 .
(10)
φk+1 = argmin φ
τ ∈Dk
T is the number of steps of the trajectory, and Rt represents a return, which is defined as the total sum of discounted rewards from the current time step onwards. The interaction of the policy function πθ (a|s) and the state value function Vφ (s) forms the Actor-Critic framework. The Actor uses policy function to generate actions, and the Critic uses state value functions to evaluate the Actor’s performance. We represent them using a multilayer perceptron with two hidden layers of 4 and 2 nodes, followed by tanh as the activation function. The state s at the current time step is fed into the Actor and returns a Gaussian distribution over action a. The final output a is sampled from the Gaussian distribution, and calculate the log probability of a to optimize the Actor’s policy function using Eq. (6). 2.2 Simulated Learning Environment In this paper, we deploy a simulation environment for regrasping policy learning in PyBullet [20], a fast and easy-to-use Python module developed based on the Bullet physics engine for physics simulation, robot simulation, etc. We import a 6 DOF robotic arm UR5 (by Universal Robots) in the PyBullet simulation environment. A simplified two-finger gripper is installed at the end of the robotic arm as the end effector (see Fig. 4). We design an enhanced idealized object model for training and testing, the training object model is shown in Fig. 4, the overall length is 320 mm. The red cylinder is only 5 mm in length, its mass is randomly selected from 1 to 1.2 kg, and the mass of the rest is close to 0. By changing the position of the red cylinder, 18 different mass distributions are emulated (9 for testing), and the friction parameters of each model are also different. The steps in a grasping episode are as follows: (1) Randomly load an object in the training set, and reset the robot. (2) The robot randomly selects an initial grasping position (avoiding the CoM position) within the graspable range of the object. (3) Trial grasp with suitable gripping force. The gripping force is required to maintain at least one constant contact point pair with the object (The gripping force is adjusted by the opening size of the gripper). Then the robot lifts the object vertically at a speed of about 0.3 m/s, with a maximum lifting height of 40 mm. (4) Collect the current state, then release the object back to the experimental platform, and the Actor outputs action according to the current state. (5) The robot moves to the new grasping position, if the next grasp position exceeds the graspable range, we make it grasp at the border.
420
R. Wang et al.
(6) Repeats (3)–(5) until the termination condition is reached.
Fig. 4. The UR5 robot and training objects in the simulation.
To save training time, the gripper’s opening size and the graspable range for each object are preset. The episode will be terminated when the following situations occur: 1. Exceed the maximum number of trial grasps (5 times, β1 = 1 at this time); 2. Stable grasping; 3. Invalid trial grasp caused by the robot running to the wrong position. We read the pose of the object through the interface function provided by PyBullet to simulate real tactile sensing. Therefore, in the simulation environment, the sign of stable grasping (β0 = 1) is: when the robot lifts the object to the maximum height, the rotation angle of the object does not exceed the threshold (0.1 rad), and the z coordinate of the object is greater than the initial value. The robot will hold the object for 5 s to judge. 2.3 Simulation to Reality We transfer the control policy learned in the simulation environment to the real environment, and use the real sensor information to perform skill playback. In the real grasp scene, when in contact with an object, the local motion information of the object can be conveyed through the Tekscan sensor. We detect slip and measure the real object’s rotation angle by extracting the pressure distribution’s geometric principal axis in the tactile image. The process is shown in Fig. 5: In the process of the robot lifting an object, when the detected offset of the center of the principal axis exceeds 5 pixels, the grasping force tuning method in [18] is used to adaptively increase the grasping force to eliminate the translation slip. If it is detected that the rotation angle of the principal axis is greater than the threshold, the current state is calculated. The remaining steps are similar to those in the simulation, the trained agent will guide the robot’s next actions, until the movement of the principal axis is within the threshold, and can be maintained at the maximum lifting height for 5 s. The process is shown in Fig. 1(b).
Center-of-Mass-Based Regrasping of Unknown Objects
421
Fig. 5. The principal axis extraction pipeline from tactile images (Resolution = 44 * 44).
3 Evaluation and Discussion We implement PPO based on Python3.6.5 and Pytorch1.8.2 libraries. The agent is trained on an Nvidia GeForce 3060 GPU with 12G of graphics memory. Two Adam optimizers with learning rates of 3e − 4 and 1e − 3 are used to update the parameters of Actor and Critic networks. The standard deviation of the Actor’s Gaussian distribution during training is e−0.5 , we reduce it to e−4 after training. We train for 200 epochs, each epoch is set to 30 steps. The average reward change per episode during the training process is shown in Fig. 6.
Fig. 6. Training curve tracking the agent’s average reward per episode.
3.1 Simulation Evaluation and Comparison We first evaluate the performance of our model in the simulation environment. 8 new random objects not seen in training are used as test objects, these new objects have different sizes, shapes, surface textures and mass distribution. The robot starts a trial grasp with an unstable grasping pose. The initial fixed step size in one episode is still 0.4L, but it is shortened to 0.8 times the previous step in each trial grasp. The purpose of this is to prevent the robot from repeating attempts on both sides of the CoM. 20 times regrasping tests are performed on each object, and the test results are shown in Table 2. It
422
R. Wang et al. Table 2. Simulation test results
Object ID Object Model Average Regrasps
#020
#090
#426
#576
#604
#705
#797
#998
3.40
1.8
2.50
2.65
1.20
2.35
1.45
1.00
can be seen from the results that our method can achieve successful regrasping in about 2 times for most objects, the average number of regrasps is 2.04. This shows that our method can be applied to unknown objects, and its efficiency is less affected by object shape and mass distribution. In the simulation, we compare our method with the binary search method, and the heuristic method in [8]. The objects used for comparison are 9 enhanced idealized models that have not been seen in training, the initial grasp position is still selected near the object’s geometric center, and also averaged after 20 tests. The results are shown in Fig. 7. The average regrasps for all objects of our method is less than the two baseline methods. Both baseline methods only exploit the feedback of the object’s rotation direction, so these methods are less adaptable, resulting in increased regrasps for some objects (e.g. #05, #07). However, our method is able to extract more effective information provided by tactile sensing, and autonomously learn the underlying relationship between observations and CoM positions, improving the adaptability and robustness of the robot’s COM-based regrasping.
Fig. 7. Comparison with baseline methods
3.2 Real-World Evaluation We conduct real-world experiments using a 6-DOF UR5 robot equipped with a CTM two-finger modular gripper (CTM2F110 from ChangingTek, China). A Tekscan tactile
Center-of-Mass-Based Regrasping of Unknown Objects
423
sensor (#5027, pressure-sensitive range 0–345 kPa) with foamed silicone as the base is attached to one of the fingers of the gripper. All experimental equipment is shown in Fig. 2(b). We test the regrasping agent on 14 household objects (with different shapes, stiffness and mass distribution, never appearing in simulation). The test mode is the same as in the simulation. All results are presented in Table 3. It can be seen from the results that for most objects, our method can achieve stable grasping in 1–2 times regrasping. Some objects with a dynamic CoM (liquid inside, such as No. (9) Cola, No. (10) 80% Detergent, etc.) require a slight increase in the number of regrasps, generally about 2–4 times, and since the CoM changes dynamically, it needs to be found very precisely to stop it from drifting. The mass of the object also has a strong impact on the number of regrasps, e.g. (11) to (14), these objects are more prone to rotate when the grasp position is off CoM and the friction moment is insufficient. In a total of 280 grasping experiments, the average number of regrasps is 2.35. Experimental results demonstrate that our method can generalize the learned policy to unknown objects in the real world. The generalization ability of our method does not depend on large-scale labeled data, by adding fewer regrasps, only uses the tactile sensing modality to make up for the limitation in the success rate in [7]. Table 3. Real-world test results
Object Number Average Regrasps
1.30
1.80
1.40
1.95
2.05
2.10
2.30
2.65
2.70
2.75
3.50
2.50
2.75
3.20
Object Number Average Regrasps
4 Conclusion and Future Work In this paper, we use RL to help the robot learn CoM-based regrasping from dynamic tactile sensing, without relying on any prior knowledge of the object (e.g. friction coefficient, stiffness, mass distribution). RL policy is trained purely in simulation, for optimization problems with efficiency requirements, the agent is sensitive to the selection of the state and reward shaping, so it is necessary to add additional constraints and incentives to the reward function. Since the input of our policy is distilled physical parameters, it can be easily transferred to the real world. The effectiveness of the method is demonstrated in comparison with baseline methods and real-world test results on unknown objects. In the future, we will try to train the robot to grasp some objects with complex shapes and uncertain CoM in the simulated environment. We also considered better integrating the adaptive control capability of grasping force into our system.
424
R. Wang et al.
References 1. Kaboli, M., Yao, K., Cheng, G.: Tactile-based manipulation of deformable objects with dynamic center of mass. In: 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids), pp. 752–757. IEEE (2016) 2. Kanoulas, D., Lee, J., Caldwell, D.G., et al.: Center-of-mass-based grasp pose adaptation using 3D range and force/torque sensing. Int. J. Humanoid Rob. 15(04), 1850013 (2018) 3. Desanghere, L., Marotta, J.J.: The influence of object shape and center of mass on grasp and gaze. Front. Psychol. 6, 1537 (2015) 4. Kamon, I., Flash, T., Edelman, S.: Learning to grasp using visual information. In: Proceedings of IEEE International Conference on Robotics and Automation, vol. 3, pp. 2470–2476. IEEE (1996) 5. Lopez-Damian, E., Sidobre, D., Alami, R.: A grasp planner based on inertial properties. In: Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pp. 754–759. IEEE (2005) 6. Dogar, M.R., Hsiao, K., Ciocarlie, M., et al.: Physics-based grasp planning through clutter (2012) 7. Feng, Q., Chen, Z., Deng, J., et al.: Center-of-mass-based robust grasp planning for unknown objects using tactile-visual sensors. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 610–617. IEEE (2020) 8. Kolamuri, R., Si, Z., Zhang, Y., et al.: Improving grasp stability with rotation measurement from tactile sensing. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6809–6816. IEEE (2021) 9. Veres, M., Cabral, I., Moussa, M.: Incorporating object intrinsic features within deep grasp affordance prediction. IEEE Robot. Autom. Lett. 5(4), 6009–6016 (2020) 10. Kalashnikov, D., Irpan, A., Pastor, P., et al.: Scalable deep reinforcement learning for visionbased robotic manipulation. In: Conference on Robot Learning, pp. 651–673. PMLR (2018) 11. Zeng, A., Song, S., Welker, S., et al.: Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4238–4245. IEEE (2018) 12. Merzi´c, H., Bogdanovi´c, M., Kappler, D., et al.: Leveraging contact forces for learning to grasp. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3615– 3621. IEEE (2019) 13. Akkaya, I., Andrychowicz, M., Chociej, M., et al.: Solving Rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019) 14. Dong, S., Jha, D.K., Romeres, D., et al.: Tactile-RL for insertion: Generalization to objects of unknown geometry. IEEE (2021) 15. Chebotar, Y., Hausman, K., Su, Z., et al.: Self-supervised regrasping using spatio-temporal tactile features and reinforcement learning. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1960–1966. IEEE (2016) 16. I-Scan System. http://www.tekscan.com/products-solutions/systems/i-scan-system. Accessed 21 Aug 2022 17. Schulman, J., Wolski, F., Dhariwal, P., et al.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017) 18. Dai, J., Xie, Y., Wu, D., et al.: a robotic dynamic tactile sensing system based on electronic skin. In: 2021 IEEE 16th International Conference on Nano/Micro Engineered and Molecular Systems (NEMS), pp. 1655–1659. IEEE (2021) 19. Schulman, J., Moritz, P., Levine, S., et al.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015) 20. Pybullet Homepage. http://pybullet.org. Accessed 21 Aug 2022
Alongshore Circumnavigating Control of a Manta Robot Based on Fuzzy Control and an Obstacle Avoidance Strategy Beida Yang1,2,3 , Yong Cao1,2,3(B) , Yu Xie1,2,3 , Yonghui Cao1,2,3 , and Guang Pan1,2,3 1 School of Marine Science and Technology, Northwestern Polytechnical University,
Xi’an 710072, China {yangbeida,xieyu}@mail.nwpu.edu.cn, {cao_yong,caoyonghui, panguang}@nwpu.edu.cn 2 Unmanned Vehicle Innovation Center, Ningbo Institute of NPU, Ningbo 315103, China 3 Key Laboratory of Unmanned Underwater Vehicle Technology of Ministry of Industry and Information Technology, Xi’an 710072, China
Abstract. Aiming at the needs of civil market tasks such as autonomous swimming display of pools in the aquarium and special tasks such as intelligent autonomous patrol in specific water areas, this paper proposes a task driven control method for manta robot to circumnavigate around the shores. We equip the robot fish with a complete information sensing network and realize its driving and yaw control basing on the CPG phase oscillator network. We generate an offline look-up table by using fuzzy control method, and realize the closed-loop control of heading by looking up this table. Basing on the requirements of the circumnavigation task, we propose a real-time obstacle avoidance strategy combined with the infrared range sensor information. Finally, we build an experimental pool platform to conduct underwater alongshore circumnavigating experiments of the robot fish, and the experimental results prove the effectiveness of the overall scheme. Keywords: Robotic manta · Fuzzy control · Obstacle avoidance strategy · Alongshore circumnavigating control
1 Introduction In recent years, with the promotion of ocean power strategy and the development of marine engineering technology, a variety of underwater robots suitable for marine operations have emerged. Traditional unmanned underwater vehicles (UUVs) and autonomous underwater vehicles (AUVs) often use propellers as thrusters, which have a negative impact on the ocean environment due to their high noise and low bio-affinity. Therefore, with the intensive research on bionic principles and design, bionic underwater robots have become one of the research hotspots in recent years due to their high efficiency, good maneuverability, large dive depth, low resistance to motion, and high bio-affinity [1, 2]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 425–436, 2023. https://doi.org/10.1007/978-981-99-0617-8_29
426
B. Yang et al.
For the field of marine engineering technology, bionic robots play the most important role. In some specific application environments underwater, such as unknown environmental waters, anti-terrorist detonation and underwater exploration, where it is inconvenient for people to enter, these bionic underwater robots will replace humans to perform underwater environmental information acquisition, personnel search and rescue, intelligent tracking patrol and other work. Usually, robots are carried with different kinds of sensors to obtain external information, including the size, location and distance of obstacles. They feedback the sensing information to the control and decision system through the communication system, and execute corresponding measures to complete the predefined tasks. Li et al. implemented underwater path planning and obstacle avoidance of a robot fish based on line-of-sight navigation algorithm and fuzzy control algorithm [3]. Sang et al. proposed a design and implementation of a bionic robot fish integrating multiple infrared sensors combined with an intelligent obstacle avoidance control method based on rule-based reasoning [4]. Wang et al. designed multiple ultrasonic ranging modules for parallel ranging and introduced fuzzy inference algorithms for snake robots to realize a novel multiple parallel ultrasonic obstacle avoidance scheme [5]. Zhuang et al. proposed an improved fuzzy obstacle avoidance control algorithm with an elegant degradation strategy to realize a wall-climbing robot that can cross the obstacle to reach the target point despite the perturbed failure of detection sensors [6]. In this paper, for the specific task of controlling the robot fish to circumnavigate alongshore in a pool, a design scheme of manta robot with integrated multi-sensors is presented. The first section of this paper introduces the design scheme and main technical parameters of the integrated multi-sensor robotic fish; The second section introduces the driving control method of the manta robot based on CPG and fuzzy control; The third section introduces an obstacle avoidance and turning strategy based on infrared distance measurement; The fourth section conducts the experiment of circumnavigation alongshore in the pool to verify the feasibility of the overall scheme; Finally, conclusions and outlook are given.
2 Design of a Manta Robot with Multi-sensors Integrated 2.1 Hardware Structure and Technical Parameters Based on the appearance of the real manta ray, we designed this imitation manta robot, which mainly includes the control decision module (main control chip and peripheral devices), security module (control chip and peripheral circuit), power module (lithium battery and wireless switch), driver module, communication module (digital transmission radio), information sensing module, etc. [7]. The three-dimensional structure and the ultimate robot appearance are shown in Fig. 1. The drive module is mainly divided into two parts: pectoral fin and tail fin, which are composed of 6 servos and 1 servo with related drive circuits and flexographic structural parts. We adopts a 50 kg large torque waterproof small volume servo, its no-load speed is 0.15 s/60◦ , rated working voltage 7.4 V, no-load current 1 A, load current 4.5 A. The control decision module processes and calculates the control parameters received from the communication module, generates PWM waves with corresponding pulse width,
Alongshore Circumnavigating Control of a Manta Robot
427
and controls the 7 servos of the drive module to drive the fins to realize the high bionic posture control of the manta robot.
Fig. 1. Structure (A) and appearance (B) of the manta robot.
The technical parameters of the manta robot are shown in Table 1. Table 1. Related technical parameters of the manta robot Items
Specification
Dimension (L × W × H)
600 mm × 900 mm × 120 mm
Mass
8.50 kg
Actuator mode
DC servomotors
Micro-controller
STM32F103ZET6
Battery
7.4 VDC 1500 mAH Ni-H
Max heading speed
1.5 BL(Body length)/s
Yaw rate
60◦ /s
Min. Turning radius
0.5 BL
Working hours
4h
2.2 Design of Information Sensing Module The information sensing module of this manta robot mainly includes a pressure sensor, an attitude sensor and an infrared distance sensor. The pressure sensor mainly provides depth information, and the attitude sensor can provide three attitude angles (heading angle, pitch angle and roll angle), acceleration and angular velocity information on three axes. The main control board receives the data information for processing and adjusts the attitude of the robot fish according to the result of the data obtained from the solution. These two sensors are the basis to ensure the stable attitude control of the robot fish. The pressure sensor selected for this manta robot is Bar02 from BLUEROBOTS, with IIC communication, 0–10 m range, 0.16 mm water depth resolution, ±4 cm maximum error. Its attitude sensor is ELLIPS SERIAL2-N from SBGSystem France, which communication method is RS232 communication.
428
B. Yang et al.
In order to realize the autonomous obstacle avoidance and turning when swimming alongshore, the robot needs to get the information of obstacles in the water environment. With only four non-interfering infrared distance measurement modules to form a distance sensing network, we can provide the distance of obstacles in the surrounding environment to the manta robot in real time to assist the judgment of obstacle avoidance strategy.
Fig. 2. Schematic of the phase-based distance measurement method.
This manta robot uses JRT Meter Technology’s M8 laser distance measurement module, which has a size of 25 × 49 × 13 mm, weighs about 10 g. It has a measurement accuracy of less than 1 mm, and has a maximum measurement distance of 40 m. As is clearly shown in Fig. 2, this module adopts the phase-based distance measurement method. Its principle is to use a modulated signal to modulate the light intensity of the emitted light wave, and indirectly measures the round trip time from the module to the obstacle by solving the phase difference between the emitted and received waves, and then calculates the distance information of the obstacle [8]. Assume that L is the distance between the sensor and the obstacle, c is the speed of light, T is the period of the modulated signal, and φ is the phase difference between the emitted and received waves. We can use Eq. (1) below to calculate the distance the sensor and the obstacle L=
1 · c · t = φ · c · T /4π 2
(1)
This module is tiny in size, with sufficient range and high accuracy to meet the needs of this manta robot to achieve obstacle avoidance strategy and circumnavigation alongshore.
3 Driving Control of Manta Robot Based on CPG and Fuzzy Control 3.1 CPG Control Network Construction CPG, as a kind of neural network, can realize self-excited oscillation through mutual inhibition between neurons to generate stable periodic signals, thus realizing rhythmic motion control of limb or torso-related parts. CPG is widely used in the field of machine fish control because of its high applicability in the control of multi-degree-of-freedom joint robots. The phase oscillator model is one kind of CPG oscillator models, which is based on sinusoidal wave control with simple form and few state variables, and can
Alongshore Circumnavigating Control of a Manta Robot
429
control both output amplitude individually and phase difference between cells, with smooth regulation, good stability and robustness, and suitable for a variety of gait control and conversion. Through the observation of real manta rays in the oceanarium and the processing and analysis of the manta rays swimming video data, it can be seen that the trajectory of the outer edge of the pectoral fin is a sinusoidal wave with obvious regular characteristics of frequency, amplitude and phase difference. Therefore, in this paper, we choose the phase oscillator model as the controller of the manta robot, and this model is shown in Eqs. (2) below, includes phase equation, amplitude equation and output equation [9]. ⎧ ⎨ φ˙ i = 2π vi + j ωij sin φj − φi − Δϕij (2) r¨i = ai a4i (Ri − ri ) − r˙i ⎩ θi = ri (1 + cos(φi )) where φi denotes the phase of the cell i; vi denotes the intrinsic frequency; ωij denotes the coupling weight of the cell j to the cell i; ϕij denotes the desired phase difference; ri denotes the amplitude; ai denotes the positive constant controlling the speed of amplitude convergence; Ri denotes the desired amplitude and θi denotes the output value. Based on the design structure of the 7 drive servos of this manta robot, a 7-unit CPG topology network was designed and constructed, which is shown in Fig. 5. This topology adopts the minimal connection form with units 1 and 4 as the dominant units, using the least connection relationship to make each independent CPG unit establish direct or indirect connection with other units. 3.2 Yaw Control Based on Asymmetric Phase Difference The yaw attitude of the manta robot can be achieved by two ways: asymmetric amplitude and asymmetric phase difference of the pectoral fins on both sides. The asymmetric amplitude method adopts unequal desired amplitude on both sides, so that both sides have unequal propulsive force to generate steering torque to achieve yaw. Its advantage is that the steering overshoot is small and the disadvantage is that the response of yaw effect is slow and the prototype will generate a large roll angle, so it is suitable for the accurate adjustment of small range of heading difference. The asymmetric phase difference method uses unequal desired phase differences on both sides to generate propulsive forces of the same direction and unequal magnitude or opposite direction and unequal magnitude on both sides, thus forming a large steering torque to achieve fast yaw. Its advantages are fast steering response and small turning radius and its disadvantage is that it is easy to produce large overshoot, which is suitable for conditions that require fast and large steering [10]. In this paper, the asymmetric phase difference method is selected for yaw control according to the circumnavigation and obstacle avoidance requirements. 3.3 Closed-Loop Control of Heading Based on Fuzzy Control The closed-loop control of the robot fish, as the basis for the implementation of the circumnavigation around the shore, requires sensors to provide real-time attitude information of the robot fish itself, and then quantitative control of the desired attitude is
430
B. Yang et al.
achieved by establishing a closed-loop control. In the water environment full of complexity and uncertainty, the hydrodynamic model of the robot fish is often difficult to establish precisely. While fuzzy control, as a rule-based control, directly uses languagebased control rules to fuzzify precise variables, and then uses the operator’s empirical knowledge for fuzzy reasoning to control some complex systems. In addition, fuzzy control is robust and has strong adaptability to nonlinear systems, so we use fuzzy control to achieve closed-loop control of handing. The fuzzy query table is calculated by the MATLAB fuzzy control toolbox, and the specific steps include: (1) Divide the deviation and deviation change rate of the heading into 7 classes, namely, Negative Big (NB), Negative Medium (NM ), Negative Small (NS), Zero (Z), Positive Small (PS), Positive Medium (PM ), Positive Big (PS); (2) Determine the fuzzy membership function of input and output variables, and the triangular membership function is used in this paper (Fig. 3-A); (3) Determine fuzzy control rules and create the rule table (Fig. 3-B); (4) Establish Simulink simulation model; (5) Establish MATLAB system test model, and after 169 iterations of calculation, the fuzzy control query table matrix is finally obtained (Fig. 3-C).
Fig. 3. Charts and tables related to fuzzy control of heading. (A) Triangular membership function for fuzzy control. (B) Rule table for fuzzy control of heading. (C) Query table for fuzzy control of heading.
As is shown in Fig. 4, the complete fuzzy control strategy is: (1) SBG sensor feedback back to the actual heading, the communication module receives the desired heading, the difference between the two as the deviation e input to the heading fuzzy controller, and the differentiation of the deviation as ec input; (2) Put the processed e, ec the
Alongshore Circumnavigating Control of a Manta Robot
431
fuzzy controller for the table look-up operation; (3) Convert the data in the queried offline table to the heading control parameters of the actuator by scaling factors; (4) The heading control parameters as the parameter input to the CPG network, the PWM signal of the control rudder is generated by the CPG network; (5) Repeat the above process, and continuously adjust the control parameters, and finally achieve the heading stability control of the submersible.
Fig. 4. Structure of the closed-loop control of heading based on fuzzy control.
4 An Obstacle Avoidance Strategy for Manta Robot When performing the task of circumnavigating alongshore in a specific water area, the robot fish keeps a determined course forward in an open environment (without obstacles), and when obstacles appear on the front side, the robot fish needs to sense in real time and avoid obstacles based on the strategy. Then it continues the circumnavigation task along the new pool shore. According to the above, it can be seen that this manta robot achieves good yaw and fixing effect relying on its information sensing network and CPG model network. Together with the environmental information provided by four infrared range sensor modules, we developed a planar real-time obstacle avoidance and turning strategy. The task of obstacle avoidance can be described as follows: The robot fish circulates along the shore in a specific water area. The infrared distance measurement module provides the distance information of environmental obstacles on the left, right and front side of the robot fish in real time, and through the calculation and processing of obstacle information based on the obstacle avoidance strategy logic, so that the machine fish can yaw and avoid obstacles and circulate along the new pool shore according to the strategy. The obstacle avoidance and steering strategy of the robot fish is: When there is no obstacle information on the front side, the fish will swim forward according to the set heading. When the distance measurement module feeds back the obstacle on the front side, the fish will yaw to the right at a certain angle and update its heading to swim
432
B. Yang et al. Table 2. The obstacle avoidance rule
Judgment conditions
Robot conditions
Motion description
(Dfl , Dfr )min < Fs Dl < Ls Dr < Rs
Hazardous condition
Brake and float up
Fs ≤ (Dfl , Dfr )min ≤ Fv
Steering condition
Turn right and update course
(Dfl , Dfr )min > Fv
Security condition
Swim forward on determined course
along the new pool shore. When the obstacle distance on the front side is less than a certain dangerous distance, the fish will brake urgently and float up. Similarly, when the distance measurement module feeds back the obstacle on the left and right side is less than a certain dangerous distance, the machine fish will also emergency braking to avoid collision. Assuming that the critical distance for the front side to perform effective obstacle avoidance is Fv , the dangerous distance for the front side, left side and right side to perform braking instructions are Fs , Ls , Rs , the two front side range measurement return values are Dfl and Dfl , the left and right side range measurement return values are Dl , Dr , then the obstacle avoidance rule can be expressed as Table 2.
Fig. 5. The structure of complete circumnavigating control.
Alongshore Circumnavigating Control of a Manta Robot
433
In combination with the above obstacle avoidance and turning rules and the closedloop heading forward and yaw action of the robot fish, the whole circumnavigating alongshore control can be described as Fig. 5.
5 Experiments In order to verify the effectiveness of the manta robotic design and the circumnavigating alongshore control scheme, we built a rectangular pool as test platform with three opaque acrylic panels placed side by side as one wall of the water. The rectangular pool was 5.5 m long, 3.6 m wide and 1 m deep. First of all, since the closed-loop heading control is the basis of the robot fish circumnavigating alongshore, we did the closed-loop navigation experiment. This time, we set a target course of 250◦ for the robot fish. Through the real-time heading value returned by the communication module to the upper computer, we draw the following Fig. 7. It can be seen from the figure that the heading of the robot fish is stable at about 250◦ . Due to the certain drift phenomenon of the SBG module, the heading error shown in Fig. 7 will be slightly larger than the actual situation. The actual heading error is about ±5◦ , and the navigation effect is good. Figure 6 shows the pictures of this experiment.
Fig. 6. The pictures of tracking a specified heading (250◦ ).
434
B. Yang et al.
Fig. 7. The curve of tracking a specified heading (250◦ ).
Then we started the circumnavigation experiment. According to Fig. 5, we started the robot fish and set the initial heading and other control parameters for it. Then we recorded the circling photos of the robot fish and the data of its heading. As is shown in Fig. 8, the robot fish successfully swam along the shore in the rectangular pool. When encountered the pool shore from the front side, the robot fish successfully used the infrared distance sensor to sense the obstacle information, and turned right according to the control procedure and updated the new heading. Figure 9 also clearly recorded the changing heading value.
second shore 1m 5.5m
3.6m third turn second turn
first turn
forth turn
forth shore Fig. 8. The pictures of circumnavigating alongshore of a rectangular pool.
Alongshore Circumnavigating Control of a Manta Robot
435
Fig. 9. The curve of heading when circumnavigating alongshore of a rectangular pool.
6 Conclusions In this paper, we propose a design scheme of manta robot and a control method of circumnavigating alongshore based on fuzzy control and obstacle avoidance strategy. With the control method, we realize the autonomous circular swimming of this robot fish in a square pool. Through this swimming mode, civil and special tasks such as independent swimming display of pools in the aquarium and intelligent automatic patrol in specific waters can be completed. First of all, we designed a manta robot and built a set of information sensing system for real-time feedback of the robot status and obstacle information. For the asymmetric characteristics of the fluttering wings of real manta rays, we improved the CPG phase oscillator network to achieve the driving and yaw control of the robot fish. Finally, we propose an obstacle avoidance and turning strategy combined with the infrared distance sensor information, and prove the effectiveness of the overall scheme by completing the circumnavigation task in a water pool. However, this paper only gives the method of circumnavigating alongshore in rectangle-like water areas, and the obstacle avoidance and turning strategy is relatively simple. It is the focus of the next research to complete the task of circumnavigating alongshore in irregular water areas where the environment information is unknown.
References 1. Wang, T., Yang, X., Liang, J.: A survey on bionic autonomous underwater vehicles propelled by median and/or paired fin mode. Robots 35(03), 352–362+384 (2013) 2. Zhang, Q., Xue, Z.: Research on obstacle avoidance of robofish based on fuzzy control. J. Qinghai Univ. 34(03), 78–83 (2016) 3. Li, Q., Gao, J., Xie, G., et al.: Obstacle avoidance algorithm of bionic robot fish based on fuzzy control. Ordnance Ind. Autom. 30(12), 65–69 (2011) 4. Sang, H.Q., Wang, S., Tan, M., et al.: Autonomous obstacle avoidance of biomimetic robotfish based on infrared sensor. J. Syst. Simul. 06, 1400–1404 (2005) 5. Wang, J., Niu, F., Zhang, W.: Obstacle avoidance algorithm for snake like robot based on ultrasonic ranging. Electron. Qual. (08), 63–67 (2021)
436
B. Yang et al.
6. Zhuang, Y., Teng, H., Xu, T., et al.: Obstacle avoidance control of wall-climbing robot based on degraded fuzzy algorithms. Sci. Technol. Eng. 20(19), 7729–7736 (2020) 7. Cao, Y., Bi, S., Cai, Y., Wang, Y.: Applying central pattern generators to control the robofish with oscillating pectoral fins. Ind. Robot Int. J. 42(5), 392–405 (2015) 8. Fu, Y., Cao, Z., Wang, S., et al.: Application of sensors in real-time obstacle avoidance of multi-joint robot system. Robot (01), 73–79 (2003) 9. Cao, Y., Xie, Y., He, Y., Pan, G., Huang, Q., Cao, Y.: Bioinspired central pattern generator and T-S fuzzy neural network-based control of a robotic manta for depth and heading tracking. J. Mar. Sci. Eng. 10, 758 (2022). https://doi.org/10.3390/jmse10060758 10. Hao, Y., Cao, Y., Cao, Y., Huang, Q., Pan, G.: Course control of a manta robot based on amplitude and phase differences. J. Mar. Sci. Eng. 10, 285 (2022). https://doi.org/10.3390/ jmse10020285
Robot Calligraphy Based on Footprint Model and Brush Trajectory Extraction Guang Yan1
, Dongmei Guo1,2(B)
, and Huasong Min1(B)
1 Wuhan University of Science and Technology, Wuhan 430081, China
[email protected] 2 Anhui University of Science and Technology, Huainan 232001, China
[email protected]
Abstract. In this paper, a method based on footprint model and brush trajectory extraction is proposed to make robots write calligraphy. As the footprint model is important for robot calligraphy, especially for the width control of stroke. A footprint model is first proposed based on a binary linear regression algorithm. The relationship between controllable parameters of robot and the width of stroke is established in our model. Thus, the model is suitable for robotic writing. Then, a skeleton-based stroke generation method is proposed for simulating the actual writing process. The writing trajectory of stroke is extracted by the skeleton tracking algorithm, and the normal angle is computed to reconstruct the brush stroke with footprint model and brush trajectory. A writing optimization method based on calligraphic rules is proposed to make strokes conform to calligraphic characteristics. Finally, the Non-Uniform Rational B-Spline (NURBS) algorithm is used for robotic writing path planning so that actual writing could be performed. Our approach shows excellent performance in writing typical strokes and Chinese characters. Keywords: Robot calligraphy · Footprint model · Stroke generation method · Skeleton tracking algorithm · Calligraphic rules
1 Introduction Robot calligraphy combines calligraphic creation with robotic technology, which not only reproduces artistic charm, but also promotes the transmission of calligraphic culture [1]. At the same time, the theoretical of robotic calligraphy research can provide a reference for robotics applications in other fields. The soft and deformable nature of the brush bristles dictates that calligraphic robots cannot write by trajectory information alone but rather model the brushes and study the mechanism of stroke formation [2]. Wong and IP [3] constructed a 3D footprint model by abstracting the brush tip as an inverted cone and the brush contact area with the paper surface as an ellipse. However, it is impossible for the brush to penetrate the paper surface in actual writing, and the computation of modeling in terms of brush hairs is time-consuming, which cannot guarantee the real-time performance. Lei Huang © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 437–451, 2023. https://doi.org/10.1007/978-981-99-0617-8_30
438
G. Yan et al.
[4] proposed a brush model based on haptic feedback, but the model is mainly used for virtual brush devices and is not suitable for robotic calligraphy. Junsong Zhang [5] and Ziqiang Qian [6] used Bezier curves and trigonometric functions to simulate the raindrop model respectively. Joshi A. [7] designed a circular model, so the footprint model is rendered along the midline of the stroke trajectory to generate a virtual stroke. However, the ink will diffuse when writing with brush, and the footprint of brush will not be a circular. The above-mentioned footprint model is limited to the field of computer graphics. It cannot be used for robot calligraphy, as the control parameters of the robot are not considered. How to capture a trajectory that represents the characteristics of the strokes and control the robot writing is also one of the research focuses. Xu P. et al. [8] extracted the writing trajectories from the video stream of calligraphers’ writing. However, the trajectory information captured by the camera is not accurate enough, and the writing trajectories of different calligraphers are not the same. Sen Wang et al. [9] formulated the calligraphic writing as a trajectory optimization problem. In their method, stroke trajectory was used as the open-loop control trajectories for robots to draw calligraphy, and the writing trajectory is optimized by a pseudo-spectral optimal control method. Hsien-I Lin [10] obtained the trajectory model of the brush from the experiment and characterized it with Bezier curve. The model was used to derive the relationship between the trajectories of the brush and handle. Gan Lin et al. [11] proposed a calligraphy system combining the Long Short-Term Memory (LSTM) network and the Generative Adversarial Network (GAN). Each generation process of the GAN contains multiple cycles of LSTM network, and the LSTM network acquires trajectory sequences from the input stroke images. In each loop, the robot continued writing by following a new trajectory point. However, deep learning-based methods require large datasets for iterative training to improve performance and are difficult to generalize to writing new or complex character. As mentioned above, our key contributions are as follows: (1) A number of footprints are collected to analyze the relationship between the images and the controllable parameters of robot. Then, a footprint model suitable for robotic writing is proposed based on linear regression algorithm. Compared with existing methods, the results demonstrate the advantages of proposed model in terms of high similarity. (2) A fast stroke generation method is proposed to simulating the real writing process. The writing trajectory is extracted from images, and the normal angle of footprint model combined with trajectory is designed to reconstruct brush strokes. (3) A writing optimization method based on calligraphic rules is proposed to make the writing process more consistent with the calligraphic characteristics. A robotic calligraphy platform is built to validate our method, and the robot shows excellent performance in writing.
2 Footprint Model We focus on reproducing the effect of ink on paper during the process of Chinese calligraphy instead of calculating the changes of brush hair. As is shown in Fig. 1, the
Robot Calligraphy Based on Footprint Model
439
actual footprint is collected by robotic demonstrations and the parameters of the footprint model are calculated using a linear regression algorithm.
Fig. 1. The footprint written by a real brush and the simulated effect of proposed footprint model.
2.1 Collecting the Footprint of Brush In the experiment, the initial state is defined as: the brush was vertical and perpendicular to the paper, the tip of the brush just touched the paper without pressure. The drop height d and the inclined angle α of the brush are the parameters of robot. The experimental process is described as: the robot is first adjusted to the initial state, and adjusted to the specified inclination angle. Then, the robot slowly descends to the specified height along the longitudinal direction and lifts the pen after 1 s. We set d = {11 mm, 13 mm, 15 mm, 17 mm, 20 mm} and α = {0◦ , 5◦ , 10◦ } respectively. Thus, there are 15 different combinations of d and α. For each group, at least 10 footprint images were collected using the robotic writing system and a normal camera. Figure 2 shows some samples of the collected footprint images.
Fig. 2. Samples of footprint. (a)-(o) shows the stroke images with the different combinations of the parameters d = {11 mm, 13 mm, 15 mm, 17 mm, 20 mm} and α = {0◦ , 5◦ , 10◦ } respectively.
440
G. Yan et al.
2.2 Proposed Model Based on the trigonometric function method, the mathematical expression of the footprint model is described as: x = a · (1 − sin t) · cos t (1) y = b · (sin t − 1) where a and b are the shape description parameters of the proposed model. The simulation effect of footprint model is shown in Fig. 3.
Fig. 3. Simulated result. (a) The contour of the model. (b) Simulation of the model.
where O1 and O2 are the starting and ending points of the contour, O3 and O4 are the upper and lower vertices, respectively. The parameters L and W are used to denote the length and width of the stroke. L and W are mutually perpendicular. For the extreme value of Eq. (1), the linear proportionality of {a, b} and {L, W } can be described as Eq. (2). L = 2b√ (2) W = 323a
2.3 Parameters Identification Some of the experimental data are recorded in Table 1. Table 1. Experimental dataset of footprint collection. Drop height of brush d (mm)
Inclined angle of brush α (°)
Length of the footprint L (cm)
Width of the footprint W (cm)
11
0
3.25
2.14
11
5
3.40
2.20
11
10
3.60
2.30 (continued)
Robot Calligraphy Based on Footprint Model
441
Table 1. (continued) Drop height of brush d (mm)
Inclined angle of brush α (°)
Length of the footprint L (cm)
Width of the footprint W (cm)
17
0
4.57
2.80
17
5
4.82
2.88
17
10
4.98
3.00
20
0
5.23
3.12
20
5
5.46
3.22
20
10
5.62
3.28
Analyze the data above, it can be concluded that the size of footprint is positively correlated with the drop height d and inclined angle α of the brush. In order to study how the drop height and inclined angle of brush affects the shape of footprint, a binary linear regression equation is established as: L = β1 ∗ d + β2 ∗ α + ε1 (3) W = γ1 ∗ d + γ2 ∗ α + ε2 where β1 , β2 , γ1 , γ2 are regression coefficients, and ε1 , ε2 are constant terms. The parameters of the footprint model are computed by the least squares method, as is shown in Eq. (4): L = 0.236 ∗ d + 0.045 ∗ α + 0.535 (4) W = 0.115 ∗ d + 0.019 ∗ α + 0.827 where L is the length of the footprint, W is the width of the footprint, d is the drop height of brush, and α is the inclined angle of brush, the angles in the formula are expressed in radians.
3 Stroke Generation Method A skeleton-based stroke generation method is proposed to simulate the writing process. First, the writing trajectory is captured from the reference stroke image by trajectory tracking algorithm. The normal angle of trajectory is calculated, which representing the actual writing direction. Then, the width feature at each trajectory point is extracted. The writing process is simulated by continuously accumulating strokes based on the brush trajectory combined with proposed footprint model. Finally, for different calligraphy stages, a writing optimization method is introduced to fine-tune the trajectory information based on calligraphy rules and inclination information. Figure 4 shows the stroke generation process of our method.
442
G. Yan et al.
Fig. 4. Stroke generation process.
3.1 Extraction of Writing Trajectory In this section, the skeleton tracking algorithm is used to extract the stroke writing trajectory. The entire procedure of the skeleton tracking algorithm is illustrated with detailed pseudocode in Algorithm 1 and Algorithm 2.
Algorithm 1 Extract stroke writing trajectory. Input: A binary stroke image Im. begin SP = [] // Initialize SP to stack trajectory point coordinates = 10 // is a pre-determined image size Im0 = Skeletonize (Im) // via Zhang-Suen algorithm (Im1, Im2) = Segmentate (Im0) repeat M = FindNotEmpty (Im1, Im2) for each M' in M // M' is a non-empty matrix if width(M'), height(M') < SP = [SP; ExtractSkeletonPoints(M')] // (see Algorithm 2) else (Im1, Im2) = Segmentate(M') end if end for until M' can’t be segmented // M' is a zero matrix end Output: The set of the stroke coordinates SP.
Robot Calligraphy Based on Footprint Model
443
Algorithm 2 Extract skeleton points. Input: Binary submatrix M'. begin flag = false // Initialize a flag to false for x in border (M') // Traverse each boundary pixel in M' if x=1 and flag=false S = [S; coordinate(x)] flag = true end if if x=0 and flag=true P = S (end) S (end) = [] S = [S; midpoint (P, coordinate(x))] //Save midpoint flag = false end if end for end Output: The coordinates of subimage S.
When a binary image is input, it is first converted into a 2D matrix consisting of foreground pixel 1 and background pixel 0. The binary matrix is then segmented by row or column. The boundary pixel points of each non-empty submatrix with size less than a threshold are traversed and the pixel points that satisfy the condition are stacked as in Algorithm 2. Finally, the binary image is output in real time as a set of vector folds which is a collection of 2D coordinates along the image skeleton. Figure 5 shows a comparison of the trajectory extraction results of five algorithms. Clearly, it can be seen that the trajectories extracted by skeleton tracking algorithm reflect the writing characteristics of the reference strokes better and the trajectory is smoother compared to several other algorithms.
444
G. Yan et al.
Fig. 5. Comparison of stroke trajectory results. (a) Real stroke image. (b) Results of medial-axis algorithm. (c) Results of thin algorithm. (d) Results of skeletonize3d algorithm. (d) Results of Zhang Suen algorithm. (f) Results of skeleton tracking algorithm.
3.2 Stroke Generation Compute the Normal Angle. The writing trajectory can be represented by a discrete path composed of skeletal feature points. The actual stroke is a vector along the direction of the trajectory, so the normal angle is proposed to represent the actual writing direction. As in Eq. 5, the feature points are saved into the array P. P = [P1 , P2 · · · Pn ]
(5)
where Pi = (Pix , Piy ) denotes the 2D coordinates of each trajectory feature point, and −−−→ the vector Pi Pi+1 denotes the writing direction at the feature point. The slope of the direction vector of adjacent feature points is calculated by Eq. 6, and the normal angle θ is calculated according to the slope in turn as shown in Eq. 7. gradienti =
P(i+1)y − Piy (i = 1, 2, · · · n − 1) P(i+1)x − Pix
θi = arctan (gradienti )
(6) (7)
Figure 6 shows the start of the stroke. It can be seen that after adding the normal angle, the start phase of the stroke is more in line with the characteristics of the real one.
Robot Calligraphy Based on Footprint Model
445
Fig. 6. (a) Schematic of trajectory feature point. (b) Normal angle at the start of a stroke.
Extract the Width of Stroke. Traverse the trajectory points to find the largest inscribed circle [12] corresponding to each trajectory point. Calculate the stroke width equivalently with the diameter of the largest inscribed circle at the trajectory point. The procedure for stroke width feature extraction is as follows: Step 1: Take the coordinates of the current trajectory point as the center of the circle, R = 0. Step 2: Make the feature circle with the same circle center, then R = R + 1. Step 3: Check whether there is a point with pixel value of 255 within the circle. Step 4: If it exists, R = R−1, W = 2R. Otherwise, return to Step2 to continue the cycle. As the stroke image is binarized, only the grayscale value of each pixel point within the feature circle is checked. When a white point appears inside the circle for the first time, the maximum inner circle corresponding to the trajectory point is considered to have been found. Repeat the above steps to get the width of the stroke at each trajectory point. Superposition of Footprints. As is shown in Fig. 7, the strokes are generated by the superposition of footprint model. For each stroke, the position coordinates, normal angle and width are already known. Based on the footprint model, a stroke can be generated by superposition of a sequence of brush footprints. This paper compares various fitting curves and chooses closing the B-spline curve to connect these contour points into a simulated stroke image.
Fig. 7. Simulated results of the stroke horizontal. (a) Real stroke. (b) Stroke superimposition schematic. (c) Simulated stroke.
3.3 Writing Optimization Based on Calligraphic Rules Chinese calligraphy has distinctive characteristics of writing and techniques of brush movement, which called calligraphic rules. If the calligraphic rules are not followed, the
446
G. Yan et al.
imitated Chinese characters will distort in details. To facilitate the subsequent discussion, the following six calligraphic terms are first defined: (1) (2) (3) (4) (5) (6)
Qibi: The brush is moved straight down with the tip just touching the paper. Xingbi: The brush is moved at a certain height and inclination. Shoubi: The brush is moved vertically upwards with the tip leaving the paper. Anbi [10]: The brush is pressed downwards and then move horizontally. Huifeng: The tip of brush is hidden at the end of writing. Fangfeng: The tip of brush is shown at the end of writing.
The calligraphy process is divided into three stages: Qibi, Xingbi and Shoubi. At the stage of Qibi, the brush is slightly turned to form a fine point, and then a point-like stroke is formed by pressing the brush. During this process, the inclination angle α is set to 0°, and the drop height d of the brush gradually increases. At the stage of Xingbi, the brush will be inclined in order to aggravate the ink effect. Therefore, the inclined angle α is usually set to 5°. As the writing speed is fast, the drop height d of the brush remains unchanged. Shoubi is the end of a stroke. There are two ways in calligraphy: “Huifeng” and “Fangfeng”. As is shown in Fig. 8, the horizontal stroke simulates the process of “Huifeng”. In this process, the direction and position of the brush is changed continuously. The drop height of the brush d decreases gradually, and the inclination angle α gradually decreases to 0°. The vertical stroke simulates the process of “Fangfeng”. In order to make the stroke smoother, the number of trajectory points should be reduced. The brush is lifted quickly, so the drop height of the brush d is rapidly decreased, and the inclination angle α is set to 0°.
Fig. 8. Simulation of horizontal and vertical strokes. (a) shows “Huifeng” process. (b) shows “Fangfeng” process.
4 Robot Calligraphy 4.1 Writing Path Planning Robot writing path planning requires the completion of real-time complex curved trajectory motion based on a small number of critical path points [13]. NURBS algorithm is used for robot writing path planning. It has the following advantages: (1) The NURBS
Robot Calligraphy Based on Footprint Model
447
curve is a segmented continuous curve with smooth connection properties. It allows local adjustment of the curve without affecting the overall trajectory, a property that facilitates our subsequent fine-tuning of the trajectory according to the calligraphy rules. (2) NURBS curves do not need higher order to describe complex curves. Curves of lower order can better fit the motion path with less computational effort. As is shown in Fig. 9, the NURBS curve converts the skeleton feature points into the robot writing trajectory and the fitting result basically matches the stroke trajectory.
Fig. 9. NURBS fitting trajectory of “horizontal” stroke.
4.2 Experiment and Discussion As is shown in Fig. 10, a high-precision robot writing platform is built, including a 6-DOF robot arm and a computer. A wolf-hair brush with a length of about 48 mm and radius of about 6 mm is used for writing. The computer is used to control the robot for writing calligraphy.
Fig. 10. Hardware architecture for experiments. (a) The robotic calligraphy system. (b) Calligraphy brush.
Comparison of the Footprint Models. The proposed footprint model is compared with some other typical models. In [7], Joshi A used a circle model to fit the stroke.
448
G. Yan et al.
Wong and IP [3] designed an ellipse to generate the stroke graphic. Figure 11 shows the results of these models and the corresponding real images collected in our proposed dataset. Clearly, compared with the other models, our model shows the best performance in simulating the footprint of the brush.
Fig. 11. Comparison of proposed model and other models. (a) Actual brush footprint images. (b) shows the binary images of the real footprint images. (c) shows the results of circle model [7]. (d) shows the results of ellipse model [3]. (e) shows the results of proposed model.
Writing of Typical Strokes. Based on the footprint model, five typical strokes are tested in writing. Figure 12 shows the referenced strokes and the actual written results for calligraphic stroke “dian”, “heng”, “shu”, “pie”, and “na”.
Fig. 12. Five typical strokes written by the robotic arm. (a) Referenced strokes. (b) Writing effect before optimization. (c) Writing effect after optimization.
Robot Calligraphy Based on Footprint Model
449
Writing of Chinese Characters. Chinese characters can be considered as an arrangement of different strokes. The strokes are arranged according to the calligraphic rule of “left to right, top to bottom”. As is shown in Fig. 13, the Chinese character “不” in Yan style and the Chinese character “去” in Kai style were imitated. The writing effects are of high quality, which proves the validity of our method.
Fig. 13. Different calligraphy styles of Chinese characters written by the robotic arm. (a) Referenced Characters. (b) Writing effect before optimization. (c) Writing effect after optimization.
4.3 Evaluation Metrics As all the images used in this paper are binarized, cosine similarity (CSIM) [14] can be applied. Thus, the performance of the footprint models and the writing effect of the robot is evaluated. The cosine similarity (CSIM) is defined as the cosine distance between pair-wise image vectors, which can be described as follows: Xi1,j1, Xi2,j2, · · · Xin,jn · Yi1,j1, Yi2,j2, · · · Yin,jn (8) CSIM = i,j Xi,j i,j Yi,j represents the vector of real images and where Xi1,j1 , Xi2,j2 ,. . . Xin,jn Yi1,j1 , Yi2,j2 , . . . Yin,jn represents the vector of robotic writing image. CSIM is used to evaluate each footprint model based on the dataset. The results in the best case, the worst case, and the average of all images are recorded. As is shown in Table 2, the proposed model exhibited the highest similarity with the real brush footprint images, with a maximum similarity of 90.25% and an average value of more than 88%. Table 2. The performances (%) of other models and proposed model. Model
Max CSIM
Min CSIM
Avg CSIM
Circle [7]
71.40
64.36
67.03 (continued)
450
G. Yan et al. Table 2. (continued)
Model
Max CSIM
Min CSIM
Avg CSIM
Ellipse [3]
86.43
81.62
84.15
Ours
90.25
85.56
88.64
Table 3. Evaluation results (%) of typical strokes. “dian”
“heng”
“shu”
“pie”
“na”
Before optimization
72.32
84.36
83.37
78.68
80.13
After optimization
76.56
88.80
90.65
88.33
84.76
Table 4. Evaluation results (%) of Chinese characters. Yan Style Chinese Character “不”
Kai Style Chinese character “去”
Before optimization
80.13
75.60
After optimization
85.32
83.17
The evaluation results of calligraphy work written by robot arm are shown in Table 3 and Table 4. For different styles of strokes and Chinese characters, the robot has a good performance in writing. The writing effect without optimization based on calligraphy rules distort in details. After optimization, the overall shape of the Chinese characters is significantly more similar, and the starting and closing stages of the strokes have been improved.
5 Conclusion In this paper, a method based on footprint model and brush trajectory extraction is proposed to make robots write calligraphy. A footprint model for robotic calligraphy is established by statistically analyzing the brush footprint. A skeleton-based stroke generation method is proposed, which simulating the actual writing process. A writing optimization method based on calligraphic rules is proposed to make the strokes conform to the calligraphic characteristics. The experiment shows that the robot has wonderful performance in drawing calligraphy, which proves the superiority of our method. There is still room to improve this work. The tip of the brush is twisted and diverged when writing, resulting in a difference between the writing result and the referenced calligraphy. In the future, we will expand the stroke dataset and incorporate footprint model into deep neural networks to teach and train robots to complete Chinese calligraphy well.
Robot Calligraphy Based on Footprint Model
451
Acknowledgments. This work is supported by the National Natural Science Foundation of China (grant No.: 62073249) and Major Project of Hubei Province Technology Innovation (grant No.: 2019AAA071).
References 1. Guo, D., Min, H.: Survey of calligraphy robots. Control Decis. 37(7), 1665–1674 (2022). https://doi.org/10.13195/j.kzyjc.2021.0132 2. Hou, Z., Yang, G.: Research and prospect of virtual brush modeling. Appl. Res. Comput. 32(9), 2572–2577 (2015). https://doi.org/10.3969/j.issn.1001-3695.2015.09.003 3. Wong, H.T.F., Ip, H.H.S.: Virtual brush: a model-based synthesis of Chinese calligraphy. Comput. Graph. 24(1), 99–113 (2000). https://doi.org/10.1016/S0097-8493(99)00141-7 4. Huang, L., Hou, Z.: A novel virtual 3D brush model based on variable stiffness and haptic feedback. Math. Probl. Eng. (2020). https://doi.org/10.1155/2020/6942947 5. Zhang, J., Zhang, Y., Zhou, C.: Simulating the writing process from Chinese calligraphy image. J. Comput. Aided Des. Comput. Graph. 26(6), 963–972 (2014). https://doi.org/10. 3969/j.issn.1003-9775.2014.06.014 6. Qian, Z.: Calligraphy parametric dentification based on image processing. Ind. Control Comput. 29(9), 124–125 (2016). https://doi.org/10.3969/j.issn.1001-182X.2016.09.056 7. Joshi, A.: Efficient rendering of linear brush strokes. J. Comput. Graph. Tech. 7, 1–15 (2018). https://jcgt.org/published/0007/01/01/paper.pdf 8. Xu, P., Wang, L., et al.: Evaluating brush movements for Chinese calligraphy: a computer vision based approach. In: 27th International Joint Conference on Artificial Intelligence, pp. 1050–1056. AAAI Press (2018). https://doi.org/10.24963/ijcai.2018/146 9. Wang, S., Chen, J., Deng, X., et al.: Robot calligraphy using Pseudospectral Optimal Control in conjunction with a novel dynamic brush model. In: 2020 IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 6696–6703. IEEE (2020). https://doi.org/10. 1109/IROS45743.2020.9341787 10. Lin, H.-I., Chen, X., Lin, T.-T.: Calligraphy brush trajectory control of by a robotic arm. Appl. Sci. 10(23), 8694 (2020). https://doi.org/10.3390/app10238694 11. Gan, L., Fang, W., Chao, F., et al.: Towards a robotic Chinese calligraphy writing framework. In: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 493–498. IEEE (2018). https://doi.org/10.1109/ROBIO.2018.8665143 12. Xin, S., Liang, D., et al.: A study of robotic calligraphy copying based on style transfer technology. Machinery 56(7), 42–47 (2018). https://doi.org/10.3969/j.issn.1000-4998.2018. 07.013 13. Li, J., Min, H., Zhou, H., Xu, H.: Robot brush-writing system of Chinese calligraphy characters. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds.) ICIRA 2019. LNCS (LNAI), vol. 11745, pp. 86–96. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27529-7_8 14. Wu, R., Fang, W., Chao, F., et al.: Towards deep reinforcement learning based Chinese calligraphy robot. In: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 507–512. IEEE (2018). https://doi.org/10.1109/ROBIO.2018.8664813
Hierarchical Knowledge Representation of Complex Tasks Based on Dynamic Motion Primitives Shengyi Miao1 , Daming Zhong1 , Runqing Miao2 , Fuchun Sun1,3 , Zhenkun Wen4 , Haiming Huang1 , Xiaodong Zhang5 , and Na Wang1(B) 1 College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060,
China [email protected], [email protected] 2 College of Modern Post, Beijing University of Posts and Telecommunications, Beijing 100083, China [email protected] 3 Department of Computer Science and Technology, Tsinghua University, Beijing 100083, People’s Republic of China 4 College of Computer Science and Software, Shenzhen University, Shenzhen 518060, China 5 Beijing Key Laboratory of Intelligent Space Robotic Systems Technology and Application, Beijing 100083, China
Abstract. Current service robots without learning ability are not qualified for many complex tasks. Therefore, it is very significant to decompose the complex task into repeatable execution unit. In this paper, we propose a complex task representation method based on dynamic motion primitives, and use hierarchical knowledge graph to represent the analytic results of complex tasks. To realize the execution of complex robot manipulation tasks, we decompose the semantic tasks into the minimum motion units that can be executed by the robot and combine the multi-modal information: posture, force and robot joint parameters, obtained by the sensors. We use the knowledge graph to record the end-effector required by the robot to perform different tasks and make appropriate selection of end-effector according to different needs. Finally, Taking the long sequence complex task of service scene as an example, we use UR5 robot to verify the effectiveness and feasibility of this design. Keywords: Robot manipulation · Knowledge expression · Knowledge graph · Motion primitives
This work was supported by the Major Project of the New Generation of Artificial Intelli-gence (No. 2018AAA0102900), the National Natural Science Foundation of China (No. 62173233), the Shenzhen Science and Technology Innovation Commission project (JCYJ20210324094401005, JCYJ20220531102809022), the Guangdong Basic and Applied Basic Research Foundation (No. 2021A1515011582). © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 452–462, 2023. https://doi.org/10.1007/978-981-99-0617-8_31
Hierarchical Knowledge Representation of Complex Tasks
453
1 Introduction With the development of artificial intelligence technology, robots are increasingly entering people’s daily life. From medical robots, family service robots to restaurant chefs, they are everywhere. In the real world, the tasks that robots need to perform are usually complex tasks. Before completing complex tasks, they usually used the method of teaching pendant coding or upper computer coding. The robot manipulation tasks implemented by this method can only be applied to specific scenarios, and cannot be repeatedly applied. At the same time, it requires more human and material resources and is inefficient. In order to deal with the challenges related to the manipulation in the real scenario, the robot must summarize knowledge efficiently and improve the learning efficiency of skills through appropriate knowledge expression methods. For this reason, reference [1] has discussed in detail the specific methods of task planning, knowledge representation and retrieval of service robots. However, the knowledge representation method described in this scheme still stays at a broad semantic level, which brings difficulties to the specific implementation of robots. In order to solve the problem of knowledge representation of long-sequence tasks, reference [2] uses the manipulation tree to store the execution sequence and complete the execution of longsequence tasks by searching the execution sequence in the knowledge base. However, this method cannot reflect the relationship between the manipulation action and the specific manipulation object, which cannot support the complex knowledge reasoning and learning of the robot. In terms of describing the unit actions that the robot can perform, reference [3] composes the whole action sequence by cutting the video frame into unit actions. The task sequence described in this method can perform complex tasks, but the split unit actions are still complex and not suitable for direct execution by the robot. The matching of manipulation type and manipulation object is also a key part of knowledge representation. Reference [4] determines the relationship between action and detection object based on visual detection information, so as to better express the manipulation knowledge of robot. Similarly, reference [5] finds the relationship between actions and objects and manipulation subjects through the context semantic relationship among operators, tools and manipulation objects. These works describe the main factors that affect robot task planning from many aspects through their respective methods. It can be concluded that the complex task is divided into sub tasks, the task is described in the form of action sequence or manipulation tree, and the relationship between the manipulation action and the operator and the manipulation object is clearly described, which is conducive to the execution of the manipulation task. In order to find a general method for the robot to perform complex tasks, The knowledge graph is used to record the relationship between the motion primitives and the manipulation objects. According to the prior knowledge stored in the knowledge graph, the relationship between the execution end and the corresponding objects is found. Finally, the operating parameters are introduced so that the robot can directly execute the corresponding action sequence. This work proposes a hierarchical knowledge representation method of complex tasks based on motion primitives for complex task scenarios. The main contributions are as follows:
454
S. Miao et al.
(1) This work defines the minimum action unit that a robot can execute, and constructs the transformation between the robot action unit and the human semantic action, so as to meet the execution needs and facilitate the human semantic analysis. (2) In this work, the manipulation knowledge graph of complex tasks is constructed. The manipulation knowledge is described from four levels: task level, object level, agent level and motion primitive level. The complex tasks are planned according to the preconditions and effects of manipulation. (3) To verify the execution effect of the knowledge representation method of this design, we conducted an experiment with the long task catering in the service scenario.
2 Related Work In real production life, robots usually need to perform more complex tasks, including a series of complex actions. Therefore, the knowledge representation of complex actions is a key factor that affects the efficiency, stability and cost of complex assemble systems. Behavior tree is one of the common methods of robot task planning. It is characterized by the modular design that can be repeatedly executed, and the execution code can be reused to greatly improve the execution efficiency of complex tasks [6, 7]. However, the task execution of robots is not limited to the task planning field. The behavior tree can be well used for the task planning of robots. However, the actual tasks require the robot to have the ability of self-learning. Therefore, a robot that can perform complex tasks also needs to have the ability of knowledge representation and reasoning for new scenarios. For this reason, Sun X etc. Developed a new task representation method, a robot manipulation knowledge representation method based on knowledge graph RTPO. By storing the manipulation sequence in the form of knowledge graph, it can better meet the knowledge representation and learning requirements of the robot in the new scenario [8]. In order to enable robots to independently complete complex manipulation tasks, it is necessary to equip them with knowledge and reasoning mechanisms. Reference [9] proposes a service to promote knowledge representation and reasoning-openEASE, which aims to enable robots and operators to analyze and learn manipulation data. In order to meet the reasoning and learning needs of robots for complex tasks, Knowrob2.0 designed a set of ontologybased robot knowledge representation methods, and built a multi-level knowledge graph based on tasks, so as to realize robot knowledge reasoning and task planning [10].
3 Design In order to enhance the adaptability of the robot in the face of complex and unfamiliar scenarios, the manipulation action of the robot is disassembled into the smallest action unit that can be executed based on motion primitives, and the relationships among tasks, actions, objects and agents are stored in the form of knowledge graph. The overall design architecture is shown in Fig. 1. This chapter introduces the overall architecture of knowledge representation, then it introduces the definition of robot manipulation motion primitives, the construction method of knowledge graph, the strategy of end effector selection and action sequence generation, and the action sequence execution scheme based on action parameters.
Hierarchical Knowledge Representation of Complex Tasks
455
Fig. 1. The overall architecture of this paper. The whole framework is composed of four layers, including service layer, storage layer, strategy layer and execution layer. The service layer is mainly used for the storage of knowledge, the retrieval of knowledge and the release of instructions. The storage layer is constructed in the way of knowledge graph, including four parts: task, agent, action and object. At the same time, the logical relationship between each part is also stored in the knowledge graph. The strategy layer includes the selection of end effector and the generation of execution action sequences. The last execution layer includes the import of execution parameters and the execution of robot manipulation sequences.
3.1 Motion Primitive Definition and Human-Robot Action Transform Motion primitives can be divided into semantic actions and executive actions. Semantic actions are abstract summaries of human actions, and executive actions are simple actions that can be executed by robots. The execution motion primitives of this work are classified according to the characteristic attributes of the robot manipulation actions, mainly from the contact type, contact time and motion trajectory type of the actions. The contact types are mainly divided into contact and non-contact. For example, the movement type belongs to non-contact action, while some transitive actions belong to contact action, In the contact action, it can be divided into two types: continuous contact and noncontinuous contact, that is, it is classified according to the contact time, and the trajectory type is divided according to the three-dimensional motion difference of the manipulation action in the robot coordinate system, and then combined with the end posture of the robot, so as to determine the respective distinction of its actions. According to the division rules, its motion primitives can be divided into 11 types, including diagonal-move, vertical-move and horizontal-move of the moving class, pushing, lifting and pressing of the object class, and end posture adjustment, gripper-open, gripper-close, suck and release of the end class. All 11 primitive actions can be transformed into the execution
456
S. Miao et al.
code of the robot, and the execution and manipulation of long-sequence tasks can be realized through the modular combination of these execution actions. There are differences between human semantic actions and robot actions. According to the characteristics of human manipulation, 13 kinds of human semantic actions are defined in this work. Human actions can be decomposed into the combination of multiple robot actions. When human performs actions, vision works continuously. Therefore, when mapping human actions and robot actions, positioning is usually added. The specific action definitions and mapping relationships are shown in Table 1, By defining human executable primitive actions, on the one hand, its semantic expression is convenient for the operator to understand, and on the other hand, it will also help the subsequent manipulation knowledge reasoning. Table 1. Transformation strategy between semantic action and executive action Semantic action
Execute action
Approach
Locate + move-class
Take
Locate + posture-adjustment + suck/grasp
Move
(suck/grasp +) move-class
Release
move-class + release
Press
Locate + press
Push
Locate + move-class + press
Home
Home
Grasp
Locate + posture-adjustment + grasp
Lift
(suck/grasp +) vertical-move
Pull
suck/grasp + pull
Twist
suck/grasp + posture-adjustment
Pinch
Locate + grasp
Shake
Rotate + posture – adjustment + cycle
3.2 Construction of Knowledge Graph To enable the robot to perform tasks with a certain reasoning ability, not only its task planning and action execution scheme need to be considered, but also the relationship between specific actions and manipulation objects, as well as the agents performing manipulations. As shown in Fig. 2, this scheme builds a knowledge storage architecture including four levels of tasks, agents, actions and objects based on the execution requirements of long-range complex tasks. A complex task can be realized by completing multiple sub-tasks, and sub-tasks are composed of primitive action sequences. The agent layer mainly stores the executors of actions. By building the relationship between actions and agents, the potential general laws between actions and agents can
Hierarchical Knowledge Representation of Complex Tasks
457
be found through subsequent knowledge reasoning. The action layer is mainly used to store motion primitives that can be used for task execution. According to the defined classification criteria, it can be divided into three categories: Move class, object class and end-effector class. The object layer is mainly a collection of manipulation objects. By determining the relationship between manipulation objects and actions, appropriate manipulation actions can be determined for different types of manipulation objects. At the same time, the execution effect of each manipulation action is also stored in the knowledge graph as the relationship between actions and objects (Fig. 3).
Fig. 2. Framework diagram of knowledge graph construction. The whole framework is composed of four levels, including task, agent, action and object. At the same time, the logical relationship between each part is also stored in the knowledge graph.
3.3 Action Sequence Generation Strategy
Fig. 3. Generation strategy of action sequence. The input is the semantic task and the current state. The semantic action sequence is generated through parsing and stored in the knowledge base, and then the action sequence is output through retrieval.
A complex manipulation task can be described as a number of human semantic expression sub-tasks. Through the analysis of sub-tasks, it can be disassembled into
458
S. Miao et al.
short sentences of multiple subjects, predicates and objects. The predicates are human semantic primitive actions. For example, “take milk” as an example, the subject is the agent itself by default, “take” is the human semantic motion primitive, and “milk” is the object, it is also the specific target of semantic action. If the motion primitive is converted into the execution bit, the action of “holding” corresponds to “locate + postureadjustment + grasp”. These actions are the execution motion primitives of the robot. By calling the visual information of the multi-sensor, the execution parameters can be provided for the execution motion primitives of the robot. Finally, the execution action is transformed into the execution code, and the corresponding action can be executed with the execution parameters. A complex task is the sequential combination of these short sentences. By executing the actions of each short sentence, the planning of complex tasks can be realized (Fig. 4). 3.4 End Effector Selection Strategy
Fig. 4. Selection strategy of end effector. Among them, the input is the object attribute and object shape obtained by object detection. Through analysis and retrieval in the knowledge graph, the usable agents and alternatives actions can be obtained. The end effector in this paper is a part of the agent.
The physical characteristics of the object are closely related to the end effector it selects. The end effector selection strategy designed in this paper is mainly based on the shape characteristics of the object. When the knowledge is stored in the new target object, the shape characteristics of the object are determined according to the visual detection, and the target object is classified based on this, and then the grasping methods that can be used are determined according to the relationship between the stored motion primitives and the object. Take “take milk” and “take apple” for example, the shape of the milk box is a cube, so two finger grippers can be selected, while the shape of the apple is a sphere, two-finger gripper and three-finger gripper can be selected. Through simple end-effector selection reasoning based on different shapes, the robot can have certain intelligent selection ability when encountering new target objects.
4 Experiment and Discussion This paper takes the complex catering task in the service scenario as an example to verify the designed expression method of complex tasks based on motion primitives. The
Hierarchical Knowledge Representation of Complex Tasks
459
Fig. 5. Schematic diagram of operating platform. In this experiment, UR5 is used as an agent to perform tasks, including optional two-finger gripper and three-finger gripper. The operation platform is used to complete the catering task.
manipulation platform of catering experiment is shown in the Fig. 5. The task focuses on the semantic analysis of complex tasks and the transformation between semantic motion primitives and execution motion primitives. Through the execution experiment of complex tasks, the reliability and efficiency of the knowledge expression method based on motion primitives in this design are verified. 4.1 Long-Sequence Complex Task Analysis As shown in Fig. 6, the whole long-sequence complex catering task can be divided into six sub-tasks: take the egg, take apple, take the bread, take the milk, take the cup, and pour drinks. Each sub-task can be divided into specific semantic action sequences. Taking “take the egg” as an example, it can be divided into two steps: grasp the egg and place the egg. While “pour drinks” includes five steps: twist the bottle cap, place the cap, take the drink bottle, pour the drink to the cup, and place the drink bottle. The semantic action sequence can be obtained by extracting the semantic action in these steps, and then the semantic action can be converted into the execution action sequence according to the conversion information, and the task can be completed with the corresponding execution parameters.
460
S. Miao et al.
4.2 Demonstration of Knowledge Graph of Long Sequence Catering Task
Fig. 6. Knowledge graph demonstration of complex catering task. It includes tasks and their sub-tasks, semantic action sequences and execution action sequences, as well as the relationships among agents, objects and actions. Some parts are taken in the figure for illustration.
This paper builds a knowledge graph with four levels of task, agent, action and entity based on the neo4j graph database, and stores the whole long sequence task in the graph database according to its task analysis results. The stored content includes the hierarchical division relationship of task and sub-task, semantic action sequence and executive action sequence, the operational relationship between semantic action and executive action and object, and the specific relationship between object and agent, As shown in Fig. 6, taking “take the egg” as an example, the node relationship of the “take the egg” is shown. The whole knowledge graph is the map set of each sub task 4.3 Long Sequence Catering Task Experiment After the analysis of the long-sequence catering task and the construction of the catering sequence knowledge graph, the long-sequence catering experiment under the actual service scenario was verified. The experiment was very smooth, and the experimental execution process is shown in the figure. After the robot obtains the catering task instruction through the service layer, the semantic sequence and execution sequence of the catering task can be obtained through the action sequence generation strategy, and then the execution motion primitive sequence can be extracted, and the corresponding execution parameters are given to each motion primitive. Finally, the assigned motion primitive is transformed into the execution code, and the execution of the entire long-sequence catering task can be completed (Fig. 7).
Hierarchical Knowledge Representation of Complex Tasks
461
Fig. 7. The actual execution of the snapshot sequence of the catering experiment, the entire catering task includes 10 sub-tasks. Video URL link: https://youtu.be/ijB6qttKs-s
5 Conclusion In this work, we carefully study the hierarchical knowledge representation method of complex tasks based on motion primitives for robot manipulations, and take the service scenario as an example. To solve the problem of insufficient execution efficiency of robots when facing complex tasks, firstly, we propose a knowledge graph based on neo4j, which includes four levels: agent, task, action and object. Secondly, in order to meet the operational accuracy of the robot when performing complex tasks, we define motion primitives and transform the actions between human and robot. So that the semantic tasks of human instructions can be decomposed into the smallest motion units that the robot can perform. Last, to verify the feasibility of this design, on the basis of the constructed knowledge base, we took the catering task as an example. Through the analysis of the experimental results, the knowledge base system and the knowledge representation system based on dynamic motion primitives which we designed can well complete the semantic manipulation instructions issued by human beings. Subsequently, we will carry out task experiments under other service scenarios on this basis, and promote knowledge reasoning work on the basis of a large number experimental data.
References 1. Paulius, D., Sun, Y.: A survey of knowledge representation in service robotics. Robot. Auton. Syst. (2019) 2. Yang, Y., Guha, A., Fermuller, C., et al.: Manipulation action tree bank: a knowledge resource for humanoids. In: IEEE-RAS International Conference on Humanoid Robots. IEEE (2016) 3. Yang, Y., Fermuller, C., Aloimonos, Y.: Detection of manipulation action consequences (MAC). In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2013) 4. Myers, A., Teo, C.L., Fermüller, C., Aloimonos, Y., et al.: Affordance detection of tool parts from geometric features. In: IEEE International Conference on Robotics & Automation, pp. 1374–1381. IEEE (2015)
462
S. Miao et al.
5. Yang, Y., Aloimonos Y., Fermüller, C., et al.: Learning the semantics of manipulation action. In: Association for Computational Linguistics (ACL) (2015) 6. Colledanchise, M., Ögren, P., et al.: How behavior trees modularize hybrid control systems and generalize sequential behavior compositions, the subsumption architecture, and decision trees. IEEE Trans. Robot. (2017) 7. Colledanchise, M., Ogren, P.: Behavior Trees in Robotics and AI: An Introduction (2018) 8. Sun, X., Zhang, Y., Chen, J.: RTPO: a domain knowledge base for robot task planning. Electronics 8(10), 1105 (2019) 9. Beetz, M., Tenorth, M., Winkler, J.: Open-ease–a knowledge processing service for robots and robotics. AI Res. 374 (2015) 10. Beetz, M., Beßler, D., Haidu, A., et al.: Know rob 2.0—a 2nd generation knowledge processing framework for cognition-enabled robotic agents. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 512–519. IEEE (2018)
Saturation Function and Rule Library-Based Control Strategy for Obstacle Avoidance of Robot Manta Yu Xie1,2,3 , Shumin Ma1,2,3 , Yue He1,2,3 , Yonghui Cao1,2,3(B) , Yong Cao1,2,3 , and Qiaogao Huang1,2,3 1 School of Marine Science and Technology, Northwestern Polytechnical University,
Xi’an 710072, China {xieyu,msm,heyue}@mail.nwpu.edu.cn, {caoyonghui,cao_yong, huangqiaogao}@nwpu.edu.cn 2 Unmanned Vehicle Innovation Center, Ningbo Institute of NPU, Ningbo 315103, China 3 Key Laboratory of Unmanned Underwater Vehicle Technology of Ministry of Industry and Information Technology, Xi’an 710072, China
Abstract. The abilities to detect and avoid obstacles are the most significant concerns for a robot manta to achieve a safe operation in a complex environment. This paper presents a control strategy of bioinspired robot manta for obstacle avoidance in an unknown environment. In this control strategy, four laser distance sensors are used to acquire the distance from the obstacle to the robot manta. Then, the turning speed is calculated by the saturation function. And the turning direction is depended on the rule library where strategies are developed by possible locations of obstacles. Combining the saturation function and rule library, the robot manta can obtain appropriate motion instructions to avoid obstacles. The experiment results demonstrate that the proposed control strategy worked well and the robot manta can swim freely to avoid a collision. Keywords: Robot manta · Obstacle avoidance · Rule library
1 Introduction As a new branch of underwater unmanned vehicles, bionic vehicles are receiving more and more attention [1–4]. Their excellent performance such as high mobility and high efficiency allows them to work in complex and confined environments. Detecting and avoiding obstacles is the prerequisite for accomplishing the task. In recent years, studies on obstacle avoidance in bionic vehicles have been carried out by many researchers. Shin et al. [5] used neuro-fuzzy inferences to improve the capability of a fish robot to recognize the features of an obstacle to avoid the collision. Based on decentralized control, a target-tracking and collision-avoidance task for two autonomous robot fish is designed and implemented by Hu et al. [6]. Wang et al. [7] proposed an obstacle avoidance algorithm based on fuzzy control for bionic robot fish. Li et al. [8] developed a rule-based control method for bionic robot fish, which defined 10 basic forms of robot © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 463–473, 2023. https://doi.org/10.1007/978-981-99-0617-8_32
464
Y. Xie et al.
fish movement and several turning patterns to make it swim autonomously and avoid obstacles. Deng et al. [9] proposed a neuro-fuzzy control method for a multi-joint robot fish based on 3D printing technology. With the inference and learning ability of the neuro-fuzzy control system, the robot fish can freely move away from obstacles. Chen et al. [10] proposed a bioinspired closed-loop Central Pattern Generator (CPG) based control method, which is effective for a robot fish to avoid obstacles and track direction. It is worth to be noted that bionic vehicles can be divided into two categories by the form of motion: Body and Caudal Fin (BCF) propulsion and Medium and Paired Fin (MPF) propulsion [11]. However, the above discussion on obstacle avoidance is all about machine fish with BCF propulsion. Recently, many researchers have also conducted studies related to robot fish with MPF propulsion. Ma et al. [12] designed a biomimetic cownose ray robot fish and conducted a study on motion performance. Glushko et al. [13] designed a manta ray automatic unmanned vehicle (AUV) and designed a software control system for it. Cao et al. [14] proposed a CPG-fuzzy-based control of a cownoseray-like fish robot for depth and heading tracking. Meng et al. [15] proposed a robot manta for both efficient fast swimming and high spatial maneuverability and studied the relevant attitude control. It can be seen that there are many studies related to the mechanism design and basic control for robot fish with MPF propulsion. There is still a challenging to design the obstacle avoidance control method for robot fish with MPF propulsion. In this paper, we proposed a control strategy of bioinspired robot manta for obstacle avoidance based on the saturation function and the rule library. Firstly, the distance between the vehicle and the obstacle is obtained after suitable processing. Then, the saturation function is used to calculate the turning speed of the vehicle based on distance value. And the rules are used to determine the direction of the turning based on location of obstacles. Eventually, autonomous obstacle avoidance of robot manta in unknown environments is realized. The rest of the paper is organized as follows. Section 2 introduces the design of the robot manta and the acquisition of distance data. The details of the proposed obstacle avoidance control strategy are given in Sect. 3. In Sect. 4, experiments were conducted to verify the effectiveness of the proposed approach. Furthermore, the paper ends with some conclusions and discussions.
2 Overview of Robot Manta 2.1 The Design of the Robot Manta To obtain a similar ability of the movement to a real manta ray, our team designed an underwater vehicle called robot manta Silicone-II as shown in the Fig. 1. It comprises two pectoral fins, a carbon waterproof body, a caudal fin, a vertical fin, and the silicone skin: (a) The pectoral fin consists of three motors with three carbon skeletons; (b) Electronic components are placed in the body; (c) The caudal fin consists of one motor with a carbon skeleton; (d) The vertical fin contributes to a certain effect of maintaining balance; (e) Silicone skin covers the carbon skeletons to form a complete vehicle. To enable the vehicle to perform certain missions, the body carries four laser sensors, one attitude sensor and one pressure sensor. Laser sensors can obtain distance information. The attitude sensor is fixed on a horizontal plane in the body to obtain the attitude
Saturation Function and Rule Library-Based Control Strategy
465
of the vehicle. Pressure sensor can obtain the depth information. In addition, the battery provides a sufficient and stable source of power. In general, the detailed technological parameters are shown in the Table 1. Table 1. Specifications of robot manta. Items
Specifications
Length × Width × Height
58 cm × 94 cm × 12 cm
Length of pectoral fin
37 cm
Weight
8.5 kg
Frequency of motors
50 Hz
Measuring distance of laser sensors
300 cm (in water)
Inertial measurement unit
AHRS-2501
Measuring distance of pressure sensor
MS5837-30BA
Battery parameters
8.4-VDC 13600-mAH Ni-H
Wireless radio
HC-12 (433 MHz)
Fig. 1. Overview of our robot manta Silicone-II. (A) The overall appearance. (B) The structure of prototype. (C) The structure of hardware control system. (D) The prototype.
466
Y. Xie et al.
2.2 The Acquisition of Obstacle Distance Data The prerequisite for a vehicle to perform unknown environment avoidance is a known distance between the vehicle and the obstacle. In this paper, Laser sensors are located in the left, right, left front and right front of the head of the vehicle to obtain the distance to obstacles. When a laser distance sensor works, it needs to launch a visible red laser on the surface of an object and receive the reflected light. However, the carbon body is not transparent. To solve this issue, we hollowed out the installed position and fitted a transparent acrylic sheet to allow the sensor to work properly. Since infrared light quickly decays underwater, the measured value of the distance is quite different from the actual value. We have done a series of measurements to get the relationship between the measured value and the real value, as shown in the Fig. 2 (A). The formula is described as de = 0.2588dw + 0.2373 (1) dr = dw − de where dw is the measured value in water, de is the measurement error in water, dr is the real value. Meanwhile, the roll angle or pitch angle of the vehicle will impact the measured value, as shown in the Fig. 2 (B) and (C).
Fig. 2. The acquisition of real distance. (A) Distance curve and error curve from laser distance sensor. (B) The effect of roll angle on the real distance. (C) The effect of pitch angle on the real distance.
Thus, the real value of the laser distance sensor also requires processing before it can be useful in further experiments. The formula is described as ⎧ D1 = dr,1 cos(θ ) ⎪ ⎪ ⎨ D2 = dr,2 cos(θ ) ⎪ D = dr,3 cos(φ) ⎪ ⎩ 3 D4 = dr,4 cos(φ)
(2)
Saturation Function and Rule Library-Based Control Strategy
467
where d r,1 , d r,2 , d r,3 and d r,4 are the real values of the sensors respectively, D1 , D2 , D3 and D4 are the distance from the left front laser distance sensor to the obstacle, the distance from the right front laser distance sensor to the obstacle, the distance from the left laser distance sensor to the obstacle and the distance from the right laser distance sensor to the obstacle, θ is the pitch angle of the vehicle, φ is the roll angle of the vehicle.
3 Saturation Function and Rule Library-Based Control Strategy of Obstacle Avoidance In this section, we presented a control strategy based on saturation function and rule library for the robot manta, and the structure is introduced in Fig. 3. The strategy mainly comprises a rule library and a saturation function. Considering that the rule based on the position of the obstacle only determines the turning direction, a saturation function is introduced to calculate the turning speed. The pectoral fins of robot manta are driven by the CPG network as reference [16].
Fig. 3. The structure of control strategy.
3.1 The Selection of Saturation Function Identifying the existence of obstacles is the key for the vehicle to avoid the collision with an obstacle. When an obstacle has been detected, the following should be evaluated for avoidance based on the distance: (a) If the distance is greater than the safe distance, then the vehicle continues to swim freely; (b) If the distance is less than the safe distance, then the vehicle performs avoidance. Due to the variable distances from the obstacle to the vehicle, the saturation function is introduced to change the turning speed of the vehicle while it is avoiding. The saturation function is frequently used as the activation function of a neural network, which includes various forms. And the mathematical formulation of the Sigmoid function is described as y=
1 1 + e−x
(3)
where x is the input of the equation, y is the output of the equation. In this paper, the function is transformed to the form as α=
pi 1 + eki (Di −ci )
i = 1, 2, 3, 4
(4)
468
Y. Xie et al.
where k i is the coefficient that affects the slope, ci is the constant that makes the function curve shift from left to right, pi is the coefficient that affects the height, α is the absolute value of the difference of the phase difference, which determines the velocity of the turning. The direction of the turning is given by the rule library. In addition, specific instructions of the difference of the phase difference are given in the reference [16] and the principle of phase difference making the vehicle turn is described in reference [17]. The curves of the Sigmoid function are as shown in Fig. 4.
Fig. 4. The curves of the Sigmoid function. (A) The standard form of Sigmoid function. (B) The transformed form of Sigmoid function.
3.2 The Building of the Rule Library According to transformed sigmoid, the robot can obtain the turning speed. However, a successful obstacle avoidance need not only the turning speed but also the turning direction. Therefore, a suitable rule library needs to be created. Since there is forward inertia when the vehicle is moving, the safety distance of the left front and right front laser sensors should be greater than the safety distance of the left and right laser sensors. In this way, a sufficient turning distance can be retained to allow the vehicle to safely avoid collisions. Based on this, we set the safety distances to 150 cm, 150 cm, 130 cm and 130 cm for the left front, right front, left and right laser distance sensors respectively. We set the obstacle avoidance signal to 1 when the distance value is smaller than the safety distance. We set the obstacle avoidance signal to 0 when the distance value is greater than the safety distance. Finally, we define a set of rules, as shown in the Table 2. There are 16 situations depending on the location of obstacles. According to the different situations, we have developed different Strategies. These strategies are divided into four categories: no obstacle signals in each direction, one obstacle signal in one direction, two or three obstacle signals in two directions, and three or four obstacle signals in three directions. For example: (a) If there is an obstacle signal only in the left direction, it takes a strategy of turning right and calculates the α based on the D3 . (b) If there are obstacle signals in the left and front directions, it takes a strategy of turning right quickly and calculates the α based on the mean of D1 and D2 . Note that there is a front obstacle signal, the difference of the phase difference affecting the turning speed is calculated using the mean of the front sensor D1 and D2 . When there are both left and right obstacle signals, the difference of the phase difference is calculated using a smaller distance value. (c) When the obstacle signals are in three directions and the front
Saturation Function and Rule Library-Based Control Strategy
469
distance is small, a fixed parameter is given for the reverse swim, which is not a very common situation. Note that although the sensors in front are divided into left front and right front, it is generally considered that one obstacle signal means that there is an obstacle in front. In the actual experiment, the two front laser distance sensors rarely send obstacle signals alone. Table 2. The rules of avoiding obstacle. Situation
D1 signal
D2 signal
D3 signal
D4 signal
Strategy
1
0
0
0
0
Cruise
2
0
0
0
1
Turn left
3
0
0
1
0
Turn right
4
0
0
1
1
If D3 > D4 , then turn left If D3 < D4 , then turn right If D3 = D4 , then cruise
5
0
1
0
0
Turn right
6
0
1
0
1
Turn left
7
0
1
1
0
Turn right
8
0
1
1
1
If D3 > D4 , then turn left If D3 < D4 , then turn right If D2 < 80, then backward
9
1
0
0
0
Turn right
10
1
0
0
1
Turn left
11
1
0
1
0
Turn right
12
1
0
1
1
If D3 > D4 , then turn left If D3 < D4 , then turn right If D1 < 80, then backward
13
1
1
0
0
Turn right
14
1
1
0
1
Turn left
15
1
1
1
0
Turn right
16
1
1
1
1
If D3 > D4 , then turn left If D3 < D4 , then turn right If D1 < 80 or D2 < 80, then backward
470
Y. Xie et al.
4 Experiments The control strategy was applied to a prototype of robot manta Silicone-II. The effectiveness of the method was verified through experiments. The experiments were completed in 2022 at the pool laboratory of Ningbo Institute of Northwestern Polytechnic University, as shown in Fig. 5. The motion direction and speed of the robot were controlled by the pectoral fins, and the fins are as same as our previous study derived by the central pattern generator network [18]. The mission parameters for the experiment were sent by computer software via radio. The software was written in QT. The control program was written by C language and downloaded into the main control chip (STM32). The experimental processes were filmed by a camera fixed on a tripod. In particular, the obstacle in the experiment was replaced by the wall of the pool.
Fig. 5. The experimental pool. (A) The size of the pool is 6.7 m (Length) × 3.2 m (Width) × 1.3 m (Depth). (B) The experimental equipment. (C) The diagram of experimental procedure.
To verify the effectiveness of the proposed control strategy, pool experiments were conducted. The first experiment demonstrated turning right to avoid obstacles, as shown in the Fig. 6. In the experiment, the parameters were set to p1 = p2 = 30, p3 = p4 = 50, c1 = c2 = 150, c3 = c4 = 130 and k1 = k2 = k3 = k4 = 0.15. From 3 s to 5.5 s, there is an obstacle in front of the vehicle and the right distance is larger than the left distance, corresponding to situation 13 in the rule library. Therefore, the vehicle quickly turned right to move away, as shown in the Fig. 7. Based on the mean of forward distance D1 and D2 to calculate α. And the α affects the turning speed and the rules affect the turning direction. From 5.5 s to 7.2 s, there is an obstacle on the left of the vehicle and there is no obstacle in the front of the vehicle, corresponding to situation 3. Based on the left distance D3 to calculate α. Since the pool is quite small, we tune the parameters to be more sensitive to avoid collisions as possible, so the output curve of α is not very smooth. To verify the reliability of the proposed control strategy, we conducted a second experiment. In the experiment, the parameters were set to p1 = p2 = 30, p3 = p4 = 50,
Saturation Function and Rule Library-Based Control Strategy
471
Fig. 6. The picture of robot manta avoiding obstacles.
Fig. 7. The curves of robot avoiding obstacles. (A) The curve of distance. (B) The curves of yaw angle and α.
c1 = c2 = 150, c3 = c4 = 130 and k1 = k2 = k3 = k4 = 0.15 as well. The experiment mainly demonstrated turning left to avoid obstacles, as shown in the Fig. 8. From 3.2 s to 5.6 s, there is an obstacle that needed to be avoided in front and on right of the vehicle, corresponding to situation 14. So, the vehicle should quickly turn left to move away and calculate α based on the mean of forward distance D1 and D2 . From 5.5 s to 7.0 s, there is an obstacle on the right of the vehicle, corresponding to situation 2 in the rule library. And the vehicle should turn left and calculate α based on the right distance D4 . At 8.3 s, the left side of the vehicle was closer to the pool wall, corresponding to situation 3, so it turned slightly right. The data of distance, heading angle and α are shown in Fig. 9.
472
Y. Xie et al.
Fig. 8. The picture of robot manta avoiding obstacles.
Fig. 9. The curves of robot avoiding obstacles. (A) The curve of distance. (B) The curves of yaw angle and α.
5 Conclusions In this paper, an obstacle avoidance control strategy for a robot manta based on a saturation function and rule library was proposed. With proper calibration, the laser sensor can effectively respond to obstacles. The turning speed of the vehicle is calculated by the saturation function based on the distance. The direction of turn of the vehicle is determined by the rule based on the position of the obstacle. With appropriate motion commands, the vehicle can successfully avoid obstacles. Experimental results show that the strategy can effectively and reliably avoid collisions between the vehicle and obstacles. In the future, we will design more efficient algorithms to perform 3D obstacle avoidance. In addition, path tracking will also be included while avoiding obstacles. In this way, the vehicle will operate more effectively in the real world.
References 1. Katzschmann, R.K., DelPreto, J., MacCurdy, R., Rus, D.: Exploration of underwater life with an acoustically controlled soft robotic fish. Sci. Robot. 3(16), eaar3449 (2018)
Saturation Function and Rule Library-Based Control Strategy
473
2. Bal, C., Koca, G.O., Korkmaz, D., Akpolat, Z.H., Ay, M.: CPG-based autonomous swimming control for multi-tasks of a biomimetic robotic fish. Ocean Eng. 189, 106334 (2019) 3. Xie, F., Li, Z., Ding, Y., Zhong, Y., Du, R.: An experimental study on the fish body flapping patterns by using a biomimetic robot fish. IEEE Robot. Autom. Lett. 5(1), 64–71 (2019) 4. Cai, Y., Bi, S., Li, G., Hildre, H.P., Zhang, H.: From natural complexity to biomimetic simplification: the realization of bionic fish inspired by the cownose ray. IEEE Robot. Autom. Mag. 26(3), 27–38 (2018) 5. Shin, D., Na, S.Y., Kim, J.Y., Baek, S.J.: Fuzzy neural networks for obstacle pattern recognition and collision avoidance of fish robots. Soft. Comput. 12(7), 715–720 (2008) 6. Hu, Y., Zhao, W., Wang, L.: Vision-based target tracking and collision avoidance for two autonomous robotic fish. IEEE Trans. Ind. Electron. 56(5), 1401–1410 (2009) 7. Wang, K., Feng, J., Wang, W.: Underwater detection and obstacle avoidance control of bionic machine fish. J. Hefei Univ. Technol. Nat. Sci. Ed. 36(10), 1190–1194 (2013) 8. Li, Q., Gao, J., Xie, G., Xu, E.: Obstacle avoidance algorithm of bionic robot fish based on fuzzy control. Binggong Zidonghua Ordnance Ind. Autom. 30(12), 65–69 (2011) 9. Deng, X., Jiang, D., Wang, J., Li, M., Chen, Q.: Study on the 3D printed robotic fish with autonomous obstacle avoidance behavior based on the adaptive neuro-fuzzy control. In: IECON 2015–41st Annual Conference of the IEEE Industrial Electronics Society, pp. 000007–000012. IEEE (2015) 10. Chen, J., Yin, B., Wang, C., Xie, F., Du, R., Zhong, Y.: Bioinspired closed-loop CPG-based control of a robot fish for obstacle avoidance and direction tracking. J. Bionic Eng. 18(1), 171–183 (2021) 11. Sfakiotakis, M., Lane, D.M., Davies, J.B.C.: Review of fish swimming modes for aquatic locomotion. IEEE J. Oceanic Eng. 24(2), 237–252 (1999) 12. Ma, H., Cai, Y., Wang, Y., Bi, S., Gong, Z.: A biomimetic cownose ray robot fish with oscillating and chordwise twisting flexible pectoral fins. Ind. Robot Int. J. (2015) 13. Glushko, I., et al.: Software control architecture for the BOSS Manta Ray AUV actuation system. In: 2018 IEEE/OES Autonomous Underwater Vehicle Workshop (AUV), pp. 1–5. IEEE (2018) 14. Cao, Y., Lu, Y., Cai, Y., Bi, S., Pan, G.: CPG-fuzzy-based control of a cownose-ray-like fish robot. Ind. Robot Int. J. Robot. Res. Appl. (2019) 15. Meng, Y., Wu, Z., Dong, H., Wang, J., Yu, J.: Toward a novel robotic manta with unique pectoral fins. IEEE Trans. Syst. Man Cybern. Syst. (2020) 16. Cao, Y., Xie, Y., He, Y., Pan, G., Huang, Q., Cao, Y.: Bioinspired central pattern generator and TS fuzzy neural network-based control of a robotic manta for depth and heading tracking. J. Mar. Sci. Eng. 10(6), 758 (2022) 17. Hao, Y., Cao, Y., Cao, Y., Huang, Q., Pan, G.: Course control of a manta robot based on amplitude and phase differences. J. Mar. Sci. Eng. 10(2), 285 (2022) 18. Cao, Y., Ma, S., Cao, Y., Pan, G., Huang, Q., Cao, Y.: Similarity evaluation rule and motion posture optimization for a manta ray robot. J. Mar. Sci. Eng. 10(7), 908 (2022)
Perception-Aware Motion Control of Multiple Aerial Vehicle Transportation Systems Mingfei Jiang, Mingshuo Zuo, Xinming Yu, Rong Guo, Ruixi Wang, and Yushu Yu(B) Beijing Institute of Technology, Beijing, China [email protected]
Abstract. How to ensure that multiple aerial vehicle transportation systems can accurately track a target despite the dynamical constraints? In a general Unmanned Aerial System, the system can identify the target by sensors to adjust its flight status. However, for Multiple Aerial Vehicle Transportation Systems, if each unmanned aerial vehicle has sensors with different fields of view on board, it creates conflicts in the system and does not enable proper sensing. Moreover, the added perceptual requirements in the system will also cause conflicts with the corresponding flight dynamics constraints and will have some impact on the flight path. In this regard, we add a perception model for Multiple Aerial Vehicle Transportation Systems in Model Predictive Control to solve these conflicting problems. In this paper, we first add a perceptual constraint model of the transportation system with the camera as the sensor to the previously established dynamics simulation model. And the model is applied to multiple UAVs so that multiple cameras can aim at the same target for tracking. Then, we can solve feasible UAV motion states by incorporating the MPC of the perception model. In this way, we provide a feasible method for multi-sensor information fusion for Multiple Aerial Vehicle Transportation Systems. Finally, the simulation results demonstrate the effectiveness of the method. Keywords: Multiple Aerial Vehicle Transportation Systems · Perception · Model predictive control
1 Introduction 1.1 Motivation and Background Unmanned aerial vehicles (UAVs) are increasingly used in a large number of applications, from search [1] and rescue missions [2] to aerial surveillance [3] or exploration [4], as well as work in high-risk locations [5]. However, the payload of a single aircraft is difficult to increase due to the aircraft itself and regulatory constraints. And, the underdriven nature makes it impossible for a single airmobile vehicle to achieve six independent degrees of freedom in the mission space. In recent years, multi-aircraft M. Jiang and M. Zuo—These authors contributed equally to this work. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 474–488, 2023. https://doi.org/10.1007/978-981-99-0617-8_33
Perception-Aware Motion Control of Multiple Aerial Vehicle
475
control techniques have attracted a lot of attention. Theoretically, the payload capacity can be increased by using additional aircraft, and the attitude and position of the payload can be adjusted simultaneously by coordinating multiple thrusts provided by multiple aircraft. From another perspective, the field of object transportation has a wide range of applications and scope for exploration. UAVs can flexibly change the position and attitude of transported objects in the air. With a certain degree of stability, it can accomplish transportation tasks in complex environments. From this, we can see that it is important to include perception for control in such a system. From the point of view of control methods, the MPC control framework is a common control tool that has been widely used in recent years in the field of aerial robots and aircraft. It is a control strategy system that uses dynamic models to predict its behavior during finite regression periods. Using MPC as a control framework for multi-UAV transportation systems (MAVTS) in the system we designed can facilitate us to consider various constraints on spatial state variables. It facilitates the parameter adjustment and system control of sub-UAVs and the central platform, as well as the cooperative planning of the MAVTS. In particular, we include a perceptual model in which we can solve for the appropriate motion states and provide solutions for UAVs that satisfy constraints in various aspects. In previous studies, common motion control considered only the state constraints of the system, such as position and attitude, or the constraints of end-to-end control starting from the input torque of the motor. However, the problem that control and perception cannot be performed simultaneously is encountered in practical applications, i.e., the target may be outside the visual range. In particular, it is important to note that the simultaneous alignment of multiple sensors to a target is a prerequisite for multi-sensor multi-source information fusion and perception [6]. Further information fusion can only be performed if the conflict of viewpoints of multiple sensors distributed over multiple UAVs [7] and the conflict with dynamics during MAVTS motion are resolved. Therefore, we consider adding a perception constraint to the MPC framework so that MAVTS can actively explore the direction of the target point so that it is always within the field of view of the camera. The active perception effect of MAVTS is achieved, thus improving the system’s operation and efficiency. 1.2 Critical Review of Related Work 1.2.1 Multiple Aerial Vehicle Transportation Systems In the current study, many researchers are beginning to consider the use of multiple UAVs for transportation missions. With the combination of multiple drones, it is possible to transport heavier items. And for some previous transport difficulties can be better solved by the cooperation. Researchers have experimented with various designs to connect a platform or directly to heavy objects [8–16], with various control methods to achieve better transport results. Part of the researchers focused on studying the structure itself. [8] is an example of the use of a platform as a hub for connecting an aerial robot consisting of three quadrotors connected by a rigid articulated passive structure. Another section focused on the study of control methods. The corresponding models can be modeled in two dimensions by
476
M. Jiang et al.
mathematical-physical modeling methods [9, 10] or by some special aerial behaviors that need attention, such as the offset of the object’s center of gravity [11, 12], for a more in-depth stability study. And later, others focused on the cooperation of multiple UAVs and studied the corresponding cooperative communication methods [13, 14]. A step further would be flexible trajectory optimization similar to that of a single UAV [15]. In conclusion, the research focuses on the practical shortcomings of MAVTS. Modeling is performed according to the corresponding design goals, and then the corresponding mechanisms or controllers are designed to achieve better flight states. In this paper, we focus on building a perceptual model of the Multiple Aerial Vehicle Transportation Systems. It is hoped that the corresponding perceptual conditions can be satisfied while controlling the travel route of the transport. 1.2.2 Model Predictive Control MPC has been frequently used in the optimal design of UAVs, and here we would like to mention some MPC-related studies that involve a perceptual component. MPC is a control strategy based on numerical optimization in which a system model can be used to predict future responses. Constructing a system cost function, which refers to the error between the predicted system response and the desired system output. By minimizing the cost function, the control inputs at the future sampling moments can be the optimal sequence of control inputs. We are concerned that many researchers pursue the flight efficiency of UAVs, i.e., the shortest time, like in [16–18], where minimum time trajectory generation methods under visibility constraints are proposed. It is particularly important to note that considering perception in the cost function here reduces the computational load of the model’s predictive control pipeline and enables it to run in real time on a low-power onboard computer. Also, like in [19, 20], MPC algorithms for a quadrotor capable of optimizing the motion and sensing the target are proposed. Especially in [19], the framework satisfies the dynamical equations of the robot and computes a feasible trajectory based on the input saturation. We know in this regard that considering vision-based estimation in the optimization problem allows a more robust perception of the target. Naturally, more research is focused on MPC approaches around flight stability modeling [21], all focused on safer flight. In the paper, our main challenge is to deal with the potential conflict between dynamics and perception. We hope to achieve this by varying the cost function of the MPC. 1.3 Gaps and Contributions Based on the discussion of existing research, we suggest that the gaps that need to be bridged to MAVTS are as follows: 1. Lack of perceptual model on MAVTS In existing studies, almost all model the physical stability of the system and few focus on the perceptual part of the system. Therefore, in this paper, we model the
Perception-Aware Motion Control of Multiple Aerial Vehicle
477
angular relationship between the sensor view and the target based on the camera sensor on the UAV. And since it is a MAVTS system, they have multiple perceptual relationships that need to be considered together in the model. 2. Lack of solution for the conflict between dynamics constraints and perception Most studies take into account only one of these aspects and focus on reaching the optimum in one aspect. In this paper, on the other hand, we consider not only the dynamics constraints of multiple UAVs but also the state of the perceptual model. And, to take advantage of the flexibility of quadrotor robots, we incorporate the perceptual objective into the optimization problem. Therefore, we use MPC to compute the optimal trajectory concerning the cost function, and in this way, we can combine dynamics and perception. It is particularly important to note that the perceptual state is not considered a constraint, but a component to be optimized. The rest of this paper is organized as follows. In Sect. 2, we critically review relevant papers on MAVTS and identify the research gap. In Sect. 3, we briefly introduce our framework and demonstrate the way of MPC. In Sect. 4, we show the results generated by the framework and give some examples. Finally, we conclude this paper with contributions and future work in Sect. 5.
2 Framework and Constructs In this section, we describe the dynamics and perception models of the Multiple Aerial Vehicle Transportation Systems, design the corresponding nonlinear model predictive control framework, and validate the optimal control method for the Multiple Aerial Vehicle Transportation Systems. 2.1 Framework for MPC of Multi-UAV Transportation Systems In Fig. 1, we depict the whole process from building the dynamics and perception models of the MAVTS to constructing the MPC framework and performing simulations.
Fig. 1. Schematic diagram of the framework.
478
M. Jiang et al.
This part of the work starts with the construction of the dynamics model of the MAVTS. The derivation of the dynamical model is not the main content of this article, and the specific derivation process will be referred to [22]. In this paper, the layout of the MAVTS and the simple process of dynamics model derivation will be briefly introduced. The position, attitude, velocity, and angular velocity of the systems are taken as dynamical constraints and solved optimally in the MPC solver. After constructing the dynamics model of the system, the perception model is constructed based on the dynamics model. The model derives the angle between the principal axis of the sensor coordinate system and the target point through the transformation relationship between the body coordinate system, the sensor coordinate system, and the world coordinate system, and the angle is simultaneously used as the perception constraint in the MPC solver for the optimal solution. After specifying the model and constraints of the system, we optimally control the system by MPC, which can optimize the current system while maintaining predictive control of future priority periods using the dynamic model established above. Although MPC has been widely used in the field of aircraft in recent years, MPC considering multiple perceptual constraints is not yet mature. To demonstrate that the addition of perceptual constraints can optimize the control effect of MPC on aircraft, we selected different UAV layouts and sensor layouts as parameters for experimental validation. 2.2 Derivation of the Dynamical Model and Design of MPC Before deriving the dynamics model of the multiple aerial vehicle transportation systems, some settings and assumptions need to be made. The MAVTS in this paper consists of multiple UAVs and loads, and the layout of the whole system is shown in Fig. 2 [22]. The UAVs and loads are connected by spherical joints, and it is assumed that the centroid of mass of the UAVs coincides with the spherical joints. In this layout, each UAV can rotate around the spherical joint and drive the load motion by adjusting the thrust vector. It is important to note that the spherical joints themselves have rotation limits that make the UAVs’ attitude and thrust amplitude limited, which will be used as MPC constraints later in the article. In this paper, six UAVs are used to connect an aerial platform as a MAVTS. In this system, the force driving the platform attitude change consists of the thrust vector, the system gravity, and the external force and torque of the attached UAV. On this basis, the system dynamics model on the platform is established. The model is established in two main steps. Firstly, the rotational equations of motion and the equations of plane motion of the platform and each UAV are established separately, and then the internal force terms in each equation are eliminated to obtain a simple and comprehensive MAVTS dynamics equation. For the specific derivation of the dynamical model, please refer to the literature [22]. The final system dynamics equation obtained is shown in Eq. 1: R0 0 ˙ (1) u0 + d0 MV0 + CV0 + G = 0 I where V0 := (v0 , ω0 ) ∈ R6 represents the velocity of the COM of the whole system. M ∈ R6×6 , C ∈ R6×6 , G ∈ R6 are the mass matrix, Coriolis matrix, and gravity matrix.
Perception-Aware Motion Control of Multiple Aerial Vehicle
479
Fig. 2. Example of a MAVTS architecture where each UAV is connected to the load via a spherical joint and each UAV provides a thrust vector
u0 ∈ R6 is the equivalent wrench acting on the load in the entire system frame. d0 ∈ R6 is the bounded uncertainties. I is the Unit Matrix and R0 ∈ SO(3) is the rotation matrix representing load attitude. Matrix M, C, G is written as follows: 0 0 mt ge3 mt 0 , C= , G= (2) M= 0 0 Mt 0 −(Mt ω0 )ˆ Which: mt =
n i=0
mi , Mt = M0 −
n i=1
mi wi wi − wC
(3)
where mi is the mass of the vehicle, M0 is the inertia tensor of the load, wi is the position of the vehicle in the load frame, and wC is the position of the center of mass of the system in the load frame. The equivalent input wrench of the equation can be expressed as: u0 =
⎡ ϒ ⎤ I ... I ⎣ 1⎦ . . . := Bϒ l1 . . . l n ϒn
(4)
where l1 = wi − wc , ϒi = RT0 Ri e3 Ti is the thrust vector provided by the vehicle in the load frame. B is the allocation matrix, R0 is the rotation matrix of the attitude of the load, Ri is the attitude and angular velocity of the vehicle, Ti is the thrust magnitude of the vehicle. The kinetic description of the load consists of the following equations: p˙ 0 = v0 , R˙ 0 = R0 ω0
(5)
where p0 is the position of COM of the entire system, v0 is the linear, velocity of COM of the entire system, ω0 is the angular velocity of COM of the entire system.
480
M. Jiang et al.
For each vehicle, the rotational motion consists of the following equations: R˙ i = Ri ωi , ω˙ i = Mi−1 −ωi Mi ωi + di,r
(6)
where ωi is the angular velocity of the vehicle, Mi is the inertia tensor of the vehicle, τi is the input torque for the vehicle, di,r is uncertainties. Equations (1), (5), and (6) together form the equations of motion of MAVTS. The position, velocity, rotation matrix, and angular velocity in the dynamics model will be used as MPC dynamics constraints. And in the designing process of MPC, it is necessary to solve the optimal control problem containing the desired motion variables, given the prediction horizon, to obtain the state trajectory and control trajectory of the system.
3 Perception-Aware MPC Design of MATS 3.1 Derivation of the Perceptual Model In the derivation of the perception model, we assume that there is a monocular camera rigidly attached to a submachine of MAVTS. According to the assumptions, we first define the world frame FW and the body frame FB on the UAV. Subsequently, with oC as the center and ZC as the principal axis, a frame FC located on the camera can be constructed. Under this frame, the camera has a pyramidal field of view around the principal axis, and the target points in this field of view can be perceived by the camera. We define the horizontal angle β in the field of view. This angle represents the angle C between the axis C C Z and the position vector f P of the target point in the FC , and can be used to represent the perceived visibility constraints and objectives. Since we assume that the camera is rigidly attached to the UAV, the origin position of the Fc described in should be constant and known. Likewise, the position FB , which is denoted as BCORG P, W of the target point in FW we set up is known to us. The above description is vector f P shown in Fig. 3 [1].
Fig. 3. Schematic diagram of the perceptual model, depicting the cameras and the layout and angles β.
Perception-Aware Motion Control of Multiple Aerial Vehicle
481
Our goal is to keep the camera visible to the target point at all times during the flight of the MAVTS. To achieve this, we need to keep the target point as close to the center of the camera’s field of view as possible, i.e., to keep β as close to zero as possible. The angle β can be obtained from the product of C Z and C P:
C
⎡
f
⎤
CP ∗ C Z C f β = cos−1 ⎣ ⎦ C f P
(7)
CZ C
is known as the principal axis vector of FC , where we set it to (0, 0, 1). Therefore, W C we only need to derive the expression for C f P. Since f P is set by us, we calculate f P with C the help of FW , where C W R represents the rotation matrix from FW to FC , and WORG P represents the position vector of FW in FC : C f P
W C =C W R ∗ f P + WORG P
(8)
Since we have relied on the body coordinate system to model the dynamics in 3.2, we continue the derivation by using FB as an intermediate coordinate system to relate FW and FC : C B T W T W B T W T W B +W R ∗ R ∗ R ∗ R ∗ R ∗ P = P − P P (9) B B B C C CORG BORG f f It should be noted that the rotation matrix from FC to FB is constrained because the rotation of the spherical joints is restricted: → → → e3 = [0 0 1]T e3 < 1, − (10) λ 10, we consider that there is a slope on the ground. At this time, divide the toroidal area using the LiDAR as the center of the circle and the distance from the turning point to the center of the circle as the radius. This division effectively solves the above second Problem, but it also triggers the judgment of slope when there are obstacles on the ground. We note that the point clouds scanned by LIDAR appear as concentric rings of different radii, and the size of the ring spacing is related to the beam angle, the LIDAR height, and the slope of the scanned surface. The same LIDAR height, the same angle difference between adjacent laser beams, the larger the slope of the scanning surface, the smaller the ring spacing. Therefore, if the difference between the calculated ring spacing and the actual ring spacing is less than a certain threshold, the two points are considered as obstacles. Based on this scanning characteristic, most of the non-ground points and obstacles can be effectively filtered
636
J. Zhou and L. Zhang
Fig. 3. The black line part of the graph is the ground with slope, the blue line part is the maximum extent of the laser scan, and the red line part is judged to have a turning point and is used to divide the circumferential area. The blue point is the corresponding point on three continuous laser beams, and the middle point is the turning point. (Color figure online)
Fig. 4. A polar grid diagram based on the scanning characteristics of the LIDAR. The combination of the two is the final regional meshing.
out. Therefore, non-ground point filtering based on ring spacing size is performed before judging the turning point of a slope. The filtered point cloud will not mistakenly trigger the judgment of the turning point of the slope, ensuring the accuracy of the toroidal zoning. After adaptive division of toroidal regions, there are still problems such as large amount of calculation and sparse remote point cloud. We divide a frame point cloud
Polar Grid Based Point Cloud Ground Segmentation
637
space into N regions, each of which is divided into Nθ grids of different sizes based on the density of the point cloud (shown in Fig. 4). The size of each grid is calculated as follow. (i−1)×Ln n ≤ ρk − Lmin,n ≤ i×L , N N S= (5) (j−1)×2π − π ≤ θk ≤ j×2π Nθ,n Nθ,n − π where ρk = xk2 + yk2 , Lmin = L0 , θk = tan−1 yxkk , Ln = Lmax − Lmin , Nθ,n is the parameter set according to the LIDAR parameters. We set up a larger grid in areas farther away from the LIDAR, because there is more information about point clouds in each grid, it can be a good solution to sparse point clouds in the distance. Compared with previous polar grid partitioning methods, the adaptive partitioning method effectively reduces the number of grids in the range, significantly reduces the computational load of multiplane fitting, improves the efficiency of ground partitioning, and improves the accuracy of plane fitting for each grid. 3.3 Plane Fitting Model After the zoning is completed, we fit each grid plane, and then extract the ground points after merging to complete the task of ground segmentation. We perform plane fitting based on RANSAC algorithm, because the outlier has little influence on the RANSAC algorithm. Traditional RANSAC algorithm iteratively fits a subset of data sets randomly and repeatedly, which does not converge naturally, and the results are not necessarily optimal. Therefore, we combine the seed region growth algorithm with the RANSAC algorithm. First, we calculate the curvature at all points in each grid then calculate the mean Zmean of z-values for all points in the grid. Several points with the lowest Z value in the grid are added to the seed point set. In fact, the lowest point in a point cloud has a high probability of being a ground point. Points with curvature less than threshold Cth and Z less than Zmean + Zth also join the seed point set, where Z is an artificially set threshold. Then the seed point set is used as the initial data set of RANSAC for plane fitting. Because of previous non-ground point filtering and data set selection, we can assume that the plane fitted by the kth fitting is accurate.
638
J. Zhou and L. Zhang
In order to improve the efficiency of fitting and reduce the calculation time, in the process of iteration fitting in RANSAC, if the difference between the normal vector angle and the number of interior points of the plane fitted three times in succession is
Polar Grid Based Point Cloud Ground Segmentation
639
less than the threshold value, the optimal estimation model of the current plane can be considered and the iteration can be terminated.
4 Experiments 4.1 Experimental Hardware In this section we will introduce the hardware used for the experiment, including the LIDAR, the computer processor, and the wheeled robot (shown in Fig. 5). LIDAR: the LIDAR used in our experiment is a 16-line LIDAR called RS-LIDAR-16, which is created by Robosense. The LIDAR parameters are shown in Table 1. Table 1. RS-LIDAR-16 specifications Range
150 m
Range accuracy
Up to ± 2 cm
Horizontal FoV
360°
Vertical FoV
30°
Horizontal Resolution
0.1°/0.2°/0.4°
Vertical Resolution
2.0°
Wheeled Robot: we use the MR2000 Wheeled Robot with the RS-LIDAR-16.The robot parameters are shown in Table 2. Table 2. MR2000 robot configuration Size
1150*800*890 mm
Weight
103 kg
Maximum Gradability
30°
Maximum Speed
1.6 m/s
Processor: We run our algorithm on Intel Core i7–6700 processors.The computer we use has 16 GB of memory and is equipped with an NVIDIA GeForce GTX 1070 graphics card. 4.2 Experimental Environment To verify the feasibility of this algorithm, we performed experiments on the 00 sequence of the open source dataset KITTI. The KITTI dataset uses 64-line LiDAR to collect point cloud data. There are about 120,000 points in a frame of point cloud. The result of a frame of ground segmentation is shown in the Fig. 6. There are about 64,000 ground points in the map. Meanwhile, in order to verify the robustness of the algorithm in this paper, we collected four different campus environments, including a flat road T-junction, a road junction with a complex surrounding environment, a path surrounded by steps, and a multi-person environment, and conducted ground segmentation experiments on them.
640
J. Zhou and L. Zhang
Fig. 5. Left is MR2000 Robot, which mounts Robosense 16-line LiDAR
Fig. 6. An illustration of a frame of ground point cloud separated from a KITTI dataset.
And compared with LeGO-LOAM [20] and scan line based methods, as shown in Fig. 7. In a flat T-junction, the method used in LeGO-LOAM treats the colored part of the figure as ground points, which basically segmented the ground correctly, but there are still some under-segmentation due to the obstacles such as small stones on the ground. The ground points segmented by the scan line based method are the white points in the figure, which can better segment the ground points, but also treats the junction point between vehicles and the ground as ground points, which leads to over-segmentation. The ground points segmented by the method proposed in this paper are the green part of the figure, which effectively reduces the under-segmentation and over-segmentation problems, and the segmented ground is more accurate. In the road junction with a complex surrounding environment, LeGO-LOAM sometimes incorrectly treats points on buildings as ground points, leading to incorrect segmentation. The scan line based method treats bushes, leaves, etc. as ground points as
Polar Grid Based Point Cloud Ground Segmentation
641
Fig. 7. From the left to the right: LeGO-LOAM, the scanline based method, the proposed method, and from top to bottom were the four different environments described above.
well, and the method in this paper effectively distinguishes bushes from the ground at their base. In the path surrounded by steps, LeGO-LOAM treats all steps as ground points and shows a wide range of under-segmentation on the trail; both the scan line based approach and the algorithm proposed in this paper are effective in segmenting without treating steps as ground points. In the multi-person environment, we use the 40line LiDAR of HESAI. The first two methods treat most of the human body as ground points, indicating that the first two algorithms are not robust to the type of LiDAR and require a lot of parameters modification to get good results, while the method in this paper has better performance in different LiDAR data. 4.3 Quantitative Evaluation From the experimental analysis in the previous section, we can qualitatively conclude that the algorithm segmentation proposed in this paper works well and can still segment the ground points effectively and robustly in various environments. In order to quantitatively analyze the accuracy of the algorithm for ground segmentation, we choose a complex campus environment as the test scene. We divide a frame of point cloud into ground points PG and non-ground points PN . When we perform plane fitting, we often fit nonground points into ground points, and we call such points PNG , and likewise ground points may be treated as non-ground points, called PGN , so we use sensitivity RTP and
642
J. Zhou and L. Zhang
specificity RFP as evaluation indexes. The calculation formula is as follows. RTP = TP ÷ (TP + FN ) RFP = FP ÷ (FP + TN )
(6)
where TP is the number of correctly segmented ground points, FP is the number of non-ground points mistakenly segmented as ground points, FN is the number of correctly segmented non-ground points, and TN is the number of ground points mistakenly segmented as ground points. From this, we can see that the larger the value of RTP , the better the segmentation effect, and the larger the value of RFP , the worse the effect segmentation effect. We use the manual point cloud labeling method as the benchmark, and the experimental results of this paper are shown in Table 3. Table 3. Ground segmentation algorithm error rate Algorithm
RTP
RFP
LeGO-LOAM
71.31%
36.97%
Scan line method
91.62%
9.21%
Ours approach
98.47%
8.02%
5 Conclusion In this paper, we propose a ground segmentation method based on a polar coordinate grid. We adaptively perform the region segmentation and also the grid segmentation based on the properties of LiDAR, and then improve the RANSAC algorithm to reduce the number of iterations of the algorithm and maintain a better fitting performance. Compared with previous algorithms, the proposed algorithm effectively solves the problem of undersegmentation and still maintains good ground segmentation performance with strong robustness in many different environments. In future works, I plan to use point cloud filtering to reduce the effect of invalid point clouds on the ground fit, and to apply my method to engineering applications such as SLAM.
References 1. Huang, S.Y., Liu, L.M., Dong, J., et al.: Review of ground filtering algorithms for vehicle LiDAR scans point cloud data. Opto-Electron. Eng. 47(12), 190688 (2020) 2. Xu, Z., Zhang, K., Min, H., et al.: What drives people to accept automated vehicles? findings from a field experiment. Transp. Res. Part C: Emer. Technol. 95, 320–334 (2018) 3. Xu, G.Y., Niu, H., Guo, C.Y., et al.: Research on target recognition and tracking based on 3D laser point cloud. Autom. Eng. 42(1), 38–46 (2020) 4. Wang, X., Wang, J.Q., Li, K.Q., et al.: Fast segmentation of 3-D point clouds for intelligent vehicles. Tsinghua Sci. Technol. 54(11), 1440–1446 (2014)
Polar Grid Based Point Cloud Ground Segmentation
643
5. Steinhauser, D., Ruepp, O., Burschka, D.: Motion segmentation and scene classification from 3D LIDAR data. In: IEEE Intelligent Vehicles Symposium, pp. 398–403 (2008) 6. Li, J., Zhao, K., Bai, R., et al.: Urban ground segmentation algorithm based on ray slope threshold. Acta Optica Sinica 39(9), 0928004 (2019) 7. Chum, O., Matas, J., Kittler, J.: Locally optimized RANSAC. In: Michaelis, B., Krell, G. (eds) Pattern Recognition. DAGM 2003. Lecture Notes in Computer Science, vol. 2781, pp. 236-243. Springer, Berlin (2003). https://doi.org/10.1007/978-3-540-45243-0_31 8. Asvadi, A., Peixoto, P., Nunes, U.: Detection and tracking of moving objects using 2.5D motion grids. In: IEEE International Conference on Intelligent Transportation Systems, pp. 788–793 (2015) 9. Narksri, P., Takeuchi, E., Ninomiya, Y., et al.: A slope-robust cascaded ground segmentation in 3D point cloud for autonomous vehicles. In: IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 497–504 (2018) 10. Lim, H., Hwang, S., Myung, H.: ERASOR: egocentric ratio of pseudo occupancy-based dynamic object removal for static 3D point cloud map building. In: IEEE Robotics and Automation Letters, pp. 2272–2279 (2021) 11. Cheng, J., He, D., Lee, C.: A simple ground segmentation method for LiDAR 3D point clouds. In International Conference on Advances in Computer Technology, Information Science and Communications (CTISC), pp. 171–175 (2020) 12. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981) 13. Himmelsbach, M., Hundelshausen, F.V., Wuensche, H.J.: Fast segmentation of 3D point clouds for ground vehicles. In: IEEE Intelligent Vehicles Symposium, pp. 560–565 (2010) 14. Moosmann, F., Pink, O, Stiller, C.: Segmentation of 3D lidar data in non-flat urban environments using a local convexity criterion. In: IEEE Intelligent Vehicles Symposium, pp. 215–220 (2009) 15. Zermas, D., Izzat, I., Papanikolopoulos, N.: Fast segmentation of 3D point clouds: a paradigm on LiDAR data for autonomous vehicle applications. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 5067–5073 (2017) 16. Li, X., Han, X., Xiong, F.G.: Plane fitting of point clouds based on RANSAC and TLS. Comput. Eng. Des. 38(1), 123–126 (2017) 17. Wu, H., Zhang, X., Shi, W., et al.: An accurate and robust region-growing algorithm for plane segmentation of TLS point clouds using a multiscale tensor voting method. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 12(10), 4160–4168 (2019) 18. Lim, H., Oh, M., Myung, H.: Patchwork: concentric zone-based region-wise ground segmentation with ground likelihood estimation using a 3D LiDAR sensor. In: International Conference on Robotics and Automation(ICRA), pp. 6458–6465 (2021) 19. Nurunnabi, A., Belton, D., West, G.: Diagnostics based principal component analysis for robust plane fitting in laser data. In: International Conference on Computer & Information Technology, pp. 484–489. IEEE (2014) 20. Shan, T., Englot, B.: LeGO-LOAM: lightweight and ground-optimized lidar odometry and mapping on variable terrain. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4758–4765 (2018)
High-Precision Localization of Mobile Robot Based on Particle Filters Combined with Triangle Matching Huaidong Zhou1(B) , Wanchen Tuo1 , and Wusheng Chou1,2 1
Robotics Institute, School of Mechanical Engineering and Automation, Beihang University, Beijing100191, China {hdzhou,tuowanchen,wschou}@buaa.edu.cn 2 State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing 100191, China
Abstract. Robot location is a fundamental technology in the field of autonomous robotics and has been widely studied. However, highprecision localization of the mobile robot is still a serious challenge. To investigate this point further, this paper proposes a method to improve the autonomous robot localization accuracy, which adjusts the number of particles in the AMCL algorithm by fusing the information of LRL. LRL data is used to improve the localization accuracy of mobile platforms and prevent kidnapped robot problems. The algorithm is conducted and verified in a real robot system. The experiment results show that the proposed method can achieve high-precision locations in real-world scenarios, and verify the effectiveness of the improved localization system. Keywords: Autonomous mobile robotics · High-precision localization · Triangle match · Laser reflector landmarks
1
Introduction
Robot location is one of the most common problems in the mobile robot field and is extensively demanded in many domains such as mobile manipulation [1], warehouse logistics [2], and inspection work [3]. Based on traditional navigation technology, robots can repeat navigation and location tasks with high stability. However, the practical application requires the robot location to have high accuracy and reliability. Therefore, a variety of techniques have been developed to improve the positioning ability of the mobile robot [4]. With the development of the Simultaneous Localization and Mapping (SLAM) algorithms, a large amount of sensor information such as camera [5], laser [6], and WiFi [7] are fused to improve the positioning accuracy of the robot. The Adaptive Monte Carlo localization(AMCL) algorithm [8] is an efficient probabilistic localization method that is based on a laser sensor. Based on AMCL, enormous approaches have been developed [9,10] to enhance the efficiency and precision of the autonomous mobile robot location. Accordingly, the accuracy of robot localization has been greatly improved and widely used. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 644–652, 2023. https://doi.org/10.1007/978-981-99-0617-8_47
High-Precision Localization of Mobile
645
However, with the demand for sophisticated operations in the industry such as assembly manipulation [11] and material handling [12], the accuracy of the robot localization needs further improvement. Although this is not particularly new and has been used for many years in the field of mobile robots, existing approaches need further improve localization accuracy and tackle the kidnapping problem, especially in highly similar environments. How to rapidly and accurately navigate to the target point is an important issue, which is still worthy of further investigation. In this paper, we propose an improved particle generation method to improve the AMCL algorithm based on the Laser Reflector Landmarks(LRL) information. The new particles are continuously generated by triangular matching with LRL information. Depending on the new particles, the mobile robot achieves high-precision positioning in the progress of SLAM. Experiments on real robots verify the effectiveness of the proposed method.
2
Related Work
The SLAM problem is considered to be one of the keys to realizing autonomous robots [13]. How to locate the robot accurately is one of the key technologies. To better solve this problem, the researchers studied location techniques such as Zigbee [14] and GNSS [15]. Although robot localization technology has been widely used and obtained good results. However, high-precision positioning is still worth further study. To further improve the accuracy of the robot, in work [15], the particle weight in the Monte Carlo Localization(MCL) algorithm is adjusted by using the Karlman filter of Global Navigation Satellite System(GNSS) information. The fusion of GNSS data improves the accuracy of location in regions with fewer map characteristics and avoids the problem of kidnapped robots. Unlike previous work using GNSS to improve the particle filter, the authors Ahn et al. [16] applied the particle filter localization methods to autonomous vehicles, and the position of the autonomous vehicle is estimated accurately by using a 2D LiDAR and a features map of the road. The enhanced particle filter localization method can be used to improve both outdoor positioning accuracy and indoor positioning accuracy. Zhu et al. [17] improve the particle filter in the SLAM algorithm by using the formula of firefly position update to make the particle representation more reasonable and avoid particle weight degradation and particle depletion. In work [18], the authors propose linear kalman filter SLAM and extended kalman filter SLAM for localization and verification in simulation environments. The simulation results show that the landmark information improves the accuracy of the mobile robot positioning. Using WiFi routers as landmarks, the authors [19] propose a WiFi-based indoor mobile robot positioning system to improve positioning accuracy. This method solves the measurement noise and improves the positioning accuracy by using the component analysis-based extreme learning machine algorithm. Magnago et al. investigate the problem of effective landmark placement in work [20]. In this paper, we propose a method to estimate robot position by using LRL data and injecting the new particles according to the estimation result. The
646
H. Zhou et al.
robot state estimation is based on the theory of triangle matching, which makes the new particles more accurate and reduces the measurement noise of the laser. Besides, we also retain some of the particles produced by AMCL to avoid the lack of LRL information and to stabilize the navigation system.
3
Method
High-precision localization is an important issue in an autonomous mobile robot. In this section, all those modifications are detailed and justified in integrating the LRL information into the original AMCL algorithm [8] to improve the accuracy of mobile robot localization. 3.1
Map Building and Matched Template Establishing
Laser range finder has been widely used in map building, locating and tracking. Traditionally, the robot navigates and locates itself in the environment based on the match result of the laser scan with the pre-build probability map. In this part, we utilize the LIDAR and Gmapping algorithm [21] to build the probability map as shown in Fig. 1. During the process of map building, we estimate the position of each laser reflector based on the least square algorithm and establish the topological map of LRL as shown in Fig. 2. Then, we randomly choose three non-colinear landmarks to form a triangle and calculate the length of the sides. All of the triangles are stored in the global matched template libraries and sorted by the longest side as shown in Fig. 2. 3.2
Landmarks Retrieve Based on Triangle Matching
The laser reflectors are easier to detect than other features during navigation. The matching process is as follows. First, the robot detects three or more reflectors and calculates the detection triangle. Second, retrieving the longest side of the detected triangle in the global matched template libraries established in the last section. Then, the location information of three laser reflectors is obtained based on the retrieved result. As shown in Fig. 3, the robot G detects the local triangle formed by the laser reflectors A, B, and D, and the local triangle ABD is matched with the triangle 3 − 4 − 25 in the template library. Therefore, the position of laser reflectors A, B, and D is obtained. 3.3
Robot Position Estimation and New Particle Generation
As shown in Fig. 4, the robot detects the triangle ABC composed by the laser reflectors A, B and D. Through retrieving in the matched template libraries, the triangle 3 − 4 − 25 is the best-matched result. Therefore, the state of the robot can be estimated by laser reflectors A, B and C as follows: ⎧ 2 2 2 ⎨ (xG − xA ) + (yG − yA ) = rA 2 2 2 (1) (x − xB ) + (yG − yB ) = rB ⎩ G 2 2 2 (xG − xC ) + (yG − yC ) = rC
High-Precision Localization of Mobile
647
Fig. 1. Probability map. The red marks are the location of the laser reflector landmarks. (Color figure online)
Fig. 2. Topological map based on laser reflector landmarks
where (xG , yG ) is the estimation position of the robot, (xi , yi )(i = A, B, C) is the position of the matched reflectors, ri (i = A, B, C) is the distance between the robot and the reflector, and the formula can be further organized as: Ap = R
(2)
Therefore, the parameter p can be solved as below: p = (AT A)−1 AT R
(3)
Besides, the estimation orientation of the robot can be obtained as follow: θ=
C 1 a tan 2(yi −yG ,xi −xG ) − ϕi 3 i=A
(4)
648
H. Zhou et al.
Fig. 3. Landmarks retrieve based on triangle matching and robot position estimation
where (xG , yG ) is the estimation position of the robot, (xi , yi )(i = A, B, C) is the location of reflectors, ϕi (i = A, B, C) is the orientation of the reflector. Finally, the estimation pose of the robot can be expressed as (xG , yG , θ). In general, more than three triangles can be detected during the navigation period. Accordingly, the estimation state of the robot is the same as the triangle-matching results. To improve the accuracy of the robot localization, we empirically inject 600 new particles in each estimation position, which gives the best results in the environments tested.
4
Experiment
In the section, first, we introduce the hardware and control framework of the robot. Second, we describe the experimental environment setting. And then, we evaluate the performance of our proposed method on the robot in real-world scenarios. Finally, the results are discussed. 4.1
System Overview
The above-improved method is deployed on a real mobile robot and verified in the common indoor environment. The robot and environment settings are detailed below. System Hardware and Control Framework. The mobile robot is a twowheel differential moving platform and is equipped with a 2D laser range finder and a microcomputer, as shown in Fig. 4. The 2D laser can cover a 20 m distance and 270◦ range, supported by HOKUYO. The microcomputer is used to process the localization programs and sensor data named Jetson TX2 and with 256 core-Pascal GPUs, manufactured by Nvidia. For the real robot, we use a Robot Operated System (ROS) based control system to communicate the data between the mobile base and sensors such as 2d laser and IMU. All control systems are built on the ROS system.
High-Precision Localization of Mobile
649
Fig. 4. The mobile robot platform
Environment Description. As shown in Fig. 5, the laser reflectors are nonequidistant and arranged in the scene and ensure that the robot can detect at least three reflectors at arbitrary positions. The size of the experimental environment is about 80 m × 80 m. We build the grid map and matching template library before experimenting as Fig. 1 and Fig. 2 shown.
Fig. 5. The real environment with laser reflector
4.2
Experiment Results and Discussion
To verify the proposed method, the experiments are performed and compared with the AMCL algorithm without the triangle match process. The euclidean distance and orientation error of each position and orientation is the evaluation criterion. For the real experiment, the position of the robot is randomly initialized. The robot is navigated to the target point with a speed of 0.2 m/s. As it is shown in Fig. 6, the localization error provided by our approach is better than the AMCL algorithm and the non-triangle-match method. According to the proposed method can improve the localization accuracy based on laser reflector landmarks information. And the error provided by the non-triangle-match method is also better than the AMCL. By adding the laser reflector into the
650
H. Zhou et al.
environment, the laser can obtain more features to match the map to enhance the weight of particles and avoid the decay of particles. Our proposed approach obtains the accurate position of the laser reflector landmarks based on triangle matching and utilizes the match results to estimate the position of the robot, and adds the new accurate particles based on the state of the estimated position. Figure 7 shows the positioning error of the robot in the absence of the laser reflector landmarks information. In this case, in order to ensure the stability of the system, the robot keeps the navigation by switching to the AMCL algorithm. Therefore, the robot can achieve high-precision localization with the laser reflector landmarks information and maintain the same accuracy as AMCL with a lack of laser reflector landmarks information.
Fig. 6. Comparison between proposed method with AMCL algorithm.
Fig. 7. Comparison between proposed method with AMCL algorithm.
High-Precision Localization of Mobile
651
As the experiment results show, the proposed method achieves better performance compared to the AMCL and non-triangle-match method, improving the positioning accuracy of the robot. The high-precision positioning depends on the accuracy of triangular matching and the matching template library. Moreover, as Fig. 7 shows, our method can maintain the same accuracy as AMCL with a lack of landmarks information and provide always better results than the original AMCL algorithm.
5
Conclusions
In this work, we propose a novel localization method to improve the position accuracy of the mobile robot’s indoor positioning. This method achieves highprecision localization based on triangle matching of the laser reflector information with the matched template library. According to the experimental results, the proposed method gives better results than the original AMCL algorithm and maintains the same accuracy as AMCL with the lack of landmarks information. This method has good application value in the industry of high-precision autonomous navigation. It can also be applied to other navigation and positioning methods based on landmarks, such as vision and WiFi.
References 1. Li, Z., Moran, P., Dong, Q., Shaw, R.J., Hauser, K.: Development of a tele-nursing mobile manipulator for remote care-giving in quarantine areas. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3581–3586. IEEE, Singapore, Singapore (2017). https://doi.org/10.1109/ICRA.2017.7989411 2. Vasiljevi´c, G., Mikli´c, D., Draganjac, I., Kovaˇci´c, Z., Lista, P.: High-accuracy vehicle localization for autonomous warehousing. Robot. Comput. Integr. Manuf. 42, 1–16 (2016) 3. Huang, J., Wang, J., Tan, Y., Wu, D., Cao, Y.: An automatic analog instrument reading system using computer vision and inspection robot. IEEE Trans. Instrum. Meas. 9(69), 6322–6335 (2020) 4. Kuutti, S., Fallah, S., Katsaros, K., Dianati, M., Mccullough, F., Mouzakitis, A.: A survey of the state-of-the-art localization techniques and their potentials for autonomous vehicle applications. IEEE Internet Things J. 2(5), 829–846 (2018) 5. Li, Y., Zhu, S., Yu, Y., Wang, Z.: An improved graph-based visual localization system for indoor mobile robot using newly designed markers. Int. J. Adv. Rob. Syst. 2(15), 172988141876919 (2018) 6. O˘ guz-Ekim, P.: TDOA based localization and its application to the initialization of LiDAR based autonomous robots. Robot. Auton. Syst. 131, 103590 (2020) 7. Lee, G., Moon, B.-C., Lee, S., Han, D.: Fusion of the SLAM with Wi-Fi-based positioning methods for mobile robot-based learning data collection, localization, and tracking in indoor spaces. Sensors 18(20), 5182 (2020) 8. Pfaff, P., Burgard, W., Fox, D.: A robust monte-Carlo localization using adaptive likelihood models. In: Christensen, H.I. (ed.) European Robotics Symposium 2006. STAR, vol. 22, pp. 181–194. Springer, Berlin (2006). https://doi.org/10. 1007/11681120 15
652
H. Zhou et al.
9. Hess, W., Kohler, D., Rapp, H., Andor, D.: Real-time loop closure in 2D LIDAR SLAM. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 1271–1278. IEEE, Stockholm, Sweden (2016). https://doi.org/10. 1109/ICRA.2016.7487258 10. Peng, G., et al.: An improved AMCL algorithm based on laser scanning match in a complex and unstructured environment. Complexity 2018, 1–11 (2018) 11. Hamner, B., Koterba, S., Shi, J., Simmons, R., Singh, S.: An autonomous mobile manipulator for assembly tasks. Auton. Robot. 1(28), 131–149 (2010) 12. Chen, F., Selvaggio, M., Caldwell, D.G.: Dexterous grasping by manipulability selection for mobile manipulator with visual guidance. IEEE Trans. Industr. Inf. 2(15), 1202–1210 (2019) 13. Bresson, G., Alsayed, Z., Yu, L., Glaser, S.: Simultaneous localization and mapping: a survey of current trends in autonomous driving. IEEE Trans. Intell. Veh. 3(2), 194–220 (2017) ´ G´ ´ 14. L´ opez, Y.A., omez, M.E. de C., Alvarez, J.L., Andr´es, F.L.H.: Evaluation of an RSS-based indoor location system. Sens. Actuators A: Phys. 1(167), 110–116 (2011) ´ Garc´ıa, F., Armingol, J.M.: Improved LiDAR probabilistic local15. de Miguel, M.A., ization for autonomous vehicles using GNSS. Sensors 11(20), 3145 (2020) 16. Ahn, K., Kang, Y.: A particle filter localization method using 2D laser sensor measurements and road features for autonomous vehicle. J. Adv. Transp. 2019, 1–11 (2019) 17. Zhu, D., Sun, X., Wang, L., Liu, B., Ji, K.: Mobile robot SLAM algorithm based on improved firefly particle filter. In: 2019 International Conference on Robots & Intelligent System (ICRIS), pp. 35–38. IEEE, Haikou, China (2019). https://doi. org/10.1109/ICRIS.2019.00018 18. Ullah, I., Su, X., Zhang, X., Choi, D.: Simultaneous localization and mapping based on Kalman filter and extended Kalman filter. Wirel. Commun. Mob. Comput. 2020, 1–12 (2020) 19. Cui, W., Liu, Q., Zhang, L., Wang, H., Lu, X., Li, J.: A robust mobile robot indoor positioning system based on Wi-Fi. Int. J. Adv. Rob. Syst. 1(17), 172988141989666 (2020) 20. Magnago, V., Palopoli, L., Passerone, R., Fontanelli, D., Macii, D.: Effective landmark placement for robot indoor localization with position uncertainty constraints. IEEE Trans. Instrum. Meas. 11(68), 4443–4455 (2019) 21. Grisetti, G., Stachniss, C., Burgard, W.: Improved techniques for grid mapping with Rao-Blackwellized particle filters. IEEE Trans. Robot. 1(23), 34–46 (2007)
Correction to: A LED Module Number Detection for LED Screen Calibration Yang Zhang, Zhuang Ma, Yimin Zhou, Lihong Zhao, Yong Wang, and Liqiang Wang
Correction to: Chapter “A LED Module Number Detection for LED Screen Calibration” in: F. Sun et al. (Eds.): Cognitive Systems and Information Processing, CCIS 1787, https://doi.org/10.1007/978-981-99-0617-8_41
In the originally published version of the chapter 41 the names of the 3 authors were erroneously omitted. The names of the three authors have been added to the chapter and the acknowledgement section has been modified.
The updated original version of this chapter can be found at https://doi.org/10.1007/978-981-99-0617-8_41 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, p. C1, 2023. https://doi.org/10.1007/978-981-99-0617-8_48
Author Index
C Cai, Yan 288 Cao, Jiuwen 85, 174 Cao, Xiaolei 329 Cao, Xiaoyue 389 Cao, Yingzhuo 489 Cao, Yong 59, 425, 463, 489 Cao, Yonghui 59, 425, 463, 489 Chai, Runqi 130 Chai, Senchun 130 Chai, Wenqing 512 Chen, Hao 174 Chen, Lei 195 Chen, Meng 371 Chen, Rui 521 Chen, Wenbai 301 Chen, Yaohui 85 Chen, Yufang 209 Cheng, Hong 16 Chou, Wusheng 644 Chu, Zhongyi 45, 70 Cui, Jing 45, 70 Cui, Junhan 329 Cui, Xiaonan 85, 174 D Ding, Xingyu 547 Dong, Mingjie 3 Du, Jianrui 274 F Fan, Yingjun 274 Fang, Jiali 222 Feng, Yuanmeng 85 Feng, Yuting 274 G Gao, Feng 85 Gao, Yang 617
Guo, Dongmei 437 Guo, Rong 474 Guo, Zhongwen 234 H He, Yue 463, 489 Hu, Dewen 102 Hu, Lingfeng 209 Huang, Haiming 452, 512 Huang, Qiaogao 59, 463, 489 Huang, Rui 16 J Jiang, Mingfei 474 Jiang, Tiejia 85 Jiang, Zhihong 329 Jiao, Ran 3 K Kong, Haojia
115
L Lei, Baiying 174 Li, Chao 247 Li, Haiyuan 532 Li, Haoan 115 Li, Hui 329 Li, Jianfeng 3 Li, Ming 102 Li, Xiali 148 Li, Xiaoli 30, 389 Li, Zhijun 209 Lian, Chaochun 288 Liu, Chunfang 30, 222, 389 Liu, Houde 413 Liu, Huimin 45 Liu, Meng 400 Liu, Xia 159 Lu, Dongxi 288
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 F. Sun et al. (Eds.): ICCSIP 2022, CCIS 1787, pp. 653–655, 2023. https://doi.org/10.1007/978-981-99-0617-8
654
Author Index
Lu, Qinghua 400, 512, 593 Luan, Honggang 617 Luo, Jing 301 Luo, Lufeng 400, 593 Lv, Naijing 503 Lv, Xiongce 503
V Vidal, Pierre-Paul
M Ma, Junjie 301 Ma, Shumin 463, 489 Ma, Yifan 329 Ma, Zhuang 570 Miao, Runqing 452 Miao, Shengyi 452 Min, Huasong 437 Mu, Fengjun 16 N Nian, Danni
316
O Ofotsu, Nana Kwame P Pan, Guang 59, 425 Peng, Zhinan 16 Q Qi, Suiping 234 Qin, Yizhe 16 R Ruan, Wenjun
593
S Shan, Jianhua 547 Shi, Kecheng 16 Shi, Qingwu 400 Shu, Lichen 209 Song, Zengfeng 617 Su, Caihong 593 Su, Jiejiang 45 Su, Mengwei 130 Sun, Fuchun 301, 452 Sun, Jiali 351 Sun, Ruize 585 Sun, Yuhao 547 Suo, Yuhan 130
T Tan, Jiawei 329 Tang, Pengfei 102 Tang, Shijie 512 Tuo, Wanchen 644
521
174
W Wang, Danping 85 Wang, Di 159 Wang, Haodong 555 Wang, Jiawei 316 Wang, Jinxin 234 Wang, Kai 593 Wang, Kaidi 274 Wang, Lin 605 Wang, Liqiang 570 Wang, Na 452 Wang, Ni 234 Wang, Quan 593 Wang, Renpeng 413 Wang, Rui 130 Wang, Ruixi 474 Wang, Shasha 605 Wang, Tianlei 174 Wang, Yan 532 Wang, Ying 209 Wang, Yong 570 Wang, Zeyu 3 Wei, Lan 503 Wen, Zhenkun 452 Wu, Di’en 512 Wu, Fengge 247 Wu, Huarui 195 Wu, Licheng 148 Wu, Qifei 148 X Xiao, Jiangtao 413 Xie, Yu 413, 425, 463, 489 Xu, Bin 351 Xu, Xinying 555 Y Yan, Guang 437 Yang, Beida 425
Author Index
Yang, Lei 209 Yang, Ming 288 Yao, Guocai 301 Yao, Yongchun 288 Yeh, Weichang 400, 593 Yin, Chunlong 329 Yu, Congzhi 503 Yu, Hongze 532 Yu, Pan 30, 222 Yu, Xinming 474 Yu, Yushu 274, 351, 474 Yuan, Wei 288 Yuan, Yuan 195 Z Zhang, Bolun 70 Zhang, Daili 59 Zhang, Ding 547 Zhang, Huai 209 Zhang, Jiacheng 209 Zhang, Jinbiao 30 Zhang, Jingqiu 159 Zhang, Liwei 371, 632 Zhang, Mingchuan 605 Zhang, Sibo 316 Zhang, Xiaodong 452
655
Zhang, Xinya 413 Zhang, Yan 102 Zhang, Yang 570 Zhang, Yifan 503 Zhang, Yunzhi 400 Zhang, Zijian 70 Zhao, Junsuo 247 Zhao, Lei 547 Zhao, Lihong 570 Zhao, Wenjing 555 Zhao, Yongjia 585 Zheng, Haoping 371 Zheng, Ruijuan 605 Zheng, Runze 85 Zheng, Zihao 555 Zhong, Daming 452 Zhong, Hongming 148 Zhou, Huaidong 644 Zhou, Jiyang 632 Zhou, Wei 413 Zhou, Yimin 570 Zhu, Pengxuan 195 Zhu, Wenbo 400, 593 Zhu, Xinyu 521 Zou, Chaobin 16 Zuo, Mingshuo 474